This document describes how references to Greek words have been encoded in the Cologne edition of the Monier-Williams Sanskrit-English dictionary. The result of this work may be seen in two places:
The starting point of this work was an earlier (2007) edition of the xml form of MW in which Greek was embedded in the Beta encoding. This work was done by Wendy Teo under the supervision of Peter Scharf and Malcolm Hyman. In this earlier edition, the position of text in the Greek alphabet had been indicated in Malten's initial coding by a placeholder character, '$'; text in other alphabets, such as Arabic, had been similarly indicated. The task was thus reduced to examining all such positions in relation to the scanned image of the page; for those instances identified as Greek, the placeholder character was replaced by a standard transliteration of the Greek. This transliteration was delimited by an identifiable character string, so that subsequent work could identify the instances of Greek transliteration.
The next step was to integrate this encoding of the earlier MW edition into the current database representing MW. I chose to replace the still-present placeholder character with an abstract element <gk>n</gk>. The content of this element is an index number for the MW record. For instance, <gk>1</gk> represents the 1st Greek word in a given record, <gk>2</gk> represents the 2nd Greek word in a given record, and so forth.
Then, a separate database table, mwgreek, was created. It contains two fields:
A)<e>A<e>ἀ<gk>A)N<e>A%29N<e>ἀν
The transcoding of beta code into unicode for the Greek was accomplished by
creating a data table, 'beta_greek.xml'. This table is processed in the
normal way by the transcoding routine developed in Java by Ralph Bunker, and
recoded in PHP. A full account can be
seen by pressing the 'alphabet' button.
This table was developed on the basis of the EpiDoc open source project, which was
identified by Gregory Crane, director of the Perseus project, in a communication to Scharf. Specifically, the beta_greek.xml file was based on an examination
of the files BetaCodeConvert.properties and UnicodeCConverter.propertiies.
The first file corresponds beta codes (letters and diacritics) to property
names (such as 'A = alpha', '*A = Alpha'), and the second file corresponds
these property names to Unicode code points (such as 'alpha = \u03B1', and
'Alpha = \u0391'). Another online reference describes how diacritics are added to letters (they follow the letter). Putting these pieces together seems to give a fairly good
representation of Greek unicode from Beta transliteration.
Two implementation blemishes, introduced by this implementation, of which I am aware are:
Although the results as now constituted are acceptable, there are a few places where further attention could lead to improvements: