Grady Ward's Moby

Moby Part-of-Speech

This second edition is a particularly thorough revision of the original Moby Part-of-Speech. Beyond the fifteen thousand new entries, many thousand more entries have been scrutinized for correctness and modernity. This is unquestionably the largest P-O-S list in the world. Note that the many included phrases means that parsing algorithms can now tokenize in units larger than a single word, increasing both speed and accuracy.

Database Legend

Each part-of-speech vocabulary entry consists of a word or phrase field followed by a field delimiter of (ASCII 215) and the part-of-speech field that is coded using the following ASCII symbols (case is significant):
    Noun                            N
    Plural                          p
    Noun Phrase                     h
    Verb (usu participle)           V
    Verb (transitive)               t
    Verb (intransitive)             i
    Adjective                       A
    Adverb                          v
    Conjunction                     C
    Preposition                     P
    Interjection                   !
    Pronoun                         r
    Definite Article                D
    Indefinite Article              I
    Nominative                      o
This two-part vocabulary record is delimited from others with CRLF (ASCII 13/10). For example, engineer Nt means that the word engineer has two main uses in English; the principal part-of-speech is as a noun "That engineer could write in microcode with one hand and in ADA with the other" and its secondary part-of-speech is as a transitive verb: "We sure engineered that software to death." In many cases, the -ed, -ing, -ly, and -ic forms of words are not explicitly listed; the participle forms of verbs will be usually marked simply with the V sign rather than the more specific t or i symbols. Words such as "be," which often have more than one head entry in a dictionary, have one listing with all the parts-of-speech for all senses concatenated. Foreign words commonly used in English usually include their diacritical marks, for example, the acute accent e is denoted by ASCII 142.
This project is available here [1.2MB].

[ILASH home] Last modified: October 24, 2000>
The Institute for Language Speech and Hearing, The University of Sheffield