Grady Ward's Moby

Moby Thesaurus

Moby Thesaurus is the largest and most comprehensive thesaurus data source in English available for commercial use. This second edition has been thoroughly revised adding more than 5,000 root words (to total more than 30,000) with an additional million synonyms and related terms (to total more than 2.5 million synonyms and related terms). Although this thesaurus is provided in a very simple ASCII format suitable to viewing, editing, and automatic parsing, most users will consider reformatting schemes to represent the data in a more economical form, such as table of related terms whose index can be shared by many roots. This is roughly the technique used by the thesaurus in print form that has the large index coupled with the synonyms under abstract (and arbitrary) headings in the front matter. Tables of related terms can be stored in, for example, LZ compressed form until actually required by the application. Combining such schemes could easily reduce the storage requirement of this data by an order of magnitude or more. The supplementary file, roget13a.txt, provides a small thesaurus already organized in this form that you may wish to use as a guide when developing your own categories of synonyms. Also, of course, uncommon words can be stripped out according to the developer's criterion, keeping only the core and most oftenly used information. Once unarchived, the database format is flat-file ASCII: each record (delimited from other records with a terminal carriage return/linefeed [ASCII 13/10] character) is of the form:

(In this example, the root word is 'frill', which is always the first word of the list. The synonyms and related words are listed in ASCII alphabetical order after the root. Each entry, including the root, is followed by a comma. The last entry in a record is followed by a carriage return/linefeed [ASCII 13/10].)

binding,bonus, bordering,bordure,bravery,chiffon,clinquant,colors,
colors of rhetoric, crease,creasing,crimp,crisp,decoration,dog-ear,
double,double over, doubling,duplication, duplication of effort,
duplicature,edging,elegant variation,embellishment,embroidery,
enfold,expletive,extra,extra added attraction,extra dash,
extravagance,fat,featherbedding,festoons,figure,figure of speech,
filigree,filling,fillip, fimbria,fimbriation,fine writing,finery,flection,
flowers of speech,flute,fold,fold over,folderol,foofaraw,frilliness,
frilling,frills,frills and furbelows,fringe,frippery,froufrou,furbelow,
fuss,gaiety,galloon, gather,gaudery,gewgaw,gilding,gilt,gingerbread,
hem,infold,interfold, jazz,lagniappe,lap over,lapel,lappet,list,lushness,
ostentation,overadornment,overlap,padding, paste,payroll padding,
plait,plat,pleat,pleonasm,plica,plicate, plication,plicature,ply,premium,
prolixity,purple patches,quill,redundance, redundancy,ruche,ruching,
ruff,ruffle,selvage, showiness,skirting,something extra,stuffing,superaddition,
trumpery,tuck,turn over,twill,twist,unnecessariness,valance,verbosity,
welt,wrinkle[carriage return]
Part-of-Speech information is not stored with this thesaurus. Grady Ward supplies a separate lexical database providing the part(s)-of-speech for a large collection (>200,000) of English words and phrases that can be used in conjunction with this list to supply POS information if needed by the particular application.
This project is available here [12MB].

[ILASH home] Last modified: October 24, 2000>
The Institute for Language Speech and Hearing, The University of Sheffield