SCOTS
CMSW

Dealing With Variation: Lexis

Christian Kay, University of Glasgow

“The trouble with the English language is that it has too many words.” So said one of my students in some disgust after coming across yet another meaning which was new to him. Two of the reasons for this state of affairs are homonymy and polysemy, both of which result in multiple meanings for a single form. A third is the degree of diatopic, diachronic and stylistic variation to be found in lexis, resulting in multiple forms for similar meanings, with implications for data-mining. Disambiguation of such forms has long been an issue in Natural Language Processing, with two main approaches being taken, (1) the creation of semantic parsers which decompose meanings into smaller components which can then be matched in tree structures (as in WordNet, MindNet), and (2) probability-based searches which exploit the fact that meaning is governed by context, so that words tend to co-occur with others from the same semantic area. A component of the latter is often a semantically organised thesaurus.i

The Historical Thesaurus of English,ii which will be demonstrated, offers a sophisticated tool for online searching. Search words are, however, normalised to Oxford English Dictionary headwords. The full usefulness of such a tool for searching unannotated texts can only be realised when variant spellings can be incorporated.

i. For an overview, see Wilks, Yorick A., Slator, Brian M., and Guthrie, Louise M., Electric Words: Dictionaries, Computers and Meanings (Cambridge, MA, and London: MIT Press, 1996). More recent work, including the MALT project (Mappings, Agglomerations and Lexical Tuning), is described at http://www.dcs.shef.ac.uk/~yorick/

ii. http://www.historicalthesaurus.arts.gla.ac.uk