Search the Corpus

356 documents in the corpus

147,129 total types

5,434,830 total tokens

The content of CMSW is mainly written texts. Documents are selected on the basis of date and genre. Some texts have been chosen as ‘written records of speech’, e.g. minutes of meetings and transcripts of court proceedings. Other genres include personal writing, expository prose, verse/drama, journalism, and writing by orthoepists or commentators on language.

You can download the entire corpus as plain text.