All documents are classified as belonging to one of the following genres, and we have tried where possible to include an equal number of words in all genres:

This is in addition to the subcorpus of language commentators, which consists of approximately 1 million words. There is a comprehensive list of all documents available for browsing.

While CMSW has been assembled with the aim of being used as a complete corpus, there are individual documents and collections of documents which might also be of interest to researchers. See some highlights here.