All documents are classified as belonging to one of the following genres, and we have tried where possible to include an equal number of words in all genres:
- Administrative prose (such as legal documents or council minutes)
- Expository prose (such as travel narratives)
- Personal writing (such as diaries and personal letters)
- Instructional prose (such as textbooks and educational materials)
- Religious prose (including sermons)
- Verse and drama
- Imaginative prose (such as novels and short stories)
- Journalism
This is in addition to the subcorpus of language commentators, which consists of approximately 1 million words. There is a comprehensive list of all documents available for browsing.
While CMSW has been assembled with the aim of being used as a complete corpus, there are individual documents and collections of documents which might also be of interest to researchers. See some highlights here.