Author(s): Prof Christian Kay
Copyright holder(s): Prof Christian Kay
Friday 24th March 1995
From the point of view of students and researchers in Arts, one of the biggest advances in recent years has been the increased availability of texts. As previous speakers have shown, there is now a wealth of material available, both on-line over the Internet and processed on disk or CD-Rom. However, as in many areas of modern day life, the problem is not so much getting hold of vast quantities of material, but what you do with it once you’ve got it. This paper will focus on various activities students can carry out using texts.
In our experience at STELLA, students take enthusiastically to searching for texts; locating a suitable text, and often locating it in an exotic environment, such as the Pentagon or a library in Japan, can be an exciting experience. Enthusiasm sometimes has to be curbed, since texts must be short enough to be manageable; equally, however, they must be long enough to yield sufficient data for analysis. The question “how long should my text be?” is therefore as unanswerable as “how long is a piece of string”! The student who wants, say, to look at the rhyme schemes of sonnets, will require a good deal less text than one who wants to search for a fairly rare word, such as however, in a range of modern prose texts.
Having found the text, the next step is to download it on to a disk and to choose a suitable program to manipulate the data. When we first started doing this kind of work, we thought that these processes might cause problems, but in fact this has not been the case. Most student access is through COMET (Corpus of Modern English Texts), a text-collection project based at Glasgow. At the exit point from COMET, we display an instruction page prepared by the CTI Centre, Oxford (Computers in Teaching Initiative), pointing users in the direction of instructions for the main text-handling packages. In fact, we have now standardised on the use of certain concordance and database packages, which are adequate for most undergraduate needs.
For initial instruction on concordancing, we use the Longman Miniconcordancer. Although this package is far from perfect, in either applications or appearance, it contains its own small sample of texts and produces fast results. Students normally have a one-hour class where the principles and uses of concordances are discussed. This is followed up by a lab session where they work through an exercise, using either one of the Longman texts or one they have selected via COMET. The last stage is a second laboratory session where they work independently on a task.
In fact, the most difficult part of the exercise is not using the software, but convincing the students that the information they extract from it is actually useful! This is especially true now that our teaching extends beyond students of English. This year, in addition to our paper in Literary and Linguistic Computing for Honours English students, we have used concordances in a general Humanities Computing Course and with the students doing a master’s degree in information technology, not all of whom have a humanities background.
I usually begin the talk part of the exercise by asking the students to consider what computers do well and what they do badly. This produces a variety of answers, but in essence we can agree that computers do a variety of monotonous chores very well. In particular, they are good at recognising things, provided they know what to look for, at displaying things once they have found them, at counting things and presenting them in interesting ways. They are less good, or completely useless, at interpreting things or recognising their significance. A classic example here is alliteration. A computer can identify all initial occurrences of a particular letter or group of letters, but it cannot, without further input, translate them into sounds; even if it could, it could not say whether the repeated sounds were of literary significance.
The literary student is usually fairly quick to see the possibilities of all this counting and sorting. Often the best way to convince them is to suggest that they compare two texts, from the same or different genres - two sonnets or a sonnet and advertisement - or two newspaper articles on the same topic or two characters in a novel. Even simple counting can be revealing here: one text uses many more verbs; it has longer words (bearing in mind that a computer counts letters, not syllables); it has a high lexical density, that is a large number of different words; it is prone to repetition - perhaps of particular words, or particular collocations or of words from a particular semantic field. Even at this simple level, a good deal of useful data can be gathered. But at this point the human analyst has to take over. Only s/he can recognise that a particular field of meaning, or, more importantly, interpret the significance of particular choices in a text.
Work of this kind need not be restricted to literary texts. The student interested in grammar - a somewhat rarer breed - can search for particular forms - e.g. however and whether a comma or a semi-colon is normally used in front of it. There are also applications in History of English classes, again searching in particular forms. The technique can also be extended to language learning classes. If a student is unsure of how to use a particular word, or finds it difficult to understand a point of grammar, s/he can of course be sent to a dictionary or grammar book. However it is much more interesting, and often more pedagogically successful for the student to concordance a particular word and work out for herself what it means in context, rather as a lexicographer does.
Many of our students go on to do projects or dissertations based on texts. For these, the Longman Miniconcordancer is not usually adequate since it cannot handle enough text. More major work requires a text database like TACT or OCP (Oxford Concordance Program). Normally we favour the former, and indeed now we have someone working on a Windows version of it. For more purely database work, we use ACCESS, which we find friendlier than other databases such as Paradox.
This year topics for projects have ranged particularly widely. Many of the writers ended up surprising me, and more importantly themselves, by the additional insights they achieved.
This work is protected by copyright. All rights reserved.
The SCOTS Project and the University of Glasgow do not necessarily endorse, support or recommend the views expressed in this document.
Cite this Document
LTD1 Workshop. 2021. In The Scottish Corpus of Texts & Speech. Glasgow: University of Glasgow. Retrieved January 2021, from http://www.scottishcorpus.ac.uk/document/?documentid=3.
"LTD1 Workshop." The Scottish Corpus of Texts & Speech. Glasgow: University of Glasgow, 2021. Web. January 2021. http://www.scottishcorpus.ac.uk/document/?documentid=3.
The Scottish Corpus of Texts & Speech, s.v., "LTD1 Workshop," accessed January 2021, http://www.scottishcorpus.ac.uk/document/?documentid=3.
If your style guide prefers a single bibliography entry for this resource, we recommend:
The Scottish Corpus of Texts & Speech. 2021. Glasgow: University of Glasgow. http://www.scottishcorpus.ac.uk.