UK English Wordlist With Frequency Classification This wordlist is primarily intended to be useful for checking spelling. Editorial policy is conservative. Principal omissions: - words requiring a capital letter - abbreviations - slang Colloquialisms and archaisms are generally excluded. A rare word similar to a common word may be excluded. Both -ise and -ize spellings are included. The character set is: lowercase letters, hyphen, apostrophe. Words which can be spelt with accents occur here in their plain form. If this wordlist is to be used with ispell the following lines may be appropriate for the affix file: boundarychars [---] boundarychars ' wordchars [a-z] [A-Z] The commonest words are labelled 16 and the least common 0. Coverage of common words should be good, but note the categories excluded. Brian Kelk bck22@bckelk.uklinux.net April 2002 Here are bits of a brief conversation I had with the author: From: Brian Kelk Date: Sat, 08 Jul 2000 20:27:21 +0100 > I was wondering what the copyright status of your "UK English Wordlist > With Frequency Classification" word list as it seems to be lacking any > copyright notice. Also, how did you arrive at the "Frequency > Classification". There were many many sources in total, but any text marked "copyright" was avoided. Locally-written documentation was one source. An earlier version of the list resided in a filespace called PUBLIC on the University mainframe, because it was considered public domain. Briefly about frequency: rather than counting occurrences of a word this classification is more along the lines of counting the number of texts in which the word occurs. That way you get some noise immunity, which you very much need. It's based on maybe 5-10 million words of text on the Cambridge mainframe in the 1980s. I had in mind that it might be useful for ranking possible corrections ... Date: Tue, 11 Jul 2000 19:31:34 +0100 > So are you saying your word list is also in the public domain? That is the intention.