The nature of the families
The word lists were made to be used with the AntWordProfiler and Range computer
programs and these program cannot distinguish between homonyms like Smith (the
family name) and smith (blacksmith) and March (the month) and march (as soldiers do).
Thus when the program runs, these uses are not distinguished and would be counted in
the same family and as the same type. There was an attempt to deal with this wherever
possible. Marched, marching, marches, marcher, marchers etc for example were put in
one family and March into another. This does not completely distinguish the homonyms,
but it is a step towards doing so.
The high frequency word families tend to be quite large as it appears that higher
frequency stems generally can take a greater range of affixes than lower frequency words.
For example, the high frequency word family nation has the following members nations,
national, nationally, nationwide, nationalism, nationalisms, internationalism,
internationalisms, nationalisations, internationalisation, nationalist, nationalists,
nationalistic, nationalistically, internationalist, internationalists, nationalise,
nationalised, nationalising, nationalisation, nationalize, nationalized, nationalizing,
nationalization, nationhood, nationhoods.
The word family lists group items together that would be perceived as the same words for
the receptive skills of listening and reading. If word lists were made for productive
purposes, for speaking and writing, the lemma would be the largest sensible unit to use.
Some researchers argues for the word type.
The word lists contain compound words but they do not contain phrases. According to or
au fait, for example, might be best counted as a unit, but in the lists the unit is the single
The validity of the BNC word family lists
There are ways of checking whether the word family lists are properly ordered. From the
st 1000 to the 25th 1000, the number of tokens, types, and families found in an
independent corpus should decrease. That is, when the lists are run over a different
corpus from the BNC or COCA, the 1
st 1000 word family list should account for more
tokens, types and families than the 2nd 1000 family list does. Similarly, the 2
nd 1000 word
family list should account for more tokens, types and families than the 3rd 1000 family
list does and so on. While this does not show that each word family is in the right list, it
does show that the lists are properly ordered. Table 3 presents such data using the Range
output from the Wellington Written Corpus.The nature of the families
The word lists were made to be used with the AntWordProfiler and Range computer
programs and these program cannot distinguish between homonyms like Smith (the
family name) and smith (blacksmith) and March (the month) and march (as soldiers do).