最強の英単語集 1冊目 #478

478名無しさん@Next2ch:2014/09/23(火) 13:00:36.96 ID:???

The BNC/COCA word family lists
(17 September 2012)
The BNC/COCA word family lists consist of 29 word family lists. Twenty-five of the
lists contain word families based on frequency and range data. The four additional lists
are (1) an ever-growing list of proper names, (2) a list of marginal words including swear
words, exclamations, and letters of the alphabet, (3) a list of transparent compounds, and
(4) a list of abbreviations. In the lists for AntWordProfiler, each list has a name which
describes its content. In the lists for Range, because of the requirements of the Range
program, each list has a fixed name – basewrdx.txt, where x is a number. Basewrd26-30
just contain one nonsense word each. They were made to provide space for additional
lists and to avoid having to keep changing the names of the proper nouns etc lists.
Basewrd31 contains proper nouns, basewrd32 marginal words, basewrd33 transparent
compounds and basewrd34 abbreviations. More detail on these additional lists can found
in Nation and Webb (2011: Chapter 8).
The lists are saved in UTF-8, without BOM (choose under Encoding in Notepad ++).
The making of the lists
The 1st 1000 and 2nd 1000 word family lists
The first two 1000 word family lists were made using a specially designed 10 million
token corpus. Six million tokens of this corpus were spoken English from both British
and American English (see Corpus/PN corpus for 2000) as well as movies and TV
programs. The written sections included texts for young children and fiction (see Table
1).
Table 1: The corpus used for the first two 1000 word family lists
US UK/NZ
Spoken
1 AmNC spoken face to
face, telephone 1
1,107,602 4 BNC 1 1,036,097
2 AmNC spoken face to
face, telephone 2
1,029,831 5 BNC 2 1,125,523
3 Movies and TV 1.000,000 6 BNC Plus half of
WSC
1,132,620
Written
7 AmNC written fiction,
letters 1
1,145,081 9 School journals 1,028,842
8 AmNC written fiction,
letters 2
939,407 10 BNC fiction 1,040,204
This unusual step of creating a special corpus for the first 2000 word families was
followed because the previous lists made from the British National Corpus were so strongly influenced by the written formal nature of the corpus that they were not suitable
lists for creating language courses or graded reader lists (see Nation, 2004). Very
common words in spoken English like alright, pardon, hello, dad, bye could then be
included in the high frequency words. Other arbitrary adjustments included putting all the
word forms of numbers (one, two, hundred) and weekdays in the 1
st 1000, and the months
of the year in the 2nd 1000, even though their frequency did not always justify this. The
goal was to have a set of high frequency word lists that were suitable for teaching and
course design.
The 3rd 1000 onwards
The remaining 1000 lists were made by u


スパムを通報

このレスがスパム・荒らしである場合は以下のボタンをクリックして通報してください。
(同意できない意見などに反対票を投じる機能ではありません)
通報

このスレッドを全て表示


レスを書き込む

このスレッドはID非表示です。