简介
Word frequency distributions are characterized by very large numbers of rare words, a property which leads to unusual statistical phenomena. Special statistical techniques for their analysis can be found in various technical journals. Baayen's (U. of Nijmegen, The Netherlands) text is intended to make these specialized techniques more accessible to non-specialists. Coverage includes some basic concepts and notation; non-parametric methods for the analysis of word frequency distributions; detailed descriptions of three parametric models (the lognormal model, the Yule-Simon Zipfian model, and the generalized inverse Gauss-Poisson model); mixture distributions; the effect of non-randomness in word use on the accuracy of the non- parametric and parametric models; and examples of applications. Background knowledge in statistics, probability theory, and some knowledge of elementary calculus is required. Each chapter includes exercises, with solutions provided in an appendix. Includes a CD-ROM. This text is for computational linguists, corpus linguists, psycholinguists, and researchers in the field of quantitative statistics. Annotation c. Book News, Inc., Portland, OR (booknews.com)
目录
List of Figures p. ix
List of Tables p. xix
Introduction p. xxi
1 Word Frequencies p. 1
1.1 Introduction p. 2
1.2 The frequency spectrum p. 8
1.3 Zipf p. 13
1.4 The quest for characteristic constants p. 24
1.5 The lognormal distribution p. 32
1.6 Discussion p. 34
1.7 Bibliographical Comments p. 35
1.8 Questions p. 35
2 Non-parametric models p. 39
2.1 Basic concepts p. 39
2.2 The Urn model p. 42
2.3 The Structural Type Distribution p. 47
2.4 The LNRE zone p. 51
2.5 Good-Turing estimates p. 57
2.6 Interpolation and Extrapolation p. 63
2.6.1 Interpolation p. 64
2.6.2 Extrapolation p. 69
2.7 Discussion p. 76
2.8 Bibliographical Comments p. 76
2.9 Questions p. 77
3 Parametric models p. 79
3.1 Introduction p. 79
3.2 LNRE models p. 82
3.2.1 The Lognormal Structural Type Distribution p. 82
3.2.2 The Generalized Inverse Gauss-Poisson Structural Type Distribution p. 89
3.2.3 The Zipfian Family of LNRE Models p. 93
3.3 Evaluating Goodness of Fit p. 118
3.4 Parameter estimation p. 122
3.5 A comparative study p. 124
3.6 Comparing Lexical Measures Across Texts p. 132
3.7 Discussion p. 132
3.8 Bibliographical Comments p. 133
3.9 Questions p. 133
4 Mixture distributions p. 135
4.1 Introduction p. 135
4.2 Expectations, variances, and covariances p. 139
4.3 Examples of mixture distributions p. 142
4.3.1 A text-level mixture model p. 142
4.3.2 Morphological mixtures p. 145
4.4 Morphological Productivity p. 154
4.5 Discussion p. 158
4.6 Bibliographical Comments p. 160
4.7 Questions p. 160
5 The Randomness Assumption p. 161
5.1 The Randomness Assumption p. 161
5.1.1 Non-randomness and lexical specialization p. 162
5.1.2 Consequences of non-randomness p. 167
5.2 Adjusted LNRE models p. 173
5.2.1 Partition-based adjustment p. 174
5.2.2 Parameter-based adjustment p. 179
5.3 Discussion p. 192
5.4 Bibliographical Comments p. 193
6 Examples of Applications p. 195
6.1 Distributional properties of the lexicon p. 195
6.1.1 Word length and sample size p. 195
6.1.2 Matching reliability across corpora p. 199
6.2 Morphological productivity p. 203
6.2.1 Global analyses p. 203
6.2.2 Productivity and register p. 208
6.3 Authorship and Style p. 211
6.4 Beyond word frequency distributions p. 214
6.4.1 Counts of filarial worms on mites on rats p. 214
6.4.2 Year references p. 215
6.4.3 CV-structures p. 218
6.4.4 Word pairs p. 221
6.4.5 Discussion p. 221
6.5 Some practical guidelines p. 223
A List of Symbols p. 237
B Solutions to the exercises p. 241
C Software p. 251
D Data sets p. 289
Bibliography p. 321
Index p. 329
List of Tables p. xix
Introduction p. xxi
1 Word Frequencies p. 1
1.1 Introduction p. 2
1.2 The frequency spectrum p. 8
1.3 Zipf p. 13
1.4 The quest for characteristic constants p. 24
1.5 The lognormal distribution p. 32
1.6 Discussion p. 34
1.7 Bibliographical Comments p. 35
1.8 Questions p. 35
2 Non-parametric models p. 39
2.1 Basic concepts p. 39
2.2 The Urn model p. 42
2.3 The Structural Type Distribution p. 47
2.4 The LNRE zone p. 51
2.5 Good-Turing estimates p. 57
2.6 Interpolation and Extrapolation p. 63
2.6.1 Interpolation p. 64
2.6.2 Extrapolation p. 69
2.7 Discussion p. 76
2.8 Bibliographical Comments p. 76
2.9 Questions p. 77
3 Parametric models p. 79
3.1 Introduction p. 79
3.2 LNRE models p. 82
3.2.1 The Lognormal Structural Type Distribution p. 82
3.2.2 The Generalized Inverse Gauss-Poisson Structural Type Distribution p. 89
3.2.3 The Zipfian Family of LNRE Models p. 93
3.3 Evaluating Goodness of Fit p. 118
3.4 Parameter estimation p. 122
3.5 A comparative study p. 124
3.6 Comparing Lexical Measures Across Texts p. 132
3.7 Discussion p. 132
3.8 Bibliographical Comments p. 133
3.9 Questions p. 133
4 Mixture distributions p. 135
4.1 Introduction p. 135
4.2 Expectations, variances, and covariances p. 139
4.3 Examples of mixture distributions p. 142
4.3.1 A text-level mixture model p. 142
4.3.2 Morphological mixtures p. 145
4.4 Morphological Productivity p. 154
4.5 Discussion p. 158
4.6 Bibliographical Comments p. 160
4.7 Questions p. 160
5 The Randomness Assumption p. 161
5.1 The Randomness Assumption p. 161
5.1.1 Non-randomness and lexical specialization p. 162
5.1.2 Consequences of non-randomness p. 167
5.2 Adjusted LNRE models p. 173
5.2.1 Partition-based adjustment p. 174
5.2.2 Parameter-based adjustment p. 179
5.3 Discussion p. 192
5.4 Bibliographical Comments p. 193
6 Examples of Applications p. 195
6.1 Distributional properties of the lexicon p. 195
6.1.1 Word length and sample size p. 195
6.1.2 Matching reliability across corpora p. 199
6.2 Morphological productivity p. 203
6.2.1 Global analyses p. 203
6.2.2 Productivity and register p. 208
6.3 Authorship and Style p. 211
6.4 Beyond word frequency distributions p. 214
6.4.1 Counts of filarial worms on mites on rats p. 214
6.4.2 Year references p. 215
6.4.3 CV-structures p. 218
6.4.4 Word pairs p. 221
6.4.5 Discussion p. 221
6.5 Some practical guidelines p. 223
A List of Symbols p. 237
B Solutions to the exercises p. 241
C Software p. 251
D Data sets p. 289
Bibliography p. 321
Index p. 329
- 名称
- 类型
- 大小
光盘服务联系方式: 020-38250260 客服QQ:4006604884
云图客服:
用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问
Video Player
×
Audio Player
×
pdf Player
×
亲爱的云图用户,
光盘内的文件都可以直接点击浏览哦
无需下载,在线查阅资料!