Introduction to information retrieval /

副标题:无

作   者:Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze.

分类号:

ISBN:9780521865715

微信扫一扫,移动浏览光盘

简介

"Introduction to Information Retrieval is the first textbook with a coherent treatment of classical and web information retrieval, including web search and the related areas of text classification and text clustering. Written from a computerscience perspective, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents and of methods for evaluating systems,along with an introduction to the use of machine learning methods on text collections." "Designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also interest researchers and professionals. A complete set of lecture slides and exercises that accompany the book are availableon the web."--BOOK JACKET.

目录

Table of Notation p. xi
Preface p. xv
Boolean retrieval p. 1
An example information retrieval problem p. 3
A first take at building an inverted index p. 6
Processing Boolean queries p. 9
The extended Boolean model versus ranked retrieval p. 13
References and further reading p. 16
The term vocabulary and postings lists p. 18
Document delineation and character sequence decoding p. 18
Determining the vocabulary of terms p. 21
Faster postings list intersection via skip pointers p. 33
Positional postings and phrase queries p. 36
References and further reading p. 43
Dictionaries and tolerant retrieval p. 45
Search structures for dictionaries p. 45
Wildcard queries p. 48
Spelling correction p. 52
Phonetic correction p. 58
References and further reading p. 59
Index construction p. 61
Hardware basics p. 62
Blocked sort-based indexing p. 63
Single-pass in-memory indexing p. 66
Distributed indexing p. 68
Dynamic indexing p. 71
Other types of indexes p. 73
References and further reading p. 76
Index compression p. 78
Statistical properties of terms in information retrieval p. 79
Dictionary compression p. 82
Postings file compression p. 87
References and further reading p. 97
Scoring, term weighting, and the vector space model p. 100
Parametric and zone indexes p. 101
Term frequency and weighting p. 107
The vector space model for scoring p. 110
Variant tf-idf functions p. 116
References and further reading p. 122
Computing scores in a complete search system p. 124
Efficient scoring and ranking p. 124
Components of an information retrieval system p. 132
Vector space scoring and query operator interaction p. 136
References and further reading p. 137
Evaluation in information retrieval p. 139
Information retrieval system evaluation p. 140
Standard test collections p. 141
Evaluation of unranked retrieval sets p. 142
Evaluation of ranked retrieval results p. 145
Assessing relevance p. 151
A broader perspective: System quality and user utility p. 154
Results snippets p. 157
References and further reading p. 159
Relevance feedback and query expansion p. 162
Relevance feedback and pseudo relevance feedback p. 163
Global methods for query reformulation p. 173
References and further reading p. 177
XML retrieval p. 178
Basic XML concepts p. 180
Challenges in XML retrieval p. 183
A vector space model for XML retrieval p. 188
Evaluation of XML retrieval p. 192
Text-centric versus data-centric XML retrieval p. 196
References and further reading p. 198
Probabilistic information retrieval p. 201
Review of basic probability theory p. 202
The probability ranking principle p. 203
The binary independence model p. 204
An appraisal and some extensions p. 212
References and further reading p. 216
Language models for information retrieval p. 218
Language models p. 218
The query likelihood model p. 223
Language modeling versus other approaches in information retrieval p. 229
Extended language modeling approaches p. 230
References and further reading p. 232
Text classification and Naive Bayes p. 234
The text classification problem p. 237
Naive Bayes text classification p. 238
The Bernoulli model p. 243
Properties of Naive Bayes p. 245
Feature selection p. 251
Evaluation of text classification p. 258
References and further reading p. 264
Vector space classification p. 266
Document representations and measures of relatedness in vector spaces p. 267
Rocchio classification p. 269
k nearest neighbor p. 273
Linear versus nonlinear classifiers p. 277
Classification with more than two classes p. 281
The bias-variance tradeoff p. 284
References and further reading p. 291
Support vector machines and machine learning on documents p. 293
Support vector machines: The linearly separable case p. 294
Extensions to the support vector machine model p. 300
Issues in the classification of text documents p. 307
Machine-learning methods in ad hoc information retrieval p. 314
References and further reading p. 318
Flat clustering p. 321
Clustering in information retrieval p. 322
Problem statement p. 326
Evaluation of clustering p. 327
K-means p. 331
Model-based clustering p. 338
References and further reading p. 343
Hierarchical clustering p. 346
Hierarchical agglomerative clustering p. 347
Single-link and complete-link clustering p. 350
Group-average agglomerative clustering p. 356
Centroid clustering p. 358
Optimality of hierarchical agglomerative clustering p. 360
Divisive clustering p. 362
Cluster labeling p. 363
Implementation notes p. 365
References and further reading p. 367
Matrix decompositions and latent semantic indexing p. 369
Linear algebra review p. 369
Term-document matrices and singular value decompositions p. 373
Low-rank approximations p. 376
Latent semantic indexing p. 378
References and further reading p. 383
Web search basics p. 385
Background and history p. 385
Web characteristics p. 387
Advertising as the economic model p. 392
The search user experience p. 395
Index size and estimation p. 396
Near-duplicates and shingling p. 400
References and further reading p. 404
Web crawling and indexes p. 405
Overview p. 405
Crawling p. 406
Distributing indexes p. 415
Connectivity servers p. 416
References and further reading p. 419
Link analysis p. 421
The Web as a graph p. 422
PageRank p. 424
Hubs and authorities p. 433
References and further reading p. 439
Bibliography p. 441
Index p. 469

已确认勘误

次印刷

页码 勘误内容 提交人 修订印次

Introduction to information retrieval /
    • 名称
    • 类型
    • 大小

    光盘服务联系方式: 020-38250260    客服QQ:4006604884

    意见反馈

    14:15

    关闭

    云图客服:

    尊敬的用户,您好!您有任何提议或者建议都可以在此提出来,我们会谦虚地接受任何意见。

    或者您是想咨询:

    用户发送的提问,这种方式就需要有位在线客服来回答用户的问题,这种 就属于对话式的,问题是这种提问是否需要用户登录才能提问

    Video Player
    ×
    Audio Player
    ×
    pdf Player
    ×
    Current View

    看过该图书的还喜欢

    some pictures

    解忧杂货店

    东野圭吾 (作者), 李盈春 (译者)

    loading icon