信息检索:算法与启发式方法(第2版)(“信息检索”课程的优秀教材)
基本信息
- 原书名: Information Retrieval: Algorithms and Heuristics (The Information Retrieval Series)(2nd Edition)
- 原出版社: Springer
- 作者: (美)David A.Grossman Ophir Frieder
- 丛书名: 图灵计算机科学
- 出版社:人民邮电出版社
- ISBN:9787115212252
- 上架时间:2009-10-13
- 出版日期:2009 年10月
- 开本:16开
- 页码:332
- 版次:2-1
- 所属分类:
计算机 > 计算机科学理论与基础知识 > 计算理论 > 算法
教材 > 计算机教材 > 本科/研究生 > 计算机专业教材 > 计算机基础课程 > 算法与数学基础
编辑推荐
"提供了阐述算法的大量实例。
国外众多名校采用
兼顾知识的广度和主题深度"
推荐阅读
内容简介回到顶部↑
本书是“信息检索”课程的优秀教材,书中对信息检索的概念、原理和算法进行了详细介绍,内容主要包括检索策略、检索实用工具、跨语言信息检索、查询处理、集成结构化及数据和文本、并行信息检索以及分布式信息检索等,并给出了阐述算法的大量实例。.
本书有一定的深度和广度,而且所有的内容都用当前的技术阐述,是高等院校计算机及信息管理等相关专业本科生和研究生的理想教材,对信息检索领域的科研和技术人员也是很好的参考书。...
本书有一定的深度和广度,而且所有的内容都用当前的技术阐述,是高等院校计算机及信息管理等相关专业本科生和研究生的理想教材,对信息检索领域的科研和技术人员也是很好的参考书。...
作译者回到顶部↑
目录回到顶部↑
1. introduction . 1
2. retrieval strategies 9
2.1 vector space model 11
2.2 probabilistic retrieval strategies 21
2.3 language models 45
2.4 inference networks 57
2.5 extended boolean retrieval 67
2.6 latent semantic indexing 70
2.7 neural networks 74
2.8 genetic algorithms 80
2.9 fuzzy set retrieval 84
2.10 summary90
2.11 exercises91
3. retrieval utilities 93
3.1 relevance feedback 94
3.2 clustering 105
3.3 passage-based retrieval 113
3.4 n-grams 115
3.5 regression analysis 119
3.6 thesauri 122
2. retrieval strategies 9
2.1 vector space model 11
2.2 probabilistic retrieval strategies 21
2.3 language models 45
2.4 inference networks 57
2.5 extended boolean retrieval 67
2.6 latent semantic indexing 70
2.7 neural networks 74
2.8 genetic algorithms 80
2.9 fuzzy set retrieval 84
2.10 summary90
2.11 exercises91
3. retrieval utilities 93
3.1 relevance feedback 94
3.2 clustering 105
3.3 passage-based retrieval 113
3.4 n-grams 115
3.5 regression analysis 119
3.6 thesauri 122
前言回到顶部↑
When we wrote the first edition of this book in 1998, the Web was relatively new, and information retrieval was an old
field but it lacked popular appeal. Today the word Google has joined the popular lexicon, and Google indexes more than four
billion Web pages. In 1998, only a few schools taught graduate courses in information retrieval; today, the subject is
commonly offered at the undergraduate level. Our experience with teaching information retrieval at the undergraduate level,
as well as a detailed analysis of the topics covered and the effectiveness of the class, are given in [Goharian et al.,
2004]. .
The term Information Retrieval refers to a search that may cover any form of information: structured data, text, video,
image, sound, musical scores, DNA sequences, etc. The reality is that for many years, database systems existed to search
structured data, and information retrieval meant the search of documents. The authors come originally from the world of
structured search, but for much of the last ten years, we have worked in the area of document retrieval. To us, the world
should be data type agnostic. There is no need for a special delineation between structured and unstructured data. In 1998,
we included a chapter on data integration, and reviews suggested the only reason it was there was because it covered some of
our recent research. Today, such an allegation makes no sense, since information mediators have been developed which operate
with both structured and unstructured data. Furthermore, the eXtensible Markup Language (XML) has become prolific in both the
database and information retrieval domains.
We focus on the ad hoc information retrieval problem. Simply put, ad hoc information retrieval allows users to search for
documents that are relevant to user provided queries. It may appear that systems such as Google have solved this problem, but
effectiveness measures for Google have not been published. Typical systems still have an effectiveness (accuracy) of, at
best, forty percent [TREC, 2003]. This leaves ample room for improvement, with the prerequisite of a firm understanding of
existing approaches.
field but it lacked popular appeal. Today the word Google has joined the popular lexicon, and Google indexes more than four
billion Web pages. In 1998, only a few schools taught graduate courses in information retrieval; today, the subject is
commonly offered at the undergraduate level. Our experience with teaching information retrieval at the undergraduate level,
as well as a detailed analysis of the topics covered and the effectiveness of the class, are given in [Goharian et al.,
2004]. .
The term Information Retrieval refers to a search that may cover any form of information: structured data, text, video,
image, sound, musical scores, DNA sequences, etc. The reality is that for many years, database systems existed to search
structured data, and information retrieval meant the search of documents. The authors come originally from the world of
structured search, but for much of the last ten years, we have worked in the area of document retrieval. To us, the world
should be data type agnostic. There is no need for a special delineation between structured and unstructured data. In 1998,
we included a chapter on data integration, and reviews suggested the only reason it was there was because it covered some of
our recent research. Today, such an allegation makes no sense, since information mediators have been developed which operate
with both structured and unstructured data. Furthermore, the eXtensible Markup Language (XML) has become prolific in both the
database and information retrieval domains.
We focus on the ad hoc information retrieval problem. Simply put, ad hoc information retrieval allows users to search for
documents that are relevant to user provided queries. It may appear that systems such as Google have solved this problem, but
effectiveness measures for Google have not been published. Typical systems still have an effectiveness (accuracy) of, at
best, forty percent [TREC, 2003]. This leaves ample room for improvement, with the prerequisite of a firm understanding of
existing approaches.
序言回到顶部↑
The past five years, as Grossman and Frieder acknowledge in their preface,have been a period of considerable progress for the field of information retrieval (IR). To the general public, this is reflected in the maturing of commercial Web search engines. To the IR practitioner, research has led to an improved understanding of the scope and limitations of the Web search problem, new insights into the retrieval process through the development of the formal underpinnings and models for IR, and a variety of exciting new applications such as cross-language retrieval, peer-to-peer search, and music retrieval, which have expanded the horizons of the research landscape. In addition, there has been an increasing realization on the part of the database and IR communities that solving the information problems of the future will involve the integration of techniques for unstructured and structured data. The revised edition of this book addresses many of these important new developments, and is currently the only textbook that does so. .
Two particular examples that stood out for me are the descriptions of language models for IR and cross-language retrieval. Language models have become an important topic at the major IR conferences and many researchers are adapting this framework due to its power and simplicity, as well as the availability of tools for experimentation and application building. Grossman and Frieder provide an excellent overview of the topic in the retrieval strategies chapter, together with examples of different smoothing techniques. Crosslanguage retrieval, which involves the retrieval of text in different languages than the query source language, has been driven by government interest in Europe and the U.S. A number of approaches have been developed that can exploit available resources such as parallel and comparable corpora, and the effectiveness of these systems now approaches (or even surpasses in some cases) monolingual retrieval. The revised version of this book contains a chapter on cross-language retrieval that clearly describes the major approaches and gives examples of how the algorithms involved work with real data. The combination of up-to-date coverage, straightforward treatment, and the frequent use of examples makes this book an excellent choice for undergraduate or graduate IR courses. ...
W. Bruce Croft
August 2004
Two particular examples that stood out for me are the descriptions of language models for IR and cross-language retrieval. Language models have become an important topic at the major IR conferences and many researchers are adapting this framework due to its power and simplicity, as well as the availability of tools for experimentation and application building. Grossman and Frieder provide an excellent overview of the topic in the retrieval strategies chapter, together with examples of different smoothing techniques. Crosslanguage retrieval, which involves the retrieval of text in different languages than the query source language, has been driven by government interest in Europe and the U.S. A number of approaches have been developed that can exploit available resources such as parallel and comparable corpora, and the effectiveness of these systems now approaches (or even surpasses in some cases) monolingual retrieval. The revised version of this book contains a chapter on cross-language retrieval that clearly describes the major approaches and gives examples of how the algorithms involved work with real data. The combination of up-to-date coverage, straightforward treatment, and the frequent use of examples makes this book an excellent choice for undergraduate or graduate IR courses. ...
W. Bruce Croft
August 2004
媒体评论回到顶部↑
本书涉及最新的研究成果,语言经得起推敲,还精心准备了大量的实例说明,适合作为研究生和本科生信息检索课程的首选教材。.
——美国马萨诸塞大学阿默斯特校区计算机科学系杰出教授W.Bruce Croft
推荐把本书作为计算机科学专业学生的首选教材,同时也适用于SEO专业人员和Web开发者阅读,将搜索技术、算法和探索法运用于他们的项目中。
——信息技术与服务顾问E. Garcia博士...
——美国马萨诸塞大学阿默斯特校区计算机科学系杰出教授W.Bruce Croft
推荐把本书作为计算机科学专业学生的首选教材,同时也适用于SEO专业人员和Web开发者阅读,将搜索技术、算法和探索法运用于他们的项目中。
——信息技术与服务顾问E. Garcia博士...

点击看大图








加载中...
