内容简介
书籍 计算机书籍
本书综合考虑了有监督和无监督模式识别的经典的以及当前的理论和实践,为专业技术人员和高校学生建立起了完整的基本知识体系。本书由模式识别领域内的两位顶级专家合著,从工程角度全面阐述了模式识别的应用,内容包括贝叶斯分类、贝叶斯网络、线性和非线性分类器 (包含神经网络和支持向量机) 、动态编程和用于顺序数据的隐马尔科夫模型、特征生成 (包含小波、主成分分析、独立成分分析和分形分析) 、特征选择技术、来自学习理论的基本概念、聚类概念和算法等。
本书是享誉世界的名著,内容既全面又相对独立,既有基础知识的介绍,又有本领域研究现状的介绍,还有对未来发展的展望,是本领域最全面的参考书,被世界众多高校选用为教材。本书可作为高等院校计算机、电子、通信、自动化等专业研究生和高年级本科生的教材,也可作为计算机信息处理、自动控制等相关领域的工程技术人员的参考用书。
● 提供了最新的关于支持向量机的研究成果 (包括V-SVM及其几何解释)。
● 讨论了多分类器组合方法 (包括Boosting方法)。
● 增加了最新的资料。介绍了一些聚类算法,这些算法根据Web挖掘和生物信息等应用的要求而修改,以适合大数据集和高维数据。
● 涵盖了不同的应用,例如图像分析、光学字符识别、信道均衡、语音识别和音频分类等。
● 面向服务的软件工程,解释了如何将可复用的Web服务用于开发新的应用。
● 面向方面的软件开发,描述了基于关注点分离的新技术。
目录
Preface .
CHAPTER 1 INTRODUCTION
1.1 Is Pattern Recognition Important?
1.2 Features, Feature Vectors, and Classifiers
1.3 Supervised Versus Unsupervised Pattern Recognition
1.4 Outline of the Book
CHAPTER 2 CLASSIFIERS BASED ON BAYES DECISION THEORY
2.1 Introduction
2.2 Bayes Decision Theory
2.3 Discriminant Functions and Decision Surfaces
2.4 Bayesian Classification for Normal Distributions
2.5 Estimation of Unknown Probability Density Functions
2.5.1 Maximum Likelihood Parameter Estimation
2.5.2 Maximum a Posteriori Probability Estimation
2.5.3 Bayesian Inference
2.5.4 Maximum Entropy Estimation
2.5.5 Mixture Models
2.5.6 Nonparametric Estimation
2.5.7 The Naive-Bayes Classifier
2.6 The Nearest Neighbor Rule
2.7 Bayesian Networks
CHAPTER 3 LINEAR CLASSIFIERS
3.1 Introduction
3.2 Linear Discriminant Functions and Decision Hyperplanes
3.3 The Perceptron Algorithm
3.4 Least Squares Methods
3.4.1 Mean Square Error Estimation
3.4.2 Stochastic Approximation and the LMS Algorithm
3.4.3 Sum of Error Squares Estimation
3.5 Mean Square Estimation Revisited
3.5.1 Mean Square Error Regression
3.5.2 MSE Estimates Posterior Class Probabilities
3.5.3 The Bias-Variance Dilemma
3.6 Logistic Discrimination
3.7 Support Vector Machines
3.7.1 Separable Classes
3.7.2 Nonseparable Classes
3.7.3 v-SVM
3.7.4 Support Vector Machines: A Geometric Viewpoint
3.7.5 Reduced Convex Hulls
CHAPTER 4 NONLINEAR CLASSIFIERS
4.1 Introduction
4.2 The XOR Problem
4.3 The Two-Layer Perceptron
4.3.1 Classification Capabilities of the Two-Layer Perceptron
4.4 Three-Layer Perceptrons
4.5 Algorithms Based on Exact Classification of the Training Set
4.6 The Backpropagation Algorithm
4.7 Variations on the Backpropagation Theme
4.8 The Cost Function Choice
4.9 Choice of the Network Size
4.10 A Simulation Example
4.11 Networks With Weight Sharing
4.12 Generalized Linear Classifiers
4.13 Capacity of the l-Dimensional Space in Linear Dichotomies
4.14 Polynomial Classifiers
4.15 Radial Basis Function Networks
4.16 Universal Approximators
4.17 Support Vector Machines: The Nonlinear Case
4.18 Decision Trees
4.18.1 Set of Questions
4.18.2 Splitting Criterion
4.18.3 Stop-Splitting Rule
4.18.4 Class Assignment Rule
4.19 Combining Classifiers
4.19.1 Geometric Average Rule
4.19.2 Arithmetic Average Rule
4.19.3 Majority Voting Rule
4.19.4 A Bayesian Viewpoint
4.20 The Boosting Approach to Combine Classifiers
4.21 Discussion
CHAPTER 5 FEATURE SELECTION
5.1 Introduction
5.2 Preprocessing
5.2.1 Outlier Removal
5.2.2 Data Normalization
5.2.3 Missing Data
5.3 Feature Selection Based on Statistical Hypothesis Testing
5.3.1 Hypothesis Testing Basics
5.3.2 Application of the t-Test in Feature Selection
5.4 The Receiver Operating Characteristics (ROC) Curve
5.5 Class Separability Measures
5.5.1 Divergence
5.5.2 Chernoff Bound and Bhattacharyya Distance
5.5.3 Scatter Matrices
5.6 Feature Subset Selection
5.6.1 Scalar Feature Selection
5.6.2 Feature Vector Selection
5.7 Optimal Feature Generation
5.8 Neural Networks and Feature Generation/Selection
5.9 A Hint on Generalization Theory
5.10 The Bayesian Information Criterion
CHAPTER 6 FEATURE GENERATION I: LINEAR TRANSFORMS
6.1 Introduction
6.2 Basis Vectors and Images
6.3 The Karhunen-Loeve Transform
6.4 The Singular Value Decomposition
6.5 Independent Component Analysis
6.5.1 ICA Based on Second- and Fourth-Order Cumulants
6.5.2 ICA Based on Mutual Information
6.5.3 An ICA Simulation Example
6.6 The Discrete Fourier Transform (DFT)
6.6.1 One-Dimensional DFT
6.6.2 Two-Dimensional DEr
6.7 The Discrete Cosine and Sine Transforms
6.8 The Hadamard Transform
6.9 The Haar Transform
6.10 The Haar Expansion Revisited
6.11 Discrete Time Wavelet Transform (DTWT)
6.12 The Multiresolution Interpretation
6.13 Wavelet Packets
6.14 A Look at Two-Dimensional Generalizations
6.15 Applications
CHAPTER 7 FEATURE GENERATION II
7.1 Introduction
7.2 Regional Features
7.2.1 Features for Texture Characterization
7.2.2 Local Linear Transforms for Texture Feature Extraction
7.2.3 Moments
7.2.4 Parametric Models
7.3 Features for Shape and Size Characterization
7.3.1 Fourier Features
7.3.2 Chain Codes
7.3.3 Moment-Based Features
7.3.4 Geometric Features
7.4 A Glimpse at Fractals
7.4.1 Self-Similarity and Fractal Dimension
7.4.2 Fractional Brownian Motion
7.5 Typical Features for Speech and Audio Classification
7.5.1 Short Time Processing of Signals ..
7.5.2 Cepstrum
7.5.3 The Mel-Cepstrum
7.5.4 Special Features
7.5.5 Time Domain Features
7.5.6 An Example
CHAPTER 8 TEMPLATE MATCHING
8.1 Introduction
8.2 Measures Based on Optimal Path Searching Techniques
8.2.1 Bellman's Optimality Principle and Dynamic Programming
8.2.2 The Edit Distance
8.2.3 Dynamic Time Warping in Speech Recognition
8.3 Measures Based on Correlations
8.4 Deformable Template Models
CHAPTER 9 CONTEXT-DEPENDENT CLASSIFICATION
9.1 Introduction
9.2 The Bayes Classifier
9.3 Markov Chain Models
9.4 The Viterbi Algorithm
9.5 Channel Equalization
9.6 Hidden Markov Models
9.7 HMM with State Duration Modeling
9.8 Training Markov Models via Neural Networks
9.9 A Discussion of Markov Random Fields
CHAPTER 10 SYSTEM EVALUATION
10.1 Introduction
10.2 Error Counting Approach
10.3 Exploiting the Finite Size of the Data Set
10.4 A Case Study From Medical Imaging
CHAPTER 11 CLUSTERING: BASIC CONCEPTS
11.1 Introduction
11.1.1 Applications of Cluster Analysis
11.1.2 Types of Features
11.1.3 Definitions of Clustering
11.2 Proximity Measures
11.2.1 Definitions
11.2.2 Proximity Measures between Two Points
11.2.3 Proximity Functions between a Point and a Set
11.2.4 Proximity Functions between Two Sets
CHAPTER 12 CLUSTERING ALGORITHMS I: SEQUENTIAL ALGORITHMS
12.1 Introduction
12.1.1 Number of Possible Clusterings
12.2 Categories of Clustering Algorithms
12.3 Sequential Clustering Algorithms
12.3.1 Estimation of the Number of Clusters
12.4 A Modification of BSAS
12.5 A Two-Threshold Sequential Scheme
12.6 Refinement Stages
12.7 Neural Network Implementation
12.7.1 Description of the Architecture
12.7.2 Implementation of the BSAS Algorithm
CHAPTER 13 CLUSTERING ALGORITHMS II: HIERARCHICAL ALGORITHMS
13.1 Introduction
13.2 Agglomerative Algorithms
13.2.1 Definition of Some Useful Quantities
13.2.2 Agglomerative Algorithms Based on Matrix Theory
13.2.3 Monotonicity and Crossover
13.2.4 Implementational Issues
13.2.5 Agglomerative Algorithms Based on Graph Theory
13.2.6 Ties in the Proximity Matrix
13.3 The Cophenetic Matrix
13.4 Divisive Algorithms
13.5 Hierarchical Algorithms for Large Data Sets
13.6 Choice of the Best Number of Clusters
CHAPTER 14 CLUSTERING ALGORITHMS III: SCHEMES BASED ON FUNCTION OPTIMIZATION
14.1 Introduction
14.2 Mixture Decomposition Schemes
14.2.1 Compact and Hyperellipsoidal Clusters
14.2.2 A Geometrical Interpretation
14.3 Fuzzy Clustering Algorithms
14.3.1 Point Representatives
14.3.2 Quadric Surfaces as Representatives
14.3.3 Hyperplane Representatives
14.3.4 Combining Quadric and Hyperplane Representatives
14.3.5 A Geometrical Interpretation
14.3.6 Convergence Aspects of the Fuzzy Clustering Algorithms
14.3.7 Alternating Cluster Estimation
14.4 Possibilistic Clustering
14.4.1 The Mode-Seeking Property
14.4.2 An Alternative Possibilistic Scheme
14.5 Hard Clustering Algorithms
14.5.1 The Isodata or k-Means or c-Means Algorithm
14.5.2 k-Medoids Algorithms
14.6 Vector Quantization
CHAPTER 15 CLUSTERING ALGORITHMS IV
15.1 Introduction
15.2 Clustering Algorithms Based on Graph Theory
15.2.1 Minimum Spanning Tree Algorithms
15.2.2 Algorithms Based on Regions of Influence
15.2.3 Algorithms Based on Directed Trees
15.3 Competitive Learning Algorithms
15.3.1 Basic Competitive Learning Algorithm
15.3.2 Leaky Learning Algorithm
15.3.3 Conscientious Competitive Learning Algorithms
15.3.4 Competitive Learning-Like Algorithms Associated with Cost Functions
15.3.5 Self-Organizing Maps
15.3.6 Supervised Learning Vector Quantization
15.4 Binary Morphology Clustering Algorithms (BMCAs)
15.4.1 Discretization
15.4.2 Morphological Operations
15.4.3 Determination of the Clusters in a Discrete Binary Set
15.4.4 Assignment of Feature Vectors to Clusters
15.4.5 The Algorithmic Scheme
15.5 Boundary Detection Algorithms
15.6 Valley-Seeking Clustering Algorithms
15.7 Clustering Via Cost Optimization (Revisited)
15.7.1 Branch and Bound Clustering Algorithms
15.7.2 Simulated Annealing
15.7.3 Deterministic Annealing
15.7.4 Clustering Using Genetic Algorithms
15.8 Kernel Clustering Methods
15.9 Density-Based Algorithms for Large Data Sets
15.9.1 The DBSCAN Algorithm
15.9.2 The DBCLASD Algorithm
15.9.3 The DENCLUE Algorithm
15.10 Clustering Algorithms for High-Dimensional Data Sets
15.10.1 Dimensionality Reduction Clustering Approach
15.10.2 Subspace Clustering Approach
15.11 Other Clustering Algorithms
CHAPTER 16 CLUSTER VALIDITY
16.1 Introduction
16.2 Hypothesis Testing Revisited
16.3 Hypothesis Testing in Cluster Validity
16.3.1 External Criteria
16.3.2 Internal Criteria
16.4 Relative Criteria
16.4.1 Hard Clustering
16.4.2 Fuzzy Clustering
16.5 Validity of Individual Clusters
16.5.1 External Criteria
16.5.2 Internal Criteria
16.6 Clustering Tendency
16.6.1 Tests for Spatial Randomness
Appendix A
Hints from Probability and Statistics
Appendix B
Linear Algebra Basics
Appendix C
Cost Function Optimization
Appendix D
Basic Definitions from Linear Systems Theory
Index ...
前言
This book is the outgrowth of our teaching advanced undergraduate and graduate courses over the past 20 years. These courses have been taught to different audiences, including students in electrical and electronics engineering, computer engineering, computer science and informatics, as well as to an interdisciplinary audience of a graduate course on automation. This experience led us to make the book as self-contained as possible and to address students with different backgrounds. As prerequisitive knowledge the reader requires only basic calculus, elementary linear algebra, and some probability theory basics. A number of mathematical tools, such as probability and statistics as well as constrained optimization, needed by various chapters, are treated in four Appendices. The book is designed to serve as a text for advanced undergraduate and graduate students, and it can be used for either a one-or a two-semester course. Furthermore, it is intended to be used as a self-study and reference book for research and for the practicing scientist/engineer. This latter audience was also our second incentive for writing this book, due to the involvement of our group in a number of projects related to pattern recognition. .
The philosophy of the book is to present various pattern recognition tasks in a unified way, including image analysis, speech processing, and communication applications. Despite their differences, these areas do share common features and their study can only benefit from a unified approach. Each chapter of the book starts with the basics and moves progressively to more advanced topics and reviews up-to-date techniques. A number of problems and computer exercises are given at the end of each chapter and a solutions manual is available from the publisher. Furthermore, a number of demonstrations based on MATLAB are available via the web at the book's site, http://www.di.uoa.gr/~stpatrec.
Our intention is to update the site regularly with more and/or improved versions of these demonstrations. Suggestions are always welcome. Also at this web site, a page will be available for typos, which are unavoidable, despite frequent careful reading. The authors would appreciate readers notifying them about any typos found. ..
This book would have not be written without the constant support and help from a number of colleagues and students throughout the years. We are especially indebted to Prof. K. Berberidis, Dr. E. Kofidis, Prof. A. Liavas, Dr. A. Rontogiannis, Dr. A. Pikrakis, Dr. Gezerlis and Dr. K. Georgoulakis. The constant support provided by Dr. I. Kopsinis from the early stages up to the final stage, with those long nights, has been invaluable. The book improved a great deal after the careful reading and the serious comments and suggestions of Prof. G Moustakides, Prof. V. Digalakis, Prof. T. Adali, Prof. M. Zervakis, Prof. D. Cavouras, Prof. A. BOhm, Prof. G. Glentis, Prof. E. Koutsoupias, Prof. V. Zissimopoulos, Prof. A. Likas, Dr. S. Perantonis, Dr. A. Vassiliou, Dr. N. Vassilas, Dr. V. Drakopoulos, Dr. S. Hatzispyros, Prof. T. Stamatopoulos, Dr. G. Paliouras, Dr. V. Karkaletsis, Dr. Mavroforakis, Dr. S. Dralis. We are greatly indebted to these colleagues for their time and their constructive criticisms. Our collaboration and friendship with Prof. N. Kalouptsidis have been a source of constant inspiration for all these years. We are both deeply indebted to him.
Last but not least, K. Koutroumbas would like to thank Sophia, Dimitris-Marius, and Valentini-Theodora for their tolerance and support and S. Theodoridis would like to thank Despina, Eva, and Eleni, his joyful and supportive "harem." ...