### 基本信息

- 原书名：Neural Networks and Learning Machines (3rd Edition)
- 原出版社： Prentice Hall

### 编辑推荐

本书结合近年来神经网络和机器学习的*新进展，从理论和实际应用出发，全面、系统地介绍了神经网络的基本模型、方法和技术，并将神经网络和机器学习有机地结合在一起。

本书是一本全英文版本。

### 内容简介

计算机书籍

神经网络是计算智能和机器学习的重要分支，在诸多领域都取得了很大的成功。在众多神经网络著作中，影响最为广泛的是Simon Haykin的《神经网络原理》(第4版更名为《神经网络与机器学习》)。在本书中，作者结合近年来神经网络和机器学习的最新进展，从理论和实际应用出发，全面。系统地介绍了神经网络的基本模型、方法和技术，并将神经网络和机器学习有机地结合在一起。.

本书不但注重对数学分析方法和理论的探讨，而且也非常关注神经网络在模式识别、信号处理以及控制系统等实际工程问题中的应用。本书的可读性非常强，作者举重若轻地对神经网络的基本模型和主要学习理论进行了深入探讨和分析，通过大量的试验报告、例题和习题来帮助读者更好地学习神经网络。

本版在前一版的基础上进行了广泛修订，提供了神经网络和机器学习这两个越来越重要的学科的最新分析。

本书特色

基于随机梯度下降的在线学习算法；小规模和大规模学习问题。..

核方法，包括支持向量机和表达定理。

信息论学习模型，包括连接、独立分量分析(ICA)，一致独立分量分析和信息瓶颈。

随机动态规划，包括逼近和神经动态规划。

逐次状态估计算法，包括Kalman和粒子滤波器。

利用逐次状态估计算法训练递归神经网络。

### 作译者

### 目录

Acknowledgements

Abbreviations and Symbols

GLOSSARY

Introduction

1. What is a Neural Network?

2. The Human Brain

3. Models of a Neuron

4. Neural Networks Viewed As Directed Graphs

5. Feedback

6. Network Architectures

7. Knowledge Representation

8. Learning Processes

9. Learning Tasks

10. Concluding Remarks

Notes and References

Chapter 1 Rosenblatt's Pereeptron

1.1 Introduction

1.2. Perceptron

1.3. The Perceptron Convergence Theorem

### 前言

Write an up-to-date treatment of neural networks in a comprehensive, thorough, and readable manner.

The new edition has been retitled Neural Networks and Learning Machines, in order to reflect two realities:

1. The perceptron, the multilayer perceptron, self-organizing maps, and neurodynamics, to name a few topics, have always been considered integral parts of neural networks, rooted in ideas inspired by the human brain.

2. Kernel methods, exemplified by support-vector machines and kernel principalcomponents analysis, are rooted in statistical learning theory.

Although, indeed, they share many fundamental concepts and applications, there are some subtle differences between the operations of neural networks and learning machines. The underlying subject matter is therefore much richer when they are studied together, under one umbrella, particularly so when

ideas drawn from neural networks and machine learning are hybridized to perform improved learning tasks beyond the capability of either one operating on its own, and

ideas inspired by the human brain lead to new perspectives wherever they are of particular importance.

Moreover, the scope of the book has been broadened to provide detailed treatments of dynamic programming and sequential state estimation, both of which have affected the study of reinforcement learning and supervised learning, respectively, in significant ways.

Organization of the Book

The book begins with an introductory chapter that is motivational, paving the way for the rest of the book which is organized into six parts as follows:

1. Chapters 1 through 4, constituting the first part of the book, follow the classical approach on supervised learning. Specifically

Chapter 1 describes Rosenblatt's perceptron, highlighting the perceptron convergence theorem, and the relationship between the perceptron and the Bayesian classifier operating in a Gaussian environment.

Chapter 2 describes the method of least squares as a basis for model building. The relationship between this method and Bayesian inference for the special case of a Gaussian environment is established. This chapter also includes a discussion of the minimum description length (MDL) principle for model selection.

Chapter 3 is devoted to the least-mean-square (LMS) algorithm and its convergence analysis. The theoretical framework of the analysis exploits two principles: Kushner's direct method and the Langevin equation (well known in nonequilibrium thermodynamics).

These three chapters, though different in conceptual terms, share a common feature: They are all based on a single computational unit. Most importantly, they provide a great deal of insight into the learning process in their own individual ways--a feature that is exploited in subsequent chapters.

Chapter 4, on the multilayer perceptron, is a generalization of Rosenblatt's perceptron. This rather long chapter covers the following topics:

the back-propagation algorithm, its virtues and limitations, and its role as an optimum method for computing partial derivations;

optimal annealing and adaptive control of the learning rate;

cross-validation;

### 媒体评论

The form of supervised learning we have just described is the basis of error-correction learning. From Fig. 24, we see that the supervised-learning process con-stitutes a closed-loop feedback system, but the unknown environment is outside theloop. As a performance measure for the system, we may think in terms of the mean-square error, or the sum of squared errors over the training sample, defined as a func-tion of the free parameters （i.e., synaptic weights） of the system. This function maybe visualized as a multidimensional error-performance surface, or simply error surface,with the free paiameters as coordinates.The true error surface is averaged over allpossible input-output examples. Any given operation of the system under theteachers supervision is represented as a point on the error surface. For the system toimprove performance over time and therefore learn from the teacher, the operatingpoint has to move down successively toward a minimum point of the error surface;the minimum point may be a local minimum or a global minimum. A supervisedlearning system is able to do this with the useful information it has about the gradient of the error surface corresponding to the current behavior of the system.

### 书摘

The form of supervised learning we have just described is the basis of error-correction learning. From Fig. 24, we see that the supervised-learning process con-stitutes a closed-loop feedback system, but the unknown environment is outside theloop. As a performance measure for the system, we may think in terms of the mean-square error, or the sum of squared errors over the training sample, defined as a func-tion of the free parameters （i.e., synaptic weights） of the system. This function maybe visualized as a multidimensional error-performance surface, or simply error surface,with the free paiameters as coordinates.The true error surface is averaged over allpossible input-output examples. Any given operation of the system under theteachers supervision is represented as a point on the error surface. For the system toimprove performance over time and therefore learn from the teacher, the operatingpoint has to move down successively toward a minimum point of the error surface;the minimum point may be a local minimum or a global minimum. A supervisedlearning system is able to do this with the useful information it has about the gradient of the error surface corresponding to the current behavior of the system.