This is the second revision of our second-level statistics text, originally published in 1978 and first revised in 1987. As before, this text is intended primarily for advanced undergraduates, graduate students, and working professionals in the health, social, biological, and behavioral sciences who engage in applied research in their fields. The book may also provide professional statisticians with some new insights into the application of advanced statistical techniques to real-life problems.
We have attempted in this revision to retain the basic structure and flayor of the earlier two editions, while at the same time making changes to keep pace with current analytic practices and computer usage in applied research. Notable changes in the third edition, discussed in more
detail later, include a fourth author (Azhar Nizam), some reorganization of topics (in Chapters
22-24), expanded coverage of some conteot areas (such as logistic regression, in Chapter 23), a new chapter (Chapter 21 on repeated measures ANOVA), some new exercises for the reader, and
the integration of computer output (using the SAS package, primarily) into our discussion of
examples in the main body of the text and as a component of exercises given at the end of each
chapter. We have deleted from the previous editions chapters on discriminant analysis, factor
analysis, and categorical data analysis. This decision was based on our finding from a survey of
previons users of our text that these chapters were rarely used for classroom instruction and
were largely out of date. At the same time, the chapters we have added to replace this material
seem more relevant to current applied research practice.
In this revision, as in our previous versions, we emphasize the intuitive logic and assump-
tions that underlie the techniques covered, the purposes for which these techniques are de-
signed, the advantages and disadvantages of the techniques, and valid interpretations based on
the techniques. Although we describe the statistical calculations required for the techniques we
cover, we rely on computer output (even more so in this revision than previously) to provide the
results of such calculations, so the reader can concenuate oo how to apply a given technique
rather than on how to carry out the calculations. The mathematical formulas that we do present
require no more than simple algebraic manipulations. Proofs are of secondary importance and
are generally omitted. Neither calculus nor matrix algebra is used anywhere in the main text,
. although we have included an appendix on matrices for the interested reader.
The text is not intended to be a general reference work dealing with all the statistical tech-niques available for analyzing data involving several variables. Instead, we focus on the tech-niques we consider most essential for use in applied research. Alter becoming proficient with the material in this text, the reader should be able to benefit from more specialized discussions of applied topics not covered here.
The most notable features of this second revised edition are the following:
1. Regression analysis and analysis of variance are discussed in considerable detail and
with pedagogical care that reflects the authors' extensive experience and insight as
teachers of such material.
2. The relationship between regression analysis and analysis of variance is highlighted.
3. The connection between multiple regression analysis and multiple and partial corre-
lation analysis is discussed in detail.
4. Several advanced topics are presented in a unique, nonmathematical manner, in-
cluding the analysis of repeated measures data (a new topic in this edition), maxi-
mum likelihood methods, logistic regression (expanded into a new chapter), and
Poisson regression (also expanded into a new chapter).
5. An up-to-date discussion of the issues and procedures involved in fine-tuning a
regression analysis is presented in chapters on confounding and interaction in re-
gression, regression diagnostics, and selecting the best model.
6. Numerous examples and exercises illustrate applications to real studies in a wide
variety of disciplines. New exercises have been added to all chapters.
7. Representative computer results from packaged programs (primarily using the SAS
package) are used to illustrate concepts in the body of the text, as well as to provide
a basis for exercises for the reader. We have greatly expanded the quantity of com-
puter results provided throughout the text. Whenever appropriate, we have used
computer output to replace material in the previous edition that unnecessarily
emphasized numerical calculations.
8. The complete set of data for most exercises is provided, along with related computer
results. This allows the instructor to assign computer work based on available pack-
aged programs. However, if the instructional objectives involve a minimum of com-
puter work, the instructor can use the computer results to give the student practical
experience in interpreting computer output based on the techniques described in the
text.
9. 5"he reorganization and expansion of the material on maximum likelihood methods
into three chapters (22-24) provide a strong foundation for understanding the most
widely used method for fitting mathematical models involving several variables.
10. A new chapter on methods for the analysis of repeated measures data (Chapter 21
extends the discussion of ANOVA methods to a rapidly developing area of statisti-
cal methodology for the analysis of correlated data.
For formal classroom instruction, the chapters fall naturally into three clusters: Chapters 4 through 16, on regression analysis; Chapters 17 through 20, on analysis of variance, with optional use of Chapter 21 to introduce the analysis of repeated measures data; and Chapters 22 through 24, on maximum likelihood methods and important applications involving logistic and
Poisson regression modeling. For a first course in regression analysis, some of Chapters 11
through 16 may be considered too specialized. For example, Chapter 12 on regression diagnos-
tics and Chapter 16 on selecting the best model might be used in a continuation course on
regression modeling, which might also include some of the advanced topics covered in Chap-
ters 21 through 24.
The Teaching Package
A data disk is bound into each copy of the book. This disk contains data for the problems;
the data sets are formatted for SAS, StataQuest, Minitab, and in ASCII. A Student Solutions
Manual contains complete solutions for all of the problems for which answers are given in
Appendix D, and a Solutions Manual, available to adopting instructors, contains complete solu-
tions to all problems in the book.
Acknowledgments
We wish to acknowledge several people who contributed to the preparation of this text.
Drs. Kleinbaum and Kupper continue to be indebted to John Cassel and Bernard Greenberg, two
mentors who have provided us with inspiration and the professional and administrative guid-
ance that enabled us to gain the broad experience necessary to write this book. Dr. Muller adds
his thanks to Bernard Greenberg. Dr. Kleinbaum also wishes to thank John Boring, Chair of the
Epidemiology Department at Emory University for his strong support and encouragement and
for his deep commitment to teaching excellence. Dr. Kupper wishes to thank Barry Margolin,
Chair of the Biostatistics Department at the University of North Carolina for his leadership and
support. Azhar Nizam wishes to thank the chair of his department, Dr. Vicki Hertzberg, Depart-
ment of Biostatistics at Emory University.
We also wish to thank Edna Kleinbaum, Sandy Martin, Sally Muller, and Janet Nizam for
their encouragement and support during the writing of this revision. We thank our many stu-
dents and colleagues at Emory University and at the University of North Carolina for their help-
ful comments and suggestions. We also want to thank the reviewers: Robert J. Anderson,
University of Illinois at Chicago; Alfred A. Bartolucci, The University of Alabama at Birming-
ham; Robert Cochran, University of Wyoming; Joseph L. Fleiss, Columbia University Medical
Center; James E. Holstein, University of Missouri at Columbia; Robin H. Lock, St. Lawrence
University; Frank P. Mathur, Cal Poly at Pomona; and Satya N. Mishra, University of South
Alabama. Finally, we thank those persons responsible for publishing this book: Alex Kugushev,
Jamie Sue Brooks, and Dusty Davidson.