Research Areas of Interest

Statistical Learning and Datamining

I have a long standing interest in flexible and nonparametric techniques for function estimation and prediction. My Ph.D thesis was on "Principal Curves and Surfaces" (advisor Werner Stuetzle), a nonparametric version of principal components that fits smooth curves and low-dimensional manifolds through the "middle" of a multi-dimensional set of points.

"Generalized Additive Models" (with Rob Tibshirani) offer a more flexible approach to popular methods like multiple linear regression, logistic and log-linear regression, and the Cox model. Linear functions can be replaced by more flexible smooth functions. One can mix and match linear terms with smooth terms, which allows a natural blend with classical linear models.

In the same vein, I have worked in non-parametric versions of linear discriminant analysis, mixture classification problems, and other classification schemes.

Other exotica include modeling human signatures, handwritten digits, three-dimensional protein structures, human gait analysis, and I am still looking...

In the last 10 years my colleagues and I have been drawn into the machine learning domain, probably after the lure of neural networks. This has led us to offer a statistical perspective on novel and popular techniques arising outside of statistics, such as boosting and support-vector machines. This culminated in our 2001 book "Elements of Statistical Learning", but the interest continues.

Statistical Computing

During my nine years at AT&T Bell Laboratories, I was drawn into the field of statistical computing by my mentors John Chambers, Daryl Pregibon, Rick Becker, and Alan Wilks. What this really means is that I learned to design and write decent (at least acceptable) software, and lots of it, leading up to "Statistical Computing in S" (co-edited with John Chambers). We developed the statistical modeling framework for linear, generalized linear and additive models that is currently used in Splus and R. I still love writing good code, and most recently am proud of my contributions to the MDA software in R/Splus, LARS and the PAM genomic software.


There is a great deal of genomics going on at Stanford, and being half time in the medical school, I am naturally involved. Rob Tibshirani and I collaborate with a number of researchers on microarray related projects, and have developed a number of procedures for analyzing and modeling expression arrays.