My Publications


An Introduction to Statistical Learning, with Applications in Python

by Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani and Jonathan Taylor (July 2023). This book (ISLP) differs from the R book (ISLR2) in that the labs at the end of each chapter are implemented in Python.

Book Homepage and Resources

Book pdf

An Introduction to Statistical Learning with Applications in R (second edition)

by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (August 2021) 3 new chapters (+179 pages), including Deep Learning

Book Homepage and Resources

Statistcal Learning with Sparsity: the Lasso and Generalizations

by Trevor Hastie, Robert Tibshirani and Martin Wzainwright (May 2015)

Book Homepage

(10.5Mb, corrected online)

An Introduction to Statistical Learning with Applications in R

by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (June 2013)

Book Homepage

pdf (9.4Mb, 6th corrected printing)

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2001)

Book Homepage
Statistical Models in S

edited by John Chambers and Trevor Hastie (1991)

Generalized Additive Models

by Trevor Hastie and Robert Tibshirani (1990)


The research reported here was partially supported by grants from the National Science Foundation and the National Institutes of Health.

For medical papers see also


  • Anav Sood and Trevor Hastie A Statistical View of Column Subset Selection. We consider the problem of selecting a small subset of representative variables from a large dataset. In the computer science literature, this dimensionality reduction problem is typically formalized as Column Subset Selection (CSS). Meanwhile, the typical statistical formalization is to find an information-maximizing set of Principal Variables. This paper shows that these two approaches are equivalent, and moreover, both can be viewed as maximum likelihood estimation within a certain semi-parametric model.



  • Added a new blog entry on Altered Priors. You build a classifier on some training data, but you would like to deploy it in a population where the class distribution (prior) is different. This comes up in case-control sampling, but also in other situations such as transfer learning.

  • Elena Tuzhilina and Trevor Hastie. Weighted Low Rank Matrix Approximation and Acceleration. We develop algorithms for computing an element-weighted low-rank matrix approximation (SVD) via projected gradient descent. We consider two acceleration schemes, Nesterov and Anderson, and discuss their implementation. We show how to scale these algorithms to high-dimensional problems.

  • Zijun Gao and Trevor Hastie. LinCDE: Conditional density estimation via Lindsey's method. Lindsey's method allows for smooth density estimation by turning the density problem into a Poisson GLM. In particular, we represent an exponential tilt function in a basis of natural splines, and use discretization to deal with the normalization. In this paper we extend the method to conditional density estimation via trees and then gradient boosting with trees. JMLR 2022 R package installable from GITHUB: install_github("ZijunGao/LinCDE"); see LinCDE vignette for examples.

  • Yosuke Tanigawa, Junyang Qian, Guhan Venkataraman, Johanne Justesen, Ruilin Li, Robert Tibshirani, Trevor Hastie, Manuel Rivas. Significant Sparse Polygenic Risk Scores across 428 traits in UK Biobank. In this survey across the more than 1,600 traits in the UK Biobank, we report 428 strongly significant (p<2.5e-5) sparse polygenic risk models, computed by the snpnet lasso model developed by this team.

  • Swarnadip Ghosh, Trevor Hastie and Art Owen. Scalable logistic regression with crossed random effects. We develop an approach for fitting crossed random-effect logisitic regression models at massive scales, with applications in ecommerce. We adapt a procedure of Schall (1991) and backfitting algorithms to achieve O(n) algorithms. EJS 2022

  • Zijun Gao and Trevor Hastie. DINA: Estimating Heterogenous Treatment Effects in Exponential Family and Cox Models. We extend the R-learner framework to exponential families and the Cox model. Here we define the treatment effect to be the difference in natural parameter or DINA.

  • Stephen Bates, Trevor Hastie and Rob Tibshirani. Cross-validation: what does it estimate and how well does it do it? Although CV is ubiquitous in data science, some of its properties are poorly understood. In this paper we argue that CV is better at estimating expected prediction error rather than the prediction error for the particular model fit to the training set. We also provide a method for computing the standard error of the CV estimate, which is bigger than the commonly used naive estimate which ignores the correlations in the folds. (to appear, JASA)

  • Elena Tuzhilina, Leonardo Tozzi and Trevor Hastie. Canonical Correlation Analysis in high dimensions with structured regularization.We develop structurally regularized versions of CCA for very high-dimensional MRI images from neuroscience experiments. To appear, Statistical Modelling, 2021.

  • J. Kenneth Tay, Balasubramanian Narasimhan and Trevor Hastie. Elastic Net Regularization Paths for All Generalized Linear Models. This paper describes some of the substantial enhancements to the glmnet R package ver 4.1+. All programmed GLM families are accommodated through a family() argument. We also discuss relaxed fits, and facilities for modeling stop/start data and strata in survival models. To appear, Journal of Statistical Software.



  • Trevor Hastie, Andrea Montanari, Saharon Rosset and Ryan Tibshirani. Surprises in High-Dimensional Ridgeless Least Squares Interpolation. Interpolating fitting algorithms have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum L2-norm ("ridgeless") interpolation in high-dimensional least squares regression. We consider both a linear model and a version of a neural network. We recover several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization. Annals of Statistics, 2022 50(2) pp 949-986.

  • Didier Nibbering and Trevor Hastie Multiclass-penalized logistic regression We develop a model for clustering classes in multi-class logistic regression. Computational Statistics and Data Analysis, 2022

  • Zijun Gao, Trevor Hastie and Rob Tibshirani. Assessment of heterogeneous treatment effect estimation accuracy via matching We address the difficult problem of assessing the performance of an HTE estimator. Our approach has several novelties: a flexible matching metric based on random-forest proximity scores, an optimized matching algorithm, and a match then split cross-validation scheme. Statistics in Medicine, April 2021


  • Junyang Qian, Yosuke Tanigawa, Wenfei Du, Matthew Aguirre, Chris Chang, Robert Tibshirani, Manuel A. Rivas, Trevor Hastie. A Fast and Scalable Framework for Large-scale and Ultrahigh-dimensional Sparse Regression with Application to the UK Biobank. PLOS Genetics October 2020. We develop a scalable lasso algorithm for fitting polygenic risk scores at GWAS scale. There is also a BiorXiv version. Our R package snpnet combines efficient batch-wise strong-rule screening with glmnet to fit lasso regularization paths on phenotypes in the UK Biobank data.
    Here is a link to the code and scripts used in the paper

  • Lukasz Kidzinski and Trevor Hastie Longitudinal data analysis using matrix completion We use a regularized form of matrix completion to fit functional principal component models, and extend these to other multivariate longitudinal regression models. We have an R package fcomplete which includes three vignettes demonstrating how it can be used.



  • Scott Powers, Trevor Hastie and Rob Tibshirani. Nuclear penalized multinomial regression with an application to predicting at-bat outcomes in baseball. Here we use a convex formulation of the reduced-rank multinomial model, in a novel application using a large dataset of baseball statistics. In special edition "Statistical Modelling for Sports Analytics", Statistical Modelling, vol. 18, 5-6: pp. 388-410.

  • Qingyuan Zhao and Trevor Hastie Causal Interpretations of Black-Box Models. We draw connections between Friedman's partial dependence plot and Pearl's back-door adjustment to explore the possibility of extracting causality statements after fitting complex models by machine learning algorithms. Finally published 2019 ASA/JBES, 39(1) 2019

  • Nicholas Boyd, Trevor Hastie, Stephen Boyd, Benjamin Recht, Michael Jordan. Saturating Splines and Feature Selection We use a convex framework based on TV (total variation) penalty norms for nonparametric regression. Saturation for degree-two splines requires the solution extrapolates as a constant beyond the range of the data. This along with an additive model formulation leads to a convex path algorithm for variable selection and smoothing with generalized additive models. JMLR 18(197):1-32, 2018


  • Scott Powers, Trevor Hastie and Robert Tibshirani Customized training with an application to mass spectrometric imaging of cancer tissue. Annals of Applied Statistics 9(4) (2015), 1709-1725.

  • Rakesh Achanta and Trevor Hastie Telugu OCR Framework using Deep Learning. We build an end-to-end OCR system for Telugu script, that segments the text image, classifies the characters and extracts lines using a language model.The classification module, which is the most challenging task of the three, is a deep convolutional neural network.

  • Jingshu Wang, Qingyuan Zhao, Trevor Hastie and Art Owen. Confounder Adjustment in Multiple Hypotheses Testing. (accepted, Annals of Statistics, 2016). We present a unified framework for analysing different proposals for adjusting for confounders in multiple testing (e.g. in genomics). We also provide an R package cate on CRAN that implements these different approaches. The vignette shows some examples of how to use it.

  • Alexandra Chouldechova and Trevor Hastie Generalized Additive Model Selection A method for selecting terms in an additive model, with sticky selection between null, linear and nonlinear terms, as well as the amount of nonlinearity. The R package gamsel has been uploaded to CRAN.



  • Lucas Janson, Will Fithian, and Trevor Hastie. Effective Degrees of Freedom: a Flawed Metaphor. The popular covariance formula for df gives some surprising results, like df>p in forward stepwise. May 2015, Biometrika

  • Stefan Wager, Trevor Hastie and Bradley Efron. Confidence Intervals for Random Forests: the Jackknife and the Infinitessimal Jackknife. We use ideas related to OOB errors to compute standard errors for bagging and random forests. Two approaches are presented, one based on the jacknife, the other on the infinitessimal jacknife. We study the bias of these estimates, as well as monte-carlo errors. . (JMLR 2014, 15 1625-1651)

  • David Warton, Bill Shipley and Trevor Hastie. CATS regression - a model-based approach to studying trait-based community assembly. We show how to use GLMs to fit community models, which are traditionally fit by maximum entropy. Apart from being a convenient platform for model fitting, a all the usual summaries, statistics and extensions of GLMs are available. ( Methods of Ecology and Evolution, September 29, 2014)

  • Hristo Paskov, Robert West, John Mitchell and Trevor Hastie. Compressive Feature Learning We use an unsupervised convex ducument compression algorithm to derive a sparse k-gram representation for a corpus of documents. This same dictionary, in the spirit of "deep learning", is as good as the original k-gram representation for document classification tasks. To appear, NIPS 2013.

  • My first blog post with Will Fithian. This post refers to concerns that were raised about cross-validation.

  • Noah Simon, Jerome Friedman and Trevor Hastie. A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression. We use the group lasso in the context of multinomial and multi-response regression. Each variable has multiple coefficients for the different responses, and they each get selected via a group lasso penalty. Our code is an efficient implementation of block coordinate descent, and is built into the glmnet package. (submitted, on arXiv).

  • Michael Lim and Trevor Hastie. Learning interactions via hierarchical group-lasso regularization. We use the overlap group lasso in the context of a linear model to test for interactions. Our methodology can handle qualitative as well as quantitative variables. Our R package glinternet can fit linear and logistic regression models. Optimized code can handle thousands of variables (our largest example had > 20K 3-level factors). arXiv:1308.2719 (JCGS 2014, online access)

  • Michael Jordan et al. Frontiers in Massive Data Analysis. This 129 page document is the report produced by the Committee on the Analysis of Massive Data. This committee was established by the National Research Council of the National Academies, and met four times over 2011-2012 in Washington and California. Michael Jordan was the chair of the 18 member committee, made up of statisticians (5), computer scientists and mathematicians. I was a member of the committee, and was jointly responsible for Chapter 7 with David Madigan, although all committee members provided input to all chapters as well.

  • Trevor Hastie and Will Fithian. Inference from Presence-only Data: the Ongoing Controversy. This short paper argues strongly against the use of rigid parametric logistic regression models to make inferences from presence-only data in Ecology. Essentially the rigidity of the model manufactures information that is not present in the data. Ecography (2013, editors choice) Video interview with David Warton at Ecostats conference at UNSW in Sydney in July 2013. David was wearing his editor's hat for Methods in Ecology and Evolution, and the discussion centered on this paper.

  • Noah Simon, Jerome Friedman, Trevor Hastie and Rob Tibshirani. The Sparse Group Lasso By mixing L1 penalties with group-lasso L2 penalties, we achieve a sparse group lasso where some members of a group can end up being zero. JCGS, May 2013, 22(2), pages 231-245.

  • Jianqiang Wang and Trevor Hastie. Boosted Varying-Coefficient Regression Models for Product Demand Prediction. We use the varying coefficient paradigm to fit a market segmented product demand model, with boosted regression trees as the nonparametric component. JCGS

  • Julia Viladomat, Rahul Mazumder, Alex McInturff, Douglas McCauley and Trevor Hastie Assessing the significance of global and local correlations under spatial autocorrelation; a nonparametric approach. Variables collected over a spatial domain often exhibit strong spatial autocorrelation. When such variables are used in a regression, pairwise correlation analysis, or in the popular geographically weighted regression, it can be difficult to assess significance. We propose a general approach based on randomization followed by smoothing to restore the spatial correlation structure. R code code used in the paper. (Biometrics, Jan 2014)


  • Will Fithian and Trevor Hastie. Finite-sample equivalence in statistical models for presence-only data. We show that a lot of different approaches to presence-only data are the same, in particular inhomogenous poisson proccesses, maxent, and naive logistic regression (when weighted appropriately). (AoAS 2013 7 (4) 1917-1939 ).

  • Jason Lee and Trevor Hastie. Learning Mixed Graphical Models. We use group-lasso regularized pseudo likelihood for learning the structure of a graphical model with mixed discrete and continuous variables. Our model respects the symmetry imposed by a Markov random field representation --- each of the potentials gets a vote from a pair of regression models (Gaussian, logistic or multinomial), where each of the pair of variables is the response and predictor. (in Arxiv), JCGS 24(1) 2015. Go to Jason Lee's webpage for matlab code and a demo.

  • Rahul Mazumder, Jerome Friedman and Trevor Hastie. Sparsenet R package on CRAN. Fits sparse solution paths for linear models (square-error loss) using coordinate descent with MC+ penalty family. Software is very fast, and can handle many thousands of variables. Functions for cross-validation, prediction, plotting etc. Based on algorithms described in SparseNet : Coordinate Descent with Non-Convex Penalties. JASA 2011, 106(495) 1125-1138.















  • Friedman, J., Hastie, T. and Tibshirani, R. (Published version) Additive Logistic Regression: a Statistical View of Boosting Annals of Statistics 28(2), 337-407. (with discussion)
    We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far. Here are the slides (2 per page) for my boosting talk.

  • Crellin, N., Hastie, T. and Johnstone, I. "Statistical Models for Image Sequences" Technical report, submitted to "Human Brain Mapping". We study fMRI sequences of the human brain obtained from experiments involving repetitive neuronal activity. We investigate the function form of the hemodynamic response function, and provide evidence that the commonly adopted convolution model is inadequate.

  • Hastie, T. and Tibshirani, R. "Bayesian Backfitting" Stanford Technical report.
    The Gibbs sampler looks and feels like the backfitting algorithm for fitting additive models. Indeed, a simple modification to backfitting turns it into a Gibbs sampler for spitting out samples from the "posterior" distribution for an additive fit.
    Published Statistical Science 15, no. 3 (2000), 196-223

  • Wu, T.,Hastie, T., Schmidler, S. and Brutlag, D. "Regression Analysis of Multiple Protein Structures" Models for lining up and averaging groups of protein structures.



  • Hastie, T. "Neural Networks", to appear in Encyclopaedia of Biostatistics. A brief survey with some personal points of view.

  • Hastie, T., and Tibshirani, R. "Classification by Pairwise Coupling" We solve a multiclass classification problem by combining all the pairwise rules. This paper builds on ideas proposed by J. Friedman. An abbreviated version is published in Advances in Neural Information Processing Systems 10, M. I. Jordan, M. J. Kearns, S. A. Solla, eds., MIT Press, 1998.

  • Hastie, T. and Tibshirani, R. "Generalized Additive Models" to appear in "Encyclopaedia of Statistical Sciences". A survey paper on GAMs.






  • Hastie, T. J., and Pregibon, D. "Shrinking Trees." AT&T Bell Laboratories Technical Report (March 1990). Unpublished manuscript. Thanks to Mu Zhu for turning the pre-web technical memorandum into an online document.



  • Hastie, T. J., and Pregibon, D. "A new algorithm for matched case-control studies with applications to additive models." AT&T Bell Laboratories Technical Report (November 1987). Unpublished manuscript. A shorter version appeared in the proceedings of Compstat 1988

  • Hastie, T. and Little, F. Principal Profiles A nonlinear version of principal components for compositional data, that helps explain the horse-shoe effect. Published in Proceedings of the Interface Meeting (Comp.Sci and Stat), 1987.

  • Greenacre, M. and Hastie, T. A Geometric Interpretation of Correspondence Analysis There are many ways to think of CA. This paper presents it as a form of PCA, or subspace approximation in a Chi-squared metric. JASA (82) June 1987.

  • Hastie, T. A Closer Look at the Deviance The deviance for GLMs provides the analog in many ways for sum-of-squares in linear regression. This little paper surveys the connections. The American Statistician 41(1), 1987, pp 16-20



  • Principal Curves and Surfaces SLAC (Stanford Linear Accelerator Center) has put up a pdf version of my Ph.D thesis.

    Original principal curves and surfaces movie (youtube).

  • Generalized Additive Models: the original technical report, written by two PhD students.