The data are in two gzipped files, and each line consists of the digit id (0-9) followed by the 256 grayscale values.
There are 7291 training observations and 2007 test observations, distributed as follows:
or as proportions
The test set is notoriously "difficult", and a 2.5% error rate is excellent. These data were kindly made available by the neural network group at AT&T research labs (thanks to Yann Le Cunn).