Title: Deep Residual Learning for Image Recognition
Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun
URL: https://arxiv.org/abs/1512.03385
Year: 2015
Other information: Presented at the Computer Vision and Pattern Recognition Conference (CVPR 2016).
Key: He2015

Reformulate neural network architecture to address the degratation problem due to the very large number of layers.

Degradation problem:
- Increasing the depth of the network: accuracy gets saturated and then starts degrading.
- As number of layers increase: higher training error.
- In theory, such problem should not occur. Given a neural network, one can add new layers with identity mappings. In reality, optimization algorithms probably have difficulties to find (in feasible time)these solutions.

Layers are reformulated as learning residual functions with reference to the layer inputs.

For the image classification task, two datasets were used: ImageNet and CIFAR-10

(*) in the experiments with CIFAR-10, the training images are split into 45K/5K training/validation sets.

ImageNet Dataset

Input images:
- Scale jittering as in Simonyan2015. Image is resized with shorter size sampled to be in between [256, 480].
- 224x224 crop from image is used.
- Data augmentation following Krizhevsky2012 methodology: image flips, change RGB levels.
Training
- Weight initialization: follows previous work by the authors.
- Gradient descent with batch normalization, weight decay = 0.0001, momentum = 0.9
- Mini-batch size = 256
- Learning rate starts at 0.1 and is divided by 10 when accuracy stop increasing at the validation set.
- Dropout is not employed
Testing
- Multi-crop procedure from Krizhevsky2012 is employed: 10 crops.
- Fully connected layers are converted into convolutional layers.
- Average of scores at multiple scales is employed. Testing scales used: {224, 256, 384, 480, 640}.
Configurations tested on ImageNet dataset

Architecture	top-1 error (%)	top-5 error (%)
VGG (ILSVRC'14)	-	8.43
GoogLeNet (ILSVRC'14)	-	7.89
VGG (v5)	24.4	7.1
PReLU-net	21.59	5.71
BN-inception	21.99	5.81
ResNet-34 B (projections + identitites)	21.84	5.71
ResNet-34 (projections)	21.53	5.60
ResNet-50	20.74	5.25
ResNet-101	19.87	4.60
ResNet-152	19.38	4.49

CIFAR-10 Dataset

Provide feedback

Saved searches

output map size	32x32	16x16	8x8
num. layers	1+2n	2n	2n
num. filters	16	32	64