A curated list of research in deep learning algorithms (especially on models). I also summarize classic papers and provide corresponding PyTorch implementations. Because they are pre-trained on large-scale labeled datasets (i.e., ImageNet and Microsoft COCO), I just provide implementation on inference.
There are three main architectures of recent deep learning models: 1) Convolutional Neural Networks (CNN); 2) Recurrent Neural Networks (RNN); and 3) Generative Adversarial Networks (GAN). I will introduce them at first and then describe corresponding applications (i.e., object detection using CNN). Besides, I also introduce an effective module named attention which is widely used in Natural Language Processing algorithms (i.e., BERT and Transformer).
Convolutional neural network (CNN, or ConvNet) is a class of deep neural network, most commonly applied to analyze visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation equivariant responses known as feature maps.[1]
- Image Classification (AlexNet, VGG, Inception and ResNet) [2]
- Object Detection (RCNN series, YOLOvx and SSD) [3]
- Semantic Segmentation (FCN, U-Net, PSPNet, Mask RCNN and DeepLab) [4, 5]
Recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. [6]
- Three classic modules to capture history (RNN, LSTM and GRU)
- Image Caption (One-to-Many)
- Sentiment Classification (Many-to-One)
- Machine Translation (Many-to-Many)
Generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss). [7]
- Convolutional GAN (DCGAN, LapGAN, ResGAN, SRGAN and CycleGAN) [8]
- Conditional GAN (CGAN and InfoGAN) [8]
Attention is a technique that mimics cognitive attention. The effect enhances the important parts of the input data and fades out the rest -- the thought being that the network should devote more computing power on that small but important part of the data. Which part of the data is more important than others depends on the context and is learned through training data by gradient descent. [9, 10]
- (Wikipedia) Convolutional Neural Network.
- (PIEAS) A Survey of the Recent Architectures of Deep Convolutional Neural Networks. Artificial Intelligence Review, 2020.
- (Oulu, NUDT, USYD, CUHK and UWaterloo) Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision, 2020 (IJCV).
- (Snapchat, UWaterloo, Qualcomm, UEX, UTD and UCLA) Image Segmentation Using Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021 (PAMI).
- awesome-semantic-segmentation
- (Wikipedia) Recurrent neural network
- (Wikipedia) Generative adversarial network
- (NJUST and PIEAS) Recent Progress on Generative Adversarial Networks (GANs): A Survey. IEEE Access, 2019.
- (Wikipedia) Attention
- (Lilian's blog) Attention? Attention!