Notes: Little book of deep learning

Here are my notes from The Little Book of Deep Learning¹. It’s a short read at ~ 176 pages in 3 parts and around 8 chapters.

Notable Papers

ImageNet Classification with Deep Convolutional Neural Networks – Krizhevsky et al. (2012)
Backpropagation Applied to Handwritten Zip Code Recognition – LeCun et al. (1989)

P1: Machine Learning Foundations

1. Types of Learning

Supervised Learning: Regression, classification
Unsupervised Learning: Density modeling

2. Efficient Computation

Limiting factor in GPU: memory read-write operations

Tensors: Series of scalars arranged along discrete axes

3. Training Techniques

Loss minimization
Softmax of logits
Autoregressive models: NLP and computer vision
Gradient descent and learning rate
Backpropagation and activations
Autograd – Baydin et al. (2015)

P2: Deep Models

4. Model Components

Deep architectures improve performance, yet face challenges like vanishing gradient
Layers: Convolutional, activation functions, pooling, attention (transformers)

5. Architectures

5.1 Multi-layer Perceptron (MLP)
5.2 Convolutional Network (ConvNet): Image processing
5.3 Attention Models
Transformer – Vaswani et al. (2017) (arXiv)
Generative Pre-Trained Transformer (GPT) (2018) (OpenAI Paper)
Vision Transformer

P3: Applications

6. Prediction

Image denoising, classification, object detection, semantic segmentation
Speech recognition, text-image, zero-shot prediction
Reinforcement learning

7. Synthesis

7.1 Text Generation
- Few-shot prediction
- Reinforcement Learning from Human Feedback (RLHF)
7.2 Image Generation

8. Compute Schism

8.1 Prompt Engineering
8.2 Quantization
8.3 Adapters
8.4 Model Merging

Missing Bits

Recurrent Neural Networks (RNN)
Autoencoder
Generative Adversarial Networks (GAN)
Graph Neural Networks (GNN)

Conclusion

This book provides an excellent overview of AI and machine learning fundamentals. A key takeaway is that different models serve different types of learning needs. For instance:

Image Generation: GANs, Variational Autoencoders (VAEs), autoregressive models
Text Generation: RNNs, transformers like GPT and BERT, reinforcement learning models like SeqGAN

https://fleuret.org/francois/lbdl.html ↩︎

drew's blog