Here are my notes from The Little Book of Deep Learning1. It’s a short read at ~ 176 pages in 3 parts and around 8 chapters.
Notable Papers
- ImageNet Classification with Deep Convolutional Neural Networks – Krizhevsky et al. (2012)
- Backpropagation Applied to Handwritten Zip Code Recognition – LeCun et al. (1989)
P1: Machine Learning Foundations
1. Types of Learning
- Supervised Learning: Regression, classification
- Unsupervised Learning: Density modeling
2. Efficient Computation
Limiting factor in GPU: memory read-write operations
Tensors: Series of scalars arranged along discrete axes
3. Training Techniques
- Loss minimization
- Softmax of logits
- Autoregressive models: NLP and computer vision
- Gradient descent and learning rate
- Backpropagation and activations
- Autograd – Baydin et al. (2015)
P2: Deep Models
4. Model Components
- Deep architectures improve performance, yet face challenges like vanishing gradient
- Layers: Convolutional, activation functions, pooling, attention (transformers)
5. Architectures
- 5.1 Multi-layer Perceptron (MLP)
- 5.2 Convolutional Network (ConvNet): Image processing
- 5.3 Attention Models
- Transformer – Vaswani et al. (2017) (arXiv)
- Generative Pre-Trained Transformer (GPT) (2018) (OpenAI Paper)
- Vision Transformer
P3: Applications
6. Prediction
- Image denoising, classification, object detection, semantic segmentation
- Speech recognition, text-image, zero-shot prediction
- Reinforcement learning
7. Synthesis
- 7.1 Text Generation
- Few-shot prediction
- Reinforcement Learning from Human Feedback (RLHF)
- 7.2 Image Generation
8. Compute Schism
- 8.1 Prompt Engineering
- 8.2 Quantization
- 8.3 Adapters
- 8.4 Model Merging
Missing Bits
- Recurrent Neural Networks (RNN)
- Autoencoder
- Generative Adversarial Networks (GAN)
- Graph Neural Networks (GNN)
Conclusion
This book provides an excellent overview of AI and machine learning fundamentals. A key takeaway is that different models serve different types of learning needs. For instance:
- Image Generation: GANs, Variational Autoencoders (VAEs), autoregressive models
- Text Generation: RNNs, transformers like GPT and BERT, reinforcement learning models like SeqGAN
Leave a Reply