This book was helpful for refreshing and reinforcing my understanding of transformer models. I like that it doesn’t just use a generic transformer architecture, but rather recreates GPT-2 in pytorch so that you can ultimately load pretrained weights from OpenAI. I also appreciated that there was an appendix on LoRAs.
My notes on the book are at https://brokensandals.net/notes/2025/build-a-large-language-model/.