Tag: LLM training

2 January 2026

Residual Connections and Layer Normalization in Large Language Models: Why They Keep Training Stable

Residual connections and layer normalization are essential for training stable, deep large language models. Without them, transformers couldn't scale beyond a few layers. Here's how they work and why they're non-negotiable in modern AI.

Susannah Greenwood 7 Comments

16 December 2025

Mixed-Precision Training for Large Language Models: FP16, BF16, and Beyond

Mixed-precision training using FP16 and BF16 cuts LLM training time by up to 70% and reduces memory use by half. Learn how it works, why BF16 is now preferred over FP16, and how to implement it safely with PyTorch.

Susannah Greenwood 8 Comments