Tag: layer normalization

2 January 2026

Residual Connections and Layer Normalization in Large Language Models: Why They Keep Training Stable

Residual connections and layer normalization are essential for training stable, deep large language models. Without them, transformers couldn't scale beyond a few layers. Here's how they work and why they're non-negotiable in modern AI.

Susannah Greenwood 7 Comments

16 October 2025

Transformer Pre-Norm vs Post-Norm Architectures: Which One Keeps LLMs Stable?

Pre-norm and post-norm architectures determine how Layer Normalization is applied in Transformers. Pre-norm enables stable training of deep LLMs with 100+ layers, while post-norm struggles beyond 30 layers. Most modern models like GPT-4 and Llama 3 use pre-norm.

Susannah Greenwood 8 Comments

Tag: layer normalization

Residual Connections and Layer Normalization in Large Language Models: Why They Keep Training Stable

Transformer Pre-Norm vs Post-Norm Architectures: Which One Keeps LLMs Stable?

About

Latest Stories

Generative AI ROI Case Studies: Real Results from Early Adopters

Categories

Featured Posts

What Counts as Vibe Coding? A Practical Checklist for Teams

AI Auditing Essentials: Logging Prompts, Tracking Outputs, and Compliance Requirements

How to Generate Long-Form Content with LLMs Without Drift or Repetition

Operating Model Changes for Generative AI: Workflows, Processes, and Decision-Making

Change Management Costs in Generative AI Programs: Training and Process Redesign