Tag: post-norm transformer

16 October 2025

Transformer Pre-Norm vs Post-Norm Architectures: Which One Keeps LLMs Stable?

Pre-norm and post-norm architectures determine how Layer Normalization is applied in Transformers. Pre-norm enables stable training of deep LLMs with 100+ layers, while post-norm struggles beyond 30 layers. Most modern models like GPT-4 and Llama 3 use pre-norm.

Susannah Greenwood 8 Comments

Tag: post-norm transformer

Transformer Pre-Norm vs Post-Norm Architectures: Which One Keeps LLMs Stable?

About

Latest Stories

Financial Services Use Cases for Large Language Models in Risk and Compliance

Categories

Featured Posts

Generative AI in Procurement: Automating Vendor Assessments and Clause Libraries

Tensor Parallelism for LLM Inference: A Practical Guide to Multi-GPU Deployment