Tag: LLM serving

16 June 2026

Choosing Batch Sizes to Minimize Cost per Token in LLM Serving

Learn how to optimize batch sizes in LLM serving to minimize cost per token. Discover the trade-offs between latency and throughput, and master static, dynamic, and continuous batching strategies.

Susannah Greenwood 0 Comments

Tag: LLM serving

Choosing Batch Sizes to Minimize Cost per Token in LLM Serving

About

Latest Stories

Operating Model Changes for Generative AI: Workflows, Processes, and Decision-Making

Categories

Featured Posts

Changelogs vs. Decision Logs: How to Track AI Choices for Compliance and Maintainability

Red Teaming for Privacy: Testing LLMs for Data Leakage (2026 Guide)

How MoE Routing Strategies Make Large Language Models Efficient

Mixture-of-Experts (MoE) in LLMs: Cost vs. Quality Tradeoffs Explained

A/B Testing Prompts in Generative AI: Experimentation Frameworks That Scale