Latency and Cost in Multimodal Generative AI: How to Budget Across Text, Images, and Video
Susannah Greenwood
Susannah Greenwood

I'm a technical writer and AI content strategist based in Asheville, where I translate complex machine learning research into clear, useful stories for product teams and curious readers. I also consult on responsible AI guidelines and produce a weekly newsletter on practical AI workflows.

5 Comments

  1. Nathaniel Petrovick Nathaniel Petrovick
    December 16, 2025 AT 22:29 PM

    Man, this hit home. We just went from $2k to $18k/month on AWS because someone thought 'let's add image search' without checking the token costs. We didn't even have a single user upload a photo for two weeks. Just pure waste. Started capping images at 400 tokens and dropped to $4.5k. No one noticed the difference in answers. Turned out most people just wanted to type 'my shirt is too tight' anyway.

  2. Pooja Kalra Pooja Kalra
    December 18, 2025 AT 20:54 PM

    It's not about the tech. It's about the human illusion that more data means more intelligence. We treat pixels like wisdom. But a thousand tokens of a broken phone screen don't make the AI understand brokenness. It just sees noise. And we pay for noise like it's insight.

  3. Sumit SM Sumit SM
    December 19, 2025 AT 13:38 PM

    Let me just say this: if your multimodal AI is costing more than your payroll, you’re not building AI-you’re building a financial hemorrhage. And no, ‘but users love it!’ doesn’t cut it when your CFO is crying into their coffee. We cut video processing entirely. No one missed it. People upload videos because they think it’s ‘cool,’ not because it’s useful. Sad, but true.

  4. Honey Jonson Honey Jonson
    December 20, 2025 AT 04:01 AM

    so i tried lowering the image res to 320x240 and honestly? it still works fine. like, my users didnt even notice. i was scared theyd be mad but nope. just chillin. also turned off video for now. saved like 60% and my boss actually smiled. weird right? maybe we were just overdoing it. also typoed ‘resolusion’ but you get it lol

  5. Sally McElroy Sally McElroy
    December 21, 2025 AT 16:07 PM

    It’s not just about cost-it’s about responsibility. You’re not just burning GPU credits; you’re burning energy, and that energy comes from power plants that are already overburdened. If your ‘innovation’ requires more electricity than a small town, you’re not a tech pioneer-you’re an environmental liability. And if your users can’t get a response in under two seconds, you’ve already failed them. Stop pretending this is progress. It’s just excess dressed up as intelligence.

Write a comment