Leveling Up:
AI Infrastructure for Next-Generation Gaming
Executive Summary
Jam & Tea Studios, an indie game developer pioneering AI-native gaming experiences, partnered with Mission through the AWS RAPID program to evaluate whether Amazon Bedrock foundation models could replace their self-hosted AI inference. Mission's engineers conducted rigorous testing across five leading AI models, doubling baseline accuracy through iterative optimization while providing strategic clarity on when to leverage managed services versus custom orchestration. The evaluation informed Jam & Tea's product roadmap and validated a new business opportunity.
About Jam & Tea Studios
Jam & Tea Studios is a video game studio building AI driven experiences where story quality, continuity, and responsiveness shape player trust. The company’s purpose is twofold: 1) to build games that can only be made with generative AI—what they call "AI-native games," and 2) to build multiplayer games that bring people together rather than creating isolated individual experiences. Their flagship game, Retail Mage, uses Jam & Tea’s self-hosted AI inference to orchestrate how models behave inside gameplay.
Scale faster without breaking your cloud
Background
Jam & Tea’s system achieves 92-100% accuracy on complex gameplay interactions. One year after development, Jam & Tea needed to determine whether advances in reasoning models meant they could sunset their custom infrastructure in favor of managed services like Amazon Bedrock.
Challenge
Jam & Tea Studios faced a strategic inflection point. Their self-hosted system required continuous R&D investment, ongoing prompt iteration, and dedicated GPU infrastructure to maintain the precision their gameplay demanded. The small team wanted to understand whether foundation models with enhanced reasoning capabilities had advanced enough to replace their complex, hand-coded orchestration system. The evaluation needed to be unbiased, thorough, and grounded in production-grade testing that would reveal both capabilities and limitations. They required external expertise to validate or challenge their assumptions about the current state of AI technology for real-time gaming applications.
Why Mission
Jam & Tea chose Mission because AWS introduced them as a trusted partner with deep expertise in AI/ML workloads. The AWS RAPID program provided funding to conduct a thorough evaluation without financial risk. Mission's engineers brought objectivity to a decision where internal teams might carry bias about their own technology. The partnership allowed Aaron and his team to hand off the evaluation work to professionals who could validate assumptions, identify blind spots, and push the testing further than internal resources allowed. Mission's willingness to rigorously challenge existing approaches aligned perfectly with what Jam & Tea needed: honest assessment rather than advocacy.Why AWS
AWS has been Jam & Tea's cloud partner since their participation in the AWS Generative AI Accelerator in 2024, where they were the only gaming company in their cohort. The relationship extends beyond infrastructure, reflecting a partnership model where AWS actively works to solve customer problems. While much of cloud infrastructure has become commoditized, the human element and collaborative approach differentiate AWS. Amazon Bedrock's managed inference services offered the potential to reduce operational overhead while Amazon Nova models represented cutting-edge reasoning capabilities tailored to complex, multi-step problem solving.
Solution
Mission's engineers conducted a comprehensive model evaluation using Jam & Tea's existing testing framework: 150 production-derived test cases representing complex gameplay scenarios. Working with Amazon Bedrock, they tested five foundation models across three prompt iterations, measuring accuracy, latency, and cost per interaction. The methodology ensured apples-to-apples comparisons against current baseline performance.
The team evaluated Amazon Nova 1 Pro, Nova 2 Pro, Nova 2 Lite, Claude Sonnet 4.5, and DeepSeek V3.1, tracking how each model handled intricate game logic like crafting items, managing object relationships, and maintaining world coherence. Each test required models to perform multiple simultaneous actions correctly or fail the entire interaction. Through iterative prompt engineering on Amazon Bedrock, Mission doubled model accuracy in just three iterations, demonstrating a clear improvement trajectory within the project's limited timeframe.
Mission discovered that prompt orchestration remains valuable even with modern reasoning models for latency-sensitive, high-accuracy applications. The evaluation revealed specific use cases where Amazon Bedrock models could replace proprietary components outright, alongside latency-sensitive, high-accuracy cases where prompt orchestration adds value. This nuanced assessment gave Jam & Tea actionable intelligence rather than binary recommendations. Mission also identified optimization opportunities through Bedrock-native capabilities like prompt caching and potential fine-tuning that could further improve both performance and cost efficiency.
The engagement provided strategic value beyond technical metrics. Mission validated that Jam & Tea's custom orchestration technology had genuine market differentiation, helping confirm investment decisions in their proprietary system while identifying where managed Bedrock services could reduce operational burden.
Results
Mission's evaluation achieved its core objective: providing unbiased clarity on how Amazon Bedrock models could fit Jam & Tea's architecture. The work pinpointed the gameplay interactions where Bedrock foundation models are ready to take over today, and Mission doubled model accuracy across three iterations. Equally important was discovering the boundary where Retail Mages’s more complex gameplay features still require custom orchestration. This trajectory, combined with detailed failure pattern analysis, gave Jam & Tea confidence that further optimization could reach production viability for certain use cases.
The assessment clarified when to apply different technologies. Jam & Tea now understands which gameplay interactions suit Bedrock’s single-pass reasoning models and which warrant prompt orchestration, informing architecture decisions for future titles. This provided clarity to their technology roadmap, validating continued investment in their orchestration system while identifying opportunities to leverage Amazon Bedrock for complementary workloads.
The cost case for Bedrock proved compelling. Analysis revealed potential savings exceeding 50% for games in maintenance mode by transitioning to API-based inference, with Bedrock's usage-based managed inference revealing per-interaction costs as low as roughly $0.015. This represents a predictable, usage-based pricing model that scales with player activity rather than requiring always-on GPU capacity, making it particularly attractive for games in maintenance mode or studios managing multiple titles with variable traffic patterns.
The work catalyzed unexpected business opportunities. Validation of their orchestration technology's continued relevance helped Jam & Tea initiate conversations about licensing their system to other companies facing similar challenges. Aaron has since begun collaborating with technology partners on academic research using the evaluation dataset. These outcomes extended well beyond the original AWS RAPID engagement, transforming a technical assessment into strategic positioning for future growth.
AWS Services Used
- Amazon Bedrock
- Amazon Nova models (Nova 1 Pro, Nova 2 Pro, Nova 2 Lite)
- AWS infrastructure for GPU hosting