Blog
Building Enterprise AI Agents with Amazon Bedrock AgentCore: Lessons from a Data Migration Chatbot
Agentic AI is changing what's possible with generative systems. Early AI workloads centered on content tasks like retrieving information from knowledge bases, summarizing documents, and pattern-based composition. Useful, but limited. Agentic systems reason through complex problems, orchestrate multi-step workflows, and take direct action on backend infrastructure.
Mission recently put this into practice for a global data migration platform provider facing a scaling challenge. Their move from enterprise to mid-market customers required a self-service model that could guide users through intricate migration processes. The support team couldn't keep pace with demand, and scaling by hiring wasn't viable. Domain expertise in complex data migrations takes months to develop, making it impossible to train new support staff fast enough to match customer growth. The process is well documented but couldn't bridge the gap. Users needed an intelligent system that could not only answer questions but validate data, query databases, monitor jobs, and execute workflows.
The answer was an agentic chatbot built on Amazon Bedrock AgentCore, a managed runtime for AI agents. AgentCore took care of infrastructure concerns (security, session management, observability, memory) while our team concentrated on orchestration and agent design. The chatbot queries databases, validates migration data, monitors job status, and executes multi-step workflows.
This blog article walks through the architecture, technical decisions, and patterns that shaped the solution.
The Business Challenge
Our client, a data migration platform provider, was expanding into the mid-market and needed their support model to scale differently. While complex migrations for enterprise customers often justified hands-on support, the economic realities of the mid-market made such dedicated assistance unsustainable.They needed a self-service approach that could guide users through intricate migration processes without constant human intervention.
The challenge wasn't lack of documentation. The company had extensive guides covering their migration platform. But the complexity of the work required significant domain expertise that couldn't be captured in documentation alone. Users needed to query databases for data quality checks, monitor job status across multiple systems, and execute validation workflows. Each migration involved judgment calls and troubleshooting that only experienced team members could handle effectively.
They had 13 weeks to deploy a solution that could support their Small and Medium Business (SMB) market expansion. A standard Q&A bot wasn’t enough—they needed an intelligent agent that could not only answer questions but also execute actions, orchestrate validation workflows, and interact with backend systems.
As a result, users can now rely on the chatbot to accelerate their data migration process without having to read through extensive documentation or rely solely on institutional knowledge to configure jobs. In addition, non-technical users can perform SQL queries through natural language, significantly speeding up their data quality checks.
Solution Architecture: A Multi-Agent Approach
Mission designed an orchestration pattern where a primary agent classifies user intent and routes requests to specialized sub-agents. Each sub-agent handles a specific domain: one for knowledge retrieval, another for database queries, a third for orchestrating validation workflows. The architecture keeps agents focused and maintainable while the orchestration layer handles complexity.
High-Level Workflow
User Interaction Layer
Users interact through a React UI backed by Amazon Cognito for authentication. They can submit queries via text, upload files, use pre-built prompt templates, perform data validations guided by the Chatbot, and view SQL query history. For example, user can ask questions:
- “I would like to run validation for objects A, B, and C.”
The Agent will then guide the user through the required job-parameter configurations, trigger the data-loading tool, and call the final API endpoint to complete the validations. - “Tell me how to update the configuration parameters.”
The Agent will display the corresponding UI window for the user to enter or update the parameter values.
Agent Orchestration
The super-agent (powered by Strands Agents framework on AgentCore Runtime) analyzes intent and routes to specialized sub-agents. The Discovery Agent handles Q&A against knowledge bases using Retrieval-Augmented Generation (RAG) through Amazon Bedrock Knowledge Base with documents stored in Amazon S3. When users need data from production systems, the Text-to-SQL Agent translates natural language into SQL queries against PostgreSQL. For complex multi-step processes, the Validation Agent orchestrates data validation workflows that span multiple systems.
Backend Integration
Agents interact with Amazon S3 for document storage, PostgreSQL for structured data, and AWS Lambda (via AgentCore Gateway) for custom business logic. Amazon QuickSight provides visualization when users need graphical results.
Memory Management
AgentCore's short-term memory maintains conversational context within active sessions, tracking what users asked several questions back. PostgreSQL stores chat history, user preferences, and configuration data for long-term persistence across sessions.
Observability
AgentCore Observability provides real-time monitoring of agent reasoning, tool usage, and performance metrics, giving the team visibility into how agents make decisions.
Key Technical Decisions That Shaped the Architecture
Building a production-ready agentic system in 13 weeks required making deliberate choices about infrastructure, frameworks, and integration patterns. Some decisions were about speed (choosing Strands over heavier orchestration frameworks). Others were about long-term maintainability (deploying tools through AgentCore Gateway instead of direct Lambda invocation). A few were about balancing performance with cost (the hybrid file upload pattern).
These six decisions had the biggest impact on delivery timeline, system performance, and future scalability.
1. Why AWS Bedrock AgentCore Runtime?
We chose AgentCore instead of self-managed infrastructure on Amazon ECS or Amazon EC2. The decision came down to focus. Building on managed infrastructure meant zero time spent on container orchestration, autoscaling policies, or network configuration. AgentCore provides session isolation, memory persistence, authentication, tool discovery, and observability out of the box.
The framework-agnostic design mattered too. AgentCore supports Strands, LangGraph, or custom agents without locking you into a specific vendor's patterns. It integrates natively with AWS's GenAI ecosystem, particularly Bedrock, which is aligned with the client's existing AWS footprint.
The impact showed up immediately. The solution enabled faster delivery by eliminating infrastructure management overhead, allowing the team to focus entirely on solving customer problems.
2. Why Strands Agents Framework?
We selected Strands over heavier orchestration frameworks like LangGraph. The 13-week timeline required rapid iteration, and Strands’ code-first patterns (where agent logic is defined directly in code rather than through visual interfaces) enabled us to move quickly. Its AWS-native design—with built-in support for Bedrock and AgentCore deployment—removed integration overhead, and its open-source nature gave us full transparency for code-level debugging when we needed to understand agent behavior.
While Strands can support graph-structured flows, it avoids the operational complexity common in large DAG-centric orchestrators and still offers the production-grade capabilities we needed. This allowed us to iterate fast without losing control or extensibility, ultimately delivering a production-ready agent on schedule and without compromising quality.
3. File Upload Pattern: Hybrid Pre-Signed URL Approach
For file uploads, we used AgentCore Runtime to generate pre-signed URLs for direct S3 uploads while passing metadata through the AgentCore Runtime endpoint for downstream orchestration. Large files never touch the runtime itself.
This pattern is optimized for performance and cost. Routing files directly to S3 avoided pushing large payloads through AgentCore Runtime, reducing memory and compute usage. AgentCore validates metadata and generates time-limited URLs, maintaining security without sacrificing speed. Users get faster uploads. We get lower operational costs. Response times stayed consistent regardless of file size.
4. Tool Deployment via AgentCore Gateway
Lambda functions deploy through AgentCore Gateway rather than direct invocation. This choice was about future-proofing. Tools deployed through the Gateway become reusable across multiple agents and applications. The Gateway transforms APIs into AgentCore-compatible protocols and handles automatic tool registration for agent consumption.
A single integration point reduces coupling between agents and backend services. When we need to update a tool's implementation, we change it once. Every agent consuming that tool gets the update automatically. We built a library of reusable tools that will scale beyond this single project.
5. Hybrid Memory Strategy
Memory operates on two levels. AgentCore's short-term memory handles active session context with fast access to recent conversation history, enabling multi-turn responses that understand what happened three questions ago. PostgreSQL manages long-term storage with a custom schema built for project-specific requirements.
PostgreSQL gives us advanced querying and analytics capabilities that short-term memory can't provide. It integrates with other systems like Amazon QuickSight and Amazon Lambda. Chat history, user preferences, and configuration data persist across sessions. We balanced real-time performance with robust, scalable data persistence.
6. Built-In Observability
We leveraged AgentCore Observability instead of building custom monitoring. The platform captures agent reasoning traces, tool calls, token usage, and latency. Setup took under two hours. The visibility helped us understand agent decision-making for debugging and optimization, and we could track token consumption to monitor costs.
Proactive debugging became possible without constructing custom telemetry infrastructure. When agents behaved unexpectedly, we had detailed traces showing exactly which reasoning path they followed and why.
Best Practices for Building Production AI Agents
When you build production AI agents, you need more than just model selection. It demands thoughtful architecture around orchestration, memory, security, and observability.
Leverage managed runtimes to focus on business logic. Infrastructure management consumes engineering time that could go toward solving actual business problems. AgentCore eliminated weeks of DevOps work, letting our team concentrate entirely on agent behavior and orchestration patterns.
Match your framework to your timeline. Lightweight frameworks like Strands enable rapid iteration when delivery speed matters. Heavier orchestration tools provide more features but add complexity. Choose based on what your project actually needs, not what sounds most impressive.
Design hybrid patterns for performance and cost optimization. The pre-signed URL approach for file uploads kept response times fast while reducing compute costs. Look for opportunities where direct integration (like uploads straight to S3) can bypass unnecessary processing layers.
Standardize tool deployment for reusability. Deploying Lambda functions through AgentCore Gateway created a library of reusable tools that work across multiple agents. Single integration points reduce maintenance burden and make updates cleaner.
Combine memory strategies for different use cases. Short-term memory handles active conversation context. Long-term storage manages historical data, preferences, and analytics. Each serves a distinct purpose. Use both strategically rather than forcing one approach to handle everything.
Prioritize observability from day one. Agent reliability depends on understanding how agents make decisions. Built-in observability gave us visibility into reasoning traces, tool usage, and performance without building custom telemetry. Start with monitoring, not as an afterthought.
Conclusion
Amazon Bedrock AgentCore eliminated the infrastructure complexity that typically adds months to agentic AI projects. Our team delivered a production-ready multi-agent system in 13 weeks because we could focus entirely on agent logic, orchestration patterns, and business requirements instead of managing containers, scaling policies, and observability infrastructure.
Our data migration chatbot demonstrates what becomes possible when managed runtimes handle the operational overhead. Multi-agent orchestration, hybrid memory strategies, and backend system integration solved real business problems for a company expanding into new markets. The architecture patterns we used here translate to other domains where agents need to execute actions, not just retrieve information.
Building production AI agents requires navigating trade-offs between frameworks, memory strategies, tool deployment patterns, and integration approaches. Mission, a CDW Company, is an AWS GenAI Competency Partner with over 250 GenAI solutions built on AWS. We bring experience from multiple AgentCore deployments and understand how to move from concept to production while avoiding common pitfalls.
Ready to build your own agentic AI solution? Contact Mission to discuss your use case and explore how AI agents can transform your operations.
Author Spotlight:
Na Yu, Ryan Ries, Qiong Zhang (AWS PSA), and Jonathan Vota (AWS PSA)
Keep Up To Date With AWS News
Stay up to date with the latest AWS services, latest architecture, cloud-native solutions and more.