How to Build a Chatbot to Q&A PDF Files Using LangChain, Amazon Kendra, & Large Language Models on Amazon SageMaker

Generative AI utilizing large language models (LLMs) is transforming people’s daily life and work. LLMs are able to improve the efficiency of communications between humans and machines. LLMs also improve people’s productivity with content creation, coding, summarization, and being a host of other applications. 

However, there are a few issues when using pre-trained LLMs straight out-of-the-box. One common issue is hallucination due to the fact that generative AI makes a prediction based on underlying mathematical equations. Hallucination refers to a phenomenon where a LLM generates text that appears coherent and contextually relevant, but is actually incorrect or inaccurate. The second issue is that LLMs were trained on past information. For questions on information that is fresh and frequently changed, a LLM cannot answer correctly. 

A reference architecture and its implementation of Retrieval Augmented Generation (RAG) are available in this AWS blog. The blog uses Amazon Kendra web crawler to crawl websites and creates a Kendra index as the knowledge database. LLMs used in this architecture can be variant. 

In this post, you will learn to use Amazon Kendra to index and search PDF files in your ecosystem. You will also learn to launch a LLM available as a SageMaker JumpStart model (a pre-trained model containerized by AWS). You will then use an Amazon SageMaker Studio notebook to launch a Streamlit web application for querying and getting answers from the PDF files in Kendra. We will present detailed configuration steps for Kendra and Sagemaker.

Solution Architecture

Below is a diagram that shows the overall workflow. Raw data is stored in the S3 bucket and we use a car manual PDF in this blog for illustration purposes. Amazon Kendra is used to index the data sources and serves as a knowledge database. We use the Llama2 model as the LLM for text generation. Llama2 is deployed as a SageMaker endpoint by SageMaker JumpStart.

There are three main steps in the workflow of RAG: 

1) Users input requests through the Chatbot web interface built by Streamlit.

2) LangChain is used as an agent framework to orchestrate the different components; Once a request comes in, LangChain sends a search query to Kendra to retrieve the context that is relevant to the user request.

3) LangChain then sends a prompt that includes the user request and the relevant context to a LLM. The results from the LLM are sent back to users via the Chatbot web interface.  

Solution Walkthrough

This section walks you through the steps to set up the RAG solution. 

Index the data sources using Amazon Kendra 

  1. Log into your AWS account and search for Kendra.

  2. On the Kendra console, click Create an Index.

  3. Give an Index name and Create a new role (recommended), as shown below.

  4. Keep everything else as default and create the index. It can take a few minutes to create the index. 
  5. Once the index is created. Click add data sources.
  6. Choose Amazon S3 connector.
  7. Specify the data source name and choose the language of the source documents. 
  8. Choose to create a new role, give the role a new name and leave everything else as default. 
  9. Choose the S3 location where the PDF locates and choose the Run on demand schedule. Leave everything else as default. 
  10. Go to the next page, leave everything else as default, and click add data source at the end. 

  11. Click the Sync now button on the Data sources page. It can take minutes to hours to finish the sync depending on the size of the documents. As a reference, for the car manual that has around 400 pages, it takes about 15 minutes to finish. 

Deploy a LLM from Amazon SageMaker Jumpstart 

  1. Go to SageMaker Studio. 

  2. Go to SageMaker Jumpstart, click “Models, notebooks, solutions”, and search for “Llama-2-7b-chat” model. 

  3. Select an EC2 instance type from the SageMaker hosting instance dropdown list. Note that you might need to request service quota increase if the selected EC2 instance is not available in your AWS account. Then give an Endpoint name that you would use later and click deploy. It might take 15-20 minutes to deploy the endpoint.

  4. Once the endpoint deployment is finished, you should see the endpoint status showing in service.

Launch the Chatbot application

1. Launch the Chatbot application on SageMaker Studio 

  1. Setup your SageMaker Studio working environment and launch SageMaker Studio.  

  2. On SageMaker Studio, select File -> New -> Terminal to create a new Terminal.

  3. On the terminal, run the commands below, following the instructions in the Github Repo. The commands clone the repo and install the prerequisites, including LangChain, boto3 and Streamlit.

    git clone
    cd amazon-kendra-langchain-extensions
    cd kendra_retriever_samples
    pip install -r requirements.txt
  4. Export the environment variables for the region name, the Amazon Kendra index number, and the SageMaker endpoint name that are used by the script
  5. To launch the Chatbot, run the command below. 
    streamlit run llama2

    When the application runs successfully, you will see an output similar to the one shown below. Find the port number in the output, where the port number is typically 8501.

    sagemaker-user@studio$ streamlit run llama2

    Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.

    You can now view your Streamlit app in your browser.

    Network URL:

    External URL:

  6. Copy the URL of your SageMaker studio and modify the URL using the port number as below. Enter this modified URL to a new browser, which brings up the Chatbot web interface.

    The original SageMaker studio URL: 

    Modified URL:


    Below shows two testing questions and answers returned by the RAG solution. By clicking the Sources button, it shows the data sources, such as the S3 location of the PDF files. 


2. Launch the Chatbot application on EC2 instances

You can also run the Chatbot application on an EC2 instance. For doing this, all the steps remain the same, except in the Step 6. Instead of modifying the URL to access the Chatbot web interface, you can directly click the output URL that contains the port number. This link shows how to set up an EC2 instance. Make sure that the region of the EC2 instance is the same as where the SageMaker endpoint and Kendra are launched.  

Clean up resources

Do remember to delete the Kendra index and the SageMaker endpoint to stop charging costs. 


In this blog, we show the step-by-step instructions to set up a RAG solution as a proof of concept (POC). The RAG solution includes three major components: a knowledge database for context search, LangChain, and a LLM for text generation. We use Amazon Kendra as the knowledge database. With Kendra, you do not need to deal with the complexity of chunking management and embeddings, and you can connect to a wide set of data sources, such as Amazon S3, Confluence, Slack, Salesforce, etc. Based on your specific use case, there are a variety of knowledge databases to choose from. For example, this RAG solution allows you to optionally use Amazon Aurora Serverless with pgvector and Amazon OpenSearch. For LangChain, there are also other alternatives, such as Semantic Kernel. For production level Chatbot application development, latency would be a crucial factor to optimize. Methods to reduce latency include optimizing the inference EC2 instance, reducing the model size, for example, by using quantization, and utilizing high performance computing, such as Ray.  

This blog illustrates how easy it is to set up and test out a Gen AI Chatbot, but for production level systems there are a multitude of other considerations that need to be considered. Mission Cloud is a premier AWS consulting partner that you can utilize to build out a production level system after you have proven the ROI based on this POC-based Gen AI Chatbot.

Written by