Larry Ludden - Enterprise Java Developer - AWS Cloud Architect
Back to home
Creating a chatbot - Using new AWS S3 Vector Buckets and Amazon Bedrock
Something I was exploring when I was at Subaru was creating an AI model/chat agent that could answer questions about their various models and trims but using a source of knowledge that was created from their digital brochures. There was a lot of promise (but a lot of different options), but after I had left, the cost per month for a dev instance was a bit more than I wanted to spend each month, which I believe was $50 a month at the cheapest, not including model access, etc.
But with the release of S3 Vectors, the storage cost for vectors is significantly cheaper, at least on a small scale; and perhaps even access and API calls, although I haven't done those comparisons yet.
In this high-level explanation (hopefully screenshots or a walkthrough in the future) we'll use AWS Bedrock to build a knowledge base and test it using various Bedrock models available by asking specific questions and comparing the answers. And while creating this is fairly straightforward, the number of various options, configurations, and models that are available make creating the "best" solution difficult with the many combinations that can be used (although AWS has potential help for that as well using Bedrock Evaluators). And the cost for this demo should be less than $0.10 to execute and almost no ongoing cost afterwards for the Knowledge Base since it's so small and using S3 Vector Buckets; but you can always delete it if you are completely done with it. This is the part that is great; previously the cheapest way I could find to keep knowledge bases around (regardless of size) was around $50 a month using the OpenSearch Serverless model. With S3 Vector storage the cost for small Knowledge Bases is almost nothing. The only prerequisite is having an AWS Account and full access to the account.
Note: Since S3 Vectors are in preview, the console doesn't have all of the options to manage the buckets which includes (believe it or not) deleting an S3 Vector bucket. I will give the process at the end to delete the buckets, but it will involve typing in commands (using the CLI) in CloudShell to remove the buckets.
The high-level steps are:
-
Create (or use an existing) S3 bucket/folder where the source of the Knowledge Base can be uploaded to (so that AWS can access it when creating the Knowledge Base)
-
Upload the source documents for the Knowledge Base. In this, I'm using Subaru's 2025 Forester Digital Brochure at https://www.subaru.com/content/dam/subaru/downloads/pdf/brochures/2025/2025_Forester_Brochure_120424.pdf
-
Enable some models in AWS Bedrock so you can access them. I enabled:
-
Titan Text Embeddings V2
-
Nova Lite
-
DeepSeek-R1
-
Create a Knowledge Base using the digital brochure
-
Test the Knowledge Base by asking specific questions and evaluating the answers
-
Delete the Knowledge Base and the S3 Vector Bucket
Steps
Create or use an existing S3 Bucket
In your account, go to the S3 console. If you are creating a new bucket, walk through the steps of creating a new bucket and then create a folder in that bucket to upload the source files to. If you are using an existing bucket, it still makes sense to create a new folder for the source files.
Upload source files to S3 bucket and folder
For this article we are just uploading a single file; but you can add as many as you want to and the process for creating the knowledge base would be the same. To follow along, download the Subaru 2025 Forester brochure from the link above and then, using the S3 console, upload the file into the new folder in the S3 bucket that you just created.
Enable models in Amazon Bedrock
Go to the Amazon Bedrock console. Near the bottom on the left-hand side there should be a link for Model Access. Click on that and you should be able to enable various models. I don't believe there is a charge for just enabling a model (but double-check to be sure). For this example, enable the models:
-
Titan Text Embeddings V2
-
Nova Lite
-
DeepSeek-R1
Create your Knowledge Base
Go to the Amazon Bedrock console. On the left-hand side there should be a "Knowledge Bases" link you can click on. From there, select Create and then "Knowledge base with vector store":
-
Leave or change the name and add an optional description
-
IAM Permissions: Create and use a new service role
-
Data source type: Select Amazon S3 (this is where the data for the knowledge base exists)
-
Click Next at the bottom
Configure Data Source
-
Data source location: This AWS Account
-
S3 URI: Use the Browse button and select the new folder that you created (which has the PDF file in it) from step 2
-
Parsing Strategy: Leave it set to Use Amazon Bedrock default parser
-
Chunking Strategy: Leave it set to Default chunking
-
Click Next at the bottom
Configure data storage and processing
-
Embedding Model: If you enabled it earlier you should be able to select "Titan Text Embeddings v2"
-
Vector Store: Leave it set to Quick create a new vector store
Vector store type: Select Amazon S3 Vectors -
Click Next at the bottom
On the Review and create screen, click Create Knowledge Base.
This will take a minute or two and then take you to the page for the Knowledge Base you just created. From here you need to sync the data source initially for Bedrock to parse the data. Under Data source, click the check box next to the data source and then click the Sync button. The status of the Data source should switch to Syncing and then quickly go to Available. Now we are able to test the knowledge base by asking various questions.
Test the Knowledge Base
At the top of the Knowledge Base page you are on, you should see a Test Knowledge Base option. Click that and then you should be in the chat interface. Make sure for Retrieval and response generation you have "Retrieval and response generation: data sources and model" selected. And then we can choose various models to see how well they do with the data in the knowledge base.
Initially select Nova Lite as the Model. There are a lot of configuration options, maybe that can affect the cost and accuracy of the answers, but for this we'll leave them at the defaults. Then go to the Preview space
I’ll start with: What trims are offered in the 2025 Subaru Forester? and I get back:
The 2025 Subaru Forester is offered in five trims: Base, Sport, Premium, Limited, and Wilderness. The Wilderness trim is the most advanced and is designed for off-roading and rugged terrain.
Which… isn’t really correct; the answer left out the Touring trim option. If you go to the Source Chunks configuration option and increase it from 5 to 10 (which is basically how much data will get passed to the model from the knowledge base) and ask the question again, this time I get:
The 2025 Subaru Forester is available in five trims: Forester, Forester Premium, Forester Sport, Forester Limited, and Forester Touring.
Which technically is correct; the Wilderness trim that is in the brochure is a 2024 model. So if I ask: What trims are offered in the 2024 Subaru Forester?
I get back:
The 2024 Subaru Forester is available in six trims: Forester, Forester Premium, Forester Sport, Forester Limited, Forester Touring, and Forester Wilderness.
Which is correct, but… it's hard to say if it’s lucky since the brochure only mentions one 2024 Forester trim.
Let's switch to a new model. Under model, select DeepSeek-R1. We'll leave the source chunks at 10 to be similar and ask the new model the same questions. I’ll start with: What trims are offered in the 2025 Subaru Forester? and I get back:
The 2025 Subaru Forester is offered in the following trims: Forester, Forester Premium, Forester Sport, Forester Limited, Forester Touring, and the 2024 Forester Wilderness. The Wilderness trim appears to carry over from the 2024 model year as part of the 2025 lineup.
This is a great answer and appears to be exactly correct; it even navigated the confusing aspect of having a 2024 trim in the 2025 brochure.
Let's try another question with DeepSeek: What kind of warranties are offered? I get back:
The 2025 Subaru Forester comes with several standard warranties: a 3-year/36,000-mile New Car Limited Warranty, a 5-year/60,000-mile Powertrain Limited Warranty, and a 3-year/36,000-mile Wear Item Limited Warranty covering brake pads, wiper blades, and transmitter batteries. It also includes 24-hour roadside assistance for 3 years/36,000 miles and a 5-year unlimited-mileage Rust Perforation Limited Warranty. Subaru offers optional Added Security extended service agreements and maintenance plans for up to 10 years or 100,000 miles.
Which is another great answer.
And that’s the end. I will say that during my testing I was able to get not accurate answers from the various models. There are so many options in this process (only some of which I mention below) that can all change the accuracy and answers that are being given:
-
Which model (and method) to use to create the knowledge base
-
Which model to use to query the knowledge base
-
The number of search entries that should be retrieved and given to the model
-
Various prompts that are given on top of the prompt that you supply
-
And so many more…
The really nice thing is that even if you don't delete your knowledge bases you shouldn't be charged much at all on an ongoing basis (less than a penny a day). But if you remove the knowledge base when you are done you do have to also remove the calculated data which is in the S3 Vector Store. And for this (since the feature is in Preview) we'll need to use the command line.
Deleting the Knowledge Base and the S3 Vector Bucket
Initially, delete the Knowledge Base through the Bedrock console Next, go to the S3 Console. To delete a vector bucket, you first need to delete all of the vectors, then all of the vector indexes, and then the vector bucket itself. Whew. Luckily (because we would need to script it), it does delete all of the vectors when you delete a Knowledge Base. So we only have to remove the index and the bucket.
From the S3 console, select Vector Buckets and then click on your Vector Bucket. This should show you a list of Vector Indexes in that bucket (there should just be one). We will need the index name and the bucket name for the following commands. At the bottom of the page you should see a CloudShell option. Clicking on that should bring up a terminal window at the bottom of the screen.
Next, click into the CloudShell window and let's remove the vector index. The command is:
aws s3vectors delete-index --vector-bucket-name [bucketName] --index-name [indexName]
If successful, the command will not output anything and just return you to the command line. Next, we'll delete the bucket itself. The command is:
aws s3vectors delete-vector-bucket --vector-bucket-name [bucketName]
Again, if successful, the command will just return you to the command line. And that should be it. Refresh your Vector Bucket list in the S3 console and that bucket should be gone, and there should be no more charges going forward.
Articles
Creating a chatbot - Using new AWS S3 Vector Buckets and Amazon Bedrock
(AUGUST 4 2025)Some useful AWS Java Lambda development hints
(JUNE 3 2025)Comparison of AWS Container Options and Cost
(APRIL 30 2025)Solving a Dynamic Routing Challenge with AWS Services : A Cost-Saving IOT Project
(APRIL 16 2025)Bastion Hosts in AWS, Part 1 : Options and CloudFormation Template
(MARCH 7 2025)Amazon CloudFront VPC Origins: Enhanced Security and Simplicity
(JANUARY 3 2025)AWS Basic VPC CloudFormation Template ( Yet another )
(DECEMBER 3 2024)Spot NAT Instances - A cheaper AWS NAT Gateway alternative
(NOVEMBER 23 2024)