dev-resources.site
for different kinds of informations.
Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint - Without Fixed Cost!
Why bedrock-access-gateway-function-url
This article is for GenAI builders who cares for all of these:
- No fixed cost, pay as you go priced
- Serverless LLM, no self hosting
- Multiple models in one codebase
A Typical GenAI Builder's Struggle
You are a builder specialized on AWS, maybe with a lot of AWS Credits like me.
You want to build GenAI applications when you found that most starters/examples are based on OpenAI's official Python/NodeJS SDKs, e.g.:
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(completion.choices[0].message)
# > Hello! What can I help you?
If you did read through Amazon Bedrock docs, you would realize that the data schema of the Bedrock Runtime Converse API for chat completions is very different from OpenAI's. If you need to allow model/provider switching in your GenAI application, this is particularly a burden because you might need to write very different implementations for each provider.
There are also other provider specific implementations: VertexAI, Gemini API, LangChain, etc. It takes effort to rewrite your code to cater for more models. If you are working on multiple projects, you might be maintaining the same set of code within different projects.
A New Hope - But It Comes with a Fixed Cost
To fix this issue, AWS has provided the great project aws-samples/bedrock-access-gateway
- It allows you to deploy an Application Load Balancer + Lambda/Fargate
pair so that you can use OpenAI's official SDKs with the OpenAI-API compatibile Rest API endpoint via the environment variables OPENAI_API_BASE
and OPENAI_API_KEY
.
It achieves goals #2 and #3 in the first section. You can work on projects utilizing OpenAI SDKs with ease.
The Fixed Cost Strikes Back
Yes it's absolutely great, but it's also costly if you are building your GenAI project particularly with your own money/limited budget:
-
Application Load Balancer is running 24/7 once deployed, it comes with a fixed cost per hour:
- $0.0225 per Application Load Balancer-hour; or
- $16.2 / month FIXED cost regardless of usage
- In addition, there is also the variable cost:
No. LCUs used * $0.008 per LCU-hour
-
Fargate (the alternative deployment option) is also running 24/7, so it also comes with an additional fixed cost on top of ALB:
- $0.04048 / vCPU hour
- $0.004445 / GB hour
- $35.5 / month FIXED cost under the default 1vCPU+2GB RAM setup
It's a cost nightmare especially for those who don't require 24/7 uptime and usage for the OpenAI compatible API endpoint.
Also if a fixed cost is unavoidable, why don’t we just start a cloud VM and put everything inside it instead?
Why Bedrock in the First Place?
Something feels wrong to me. I used Amazon Bedrock with the 1st reason being it's serverless nature and pay as you go capability - Why bother to pay a gigantic fixed monthly cost to host your own open sourced LLM with a VM paired with expensive GPU when you can just pick the serverless option?
The 2nd reason of picking Bedrock is on the ease of switching models.
With Bedrock, not only you can use proprietary models like Amazon Nova, but also it's immediate compatibility with other open source models like LLaMA 3.3 (While VertexAI is still offering LLaMA 3.2 at most) or Mistral by just changing the model
field in your code - without extra “endpoint deployments” - this is what other major Cloud AI providers can't provide at the moment.
For example for Azure AI, every non-OpenAI model needs to be deployed into separate inference endpoints:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
model_a = ChatCompletionsClient(
endpoint=os.environ["AZUREAI_ENDPOINT_URL_A"],
credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY_A"]),
)
model_b = ChatCompletionsClient(
endpoint=os.environ["AZUREAI_ENDPOINT_URL_B"],
credential=AzureKeyCredential(os.environ["AZUREAI_ENDPOINT_KEY_B"]),
)
For VertexAI, while you can use non-Gemini models with the same API endpoint as well as credential, the OpenAI API compatible endpoint by Google Cloud is still in beta as of time of writing (2025 Jan) - as well as multiple users are still reporting issues with tool calling.
Again, I want to stick with Bedrock with my OpenAI SDKs, but I am not willing to pay a fixed recurring cost for my GenAI application that might not generate 24/7 traffic.
When in Doubt, Read the Docs First
The maintainers of bedrock-access-gateway
suggested, namely for performance improvements that:
Also, you can use Lambda Web Adapter + Function URL (see example) to replace ALB or AWS Fargate to replace Lambda to get better performance on streaming response.
We could have used a Lambda Function URL to replace ALB via Lambda Web Adapter.
This sample app provided by AWS was based on a Lambda function with a Docker runtime, and as it's name suggests, it is a sample app not used for general purposes: Serverless Bedtime Storyteller
. With this example in place, I can build a serverless "Fixed Cost lessness" version of the bedrock access gateway.
Building the bedrock-access-gateway-function-url
project
I made a few tweaks from the original bedrock-access-gateway
project:
- Ditched
Magnum
for Lambda Web Adapter - Switched from a Docker Container runtime back to Python runtime with layers - as Lambda Docker runtimes are famous for it's cold start times
- Enabled the option (
--no-embedding
) to exclude embedding related dependencies which could drastically increase the build size -tiktoken
andnumpy
- Wrapped the Python handler with a custom entry point
run.sh
#!/bin/bash
PATH=$PATH:$LAMBDA_TASK_ROOT/bin \
PYTHONPATH=$LAMBDA_TASK_ROOT:$PYTHONPATH:/opt/python \
exec python3 api/app.py
This is necessary since using the Lambda Web Adapter resets some Python Path settings which would cause your Layered dependencies to be un-importable.
Lastly, the crux of my project is the very prepare_source.sh
file - it fetches the latest Python source of bedrock-access-gateway
with git
so that the latest efforts from the aws-examples
contributors are included. The scripts clones from the latest main
branch of the project, and copies the Python FastAPI implementation of the access gateway.
It also conducts an optional dependency reduction if you do not need to call the embeddings endpoint, as large PyPI dependencies like numpy
or tiktoken
could have been avoided.
Deployment
Straightforward. I personally recommend using the AWS CloudShell as you can even do so with your mobile AWS Console, and you can save some time by skipping the need of a Docker build:
sudo yum update -y
sudo yum install -y python3.12 python3.12-pip
(
cd /tmp && \
curl -L https://github.com/aws/aws-sam-cli/releases/latest/download/aws-sam-cli-linux-x86_64.zip -o aws-sam-cli-linux-x86_64.zip && \
unzip aws-sam-cli-linux-x86_64.zip -q -d sam-installation && \
sudo ./sam-installation/install
)
git clone --depth=1 https://github.com/gabrielkoo/bedrock-access-gateway-function-url
cd bedrock-access-gateway-function-url
./prepare_source.sh
sam build
sam deploy --guided
After within a minute, grab the value of FunctionUrl
as well as recall the value of ApiKey
value you supplied earlier in sam deploy
:
Outputs
Key Function
Description FastAPI Lambda Function ARN
Value arn:aws:lambda:us-east-1:123456789012:function:sam-app-BedrockAccessGatewayFunction-yLLzetPaKSq5
Key FunctionUrl
Description Function URL for FastAPI function
Value https://lukeskywalker.lambda-url.us-east-1.on.aws/
Successfully created/updated stack - sam-app in us-east-1
Now, test your own dedicated pay-as-you-go serverless infrastructure OpenAI-compatible API endpoint in your GenAI application!
curl "${FUNCTION_URL}api/v1/models" \
-H "Authorization: Bearer $API_KEY"
# {
# "object": "list",
# "data": [
# {
# "id": "amazon.titan-tg1-large",
# "created": 1735826872,
# "object": "model",
# "owned_by": "bedrock"
# },
# ...
# ]
# }
Alternatively, I have built a minimal UI based on the deep-chat
project so that you can test it without access to any local shell environment: https://chat.gab.hk/.
No worries about security - it’s an open sourced static website, no backend and tracking scripts. Just bring your own endpoint and key.
Return of Cost Effectiveness
With the new true serverless option, here are the costs incurred:
- Amazon Bedrock costs: Pay-as-you-go according to token usage
- Lambda Invocation costs: Per GB-second + Per Requests
So here is the final repository containing the entire setup:
https://github.com/gabrielkoo/bedrock-access-gateway-function-url
Feel free to fork it and create your own!
Next Steps
In order to further productionize it, here a list of to-dos that could have been done:
- Wrap the Function URL with Amazon CloudFront and adopt OAC - Reference Article - Secure your Lambda function URLs using Amazon CloudFront origin access control
- Experiment for the optimal memory size and timeout for the Lambda handler to achieve better cost efficiency
- Use provisioned throughput to further avoid Lambda cold starts
- Support multiple API keys by updating
api.auth.api_key_auth
logic - Support non-text/image content, such as
DocumentContent
orVideoContent
which are well supported by Amazon Bedrock Converse API.
Credits
Special thanks to the contributors of the following two projects. As without their efforts, this cost effective gateway won't even exist:
Featured ones: