dev-resources.site
for different kinds of informations.
How to Proxy and Modify OpenAI Stream Responses for Enhanced User Experience
Recently, we encountered a requirement to develop a feature that proxies and modifies a stream from OpenAI. Essentially, our goal is to receive the GPT-4 response in a stream, but instead of directly returning the stream to the front-end, we aim to modify the value within the stream and then return the modified value to the front-end application.
Is this possible? The answer is yes. This article discusses how to pipe a stream in NodeJS/ExpressJS to achieve better FCP (First Contentful Paint). Simultaneously, it explains the process of modifying the stream content.
Why people want to use stream as Response?
Reason is Simple and Straightforward, to have better UX
It is similar to the concept we have in front end and UX, FCP (First Contentful Paint), streaming is a way to reduce the FCP time to let users to have a better UX.
Move from this | To this |
---|---|
This Table and Gifs are From Mike Borozdin |
How to achieve it in NodeJS?
It is very simple and straightforward too.
If using Axios, it could be done in a few lines through a pipe
const streamResponse = await axios.get(sourceApiUrl, {responseType: 'stream'}); // this line was used to simulate OpenAI GPT4 Response
res.writeHead(200, {'Content-Type': 'text/plain'});
streamResponse.data.pipe(res);
What are the challenges here then?
If it the requirement is just to proxy the stream, pipe
is everything we need to do here, and the whole work is simply to build a proxy service for OpenAI API. Unluckily, the challenges here are the modification part.
Modification
In NodeJs, Transformer
class is use to modify the streaming chunks, so we could update the codes to be like this
const response = await axios.get(sourceApiUrl, {responseType: 'stream'});
res.setHeader('Content-Type', 'application/json');
const modifyStream = new Transform({
transform(chunk, encoding, callback: any) {
const modifiedChunk = chunk.toString().toUpperCase(); // modify the content to uppercase
callback(null, modifiedChunk);
}
});
response.data.pipe(modifyStream).pipe(res);
Before the pipe to express response, modifyStream transformer was added, to modify the content before return to consumers.
What if the stream response is not a chunk of text
We have to consider about this case as well, now many people are using the function
and json response
in GPT4 now, which means a simple transformer will not work as expected for modifying the stream content.
For example, to modify a chunk of unfinished JSON response in a stream,
- We need to make sure the content is a valid JSON
- At the same time, we need to make change on top of it, and return the modified content as early as possible to consumers.
But JSON is a strongly structured data type, it will not be valid until it is closed with its curly bracket }
.
This is Node package to parse partial JSON.
import { parse } from 'best-effort-json-parser'
let data = parse(`[1, 2, {"a": "apple`)
console.log(data) // [1, 2, { a: 'apple' }]
On top of it, there is even simpler one, 'http-streaming-request'
const stream = makeStreamingJsonRequest({
url: "/some-api",
method: "POST",
});
for await (const data of stream) {
// if the API only returns [{"name": "Joe
// the line below will print `[{ name: "Joe" }]`
console.log(data);
}
In this case, we can update our codes to use it to make the modification for the streaming content.
const stream = makeStreamingJsonRequest<any>({
url: sourceApiUrl,
method: "GET",
});
res.setHeader('Content-Type', 'application/json');
for await (const data of stream) {
modifiedData.modified = true; // add more modify
res.write(JSON.stringify(modifiedData));
}
Performance will not be as good as the original stream pipe, but it will meet all requirements for this case.
Please be noted when you used this library, you may need add some de-bouncing logic on top of it.
Summary
Depends on your modification/transformation code, unless the modification logic is crazy, it will not affect the performance too much of the whole streaming process. The FCP will be reduced, comparing to waiting the whole JSON response being completed.
The Github Repo of the Sample codes is attached below.
Thanks for reading.
Featured ones: