How to break away from the standard ChatGPT interface
How can we have a stylized and easily readable UI from ChatGPT? What we originally thought was simple turned out to be more challenging.
A while back, someone tossed around this cool idea to build an app that would leverage AI to provide users with a goal-oriented product strategy.
By using an LLM, specifically ChatGPT in this case, we could crunch the data of things like market analysis and pricing structures. This all happens with the natural language processing abilities LLMs provide. On top of that, we then get natural language back, which makes things easier to understand and implement.
The basic concept was that a user enters information about a goal or set of goals, and optionally some context about the company. A goal could be something as simple as, “We want to increase conversion rates on our landing page.” The additional context could be anything from what the landing page is for, target audience, amount of traffic, where you advertise this page, or pretty much any other information a user may think applies to achieving that goal.
At first glance, this was dead simple. OpenAI could not have made talking to ChatGPT with their API easier, as long as you want to build an interface similar to a chatbot or assistant. What if you wanted to break out of that design, though? Why would you want to break away from the standard chat interface? Great question, let me explain.
Most use cases with the current batch of AI apps work as a conversation. You ask a question, you get an answer, repeat. But In our use case the responses had more data than just a conversational message. The response we get back after ChatGPT does its magic, contains an initiative a company would undertake to achieve the goal they set out. An initiative will have activities to perform for that initiative, and there are plans to add more items such as details and documents for those activities.
This kind of data isn’t going to work well in a chat bubble. That led to the idea of having a more stylized and easily readable UI, and that led to how we talk to ChatGPT.
Unpacking ChatGPT
As mentioned above, ChatGPT interacts with the user in a chat-like way. You type something in, ChatGPT shows the lovely animated bubble to let us know it is replying, and eventually, you see text typing onto the screen as if someone were typing it in real time. The result is that most ChatGPT apps follow this same pattern and design.
But what happens behind the scenes while ChatGPT is “typing” its response?
At first, I thought the answer was simple: streaming data. Upon further digging, however, I learned that ChatGPT uses a type of stream called Server-Sent Events (SSEs). Not only was this format new to me, but it was also not what I was expecting.
In my experience, I usually see streaming data responses in somewhat usable chunks of data, the equivalent of complete sentences as opposed to snippets or phrases. But, with ChatGPT, users get back tokens in each SSEs. Tokens are generally 1-4 characters long. This could be a single punctuation mark or four characters within a word.
Either way, we’re not receiving a complete sentence, and definitely not a usable JSON object in a single response.
While there is an option not to stream the responses and simply let ChatGPT return the complete response when done, in our use case, that option was too slow.
We needed to find some sort of middle ground to get the streaming speed but have complete sets of data to pass to the UI.
Building a buffer
After asking around and getting some direction from others here at DEPT®, it became clear the best solution for creating that middle ground would be to build some sort of buffer. We need a place to catch the data coming from ChatGPT, handle formatting and validating that data, and then pass that on to the rest of the application.
In our app, we ask the user for at least a single input consisting of a business goal, typically something long or short-term. We send prompt ChatGPT with some information, including:
- The role it is acting as
- A desired format
- Two to three examples, also called Multi-Shot Prompting
- The user's input
ChatGPT tokenizes both requests coming in and responses going out. It essentially takes the text and chops it up into small bits for things like validating requests/responses that aren't too large and calculations for billing. The tokens are not usually a full word. The general rule of thumb is roughly four characters is equal to a token, but that’s definitely not a hard rule.
Example of how ChatGPT turns text into tokens.
Each SSE contains a token from the ChatGPT response, so we need to keep track of what was sent previously and what just came in.
As the store is updated, we constantly check to see if what we have in the store can be used in the UI. Everyone loves RegEx, so we have one here that checks to see if the data returned contains the pieces we need for the UI. Once that condition is satisfied, that entire chunk of the message is passed off to a formatter that converts the text to JSON to be stored in another part of the Svelte store.
When a completed message is found, we also remove it from the main buffer so we don’t have duplicate results.
In our results section of the page we have a UI component that is subscribed to (Svelte magic here again) the results array in the store. As a new complete (i.e. formatted and validated) result gets into the array, the UI updates to display it. The end result from the stream of SSEs ends up looking like this:
The results on the right have been through the buffer and formatter.
Final thoughts and lessons learned
When we first wrote this buffer system, ChatGPT 3.5 was good at getting us the format we wanted but not great.
Recently, however, both ChatGPT 3.5 and 4 have been updated with the ability to return JSON directly. With this update, the reliability of getting the format we coded the UI around has increased significantly, enough that we’re planning on refactoring it to use the updated JSON output.
Going forward, I think we’ll always need a validator in place to ensure the response we receive contains what we need for the UI. As we continue to explore, I’m hoping we can simplify the validation process and maybe remove the formatting helper altogether.
Alongside the model updates, there have also been updates to the OpenAI library we use for types. That library didn’t support streaming easily when we first wrote this system, so I’d like to revisit that to try and improve efficiency and reliability there as well.