Cameron Collis
designer
The Limits of Chat Interfaces and LLMs
November 24, 2023

A year ago, OpenAI shared the power of Large Language Models (LLM), and ever since, product teams have scrambled to add this new technology to their applications. During this time I’ve watched countless marketing videos showing a user inputting text into a field, making a request of an application in natural language, and each time, the application magically completes the request. But the novelty has worn off. I've become desensitised to the 'happy path' marketing videos. Which only show the ideal interaction between the user and application.

It seems every time this technology appears, it's in the form of a chat interface, and the user is required to interact with there application using natural language. Which is great when the user takes the ‘happy path’, but a well designed application removes negative experiences, as well as creating positive experiences, and this is where chat interfaces fail.

Since the 1970’s, graphical user interfaces have been the prominent software interface. Icons, buttons, fields, and menus are clear signifiers. Guiding the user through the application and helping them complete their task. In comparison, a chat interface presents a blank text field. Which doesn't signify to the user what the application does, how it works, and what tasks are possible.

How does the user discover the application's constraints, when it appears anything is possible? They could use the blank text field to ask the application, indicating the interface lacks discoverability. Or they could read the help documentation, which defeats the purpose of the interface leveraging natural language. Most likely the user will attempt to learn the application's constraints through a self-guided process of trial and error.

This self-guided process increases the user's cognitive load and likelihood of experiencing information overload. As they’re constantly switching between tasks and juggling multiple pieces of information in their working memory. The user must remember what prompts were previously successful, recall information to include in their prompt, and use ‘prompt engineering’ best practices.

When the application’s output doesn’t align with users expectations. The user wonders if it was themself or the application which is at fault. They’re unsure if their prompt was low quality, or if their request was outside the application's constraints. The user receives no feedback informing them why the output didn’t align with their expectations, or the possible remedies. A negative experience has occurred, and instead of reassurance the user feels a lack of control. They’re angry and frustrated.

This experience is acceptable for the technology enthusiast. Who enjoys trialling innovative technology, with expectations the experience will be buggy and inefficient. But mainstream users in the larger market don’t share the same psychographics as the technology enthusiast. They’re pragmatic and risk averse. They’ll wait for a disruptive technology to prove itself before they purchase it.

The principles of good design will eventually catch up and push LLM’s into the larger market. If the user experience features a chat interface, graphical interface, or somewhere inbetween is unclear. Until then, product teams must anticipate and remove the negative experiences a chat interface creates. And consider more than the ideal interaction between user and application. 

The reality is it's hard for product teams. Who are still learning the limitations of this new technology and how to consider the many paths a user can take when interacting with natural language. So where should product teams start? It’s clear chat interfaces lack discoverability and feedback - two traits of a well designed application. But I encourage teams to start with the user's mental model.

A mental model is formed from previous experiences and is the user's internal understanding of how something works. Chat interfaces attempt to leverage the user's mental model to make the application easier to use. But talking to a LLM is different from talking to a human. A LLM lacks the ability to recall past conversations, interpret non-verbal cues, or recognise its own mistakes. Its knowledge is confined and based on statistical patterns. Unlike human reasoning.

But a chat interface powered by a LLM can create a consistently good experience for simple and frequently repeated tasks. When the user has a clear objective and they know what they want. Tasks of this nature typically don’t require the user to be a skilled ‘prompt engineer’ or result in requests which are outside of the application’s constraints. 

Chat interfaces powered by a LLM can also create a consistently good experience when the LLM is built to excel at domain specific tasks. This happens by fine-tuning the LLM with domain specific data or using a process called Retrieval-Augmented Generation (RAG), to allow the LLM to access a domain-specific dataset.

Until there is a technological breakthrough allowing technology to reason like humans. The challenge for product teams is to find where simple and frequently repeated task, and built to excel at domain specific task overlap. While still starting with a good understanding of the user, the value they wish to create for them, and turning this into experiences where technology positively impacts the user's life.

If the product team commits to a chat interface, the challenge is to help the user form an accurate mental model of the application's constraints. This is a challenge for the entire organisation, not just the product team. As the user’s mental model is formed from ads, website copy, and the ‘happy path’ marketing videos. Every touchpoint must accurately communicate what the application does, how it works, and what tasks are possible. With an accurate mental model the user is less likely to ask an application which is built to access medical results, to book a flight from London to Paris.

Once the user has an accurate mental model, discoverability and feedback become less of a problem, but by how much is uncertain. Even with those ‘suggested prompts’. It’s an inherent trait of a chat interface to lack clear signifiers, and reassurance when the output doesn’t align with the user’s expectations. Natural language is just too complex, there are a plethora of paths a user can take, and until the technology powering these experiences improves, a chat interface isn’t going to get it right every time.