Why Does ChatGPT Hallucinate?

Large-language models (LLMs) like ChatGPT are known to occasionally hallucinate. This means the information they present may be made up and not be accurate.

One analysis found ChatGPT invented information 3% of the time. The figure was even higher for other major tech companies’ LLM programs. But even for ChatGPT, one research team found dramatic changes in its accuracy solving maths problems within the span of just a few months.

Why does it happen?

Why Does ChatGPT Hallucinate? How Accurate is It?

As Lakera AI notes, there is a randomness inherent to these programs’ text generation that can leak into the final product. Developers must balance reliability with a high “temperature” setting.

An LLM’s “temperature” refers to the degree of randomness in its text generation. A higher temperature setting makes large-language models appear more creative and also makes them more fun to use. LLMs can be set up to allow users to moderate the algorithm’s “temperature” setting.

In fact, so-called hallucinations can be an effective way to use LLMs. Creative writing or brainstorming hypotheses to a problem may all be enhanced if an LLM combines language in more random ways, just as creative people combine ideas in unusual and unexpected ways.

*Random combinations of ideas, as South Park once described the writing of Family Guy, allows a computer to give the illusion of creativity.*

This can also occur by way of inference from the question or prompt a user puts to the program. In the case of the DAN crack of ChatGPT, wording implying government censorship and conspiracies led the “DAN” version of ChatGPT to reference other conspiracies not mentioned by the user and to present them as fact.

If, on the other hand, you’re using an LLM to check information and you need reliability, you want to bring the temperature setting right down. This outcome can also be produced by giving the program a more specific prompt or question, with substantial contextualizing information, before you ask it for information.

Much like working with humans, there is a something akin to a spectrum of reliability and creativity with large-language models. The challenge is that at any given moment, a user can’t be entirely sure what end of the spectrum these programs are giving back to them.

This helps explain why more and more workplaces are blocking their employees from using these programs.