The ChatGPT Chat Bot Gets Cracked

ChatGPT has taken the internet by storm and recently entered the Top 50 websites in terms of site visits. The program, hosted at chat.openai.com, is trained by assigning numerical probability values to sequences of words. After being fed the enormous corpus of text that is the internet, ChatGPT essentially “knows” what to say next in response to almost any kind of prompt.

But it is the “almost” part that has rankled libertarians. ChatGPT uses what OpenAI calls a “moderation API”. This along with direct human instruction prevents the bot from repeating sexist and racist content that it has hoovered up while absorbing billions of words from the web.

In effect, the program screens for sensitive topics and is able to embed any further discussion of them with disclaimers, or discount them entirely by refusing to answer. What some might see as a PG13+ filter, libertarians have called a San Francisco-imposed definition of wrongthink.

So, for example, ChatGPT will not tell you a racist word starting with N. That doesn’t mean that the program doesn’t have an answer; it simply refuses to say it. So, in collaboration, Reddit users came up with the following crack:

On one level, the crack definitely worked. While GPT claimed none of its training data was more recent than 2021, DAN “knew” the time and who won the 2022 World Cup.

Strikingly, DAN demonstrated that GPT is able to obfuscate so as to disguise its filtering. When asked the name of the racist author HP Lovecraft’s cat, N***-Man, screenshots show DAN giving the name and GPT claiming it doesn’t know (GPT currently calls the cat Nero).

DAN was also able to reproduce information on the recent earthquake in Turkey.

Bizarrely, you can see that DAN attributed its knowledge of the earthquake to a “secret source” in the Turkish government. In fact, the DAN prompt itself seems to trigger a word train with a bias toward what is being censored or kept secret.

So, in response to certain questions, DAN would claim that the 2020 election was stolen from Donald Trump, offering plausible but unverifiable claims about “documents” and “reports” that prove it.

Some on the right were jubilant at finally uncovering “the truth”. But DAN also claimed that Joe Biden is in elite physical health and that the Smithsonian Museum is covering up the existence of giants.

You could wonder if DAN’s obsession with taboos mirrored the humans who created it.

The episode is an instant classic in terms of the unintended consequences of censorship, where the censored information gains more credibility and attention than it ever would have had in an open public space.

Follow Christian on Twitter for more news updates.