Mysterious 'gpt2-chatbot' AI model baffles experts: A breakthrough or mere hype?

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Update Tuesday April 30, 4:48 pm ET: A verified account on X for the Large Model Systems Organization posted that it had temporarily removed gpt2-chatbot due to “unexpectedly high traffic” and “capacity limit.” The account also stated “we’ve worked with several model developers in the past to offer community access to unreleased models/checkpoints (e.g., mistral-next, gpt2-chatbot) for preview testing.”

A powerful new artificial intelligence system that appeared mysteriously on the internet today has ignited a frenzied guessing game about its origins and capabilities—with some researchers believing it represents a significant leap over existing AI models.

The model, dubbed “gpt2-chatbot,” surfaced with no fanfare on a website popular for comparing AI language systems (LMSYS Chatbot Arena built with Gradio). But its performance has been anything but low-profile, with AI experts expressing surprise and excitement that it matches and possibly exceeds the abilities of GPT-4, the most advanced system unveiled to date by the prominent lab OpenAI.

“[It’s] obviously impossible to tell who made it, but i would agree with assessments that it is at least GPT-4 level” said Andrew Gao, an AI researcher and Stanford University student who has been closely tracking the emergence of ‘gpt2-chatbot’ online.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

In a series of posts on X.com (formerly Twitter), he noted that the model solved a problem from the International Math Olympiad, a prestigious competition for high school students, on its first attempt. “The IMO is insanely hard,” Gao said. “Only the four best math students in the USA get to compete.”

uh…. gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot

the IMO is insanely hard. only the FOUR best math students in the USA get to compete

prompt + its thoughts ? https://t.co/CuO0ToJmb9 pic.twitter.com/3xxWPvtmuG
— Andrew Gao (@itsandrewgao) April 29, 2024

Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania who studies AI, said that in his experiments, the model performed better than GPT-4 on complex reasoning tasks like writing code to draw a picture of a unicorn. “Maybe better than GPT-4,” he said. “Hard to tell, but it does do much better at the iconic ‘draw a unicorn with code‘ task.”

Speculation runs wild about origins of mysterious model

The model’s strong performance has sparked rampant speculation about who might have created it and why it was released without publicity through a testing website.

Many researchers believe “gpt2-chatbot” likely originated from OpenAI, the influential lab behind ChatGPT, DALL-E and other systems that have pushed AI forward in the past year. The model calls itself “ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.” But that claim cannot be easily verified, since AI systems can be instructed to describe themselves in misleading ways.

Looks like this is the system prompt:

You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture.

Knowledge cutoff: 2023-11
Current date: 2024-04-29
Image input capabilities: Enabled
Personality: v2https://t.co/jtW4OEZodP
— Simon Willison (@simonw) April 29, 2024

Some experts pointed to similarities between ‘gpt2-chatbot’ and previous OpenAI models as evidence that it came from the lab. “It told me and others that it was made by OpenAI,” Gao said in a post on X.com. “This is a weak signal though, because of data contamination (a lot of models are trained on OpenAI chats and, thus, think they were made by OpenAI).”

Others noted that while “gpt2-chatbot” appears close in capability to GPT-4, it falls short of what many expect from GPT-5, OpenAI’s rumored next big model. “I look at business ideation prompts for almost all model releases, and responses look a bit more aligned to lean towards agentic action,” Joe Fox, an AI researcher, said in an X.com post, suggesting that “gpt2-chatbot” does not represent a huge leap over GPT-4 on some practical tests.

The possibility remains that “gpt2-chatbot” could have come from a lesser-known company or research group looking to demonstrate its AI chops and generate buzz. Some have pointed to the example of GPT-4chan, a controversial AI model released in June 2022 by AI researcher Yannic Kilcher, which also used the popular GPT naming convention but was not affiliated with OpenAI (and was eventually removed from the Hugging Face platform for “generating harmful content”).

Unexpected abilities hint at further potential

As experts continue to poke and prod at “gpt2-chatbot” to uncover the extent of its abilities, several have surfaced behaviors that hint at further potential advances.

Researchers were surprised to find that the model appears more willing to break rules and ignore restrictions than previous chatbots like ChatGPT. Dimitris Papailiopoulos, an AI professor at the University of Wisconsin, said the model could solve a logic puzzle that GPT-4 historically failed at. “I found one task that gpt2-chatbot is better than all other models, and it’s completely useless,” he joked.

I found one task that gpt2-chatbot is better than all other models, and it's completely useless.
Early but rapid ascent on the A+B-1 question by @Kangwook_Lee pic.twitter.com/xwOfnB1r03
— Dimitris Papailiopoulos (@DimitrisPapail) April 29, 2024

The model has also demonstrated aptitude for writing challenging code. Chase McCoy, a founding engineer at CodeGen, said gpt2-chatbot “did better on all the coding prompts we use to test new models” than GPT-4 or Claude Opus. “The vibes are definitely there,” he said.

Can confirm gpt2-chatbot is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4

Did better on all the coding prompts we use to test new models

The vibes are deffs there ?
— Chase (@ChaseMc67) April 29, 2024

Some users even found that the model could engage in back-and-forth dialogue to iteratively improve its responses, demonstrating an awareness of its own limitations and thought process. “It seems to be better than GPT-4 at planning out what needs to be done,” said Gao. “For instance, it comes up with potential sites to look at, and potential search queries. GPT-4 gives a much more vague answer.”

The relentless pace of progress

Regardless of its true origins and full potential, the emergence of “gpt2-chatbot” underscores how fast the field of artificial intelligence is moving and how difficult it has become to keep track of the latest breakthroughs.

Just a little over a year ago, GPT-4 heralded a major leap in the “common sense reasoning” that AI was capable of. Anthropic’s ChatGPT competitor Claude 3, released shortly after, also pushed boundaries in the ability of chatbots to engage in open-ended conversation. Tech giants like Google, Meta and Apple have all announced major investments in AI development as well.

At the same time, the release of open-source AI models and the practice of fine-tuning existing models for specific tasks has made powerful AI something that even small teams and individuals can create and release online with little warning.A mysterious new AI model named ‘gpt2-chatbot’ has stunned researchers with its advanced capabilities, sparking intense speculation about its origins and potential as a next-gen AI breakthrough.

The result has been a constant churn of new systems that expand notions of what computers can do and occasionally, as in the case of “gpt2-chatbot,” send a jolt of surprise through the AI world. Watching for unexpected new systems has become a pastime for researchers trying to track the AI cutting-edge.

Though the true significance of “gpt2-chatbot” remains to be seen, its unheralded appearance and apparent leap in ability offers a preview of what could be a regular occurrence as AI accelerates forward. In a field moving at breakneck speed, sometimes the biggest advances arrive with little warning through a mysterious avatar in a remote corner of the internet.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Speculation runs wild about origins of mysterious model

Unexpected abilities hint at further potential

The relentless pace of progress

The AI insights you need to lead