Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Emergent large language models (LLMs) such as OpenAI’s ChatGPT (particularly its latest iteration, GPT-4), Claude AI and Gemini have demonstrated limited decision-making. In this article, we’ll discuss contemporary research surrounding decision-making by LLMs and what it could mean for their future.
Traditionally, effective decision-making in LLMs refers to deducing underlying patterns or rules and flexibly and appropriately applying them to new scenarios to make a decision. An experiment by the Santa Fe Institute noted that LLMs, including ChatGPT, could not “reason about basic core concepts.” Making well-reasoned decisions relies on a nuanced understanding of the prompt’s context and the output’s consequences.
Moreover, poor LLM decision-making tends to yield disastrous results in practice. For example, in 2023, the National Eating Disorder Association had to suspend their AI chatbot. “Tessa” started dispensing insensitive nuggets of advice such as weekly weigh-ins and eating at a 500 to 1,000 calorie deficit. The chatbot was quickly disabled amid a storm of controversy.
Rather than just providing incorrect information, LLMs can also recommend generic outcomes. INSEAD noted that when ChatGPT was prompted for research questions about business strategies, the model tended to veer towards generic, conventional wisdom concerning participative management. For example, LLMs tend to recommend collaborative working, fostering a culture of innovation and aligning employees with organizational goals. But business strategizing is a complex social and economic process that does not benefit from generic advice.
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
- Turning energy into a strategic advantage
- Architecting efficient inference for real throughput gains
- Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO
A counterargument may be: “If you want LLMs to produce business strategies or healthcare advice, why not train them to do so specifically?” The handling of contextual data is not an issue that can be resolved by widening the model’s parameters or training it on more data. Getting an LLM to make a decision based on a nuanced context cannot be solved by augmenting a dataset. Simply inputting more data could introduce or exacerbate preexisting biases and increase computational requirements.
Enabling context-appropriate decision-making
Training LLMs in context-appropriate decision-making demands a delicate touch. Currently, two sophisticated approaches posited by contemporary academic machine learning research suggest alternate ways of enhancing the decision-making process of LLMs to parallel those of humans. The first, AutoGPT, uses a self-reflexive mechanism to plan and validate the output; the second, Tree of Thoughts (ToT), encourages effective decision-making by disrupting traditional, sequential reasoning.
AutoGPT represents a cutting-edge approach in AI development, designed to autonomously create, assess and enhance its models to achieve specific objectives. Academics have since improved the AutoGPT system by incorporating an “additional opinions” strategy involving the integration of expert models. This presents a novel integration framework that harnesses expert models, such as analyses from different financial models, and presents it to the LLM during the decision-making process.
In a nutshell, the strategy revolves around increasing the model’s information base using relevant information. When applied to real-world scenarios, the presence of the expert models significantly increases the LLM’s decision-making capabilities. The model goes through the stops of “thought-reasoning-plan-criticism,” using the expert models to construct and review the LLMs’ decisions.
If successfully deployed, LLMs with expert models can analyze more information than humans, indicating that they can make more informed decisions. However, AutoGPT suffers from a limited context window — essentially, the model can only process a limited number of tokens simultaneously, meaning its memory is constrained. Using AutoGPT can, therefore, result in infinite loops of interactions. Hence, initially presenting all available information can garner more effective output than the user injecting information during a long conversation with the model.
How ‘Tree of Thoughts’ simulates human cognition
Another high-potential framework for increasing the accuracy of LLMs is ToT, which aims to simulate human cognition. Human cognition revolves around generating and comparing different options or scenarios when making decisions. Therefore, like the additional opinions strategy, ToT traces faulty decision-making in LLMs to their linear inference processes. Like AutoGPT, ToT’s dependent variable measured the ability of LLMs to obey natural language instructions by completing puzzles and complex tasks, like crosswords and creative writing.
Linear inference processes in LLMs are conceptualized by “Chain of Thought,” an approach that encourages transparency in LLMs by demonstrating sequential, step-by-step decision-making. However, ToT aims to disrupt this by enhancing the models’ self-critical capabilities and examining multiple reasoning paths.
For example, when playing Game of 24 (where players use basic mathematical operations to turn 4 numbers into 24), Chain of Thought struggles to conceptualize different outcomes — that is, which numbers can be added, subtracted, multiplied and divided to generate 24. Therefore, GPT-4 achieved only a single-figure accuracy rate. Because of ToT’s ability to map out the different outcomes, the framework achieved a 74% accuracy rate.
Ultimately, if LLMs develop consistently better judgement, humans and AI may collaborate on strategic decision-making in the future. ToT suggests application to ‘coding, data analysis and robotics’, while AutoGPT more ambitiously references general intelligence.
Either way, academic research in AI is producing new, practical strategies to induce more cognitive decision-making in LLMs. The pre-existing advantage of LLMs is their ability to analyse mass volumes of data quickly — more than a human would be able to. Therefore, if successful, LLMs could match — or even surpass — human decision-making within the next few years.
Vincent Polfliet is a senior machine learning engineer at Evolution AI.
Miranda Hartley leads copywriting at Evolution AI.