CARNEGIE MELLON UNIVERSITY, EurekAlert!; AI chatbots remain overconfident -- even when they’re wrong
"Artificial intelligence chatbots are everywhere these days, from smartphone apps and customer service portals to online search engines. But what happens when these handy tools overestimate their own abilities?
Researchers asked both human participants and four large language models (LLMs) how confident they felt in their ability to answer trivia questions, predict the outcomes of NFL games or Academy Award ceremonies, or play a Pictionary-like image identification game. Both the people and the LLMs tended to be overconfident about how they would hypothetically perform. Interestingly, they also answered questions or identified images with relatively similar success rates.
However, when the participants and LLMs were asked retroactively how well they thought they did, only the humans appeared able to adjust expectations, according to a study published today in the journal Memory & Cognition.
“Say the people told us they were going to get 18 questions right, and they ended up getting 15 questions right. Typically, their estimate afterwards would be something like 16 correct answers,” said Trent Cash, who recently completed a joint Ph.D. at Carnegie Mellon University in the departments of Social Decision Science and Psychology. “So, they’d still be a little bit overconfident, but not as overconfident.”
“The LLMs did not do that,” said Cash, who was lead author of the study. “They tended, if anything, to get more overconfident, even when they didn’t do so well on the task.”
The world of AI is changing rapidly each day, which makes drawing general conclusions about its applications challenging, Cash acknowledged. However, one strength of the study was that the data was collected over the course of two years, which meant using continuously updated versions of the LLMs known as ChatGPT, Bard/Gemini, Sonnet and Haiku. This means that AI overconfidence was detectable across different models over time.
“When an AI says something that seems a bit fishy, users may not be as skeptical as they should be because the AI asserts the answer with confidence, even when that confidence is unwarranted,” said Danny Oppenheimer, a professor in CMU’s Department of Social and Decision Sciences and coauthor of the study."
No comments:
Post a Comment