For the experiment, the authors β representing a diversity of mathematical fields β each contributed one test question that arose from research they had in the works but had not yet published. They also determined the answers; these solutions are encrypted online and will be released on Feb. 13.
βThe goal here is to understand the limits β how far can A.I. go beyond its training data and the existing solutions it finds online?β said Dr. Kolda, who is one of few mathematicians to be elected a member of the National Academy of Engineering.
The team conducted preliminary tests on OpenAIβs ChatGPT-5.2 Pro and Googleβs Gemini 3.0 Deep Think. When given one shot to produce the answer, the authors wrote, βthe best publicly available A.I. systems struggle to answer many of our problems.β
LLMs are bad at solving math problems not in their training data
π Connected ideas
Join the Conversation
Share your thoughts and go deeper down the rabbit hole