Modeling and Reasoning Math

Morning Overview on MSNOpinion

Top AI models are failing hard at solving fresh math problems

Top artificial intelligence systems now ace many textbook-style math questions, yet they still fall apart on genuinely new ...

Communications of the ACM

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

A marriage of formal methods and LLMs seeks to harness the strengths of both.

NextBigFuture

OpenAI o1 Model Sets New Math and Complex Reasoning Records

OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding ...

ExtremeTech

Microsoft Unveils Phi-4: New AI Model for Mathematical Reasoning

Phi-4 will compete with other small models such as GPT-4o mini, Gemini 2.0 Flash, and Claude 3.5 Haiku. Share on Facebook (opens in a new window) Share on X (opens in a new window) Share on Reddit ...

ExtremeTech

Microsoft's Phi-4-Reasoning Models Bring AI Math and Logic Skills to Smaller Devices

Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...

Business Insider

This DeepSeek demo shows how good the Chinese AI model is at math and reasoning

You're currently following this author! Want to unfollow? Unsubscribe via the link in your email. Follow Alistair Barr Every time Alistair publishes a story, you’ll get an alert straight to your inbox ...

Forbes

OpenAI Unveils O1 - 10 Key Facts About Its Advanced AI Models

OpenAI has introduced the o1 series, its most sophisticated AI models to date, which are designed to excel at complex reasoning and problem-solving tasks. The o1 models, which use reinforcement ...

jagranjosh.com

Google and OpenAI Model Wins Gold at International Math Olympiad

Google DeepMind and OpenAI, both companies, have won gold medals due to terrific performance at the prestigious International Mathematical Olympiad 2025. In total, both companies solved five out of ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results