News
One of Meta's newest AI models, Llama 4 Maverick, ranks below rivals on a popular chat benchmark. Meta didn't originally ...
When an AI model secretly relies on a hint or shortcut while constructing an elaborate but fictional explanation for its answer, it essentially fabricates a false reasoning narrative—a little like a ...
Artificial Analysis co-founder George Cameron told TechCrunch that the organization plans to increase its benchmarking spend ...
As AI companies look to find ways to support their incredibly expensive models, it appears Anthropic will follow in the ...
Anthropic released a new study on April 3 examining how AI models process information and the limitations of tracing their ...
Anthropic launches new Claude Max subscription tiers at $100 and $200 monthly, challenging OpenAI's premium offerings while targeting power users who need expanded AI assistant capabilities.
For the past month and counting, Claude 3.7 Sonnet has played Pokémon Red very poorly. We look at why that is.
Reasoning models—those AIs like Anthropic’s Claude 3.7 Sonnet and DeepSeek R1 -- that show their step-by-step ...
DeepSeek and OpenAI’s o1 models performed the best across the various benchmarks, but all models still struggle in a range of ...
Llama 4 consists of three new models: Scout, Maverick, and Behemoth. While each model has a different expertise, Meta claims ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results