Reinforcement Learning LLM

News

Now it’s TikTok parent ByteDance’s turn for a reasoning AI: enter Seed-Thinking-v1.5!

It achieved an 8.0% higher win rate over DeepSeek R1, suggesting that its strengths generalize beyond just logic or math-heavy challenges.

ExtremeTech on MSN1d

What Is an LLM and How Does It Work?

Modern AI LLMs can seem almost magical when you use them. But, just like even the best magic tricks, there is an explanation ...

scmp.com29d

ByteDance advances DeepSeek work in AI reasoning with open-source project led by intern

TikTok owner ByteDance, which has invested heavily in artificial intelligence (AI), has unveiled a new system that claims to improve on the work done by DeepSeek in training AI reasoning models.

InfoWorld25d

Databricks’ TAO method to allow LLM training with unlabeled data

or task-specific verification utilizing LLM judges or custom rules. In the reinforcement learning stage, the model is updated or tuned so that it produces outputs more closely aligned with high ...

SiliconANGLE24d

Databricks partners with Anthropic and touts breakthrough in reinforcement learning

Separately, Databricks said it has found a new fine-tuning method that leverages Test-time Adaptive Optimization, a type of reinforcement learning that ... Traditional LLM training often relies ...

11d

DeepSeek unveils new technique for smarter, scalable AI reward models

Reward models holding back AI? DeepSeek's SPCT creates self-guiding critiques, promising more scalable intelligence for enterprise LLMs.

25d

Databricks Has a Trick That Lets AI Models Improve Themselves

Using several recent innovations, the company Databricks will let customers boost the IQ of their AI models even if they don’t have squeaky clean data.

Unite.AI12d

Bespoke LLMs for Every Business? DeepSeek Shows Us the Way

Once upon a time, the tech clarion call was “cellphones for everyone” – and indeed mobile communications have revolutionized ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results