


Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

How procedural memory can cut the cost and complexity of AI agents

OpenCUA’s open source computer-use agents rival proprietary models from OpenAI and Anthropic

LLMs generate ‘fluent nonsense’ when reasoning outside their training zone

GEPA optimizes LLMs without costly reinforcement learning

Salesforce’s new CoAct-1 agents don’t just point and click — they write code to accomplish tasks faster and with greater success rates

New ‘persona vectors’ from Anthropic let you decode and direct an LLM’s personality

Google’s new diffusion AI agent mimics human writing to improve enterprise research

‘Subliminal learning’: Anthropic uncovers how AI fine-tuning secretly teaches bad habits

New AI architecture delivers 100x faster reasoning than LLMs with just 1,000 training examples

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it

New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap

Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems

A new paradigm for AI: How ‘thinking as optimization’ leads to better general-purpose models

New 1.5B router model achieves 93% accuracy without costly retraining

Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

AI agents are hitting a liability wall. Mixus has a plan to overcome it using human overseers on high-risk workflows

Kumo’s ‘relational foundation model’ predicts the future your LLM can’t see

Beyond static AI: MIT’s new framework lets models teach themselves

Google’s Gemini transparency cut leaves enterprise developers ‘debugging blind’

Meta’s new world model lets robots manipulate objects in environments they’ve never encountered before

AlphaOne gives AI developers a new dial to control LLM ‘thinking’ and boost performance

Stop guessing why your LLMs break: Anthropic’s new tool shows you exactly what goes wrong

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs

s3: The new RAG framework that trains search agents with minimal data

Why enterprise RAG systems fail: Google study introduces ‘sufficient context’ solution

Fine-tuning vs. in-context learning: New research guides better LLM customization for real-world tasks

Mem0’s scalable memory promises more reliable AI agents that remembers context across lengthy conversations

The ‘era of experience’ will unleash self-learning AI agents across the web—here’s how to prepare

30 seconds vs. 3: The d1 reasoning framework that’s slashing AI response times

SWiRL: The business case for AI that thinks like your best problem-solvers

When AI reasoning goes wrong: Microsoft Research shows more tokens can mean more problems

DeepCoder delivers top coding performance in efficient 14B open model

DeepSeek unveils new technique for smarter, scalable AI reward models

Open Deep Search arrives to challenge Perplexity and ChatGPT Search

The tool integration problem that’s holding back enterprise AI (and how CoTools solves it)

Hands on with Gemini 2.5 Pro: why it might be the most useful reasoning model yet

METASCALE improves LLM reasoning with adaptive strategies
