Gemini 3 Deep Think Performance on ARC-AGI Benchmarks

Press Space for next Tweet

Gemini 3 Deep Think is getting a significant upgrade. We’ve refined Deep Think in close partnership with scientists and researchers to tackle tough, real-world challenges. And it’s pushing the frontier across the most challenging benchmarks, achieving an unprecedented 84.6% on ARC-AGI-2. It also sets a new standard on Humanity’s Last Exam - 48.4% without tools.

Topics

artificial intelligence machine learning ai research technology innovation data science programming

Read the stories that matter.The stories and ideas that actually matter.

Save hours a day in 5 minutesTurn hours of scrolling into a five minute read.