Google,Lena Paul - The Next Morning OpenAI, DeepSeek, et al. are nowhere near achieving AGI (Artificial General Intelligence), according to a new benchmark.
The Arc Prize Foundation, a nonprofit that measures AGI progress, has a new benchmark that is stumping the leading AI models. The test, called ARC-AGI-2 is the second edition ARC-AGI benchmark that tests models on general intelligence by challenging them to solve visual puzzles using pattern recognition, context clues, and reasoning.
This Tweet is currently unavailable. It might be loading or has been removed.
According to the ARC-AGI leaderboard, OpenAI's most advanced model o3-low scored 4 percent. Google's Gemini 2.0 Flash and DeepSeek R1 both scored 1.3 percent. Anthropic's most advanced model, Claude 3.7 with an 8K token limit (which refers to the amount of tokens used to process an answer) scored 0.9 percent.
The question of how and when AGI will be achieved remains as heated as ever, with various factions bickering about the timeline or whether it's even possible. Anthropic CEO Dario Amodei said it could take as little as two to three years, and OpenAI CEO Sam Altman said "it's achievable with current hardware." But experts like Gary Marcus and Yann LeCun say the technology isn't there yet and it doesn't take an expert to see how fueling AGI hype is advantageous to AI companies seeking major investments.
The ARC-AGI benchmark is designed to challenge AI models beyond specialized intelligence by avoiding the memorization trap — spewing out PhD-level responses without an understanding of what it means. Instead it focuses on puzzles that are relatively easy for humans to solve because of our innate ability to take in new information and make inferences, thus revealing gaps that can't be resolved by simply feeding AI models more data.
"Intelligence requires the ability to generalize from limited experience and apply knowledge in new, unexpected situations. AI systems are already superhuman in many specific domains (e.g., playing Go and image recognition)" read the announcement.
SEE ALSO: I compared Sesame to ChatGPT voice mode and I'm unnerved"However, these are narrow, specialized capabilities. The 'human-ai gap' reveals what's missing for general intelligence - highly efficiently acquiring new skills."
To get a sense of AI models' current limitations, you can take the ARC-AGI test for yourself. And you might be surprised by its simplicity. There's some critical thinking involved, but the ARC-AGI test wouldn't be out of place next to the New York Timescrossword puzzle, Wordle, or any of the other popular brain teasers. It's challenging but not impossible and the answer is there in the puzzle's logic, which is something the human brain has evolved to interpret.
OpenAI's o3-low model scored 75.7 percent on the first edition of ARC-AGI. By comparison, its 4 percent score on the second edition shows how difficult the test is, but also how there's a lot more work to be done with reaching human level intelligence.
Topics Google OpenAI
AirPods Pro and Android: Is it worth it?AirPods Pro and Android: Is it worth it?'Death Stranding' is a haunting sciHalloween is over. The seasonal holiday wars have begun.Netflix's 'Fire in Paradise' offers sobering look at wildfires: ReviewTurns out woman covered in blood after wrecking her car was just dressed as CarrieJ.K. Rowling has the best response to Trump's nonHoliday takes an antiWhich new streaming service should you subscribe to? None of themChelsea Clinton burns Trump on Twitter for his baffling Sweden remarkFacebook proves once again that no scandal is big enough to really matterPapa John's jackCops help 108chan returns with a new name and a reminder not to do illegal stuffSad internet boy Elon Musk decides to log off. Again.Of course Cate Blanchett joined an Adele performance at a drag show at Stonewall in a pussy hatApple Card will soon become one of the best ways to buy an iPhoneTwitter rejects Facebook's faulty logic and stops running political adsFacebook allowed, then removed, ads soliciting donations to a fake Trump campaign pageMcDonald's and Burger King get graded on their beef. Which one got an F? Snapchat makes its biggest move yet in becoming America's WeChat Liam Gallagher's Reddit AMA was everything we hoped for and more Zuckerberg wants 1 billion people to use virtual reality Oculus for Business will enable companies to work in VR Kit Harrington and Rose Leslie pause Game of Thrones for wedding Facebook launches profile picture frame to celebrate International Day of the Girl 'Star Wars: The Last Jedi': Porgs are great Samsung's new sensor will bring 'portrait mode' to cheaper phones The adorable 'Star Wars' porg The Harvey Weinstein saga proves yet again why women don't come forward about sexual abuse Infamously toxic gamer gets a second chance after controversy If penne is your bae, wear this silver pasta pendant around your neck Google is forced to shut down Home Mini feature that quietly records everything Georgina Chapman leaves husband Harvey Weinstein with a strong statement Uber is under fire in *five* criminal investigations Amazon has a new Prime deal for college students. My fidget spinner and I have just been informed that fidget spinners are over Before and after: Women finally get their due in STEM stock photos 'Star Wars: The Last Jedi' poster Facebook's targeted ads on Google want to talk about Russia's targeted ads on Facebook
2.4126s , 8224.8515625 kb
Copyright © 2025 Powered by 【Lena Paul - The Next Morning】,Miracle Information Network