AI Edges Closer to Human-Level General Intelligence
January 05, 2025
on
on
H2O.ai recently announced that its h2oGPTe Agent has achieved the top spot on the General AI Assistants (GAIA) benchmark leaderboard, earning a notable score of 65%. As Jean-Pierre Joosting reported in a recent eeNews Europe article, this result outpaced competitors such as Google’s Langfun Agent (49%), Microsoft Research (38%), and Hugging Face (33%).
The GAIA benchmark evaluates how effectively artificial intelligence (AI) systems can solve complex, real-world tasks that require significant time, effort, and expertise, Joosting noted. It includes 300 test problems involving research, data analysis, document handling, and reasoning. Degree-holding humans achieve a benchmark score of 92%, often needing several days to complete all the challenges.
H2O.ai’s h2oGPTe Agent stood out for its robustness, accuracy, and efficiency, demonstrating readiness for enterprise applications that traditionally rely on skilled human assistants.
Sri Ambati, Founder and CEO of H2O.ai, remarked: “Today we are announcing that AI is only 30% away from matching human-level general intelligence on the GAIA benchmark. Open-ended questions in GAIA are a better measure of intelligence than MMLU, which relies on multiple choice. The entire Gen AI ecosystem was barely able to pass a tenth in accuracy on one of the toughest AGI benchmarks merely a year ago.”
Ambati highlighted the team’s innovations: “Makers at H2O.ai built h2oGPTe Agentic AI wielding the best models in the world for reasoning, multi-modal image, video, language understanding, code generation, and execution to ace the GAIA benchmark with a stunning 15% accuracy leap over the previous record set by researchers from Google Deepmind using the same Claude-3.5-Sonnet. The h2oGPTe Agent also beat Microsoft Research’s agent Magentic-1 that used OpenAI’s o1 model by 27%.”
He added: “Agentic AI is eating SaaS, and with h2oGPTe Agentic AI now being generally available, all our enterprise customers can solve a wide range of sophisticated business and research problems.”
Joosting reported that H2O.ai attributes its success to its commitment to simplicity and adaptability, which underpins key features of the h2oGPTe Agent:
Enterprise h2oGPTe 1.6, featuring the Agent capability, is now available across public clouds, virtual private clouds, and on-premise deployments. Learn more at H2O.ai’s platform page.
Editor's note: Our colleague Jean-Pierre Joosting first reported on this news in EENews Europe, a publication in the Elektor network.
The GAIA benchmark evaluates how effectively artificial intelligence (AI) systems can solve complex, real-world tasks that require significant time, effort, and expertise, Joosting noted. It includes 300 test problems involving research, data analysis, document handling, and reasoning. Degree-holding humans achieve a benchmark score of 92%, often needing several days to complete all the challenges.
H2O.ai’s h2oGPTe Agent stood out for its robustness, accuracy, and efficiency, demonstrating readiness for enterprise applications that traditionally rely on skilled human assistants.
Sri Ambati, Founder and CEO of H2O.ai, remarked: “Today we are announcing that AI is only 30% away from matching human-level general intelligence on the GAIA benchmark. Open-ended questions in GAIA are a better measure of intelligence than MMLU, which relies on multiple choice. The entire Gen AI ecosystem was barely able to pass a tenth in accuracy on one of the toughest AGI benchmarks merely a year ago.”
Ambati highlighted the team’s innovations: “Makers at H2O.ai built h2oGPTe Agentic AI wielding the best models in the world for reasoning, multi-modal image, video, language understanding, code generation, and execution to ace the GAIA benchmark with a stunning 15% accuracy leap over the previous record set by researchers from Google Deepmind using the same Claude-3.5-Sonnet. The h2oGPTe Agent also beat Microsoft Research’s agent Magentic-1 that used OpenAI’s o1 model by 27%.”
He added: “Agentic AI is eating SaaS, and with h2oGPTe Agentic AI now being generally available, all our enterprise customers can solve a wide range of sophisticated business and research problems.”
Joosting reported that H2O.ai attributes its success to its commitment to simplicity and adaptability, which underpins key features of the h2oGPTe Agent:
- Advanced reasoning and planning to tackle intricate, real-world tasks.
- Multimodal comprehension across text, images, and audio for seamless contextual understanding.
- Integration with enterprise tools such as Python execution and DriverlessAI for predictive analytics and decision-making.
Enterprise h2oGPTe 1.6, featuring the Agent capability, is now available across public clouds, virtual private clouds, and on-premise deployments. Learn more at H2O.ai’s platform page.
Subscribe
Tag alert: Subscribe to the tag Embedded & AI and you will receive an e-mail as soon as a new item about it is published on our website! Editor's note: Our colleague Jean-Pierre Joosting first reported on this news in EENews Europe, a publication in the Elektor network.
Read full article
Hide full article
Discussion (0 comments)