Orby’s Breakthrough Technology Achieves 74.9% Success Rate on AI Agent Benchmarks
MOUNTAIN VIEW, Calif., Feb. 25, 2025 /PRNewswire/ — Orby AI, a leading AI Agent and Automation Discovery platform company, today announced the launch of its Generic Agent Framework and Self-Adaptive Interface Learning (SAIL) technique, setting new industry benchmarks for AI agent performance. With this innovation, Orby AI’s agents achieve state-of-the-art (SOTA) results on industry-standard evaluations, including MiniWoB and WebArena, surpassing competitors in accuracy, adaptability, and success rates. This follows Orby’s recent record-setting 89.4% accuracy on the ScreenSpot Benchmark, where it (UGround) outperformed tech giants including Google DeepMind, OpenAI, and Anthropic. Orby’s AI technology combines strategic planning with precise execution to navigate complex websites and applications with unprecedented reliability.
In independent benchmark testing, Orby’s framework achieved a 74.9% success rate with Claude-3.5-Sonnet on the MiniWoB benchmark¹ of 125 web tasks, outperforming ServiceNow’s 69.8% success rate under the same evaluation protocol. On the more challenging WebArena benchmark2, comprising 812 diverse tasks for real-world web environments, Orby AI secured a 37.5% success rate, outperforming models from ServiceNow and closed-source competitors.
These breakthrough results were enabled by Orby’s newly developed Self-Adaptive Interface Learning (SAIL) technology, a fundamental advancement in how AI agents interact with web interfaces. SAIL enables AI agents to automatically learn and adapt to new websites without human intervention or manual documentation. Unlike traditional approaches that require extensive human-curated instructions, SAIL allows Orby’s Large Action Model, ActIO, to understand website interfaces at scale, dramatically reducing implementation time and costs while maintaining consistent performance across unfamiliar web environments.
Unlike traditional automation solutions that rely on handcrafted instructions or domain-specific tuning, Orby AI’s framework is entirely generic, allowing businesses to select the best foundation models for their needs without sacrificing performance. The system’s hierarchical design, which splits tasks between a strategic “planner agent” and an execution-focused “grounder agent”, enables seamless navigation of complex user interfaces and superior task completion.
“Our latest advancements push the boundaries of what AI agents can achieve in real-world applications,” said Will Lu, Co-Founder and CTO of Orby AI. “By leveraging Self-Adaptive Interface Learning (SAIL) and our Large Action Model (LAM), ActIO, our agents can independently learn and adapt to new environments—without the need for human intervention or site-specific customizations. This is a game-changer for enterprises looking to automate complex workflows at scale.”
This development comes at a crucial time when enterprises are seeking more flexible and efficient automation solutions.
“We believe enterprise automation should be generalizable, scalable, and efficient—not bound by rigid, pre-programmed rules,” said Bella Liu, Co-Founder and CEO of Orby AI. “This latest milestone underscores Orby AI’s commitment to developing intelligent agents that continuously learn, adapt, and deliver tangible business value.”
With these breakthroughs, Orby AI continues to pave the way for next-generation enterprise automation, equipping businesses with versatile AI agents that thrive in complex, real-world environments.
For detailed performance results, visit Orby.
About Orby AI:
Founded in 2022 by industry veterans with deep expertise in AI and automation, Orby AI delivers an AI agent platform that automates complex enterprise processes and enhances efficiency at scale. Powered by the industry’s first Large Action Model (LAM) and proprietary AI Agent platform, Orby empowers enterprise teams to unlock their full potential and achieve greater outcomes. By automating complex processes that require reasoning and human judgment, Orby AI allows teams to focus on higher-value, creative, and strategic work.
For more information, visit Orby.ai.
¹ MiniWoB (Mini World of Bits) is an established benchmark of 125 web tasks designed to evaluate GUI agent interactions in a simplified web setting, tested using ServiceNow’s BrowserGym evaluation protocol.
2 The WebArena benchmark evaluates autonomous agents in realistic web environments across 812 diverse tasks. These tasks involve navigating and interacting with fully functional websites, many of which are uncommon to general web users.
SOURCE Orby AI

