Introduction
On February 25, 2026, Anthropic announced the acquisition of startup Vercept to strengthen its Computer Use technology for AI agents [Source: https://www.anthropic.com/news/acquires-vercept]. This acquisition represents an important technical milestone for the industry as a whole — one aimed at enabling AI to operate real desktop applications just as humans do. In this article, we examine the technical background of the acquisition, the current state of Claude's computer use capabilities, and what lies ahead.
What Kind of Company Is Vercept?
Vercept is a startup founded on a clear thesis: "To realize AI capable of handling complex tasks, we must solve the difficult problems of perception and interaction." The company was co-founded by three individuals: Kiana Ehsani, Luca Weihs, and Ross Girshick. Girshick in particular is a well-known researcher for his work on object detection at Facebook AI Research (now Meta AI) — including Faster R-CNN and related work — and brings deep expertise in the field of computer vision.
What the Vercept team has been working on for several years is the question of "how AI systems can see and interact with the software that humans use every day." This is directly tied to the most difficult challenges Anthropic faces with computer use. According to Anthropic, Vercept plans to wind down its external-facing products within the coming weeks, with the entire team joining Anthropic [Source: https://www.anthropic.com/news/acquires-vercept].
Technical Background and Progress in Computer Use
In October 2024, Anthropic became the first in the industry to release a general-purpose computer operation model. At the time, the company acknowledged that it was "still in an experimental stage, and that operations could sometimes be cumbersome and error-prone," while anticipating rapid improvement [Source: https://www.anthropic.com/news/claude-sonnet-4-6].
The benchmark used to measure this progress is OSWorld. OSWorld is an evaluation framework that has AI execute hundreds of tasks on a simulated computer running real software such as Chrome, LibreOffice, and VS Code. There are no special APIs or dedicated connectors — the model must click a (virtual) mouse and type on a (virtual) keyboard to complete tasks, just as a human would.
As of Claude Sonnet 4.6 (released February 17, 2026), the OSWorld score has reached 72.5%. This represents a dramatic improvement achieved in just approximately 16 months from the initial release score of under 15% at the end of 2024 [Source: https://www.anthropic.com/news/acquires-vercept]. The current Sonnet 4.6 is approaching human-level capability on certain tasks, such as navigating complex spreadsheets and filling out web forms across multiple browser tabs.
It should also be noted that OSWorld was upgraded to OSWorld-Verified in July 2025, with revisions to task quality, evaluation criteria, and infrastructure. Scores from Sonnet 4.5 and earlier were measured using the older version of OSWorld, so comparisons should be made with caution [Source: https://www.anthropic.com/news/claude-sonnet-4-6].
Real-World Problems That Computer Use Solves
Computer use is particularly powerful when it comes to handling "dedicated systems and specialized tools that were built before APIs existed." It enables access to the legacy systems that many companies rely on, as well as software lacking modern API interfaces, without the need to build dedicated connectors.
Jamie Cuffe, CEO of insurtech company Pace, had this to say: "Claude Sonnet 4.6 achieved 94% on our insurance benchmark, making it the most accurate model for computer use. For business workflows like submission intake and first notice of loss, this level of accuracy is mission-critical."
Will Harvey, co-founder of Convey, also praised the model: "I was impressed by the precision of complex computer use. It clearly outperforms everything we validated in our evaluation tests." These testimonials suggest that computer use is entering a practical stage for real enterprise operations [Source: https://www.anthropic.com/news/claude-sonnet-4-6].
Security Challenges: Prompt Injection
Computer use also presents important security challenges. A key concern is prompt injection attacks, in which malicious actors embed hidden instructions on websites in an attempt to hijack the model.
Anthropic reports that Sonnet 4.6 shows significantly improved resistance to such attacks compared to its predecessor Sonnet 4.5, reaching a level on par with Opus 4.6. Specific guidelines for defending against prompt injection are also provided for developers in the API documentation. The knowledge in perception and interaction research that the Vercept team brings is expected to contribute to resolving these safety challenges as well [Source: https://www.anthropic.com/news/claude-sonnet-4-6].
Anthropic's Acquisition Strategy: Balancing Capability and Safety
The acquisition of Vercept is part of a recent series of strategic acquisitions by Anthropic. Most recently, the company acquired the development team behind the JavaScript runtime "Bun," leveraging that talent to strengthen the foundation of Claude Code.
Anthropic's criteria for bringing in external teams are clear: "technical ambitions must align, they must be able to contribute to capability improvements, and they must be committed to building AI on the basis of safety and rigor." In Vercept's case, its deep expertise in the problem of enabling AI to perceive and operate real-world software aligned precisely with the technology Anthropic needed [Source: https://www.anthropic.com/news/acquires-vercept].
Looking Ahead
While Sonnet 4.6 has shown major progress in computer use, Anthropic candidly acknowledges that it "has not yet reached the level of the most skilled human computer users." However, the pace of progress has been remarkable, and the company notes that "even more capable models are within reach."
With the Vercept team on board, the following directions are anticipated for next-generation Computer Use capabilities:
- Improved perception accuracy: Better recognition of UI elements on screen
- Multi-application coordination: Automation of complex workflows spanning multiple apps
- Stronger prompt injection resistance: More reliable security mechanisms
- Error recovery capabilities: Autonomous problem-solving in unexpected situations
A world where AI agents operate PCs on behalf of humans is no longer a distant prospect. The maturation of computer use has the potential to redefine the very concept of software automation.
Conclusion
Anthropic's acquisition of Vercept is more than just a talent acquisition. It is a serious technical and organizational investment toward a future in which AI operates seamlessly within the same digital environments as humans. The rapid progress from 15% to 72.5% on OSWorld, combined with the addition of world-class researchers like Ross Girshick, strongly suggests that Claude's computer use capabilities will continue to accelerate. For LLM and AI engineers, this technology is well worth watching closely — both in terms of leveraging the computer use API and designing for safety.
Category: LLM | Tags: Anthropic, Computer Use, Claude, AIエージェント, OSWorld, Vercept, LLM
0 件のコメント:
コメントを投稿