
Google has launched a new AI model called Gemini 2.5 Computer Use, now available in public preview through the Gemini API on Google AI Studio and Vertex AI.
The model expands on Gemini 2.5 Pro’s visual understanding and reasoning abilities, allowing AI agents to interact with websites much like a human user.
It can perform browser-based actions such as clicking, typing, scrolling, hovering, opening menus, and navigating URLs.
According to Google, it performs better than competing tools in benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld, while maintaining lower latency.

Unlike traditional AI systems that depend on APIs, Gemini 2.5 Computer Use works by analyzing screenshots of web interfaces.
It receives a task prompt, a screenshot, and a record of previous actions, then responds with a specific UI action. The client executes the action, takes a new screenshot, and sends it back to the model for the next step.

Google showed demos of the model sorting sticky notes on a digital whiteboard and transferring information between websites. It currently supports 13 types of actions and is optimized for web browsers. It’s not yet designed for desktop-level tasks, though it has shown early potential on mobile tests.
Google teams are already using Gemini 2.5 Computer Use for internal automation and testing, while external developers in early access have applied it to workflow automation and digital assistants.
Developers can try the model through Google AI Studio, Vertex AI, or the Browserbase demo environment.

ManilaShaker is a tech media producing insightful and helpful content for our local and growing international audience. Our goal is to create a premier Philippine digital consumer electronics resource that provides the most objective reviews and comparisons globally.