Copyright 2012-2025 家電庫 版權所有 京ICP備20132067號-1
谷歌的 Computer Use 模型來了!
今天凌晨,谷歌 DeepMind 重磅發布了基于 Gemini 2.5 的計算機使用模型Gemini 2.5 Computer Use。
考慮到前些天谷歌才剛剛發布了 Chrome DevTools (MCP),Gemini 2.5 Computer Use 的誕生倒不是特別讓人驚訝。簡單來說,與 OpenAI 的 Computer-Using Agent (CUA) 類似,DeepMind 的這個模型可讓 AI 直接控制用戶的瀏覽器 —— 在視覺理解和推理能力的基礎上,該模型可以幫助用戶在瀏覽器中執行點擊、滾動和輸入等操作。
先來看兩個官方演示。
提示詞:From https://tinyurl.com/pet-care-signup , get all details for any pet with a California residency and add them as a guest in my spa CRM at https://pet-luxe-spa.web.app/. Then, set up a follow up visit appointment with the specialist Anima Lavar for October 10th anytime after 8am. The reason for the visit is the same as their requested treatment.
提示詞:My art club brainstormed tasks ahead of our fair. The board is chaotic and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.
可以看到,不管是收集網絡信息與執行動作,還是整理雜亂筆記,Gemini 2.5 Computer Use 都非常準確地完成了任務,同時速度也相當快。
在相關基準上,Gemini 2.5 Computer Use 的性能表現也達到了 SOTA 水平: