Anthropic Computer Use - A Kneejerk Review

Anthropic’s new ‘Computer Use’ tool is set to radically empower systems builders, hobbyists, and individual users.

Tim Shier
|
23/10/24

Yesterday, a fundamentally new broad access toolset was released by Anthropic. Named "Computer Use", this average-sounding name is set to radically empower systems builders, hobbyists, and individual end-users alike.

In short, “Computer Use” provides the mechanism for a user to grant the AI access to control their computer (or VM/container, for safety's sake). The user then sends the AI prompts and that AI then replies in a combination of text chat and AI control inputs (keyboard, mouse) on the computer.

The Mind Boggles with the Applications
All the way from assisting people with disabilities all the way through to accelerating/automating many of the boring jobs which is an inevitable part of work.

How this is done is itself not all that complicated to understand and in hindsight it’s something that we at CerebralCircuit could build and in some respects is similar to some other applications we’ve implemented for clients.

To understand it better, a workflow as follows - using any of the other AI models would get this job done:

  • Create a RAG AI model capable of controlling the user’s desktop
  • Create a middleware (much like Anthropic have done) which has the following functions which is basically all the ways that humans interact with their computer as it stands)
    • Taking screenshots
    • Moving the mouse to X,Y coordinates
    • Clicking the mouse at X,Y coordinate
    • Keyboard input

(Anthropic have taken a slightly more complicated approach with a bit more control but the principles are much the same).

Using this alone, one could conceivably fully control ones own computer with an AI using existing tech (before Computer Use).

The fact that Anthropic is pushing and supporting this approach is in itself an interesting indicator of where they think AI usage is going. Honestly. I think they’re right. Being able to generalise AI usage across any/all programs will fundamentally change how deeply specific AIs are able to assist. To my mind, being able to have an AI that is specific to a particular program and is able to operate that program extremely well, is akin to the Matrix where one could be loaded with new skills - “I (we) know kung fu”.

A side note on this, there are rumours of Open AI bringing out something similar - although likely far better if it leverages their Real-Time API which was released on the 17th Oct 2024.

This new toolset obviously has many implications - both short and long-term, broad and narrow. For now, I wanted to take a quick look at The Good, The Bad and The Ugly on where this might be heading in the mid-term.

The Good:
I almost want to label this "The Great" - the upsides are enormous. As mentioned, leveraging this to assist people with disabilities, the elderly and generally to be able to ask an AI to help/automate any user issue/task that they’re encountering will be radical. It has the potential to level all playing fields and empower every one of us to be expert level at all the programs on our computers.

The Bad:
It’s conceivable that, in the near future, there will be an enormous amount of “desktop-enabled” AI assistants available for download and install directly onto local user machines. On the surface, this sounds amazing but there is a subtle risk sitting in this: How am I sure of the privilege controls and guard rails to ensure that this AI doesn’t go rogue and do damage (a point not missed in the Anthropic Github repo. This isn’t to say the AI is being intentionally malicious, just that it might make mistakes with catastrophic consequences for the user - like deleting explorer.exe in windows 95, by accident.

This problem is non-trivial in the wild. There really is no means for a user to understand the training of a compiled AI for any desktop-enabled agents. That said, some regulated marketplaces like Google Play (etc) are for mobile apps, which has a formal manifest for access rights might be an answer, although still tricky to enforce reliably (and quickly enough before damage is done).

The Ugly:
Imagine intelligent malware which could occur through this sort of functionality. If users inadvertently installed software/malware which masqueraded as normal software, it’d be trivial to have it also have an LLM operating in the background with a brief like: “In the background, using CLI, explore this computer, add any interesting files to a zip, when done, create a reverse shell to badActor.com and share zip file. Sleep for a month, repeat.”

This could mean a whole new world of intelligent fire-and-forget malware. Thankfully, for now, it all requires 3rd party API calls so it would be detectable via Wireshark etc but in a deeper future, where the LLM is minified to purpose, it could run locally at even an extremely low token/minute and be extremely hazardous.