Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- Top AI agents fail at freelance work, according to a new study.
- The study assessed Gemini 2.5 Pro, GPT-5, and other agents.
- Close to half of the US workforce did freelance work in 2025.
If you’re a freelance worker and you’ve been stressed about the prospect of losing your job to AI, you can rest easy — at least for the time being.
According to a new study conducted by Scale AI and the Center for AI Safety, the most cutting-edge AI agents are currently only able to automate less than 3% of the tasks required from the average independent contractor, “failing to complete most projects at a level that would be accepted as commissioned work in a realistic freelancing environment,” the authors wrote.
Also: Want better ChatGPT responses? Try this surprising trick, researchers say
The Remote Labor Index
The study, posted to the preprint server arXiv on Thursday and yet to be peer-reviewed, establishes a testing benchmark for AI systems, which it calls the Remote Labor Index (RLI).
The benchmark serves as a qualitative framework for measuring the ability of AI systems to perform economically valuable work at a time when some tech leaders have been making sweeping claims about the disruptive impact AI will have on the labor market. Anthropic CEO Dario Amodei said in May, for example, that the technology could replace up to half of all white-collar jobs within the next five years.
As the name suggests, the RLI is specifically designed to assess AI’s potential to automate remote, freelance work. As anyone who has ever spent a stint as a freelancer can attest, this is a mode of work that requires a high degree of self-sufficiency and organization, among other skills. It has also become quite popular: A recent survey found that just shy of 73 million Americans performed freelance work in 2025, representing nearly 43% of the total US workforce as of August.
AI and economically valuable labor
The new study assessed the performance of six industry-leading AI agents, including Google’s Gemini 2.5 Pro, OpenAI’s GPT-5, and Anthropic’s Sonnet 4.5.
Agents, which — unlike more limited chatbots — are able to interact with digital tools (such as a web browser) and perform complex, multi-step tasks, are widely positioned by tech developers as a crucial evolutionary step toward the development of artificial general intelligence (AGI).
Also: AI is more likely to transform your job than replace it, Indeed finds
AGI is an imprecisely defined term: Experts debate what it would mean for a computer to have true “general intelligence,” and if such a feat is even possible. However, one of the more common definitions for AGI that gets thrown around in tech circles is a system that can match or outperform humans on any economically valuable task.
If we take that definition as a starting point, the new RLI study suggests we’re likely a long way away from building true AGI. Each of the six models tested in the study is “far from capable of autonomously performing the diverse demands of remote labor,” according to the authors.
The models were evaluated across 23 categories of freelance work, including graphic design, product design, computer-aided design (CAD), and game development. Those categories and their attendant skill requirements were identified by the researchers using freelance platforms like Upwork, “grounding the benchmark in economic value and capturing the diversity and complexity of real remote labor markets.”
Also: The best free AI courses and certificates for upskilling in 2025 – and I’ve tried them all
Models were fed a project brief along with any necessary files to complete their final deliverables, which were then manually assessed by the researchers in comparison to deliverables for the same project created by human freelancers. The goal, according to the researchers, was to find out “whether an AI deliverable completes the project at least as well as the human gold standard — specifically, whether the deliverable would be accepted by a reasonable client as the commissioned work.”
The agents were then compared using an Elo metric. Manus scored the highest, with an automation rate of 2.5%, followed by Grok 4 and Claude Sonnet 2.5, both of which had a score of 2.1%.
Remote Labor Index: Measuring AI automation of remote work
Screenshot by ZDNET
The takeaway
Popular narratives around AI automation can make human labor feel more unidimensional than it is in reality. As the AI industry strives to develop systems that can match or surpass the human brain, we increasingly appreciate the brain’s remarkable flexibility, dynamism, and complexity.
Some jobs are more amenable to automation than others, but most require an amalgamation of technical and interpersonal skills, and therefore are more complicated than the AI systems of today can handle.
Also: These jobs face the highest risk of AI takeover, according to Microsoft
Even today’s most advanced AI systems, which are designed to be general-purpose agents, are only capable of performing a narrow subset of the tasks required by most human workers. As the authors of the new RLI study wrote in their report, the failure of industry-leading agents to automate less than 3% of the tasks required by the average freelancer reveals “a stark gap” separating the promise and actual, demonstrable capabilities of AI. That’s especially true considering that the RLI doesn’t capture many aspects of most freelancers’ day-to-day work lives, such as communicating and negotiating with clients.
Then again, these are early days. The capabilities of agents are expanding rapidly, and the largest tech developers are investing billions in training new, more advanced models. It’s possible that in five or ten years, companies will be hiring AI freelancers. But for now, contractors don’t seem to have any real reason to fear the AI job reaper.
Get the morning’s top stories in your inbox each day with our Tech Update newsletter.
