On the usefulness of AI agents

Agentic AI is having its moment (or its decade, as some have put it). I have been researching LLM-powered agents for the last two years, but research (involving publicly funded projects and academic peer review) is slow, and cannot keep up with the breakneck speed of the development and deployment going on in the tech industry as a whole. Especially not when AI tools are used to assist in programming new tools and frameworks. Small-scale experiments are quickly made obsolete by both new frontier models and ground-breaking tools from large companies. It is easier than ever to experiment with state-of-the-art AI models (it is simply a matter of connecting to an API), but having the time to conduct robust experiments while staying relevant is challenging.

Even though the speed of development is extreme, the feeling of urgency is dampened by the universal access to the technology. Most improvements have so far found their way to open-weights models and open-source software. No moat is defended for long, and the competitors regularly overtake each other. A lot of research is made public, and even closely kept secrets can be leaked through very simple mistakes. I have little fear of being «left behind» in the AI race, neither as a citizen nor as a researcher, because the technology is so accessible. Business managers and CEOs apparently see things differently, judging by the persistent urge to adopt AI as fast as possible, without properly assessing why, how, and at what cost. These tools are still brand new, and there is a huge diversity of advice on how to use them effectively. I'm particularly intrigued by how divergent viewpoints can be on the perceived usefulness of AI agents, which is what prompted me to write this post.

Absence of agents

There are many things I find fascinating about AI agents, but the most interesting thing of all is how little of use they are to me in my free time. At work, they are in one sense essential, because I make a living out of studying them. As a part of that, I experiment with coding agents, to understand how they can and will affect software engineering; there's no doubt that computer programming is changed forever. But when I close my work laptop, I don't feel any urge to ask an AI agent to do something – anything – for me.

I wonder whether my perceived lack of need for AI agents acting on my behalf is either an expression of being in a privileged position, a consequence of what I focus on in life, or whether they are actually not as useful as they are advertised to be. In many ways I'm obviously privileged – having access to free education, extensive social services, free healthcare, and freedom from censorship makes it much easier to live a stable, safe life with limited need to fight with powerful institutions to uphold my rights. I'm mentioning this because a lot of anecdotal evidence indicates how LLMs have helped people solve various challenges involving overwhelming bureaucratic processes. Since I don't face such problems (at the moment), I cannot speak much of the usefulness of AI agents in those situations, and I'm clearly privileged because of that. I will, however, comment that there's usually a difference between the benefits on the individual scale and the consequences on a collective scale.

Regarding more everyday matters, I follow a philosophy of digital minimalism, which has the natural effect that there is minimal amount if things I want to achieve with digital devices. This is perhaps one if the main reasons for why AI agents feel superfluous to me. They are (still) confined to the digital sphere, and given that I have little I want to accomplish there, I obviously won't feel the need for them. Additionally, as outlined in my post on outsourcing thinking, I have the attitude that certain mundane activities are healthy for us to do, and therefore I have less inclination for automating processes. I observe many people who are spending much time and money on using these tools, but it almost invariably seems to involve increasing the amount of time spent on the computer, rather than less.

Productivity and value

As mentioned above, the reason for examining and presenting my own standpoint, is to contribute to the discussion around the value of such agents. I notice that influential people like Simon Willison comments on the obvious demand and value that AI agents bring. The popularity of AI tools like OpenClaw indicates high demand, but I'm not sure if we can judge its value on popularity. There are enough examples of things that are both popular and detrimental.

Ed Zitron expressed on BlueSky earlier this year a strong view on the limited usefulness of AI, by questioning whether all AI can do is make «[s]ome engineers do some stuff faster», among other things. One responder observed that Zitron was simply describing a productivity increase without recognising it. The missing link here is that between the lines, in my interpretation, Zitron says that a simple speedup in developer productivity does not necessarily lead to increased value, which cannot be measured in lines of code or in speed of development. There is undoubtedly a huge difference in how we view what «value» actually means, which is particularly noticeable when comparing for example European and American standpoints. The latter is typically connected to the rather one-dimensional aspects of productivity and economic growth, but that is not necessarily what is needed to improve our quality of life.

Recent essays on AI, such as Dario Amodei's The Adolescence of Technology and Matt Shumer's Something Big Is Happening, contribute to the hype. Shumer urges everyone to use AI, and figure out how to use it well – for example by spending a certain amount of time using AI each day. The true benefits of AI are still unclear to me, but I agree on the importance of being aware of what the technology is and what it can do. However, we are already coming to a point where Shumer's advice should be inverted. I make a point of spending time each day writing and reflecting without any other input – not from AI, not from search engines or the internet, just writing with pen on paper. I interact a lot with LLMs and AI agents in my research, and I try out new capabilities of the latest models and tools. But, I always keep some time of the workday reserved for my own reflection and development. Even before we had LLMs, it was far too easy to make an online search and find someone else's thoughts and solutions rather than making an independent effort.

AI agents in AI research

Almost everything I do at work is digital, and the potential for using LLMs and AI agents is therefore technically huge. My experience with various use cases have been mixed, however. Asking for feedback on prose is not something that feels beneficial to me, at least in the long run. It seems like the perfect task for a language model: Tell me whether the text is well-structured, whether it make sense, whether the arguments are weak, what can be better, and so on. An LLM may genuinely help me improve the text, and I'm not claiming to write better prose than a mathematical model trained on a global digital library. However, in my testing of LLMs for writing (e.g., improve a draft for a report), the LLM often sends me down a lane where I end up with something I'm dissatisfied with, something that I cannot stand for, something I would not have written if I were not persuaded by a «helpful assistant» to adapt my text. Everyone can ask an LLM to write about something, but I'm paid and trusted to write things that I, based on my knowledge and experience, judge as important and true.

Programming and software development is by far the most promising use case I have tested so far. I have been experimenting with coding agents such as GitHub Copilot, OpenAI's Codex, Claude Code and Goose, with various LLMs serving as the engine. In the fall of 2025, I found coding agents to be very unwieldy, producing unnecessary amounts of code, quickly making whole projects unmanageable. Inline autocomplete functionality seemed preferable over the agentic approach. Now, the situation has changed, and I can build smaller prototypes and projects with coding agents, while retaining my required level of oversight and insight.

I'm aware that many developers have been using coding agents for «hands off» development for months, writing few or no lines of code themselves. I'm on the cautious side, and I'm extra careful about knowing how my software projects are designed and implemented. Below is an example of instructions I have tested for coding agents, to improve the usability of them based on my preferences and use cases:

## Core Principle
Make minimal, focused changes. When in doubt, do less rather than more.

## Code Changes
- Prefer modifying maximum 1-2 files per request
- Keep changes focused on the specific feature requested
- Avoid refactoring working code unless explicitly asked

## Before Writing Code
State your plan:
1. Which file(s) you'll modify or create
2. Approximate scope of changes
3. Any potential side effects on existing functionality

Wait for approval before proceeding with large changes (>100 lines or multiple files).

The above instructions may seem overly restrictive, but these kinds of guidelines have made coding agents more helpful, in my view.

AI agents are also presented as a way to accelerate research. I had an idea for a relatively simple research paper involving a small literature review, and wanted to see how an AI agent fared, from data collection to completed paper. OpenAI's Codex, powered by GPT-5.4 with «extra high reasoning effort», impressively managed to produced something coherent, although it was not very interesting or relevant. I also tried having the same agent reproduce the analysis and discussion I had written for another paper. Again, it was coherent, but didn't produce interesting research.

There have, without doubt, been huge improvements on this front. AI agents can now be given a dataset in almost any format, make scripts to analyze it, produce figures and tables, generate a discussion and compile the full thing into a PDF, without a single intervention from a human. Even if the results are not particularly interesting in themselves, it still means that generating scripts for data analysis and visualization can now be done much faster. The interpretation of the results is not so simple to outsource, and even if an LLM can generate a consistent and relevant discussion, there's still the question of whether the results actually are interpreted if no human has looked at them and deemed them interesting and useful.

With improvements of both agentic frameworks and the LLMs powering them, it may be possible to outsource more of the research steps and quality control to agents. Based on the current state-of-the-art, I can easily envision a future where AI agents are capable of producing logical and valuable research. I worry more about the diminishing participation of humans in the process. «Human-in-the-loop» is a popular term in AI research, but we should avoid treating this as a binary, and think of it more as a spectrum of human involvement. A larger degree of automation quickly leads to less human agency, rather than the self-empowerment that some AI providers are touting. Research in the real world is not only about the idealistic quest for more knowledge, but it forms our societies and policies. Conserving human involvement and alignment (including those humans and communities who are not a part of developing frontier AI models) in research will be an important task going forward.

I funny side note is that I'm getting very careful about using the term «AI researcher» when I describe my work. At this point it's impossible to know if that means a human that researches AI or an AI agent that does research.

Collateral damage

The spread of AI agents has already caused unforeseen effects to materialize. Personally, it has made me wary of sharing on the internet. Through the use of personal websites, I have had many meaningful and rewarding interactions with strangers online. But, knowing that the thoughts and information I put on the internet can more efficiently than ever before be used for malicious purposes, as in the case of Scott Shambaugh, makes it less and less tempting to engage in public online spaces. As Herman Martinus puts it, it would be in our interest to protect some pockets of humanity on the internet. While I prefer and prioritize physical and local communities over digital ones, it would be a tremendous waste to let the open internet be destroyed by AI. I fear, however, that there is no way back at this point.

It also has to be mentioned that I find the development process of LLMs morally wrong, although it seems to have fallen out of fashion to discuss it in some circles. Copyright infringement and exploitation of workers are serious issues – this has real negative consequences for real people. Agentic use has also further accelerated the energy consumption of AI, by leading to a gargantuan amount of tokens being generated when LLMs are deployed with increased autonomy. Moltbook (a social network for AI agents) and ClawXiv (distribution service for research papers written by AI agents) are examples of experiments where the ratio of benefits versus costs seems exceptionally skewed in this regard, even though it's very interesting from a research perspective to observe what AI agents produce when deployed in certain contexts. The price to pay for such tools is too high, but it's simply too hidden for most of us.

Because of these and other reasons, I would personally prefer not to use LLMs at all. I have experimented with these models to do research on them, and to understand their capabilities. The ethical issues of the technology have kept me from relying on the models to actually perform tasks. In the last few months, however, this has changed when it comes to programming – a new normal has arrived for most jobs involving software development. I'm not sure what the best response to this situation is, as we all have to adapt to the economic reality we live in. I believe that a slower, more thoughtful and limits-conscious development is a more sustainable way forward for humanity, but compromises are inevitable on the way there. The exploitative practices and inhumane working conditions of mineral extraction lays the foundation not only for data centers running gigantic AI models, but also the laptop that I type on – whether I'm writing code or prompting a coding agent. The difference, however, is scale. While I see few other options than accepting AI-assisted software development as a new standard, I want to keep bringing attention to harmful aspects of the technology.

Conclusion

It is always difficult to do a meaningful analysis of AI based on the current level of performance. The technology is improving continuously, and critique of certain capabilities (or inabilities) often becomes obsolete. However, we cannot hold back critique just because the technology may change. It's easy to accuse critical voices of moving the goalposts as progress is made, but new benefits almost without exception brings new problems as well.

My thesis is that the actual usefulness of AI agents is much more narrow than what many envision it to be at the moment, although we will always find something to use it for. Its main contribution to humans is not increased intelligence, but increased speed. In a highly digitalized world, that sounds like a recipe for huge efficiency gains. However, we humans are firmly rooted in the physical world, with our biological needs and limitations. I'm doubtful that a further acceleration of our lives and our society will bring the progress that we seek.