For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed. This is very difficult for the people who ask for the software, to understand, and it is quite often very difficult for the people doing the programming to understand.
Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like. Once you have arrived at that understanding, then there are a variety of ways to make what you need, but that is not the rate-limiting step.
I’m growing tired of this aphorism because I’ve been in enough situations where it was not true.
Some times the programming part really is very hard even when it’s easy to know what needs to be built. I’ve worked on some projects where the business proposition was conceptually simple but the whole reason the business opportunity existed was that it was an extremely hard engineering problem.
I can see how one could go through a career where the programming itself is not that hard if you’re mostly connecting existing frameworks together and setting up all of the tests and CI infrastructure around it. I have also had jobs where none of the programming problems were all that complicated but we spent hundreds of hours dealing with all of the meetings, documents and debates surrounding every change. Those were not my favorite companies
But that is not programming then? Doing voice recognition in the 90s, missile guidance systems, you name it, those are hard things, but it's not the "programming" that's hard. It's the figuring out how to do it. The algorithms, the strategy, etc.
I might be misunderstanding, but I cannot see how programming itself can be challenging in any way. It's not trivial per se or quickly over, but I fail to see how it can be anything but mechanical in and of itself. This feels like "writing" as in grammar and typing is the hard part of writing a book.
It would explain a lot of misunderstanding between "programmers" though.
Still, one thing I really like with LLM/AI, is that now, I can allow myself to test different abstractions a bit faster. I can allow myself to "try" more complex refactoring on a feature branch, because if I describe correctly the abstraction I want, the LLM/AI tool will be normally good at producing it. But to describe my abstraction, I need to pull all my programming and engineering years of experience.
But at the end of the day, I always tell my wife, that with these new tools, which I could not imagine so powerful 3 years ago, I live in the future :-)
Especially once you get to describe your abstractions in plain (or slightly technical) English instead of code I find it hard to say "programming" is being performed, but in many ways the case could be made that it remained the same and only the shape of the artifacts is different now.
The first is hard, the second much less so.
To be fair, a lot of programming does end up being just that.
The saying also ignores the fact that humans are not perfect programmers, and they all vary in skills and motives. Being a programmer often not about simply writing new code but modifying existing code, and that can be incredibly challenging when that code is hairbrained or overly clever and the people who wrote it are long gone. That involves programming and it's really hard.
Okay it's a spicy take, because juniors also tend to write too smart code.
Figuring out what to do and how to do it, is maybe not hard but it's effort. It's a hidden thing because it's not flat coding time, it requires planning, research, exploration and cooperation.
It's also true that some seemingly simple things are very hard. There are probably countless workarounds out there and the programmer wasn't even aware he is dodging an NP hard bullet.
Both arguments are valid.
I think the weight leans on effort, because effort is harder to avoid. Work, complexity, cruft piles up, no matter what you do. But you can work around hard problems. Not always but often enough. Not every business is NASA and has to do everything right, a 90% solution still generates 90% returns, and no one dies.
Isn't this kind of true, though? Housing construction, for instance, isn't bottlenecked by the technical difficulties of building, but by political and regulatory hurdles. Or look at large, capital-intensive projects such as the always-proposed, never built new Hudson river train tubes. Actually building these will take billions of dollars and many years, but even they would be long built by now were it not for their constantly being blocked by political jockeying.
Building stuff _does_ often involve difficult technical challenges, but I still think that as a general aphorism the observation that this isn't the _hardest_ part holds true.
It might be that I have been doing this for too long and no longer see it.
I've seen teams build and re-build the same infrastructure over and over.
I saw a request that could have been met with a few SQL queries and a dashboard got turned into a huge endeavor that implements parts of an ETL, Configuration Management, CI/CD, and Ticketing system and is now in the critical path of all requests all because people didn't ask the right questions and the incentive in a large organization is to build a mini-empire to reign over.
That said, smart infrastructure investment absolutely can be a competitive advantage. Google's infrastructure is, IMO, a competitive advantage. The amount of vertical integration and scale is unparalleled.
We all thought he would get reprimanded for wasting so much time, but by the time management figured out was happening they decided they needed to sell it as a very important idea rather than admit they just spent $100,000 of engineering time on something nobody needed. So it turned into something to celebrate and we were supposed to find ways to use it.
That company went down in flames about a year later. That’s how I learned one way to spot broken organizations and get out early rather than going down with the ship.
When I asked the team that is supposed to use it, "We haven't used that in six months and it wouldn't be missed if it were to go away".
LLMs make this trivial. I can try 20 different approaches or design ideas in a day. VS manually doing it and it taking much longer.
We've all been hearing that a lot and it's made a lot of people forget that, although programming might not be the hardest part, it's still hard.
Determining what to program can be hard, but that was already considered earlier.
The only other place where I sometimes see it become hard for some people is where they treat programming as an art and are always going down crazy rabbit holes to chase their artistic vision. Although I would say that isn't so much that programming is hard, but rather art that is trying to push boundaries is hard. That is something that holds regardless of the artistic medium.
It is other way around. Children can pick up a lot of skills that adults struggle at, like languages for example.
Plenty of research has shown reduced plasticity of the brain has stronger correlation to learning speed and ability as it grows old. Most breakthrough research is usually at age 40 or less or chess grand-masters fade in skill after. 25-40 is probably age group where the optimal balance between knowledge experience and learning ability for best outcomes.
That's like saying "becoming a writer can't be that hard, since kids learn how to write in the elementary school".
Given a set of requirements, there are many different ways to write a program to satisfy them. Some of those programs will be more efficient than others. Some will scale better. Some will end up having subtle bugs that are hard to reproduce.
Is writing hard? I expect most can agree that determining what to write, especially if you have an objective (e.g. becoming a best-selling novelist), can be extremely hard — but writing itself?
> there are many different ways to write a program to satisfy them.
"What to program" being hard was accepted from the onset and so far we see no disagreement with that.
Being able to transcribe sentences in a certain language is the skill kids pick up in elementary schools. Being a writer requires a whole set of skills built on top of that.
The reason why I brought up that difference in the first place is because both of these are called "writing". When a fan says "I heard the author is writing the next book in the series" or when an author says "I haven't been able to focus on writing due to my health issues", they're not talking about the low-level transcription skill.
> "What to program" being hard was accepted from the onset and so far we see no disagreement with that.
Similar to your interpretation of "writing", you're choosing to interpret "programming" as a process of transcribing an algorithm into a certain programming language, and everything else ends up being defined as "what to program".
That's an overly reductive interpretation, given the original context:
> For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed.
> [...]
> Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like.
Notice that the original comment talks defines "determining what to program" as a process of refining your understanding of the problem itself.
In my reading of the original comment, understanding what your users need is "what to program". Writing code that solves your users' requirements is "programming".
For me, I need to have a solution figured out before writing code. I am not even sure how you could write code before having the problem solved. Your approach would be insightful.
Like, I get it is effectively impossible to gather all user requirements upfront and that you need to build a product to find out what details you overlooked. That means software is an iterative process. But within each iteration, surely you still need to have the solution — to the extent that it satisfies the known requirements — prepared before you can write code? Maybe if you had an infinite number of monkeys you could have them bang out something that accidentally works by throwing down random keywords, but in the real world where it is only you and your code has to meaningful, what you program is simply the expression of what to program.
Writing code is just means of conveyance, no?
But if only thing you know are basic moves playing against a player with 1600 ELO you are not going to win without serious training and 1600 is still far below grand master level.
They do? I've known plenty of kids and young adults who utterly failed to become even borderline competent at programming.
I think we can agree that few of them would be economically useful due to not knowing what to program. There is no sign of competency on that front. Certainly, even the best programmer in the world could theoretically be economically useless. Programmers only become economically useful when they can bridge "what to program".
Programming in elementary schools typically involves moving a turtle around on the screen. (My mother taught 4th grade in New York for many years, and I believe her when she explained the computer instruction.)
Economically valueable programming is much more complex than is taught in many schools through freshman college. (I taught programming at the college level from 1980 till I retired in 2020.)
Also people like to fantasize that their project, their API, their little corner of the codebase is special and requires special treatment. And that you simply cant copy the design of someone much more experienced who has already solved the problem 10 years ago. In fact many devs boast about how they solved (resolved) that complex problem.
In other domains - Professional engineers (non-swe) know that there is no shame in simply copying the design for a bridge that is still standing after all those years.
The claim is that most software teams do not consider the financial impact of their work. Is what they are doing producing value that can be measured in dollars and cents and is greater than the cost of their combined cost of employment?
The article suggests that there is a lot of programming being done without considering what exactly needs to be programmed.
And the parent rightfully points out that you cannot know exactly what needs to be programmed until after you've done it and have measured the outcome. We literally call the process development; for good reason. Software is built on hunches and necessarily so. There is an assumption that in the future the cost of the work will pay back in spades, but until you arrive in that future, who knows? Hence why businesses focus on metrics that try to observe progress towards finding out rather than tracking immediate economic payoff.
The interesting takeaway from the article, if you haven't give this topic much thought already, is that the changing financial landscape means that businesses are going to be more hesitant to take those risks. Right now there still seems to be enough optimism in AI payoffs to keep things relatively alive, but if that runs out of steam...
HARD AGREE. But…
Taken as just such, one might conclude that we should spend less time writing software and more time in design or planning or requirement gathering or spec generating.
What I’ve learned is that the painful process of discovery usually requires a large contribution of doing.
A wise early mentor in my career told me “it usually takes around three times to get it right”. I’ve always taken that as “get failing” and “be willing to burn the disk packs” [https://wiki.c2.com/?BurnTheDiskpacks]
In other words "figuring out what needs to be programmed" and "actually programming the thing" look the same while they're happening. Afterwards, one could say that the first 90% was figuring out, and only the last 10% was actually doing it. The reason the distinction matters, is that if you do something that makes programming happen faster, but figuring out happen slower, then it can have the surprising affect of making it take longer to get the whole thing done.
Replace the verb "program" with "do" or anything else, and you've got a profound universal philosophical insight right there
My company is fully remote so all meetings are virtual and can be set to have transcripts, parsing through that for the changes needed and trying it out can be a simple as copy-paste, plan, verify, execute, and distribute.
I'm curious is any quantitative research has been done comparing time writing code vs time gathering and understanding requirements, documenting, coordinating efforts across developers, design and architecture, etc.
Making good decisions is the hard part, whether it's about programming or about what needs to be programmed.
Then I'd wager it's the same for the courses and workshop this guy is selling...an LLM can probably give me at least 75% of the financial insights for not even .1% of what this "agile coach" is asking for his workshops and courses.
Maybe the "agile coach LLM" can explain to the "coding LLM's" why they're too expensive, and then the "coding LLM's" can tell the "agile coach LLM" to take the next standby shift then, if he knows so much about code?
And then we actual humans can have a day off and relax at the pool.
With the annoying process people out of the picture, even reviewing vibeslop full time sounds kinda nice… Feet up, warm coffee, just me and my agents so I can swear whenever I need to. No meetings, no problems.
I dont think this will happen because AI has become a straight up cult and things that are going well don’t need so many people performatively telling each other how well things are going.
Yes but this requires the willingness to take on the additional stress and risk of managing your own sales, marketing, accounting, etc.
A perfect summation.
The point being made is, do you know what financial impact your work is having in terms of increasing revenues or decreasing costs?
If the company revenue is going down and costs increasing, developers will be laid off regardless of how many tickets they close.
There needs to be someone to benefit from all your labor. No, no, it can't be you. You have conflicts of interest!
So, you're the programmer (verify code) and the QA (verify output) and the project manager (read the spec)?
A software engineer should be able to talk directly to customers to capture requirements, turn that into spec sheet, create an estimate and a bunch of work items, write the whole system (or involve other developers/engineers/programmers to woek on their work items), and finally be able to verify and test the whole system.
That entire role is software engineering. Many in the industry suck at most of the parts and only like the programming part.
I think the hardest part is requirements gathering (e.g. creating organized and detailed notes) and offloading work planned work to other developers in a neat way, generally speaking, based on what I see. In other words, human friction areas.
I'm always amused when I read anecdotes from a role siloed / heavily staffed tech orgs with all these various roles.
I've never had a spec handed to me in my career. My job has always been been end to end. Talk to users -> write spec into a ticket -> do the ticket -> test the feature -> document the feature -> deploy the feature -> support the feature in production from on-call rotation.
Often I have a few juniors or consultants working for me that I oversee doing parts of the implementation, but thats about it.
The talking to users part is where a lot of people fall down. It is not simply stenography. Remember most users are not domain/technical experts in the same things as you, and it's all just a negotiation.
It's teasing out what people actually want (cars vs faster horses), thinking on your feet fast enough to express tradeoffs (lots of cargo space vs fuel efficiency vs seating capacity vs acceleration) and finding the right cost/benefit balance on requirements (you said the car needs to go 1000 miles per tank but your commute is 30 miles.. what if..).
We call those places "feature factories".
I have been required to talk with many in my life, I have never seen one add value to anything. (There are obvious reasons for that.) But yet, the dominant schools in management and law insist they are the correct way to create software, so they are the most common kind of employment position worldwide.
> What would you trust more - an engineer doing project management too - or a project manager doing the engineering job?
If one of the three, {PM, QA, coder}, was replaced by AI, as a customer I'd prefer to pick the team missing the coder. But for teams replacing two roles with AI, I'd rather keep the coder.
But a deeper problem now is, as a customer, perhaps I can skip the team entirely and do it all myself? That way, no game of telephone from me to the PM to the coder and QA and back to me saying "no" and having another expensive sprint.
Of course if none of your software projects are business-critical to the degree that downtime costs money pretty directly then you can skip it all and just manage it yourself.
The other thing you should probably understand is that the feedback cycle for an LLM is so fast that you don't need to think of it in terms of sprints or "development cycles" since in many cases if you're iterating on something your work to acceptance test what you're getting is actually the long pole, especially if you're multitasking.
I am curious: why? In all my years of career I've seen engineers take on extra responsibilities and doing anywhere from decent to fantastic job at it, while people who tend to start much more specialized (like QA / sysadmins / managers) I have historically observed struggling more -- obviously there are many and talented exceptions, they just never were the majority, is my anecdotal evidence.
In many situations I'd bet on the engineer becoming a T-shaped employee (wide area of surface-to-decent level of skills + a few where deep expertise exists).
It just depends on the org structure and what the org calls different skills. In lots of places now PM (as in project, not product) is in no way a leadership role.
I also wouldn't be so sure that programming is the hardest of the three roles for someone to learn. Each role requires a different skill set, and plenty of people will naturally be better at or more drawn to only one of those.
In my first gig (~30 years ago), QA could hold up a release even if our CTO and President were breathing down their necks, and every SDE bug-hunted hard throughout the programs.
Now QA (if they even exist) are forced to punt thousands of issues and live with inertial debt. Devs are hostile to QA and reject responsibility constantly.
Back to the OP, these things aren't calculable, but they'll kill businesses every time.
Maybe it's different where you live but QA pretty much disappeared a few years ago and project managers never had anything to do with the actual software
There's a 99% chance that the training materials on sale are equally replaceable with a prompt.
This doesnt mean the training has to be good, useful or original in the slightest but the provider does need to have credentials which arent just "some dev with a hot take" that a fellow executive would recognize.
People who say that haven't used today's agents enough or haven't looked closely at what they produce. The code they write isn't messy at all. It's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. The entire construction is just wrong, hidden behind a nice exterior. And when you need to add a couple more floors, the agents can't "get through it" and neither can people. The codebase is bricked.
Today's agents are simply not capable enough - without very close and labour-intensive human supervision - to produce code that can last through evolution over any substantial period of time.
They just make a lot of mistakes that compound and they don't identify. They currently need to be very closely supervised if you want the codebase to continue to evolve for any significant amount of time. They do work well when you detect their mistakes and tell them to revert.
There's nothing really stopping agents from writing the cleverest code they can. So my question is, when production goes down, who's debugging it? You don't have 10 days.
This is beautiful
It goes the other way quite often with people. How often do you see K8s for small projects?
I wish it could, but in practice, today's agents just can't do that. About once a week I reach some architectural bifurcation where one path is stable and the other leads to an inevitable total-loss catastrophe from which the codebase will not recover. The agent's success rate (I mostly use Codex with gpt5.4) is about 50-50. No matter what you explain to them, they just make catastrophic mistakes far too often.
Today's agents are simply not capable enough to write evolvable software without close supervision to save them from the catastrophic mistakes they make on their own with alarming frequency.
Specifically, if you look at agent-generated code, it is typically highly defensive, even against bugs in its own code. It establishes an invariant and then writes a contingency in case the invariant doesn't hold. I once asked it to maintain some data structure so that it could avoid a costly loop. It did, but in the same round it added a contingency (that uses the expensive loop) in the code that consumes the data structure in case it maintained it incorrectly.
This makes it very hard for both humans and the agent to find later bugs and know what the invariants are. How do you test for that? You may think you can spec against that, but you can't, because these are code-level invariants, not behavioural invariants. The best you can do is ask the agent to document every code-level invariant it establishes and rely on it. That can work for a while, but after some time there's just too much, and the agent starts ignoring the instructions.
I think that people who believe that agents produce fine-but-messy code without close supervision either don't carefully review the code or abandon the project before it collapses. There's no way people who use agents a lot and supervise them closely believe they can just work on their own.
[1]: https://www.anthropic.com/engineering/building-c-compiler
"t's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. "
If your test/design of a BUILDING doesn't include at simulations/approximations of such easy to catch structural flaws, its just bad engineering. Which rhymes a lot with the people that hate AI. By and large, they just don't use it well.
For example, ruby uses blocks a lot. Ruby blocks are curious little thingies because they are arguably just syntax sugar for a HOF, but man it's great syntax sugar. Python then has "yield" which is simultaneously the same keyword ruby uses for blocks, but works fundamentally differently (instead of just a HOF, it's for generating an iterator/generator) and while there are some decorators that can use yield's ability to "pause" execution in the function to send control flow back out of the function for a moment (@contextmanager) which feels _even more_ like ruby blocks, it's a rather limited trick and requires the decorator to adapt the Generator to a context manager and there's just no good way to generalize that.
Somehow this is the perfect storm to make LLMs completely incapable of converting ruby code that uses blocks for more than the basic iteration used in the stdlib. It will try to port to python code that is either nonsensical, or uses yield incorrectly and doesn't actually work (and in a way that type checkers can even spot). And furthermore, even if you can technically whack it with a hammer until it works with yield, it's often not at all the way to do it. Ruby devs use blocks not-uncommonly while python devs are not really going to be using yield often at all, perhaps outside of @contextmanager. So the right move is usually to just restructure control flow to not need to use blocks/HOFs (or double down and explicitly pass in a function). (Rubyists will cringe at this, and rightly so... Ruby is often extraordinarily expressive).
The fact that such a simple language feature trips them up so completely is pretty odd to me. I guess maybe their training data doesn't include a lot of ruby-to-python conversions. Maybe that's indicative of something, but I digress.
I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
When you try to throw more agents at the problem or even more verification layer, you just kill your agility even if they would still be able to work
This rhymes a lot with the Mythical Man Month. There's some corollary Mythical Machine Month thing going on with agent developed code at the moment.
All of these things human do, and i don't think we can attribute it directly to language itself, its attention and context and we both have the same issues.
The problem is maximizing code generated per token spent. This model of "efficiency" is fundamentally broken.
unless anthropic tomorrow comes in and takes ownership all the code claude generates, that is not changing..
What I might believe though is that agents might make rewrites a lot more easy.
“Now we know what we were trying to build - let’s do it properly this time!”
And of course, make the case that it actually needs a rewrite, instead of maintenance. See also second-system effect.
Yes, but even here one needs some oversight.
My experiments with Codex (on Extra High, even) was that a non-zero percentage of the "tests" involved opening the source code (not running it, opening it) and regexing for a bunch of substrings.
"The AI said so ..."
Not only is it difficult to verify, but also the knowledge your team had of your messy codebase is now mostly gone. I would argue there is value in knowing your codebase and that you can't have the same level of understanding with AI generated code vs yours.
When the management recognize a tech debt, often it is too late that nobody understand the full requirement or know how things are supposed to work.
The AI agent will just make the same mistake human would make -- writing some half ass code that almost work but missing all sorts of edge case.
I wonder if AI will avoid the inevitable pitfalls their human predecessors make in thinking "if I could just rewrite from scratch I'd make a much better version" (only to make a new set of poorly understood trade offs until the real world highlights them aggressively)
More modular code, strong typing, good documentation... Humans are bad at keeping too much in the short-term memory, and AI is even worse with their limited context window.
If you can figure out a way to reliably assign a money value to engineering teams, you're probably a shoe in for a Nobel Prize in Economic Sciences.
The article is definitely written from a "high tech" industry lens. A mid-sized utility might spend $80-$150 million USD on IT capital projects in a year, but $2b on power pole maintenance. Utilities are a strong example, but any large enterprise manufacturing company is spending more on factory upgrades that programming.
> [...] built a functional replica of approximately 95% of Slack’s core product in fourteen days using LLM agents.
IT and Finance leadership and asset heavy companies are currently trying to wrap their head around the current economics of their 100+ SaaS contracts, and if it still makes sense with LLM powered developers. Can they hire developers in house to build the fraction of the tool they use from many of these companies, save on total cost and Opex?
I work with these companies a lot, and won't weigh in on the right decision. Bottom line "it depends" on many factors, some of which are not immediately obvious. The article still holds weight regardless of industries, but there is some nuance (talent availability, internal change cost, etc.) that also have to be considered.
And how do you put liability on an LLM agent? I outsource to SaaS and consultants because 1) they're good at what they do, and 2) if they do it wrong I can sue them, escalate, berate them in social media, etc. and get things fixed; the AI pulls from so many places who is responsible? How do I validate that?
I blackbox that I can't audit is a lot of risk compared to expensive consultants with shortcomings.
But I would like to agree with what you said with respect to SaaS spending coming under scrutiny. Our technical experts are becoming aware that we spend 5 or 6-figure sums on software with barely any users that we can clone with a coding agent in an afternoon. Eventually management will find out too and we’re going to cut a lot of dead weight.
A modern pharmaceutical manufacturing plant costs two-billion dollars just to build, and that doesn't include developing a drug to actually manufacture there, or a distribution network to sell what you make inside it.
The copy doesn’t even remotely grasp the scale of what the actual Slack sofware does in terms of scale, relaiability, observability, monitorability, maintability and pretty sure also functionality.
Author only writes about the non-dev work as difference, which seems like he doesn’t know what he’s talking about in all, and what running an application at that scale actually means.
This "clone" doesn’t get you any closer to an actualy Slack copy than a white piece of paper
And somehow twitter survived and thrived and didn't really get viable competitors until forces external to the code and product itself motivated other investment. And even then it still rolls on, challenged these days, but not by the ease of which a "clone" can be made.
Just to pick an incredibly, unbelievably basic enterprise feature, my two-week Slack clone is not going to properly support legal holds. This requires having a hard override for all deletion and expiration options anywhere in the product, that must work reliably, in order to avoid accidental destruction of evidence during litigation, which comes with potentially catastrophic penalties. If you don't get this right, you don't sell to large corporations.
And there are hundred other features like this. Engineering wants an easy-to-use API for Slack bots. Users want reaction GIFs. You need mobile apps. You need Single Sign-On. And so on. These are all table stakes.
It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.
Believe me, I wish that "simple, clean" reimplementations were actually directly competitive with major products. That version of our industry would be more fun. But anyone who thinks that an LLM can quickly reimplement Slack is an utter fool who has never seriously tried to sell software to actual customers.
The other issue is that yes, perhaps most users only use 20% of the features, but each user uses a different 20% of the features in products like Word. Trust me, it's super hard to get it right even at the end-user level, let alone the enterprise level like you say.
That’s whats need in tech too.
A clone doesn’t get you closer to that.
Of late, I've come across a lot of ideas from Rory Sutherland and my conclusion from listening to his ideas is that there are some people, who're obsessed with numbers, because to them it's a way to find certainty and win arguments. He calls them "Finance People" (him being a Marketing one). Here's an example
"Finance people don’t really want to make the company money over time. They just thrive on certainty and predictability. They try to make the world resemble their fantasy of perfect certainty, perfect quantification, perfect measurement.
Here’s the problem. A cost is really quantifiable and really visible. And if you cut a cost, it delivers predictable gains almost instantaneously."
> Choosing to spend three weeks on a feature that serves 2% of users is a €60,000 decision.
I'd really want to hire the Oracle of a PM/ Analyst that can give me that 2% accurately even 75% of the time, and promise nothing non-linear can come from an exercise.
So when you know that you are spending €60k to directly benefit small number of your users, and understand that this potentially increases your maintenance burden with up to 10 customer issues a quarter requiring 1 bug fix a month, you will want to make sure you are extracting at least equal value in specified gains, and a lot more in unspecified gains (eg. the fact that this serves your 2% of customers might mean that you'll open up to a market where this was a critical need and suddenly you grow by 25% with 22% [27/125] of your users making use of it).
You can plan for some of this, but ultimately when measuring, a lot of it will be throwing things at the wall to see what sticks according to some half-defined version of "success".
But really you conquer a market by having a deep understanding of a particular problem space, a grand vision of how to solve it, and then actually executing on both. Usually, it needs to be a problem you feel yourself to address it best!
So investing e.g. 10 million this year to build a product that produces maybe 2 million ARR will have armortized after 5 years if you can reduce engineering spend to zero. You can also use the same crew to build another product instead and repeat that process over and over again. That's why an engineering team is an asset.
It's also a gamble, if you invest 10 million this year and the product doesn't produce any revenue you lost the bet. You can decide to either bet again or lay everyone off.
It is incredibly hard or maybe even impossible to predict if a product or feature will be successful in driving revenue. So all his math is kinda pointless.
This feels ludicrously backwards to me, and also contrary to what I've always seen as established wisdom - that most programming is maintenance. (Type `most programming is maintenance` into Google to find page after page of people advancing this thesis.) I suspect we have different ideas of what constitutes "maintenance".
What do you mean by maintenance?
A strict definition would be "the software is shipping but customers have encountered a bug bad enough that we will fix it". Most work is not of this type.
Most work is "the software is shipping but customers really want some new feature". Let us be clear though, even though it often is counted as maintenance, this is adding more features. If you had decided up front to not ship until all these features were in place it wouldn't change the work at all in most cases (once in a while it would because the new feature doesn't fit cleanly into the original architecture in a way that if you had known in advance you would have used a different architecture)
It's all too common to frame the tension as binary: bean counters vs pampered artistes. I've seen it many times and it doesn't lead anywhere useful.
There's often a checklist of features management has, and meeting that list gets you in the door, but the features often never get used
I suspect this is most apparent on things like meeting culture. Something happens and all of a sudden there is another recurring meeting on the calendar, with 15 attendee's, costing x dollars in wages, that produces no value for the customers because the lesson was already learned.
Or when reacting to an incident of some sort, it's so easy to have a long list of action items that may theoretically improve the situation, but in reality are incredibly expensive for the value they produce (or the risks they reduce). It's too easy to say, we'll totally redesign the system to avoid said problem. And what worries me, is often those very expansive actions, then cause you to overlook realistic but small investments that move the needle more than you would think.
And as a hot topic I also think the costs are an input into taking on tech debt. I know we all hate tech debt with a passion, but honestly, I think of it as a tool that can be wielded responsibly or irresponsibly. But if we don't know what our attention costs, we're going to have difficulty making the responsible choices about when and where to take on this debt. And then if we're not conscious about the debt, when it comes do it stings so much harder to pay down.
It might be OK to place some bets on an initiative or feature, but if we all understand we're placing a bet, this is an area to load up on debt and really minimize the investment. This also requires an org that is mature about cutting the feature if the bet doesn't materialize, and if the market signal is generated will reinvest in paying down the debt. And also has the mega-danger territory of a weak market signal, where it's not clear if there is market signal or not, so the company doubles down into the weak signal.
Also these bets shouldn't be done in isolation in my view, well executed product and market discovery should also provide lots of relevant context on the ROI.
Consider a team of eight engineers whose mission is to build and maintain an internal developer platform serving one hundred other engineers. This is a common organizational structure, and it is one where the financial logic is rarely examined carefully.
The team costs €87,000 per month. To justify that cost, the platform they build needs to generate at least €87,000 per month in value for the engineers who use it. The most direct way to measure that value is through time saved, since the platform’s purpose is to make other engineers more productive.
At a cost of €130,000 per year, one engineer costs approximately €10,800 per month, or around €65 per working hour. For the platform team to break even, their platform needs to save the hundred engineers they serve a combined total of 1,340 hours per month. That is 13.4 hours per engineer per month, or roughly three hours per week per person.
There's a fungibility assumption which is pervasive here. In most cases, a platform team is there not "to save time".It's there to deal with cross concerns that would be not only time consuming but could be business threatening, and in some cases, you keep there more expensive engineers that ensure that certain critical things are done right.
Too much snake oil for my taste.
It's effectively a form of administrative overhead. The question isn't whether the cost is justified. You have to do it. The question is whether you're better off centralizing the function in house or not. You see a similar dynamic with any other administrative function. Should you hire a payroll company or have your own payroll department? The difference is software developers tend to believe, and in many cases might even be right, that they can individually role their own platform functions, introducing a secondary consideration, whereas the idea of having software developers simply handling their own payroll clearly makes no sense.
Of course, it's worth another of the more obvious reasons it doesn't make sense. If you give your engineers direct access to the company bank account, there's a pretty clear risk they're going to steal from you. Similarly, there's a non-economic argument to be made for having a platform team in terms of separation of duties for security. Every business decision isn't simply made by answering what does this cost and how much will it earn me? Try running a software business where the developers are responsible for physical site security and see how long you make it before getting robbed. Try not hiring any janitors, letting your developers clean their own restrooms, and see how long you retain any staff.
Unfortunately, even with all the management techniques in the world, there are just some projects that are impossible to care about. There’s simply a significantly lower cap on productivity on these projects.
Its like min-maxing a Diablo build where you want the quality of the product to be _just_ above the "acceptable" threshold but no higher because that's wasting money. Then, you're free to use all remaining points to spec into revenue.
The direct and indirect financial impact of technical decisions are indeed hard to measure. But some technical decisions definitely have greater financial impact than others. Even if it's hard to precisely quantify the financial costs/benefits of every decision. It is possible to order them relatively. X is likely to make more money than Y. So we do X first and Y later.
There is a significant amount of chance involved in whether a product/feature will even make money at all. So even good plans with measurably positive expected value could end up losing money.
Just because it's impossible to be 100% certain of the outcome of any decision. Doesn't mean we should throw the baby out with the bathwater.
I do like this bit though:
> A large codebase also carries maintenance costs that grow over time as the system becomes more complex, more interconnected, and more difficult to change safely. Every engineer added to maintain it increases coordination costs, introduces new dependencies, and adds to the organizational weight that slows decision-making. The asset and the liability exist simultaneously, and for most of the past twenty years, the financial environment masked the liability side of that equation.
And the insight that LLMs are exposing this reality is absolutely true. The funny thing is they are exposing it by accelerating both good and bad engineering practices. Teams with good engineering judgement will move faster than ever with fewer people, and teams with bad engineering judgment will bury themselves in technical debt so fast the wheels will come off.
For me, running an engineering org is primarily about talent acquisition and empowering those ICs with judgment to move quickly. How well systems and teams scale depends on the domain, product, and how it allows you to decouple things. With the right talent and empowerment there are often creative ways to make product and system tradeoffs and iterate quickly to change the shape of ROI. Any mapping to financial metrics is a hugely lossy operation that can't account for such changes. It might work in mature companies that are ossified and in the second half of their lifecycle, but in growing companies I think it's fundamentally misguided would amount to empowering the wrong people.
There totally is such a cohort. There are plenty of bootstrapped companies or startups that took only an angel round and did not benefit from the low rate environment, in fact they suffered because of the very high price of SWE labor. But those engineering managers exist and are out there right now still building efficiently, quietly growing, passionately serving customers, and keeping a close eye on the bottom line and risks because that’s their livelihood.
Here's the problem I see with how this particular article is moving though: the context of these projects are often highly technical connecting back to the human problem space. Developers sit on the technical end but they also usually have a mental model for how it connects back to the non-technical. A product manager is another addition to compensate for the user connection. Between all of these folks they can only hold so much in their head about the problem space on a day-to-day basis. And that headspace for the problem is what is critical. Management wants to try a new idea for sales? They need to take it to the team with that problem space to translate it into working code. Even with the assistance of agents, one needs to hold the important patterns in their head. And my company certainly isn't going to vibe code its way through anything regulatory, mistakes there might cost us a ton in fees and bad PR. Hell I've seen product managers sweat over the possibility of getting a few 1 star reviews on the app store.
Anyway, you still need people with context to break things down and get them out the door, the agents can just assist with the speed of the In Progress stage. And clever teams can figure out how to automate their validation (but they could already do that).
Rockstar developers often seem to be the ones who can parachute in, gain context, make changes, and leave to find another problem space. They get bogged down when they've visited 10 or more problem spaces and then they start getting called back into service. Again the agents don't change any of that, the human involved has a finite capacity for context.
Teams who structure around maintaining context might be best suited for the new world of code.
If your company runs well: won't hurt you much that you're not doing this. Otherwise this will be your end. And that really hurts because you lose the economical impact of the product and the jobs.
The argument to always go for the biggest return works OK for the first few years of high growth (though the timeline is probably greatly compressed the more you use AI), but it turns into a kind of quicksand later.
More common so in larger organizations than smaller ones
How could they not? When I penciled this out ~18 years ago, I included the amortized cost of all the interviews it took to hire a given engineer as well. It's not rocket surgery, as they say.
Money can be exchanged for goods and services.
Not sure. Because it totally depends on what they do instead. Are they utilizing two hours more every week now doing meaningful work? Or are they just taking things a bit more easy? Very hard to determine and it just makes it harder to reason about the costs and wins in these cases.
The problem is that most engineering work lacks that kind of before/after measurement. Not because it is unmeasurable, but because nobody set up the baseline. Profile before you optimize and the return on investment calculates itself.
I am saying:
Given you have saved two hours per person per week
Then the value for the company is _not_ equal to two hourly salaries per week. The consequences are just not that simple.
Why don't we instead focus our energies on the customer and then work our way backward into the technology. There are a lot of ways to solve problems these days. But first you want to make sure you are solving the right problem. Whether or not your solution represents a "liability" or an "asset" is irrelevant if the customer doesn't even care about it.
Maybe there’s some new paradigm that makes this true. But it doesn’t seem obviously true to me.
Humans make the best code long term when everything orbits a vision of the underlying problem space.
LLMs seem to only consider the deeper problem space when I explicitly flag it for them, otherwise they write “good enough for this situation” type code. And that stack of patches type code is exactly how the code becomes messy and complicated in the first place.
The problem is that technical debt is a more complex concept & thus requires more metrics to properly measure than a simple concept like velocity.
Everyone wanted to copy Meta/Google/Oracle and have internal teams, and to me internal teams have been accountability vacuums.
People want an internal team so they can go "well if we had better tooling!" when instead they should make best with what they have.
If you've ever been in a meeting with multiple L8's arguing over features, you should be able to estimate how much each hour of that meeting is costing the org.
The LLM-agent team argument also misses the core point that the engineering investment (which actually encompasses business decisions, design and much more than just programming) is what actually got Slack (or any other software product) to the point where is it is now and where it's going in the future and creating a snapshot of the current status is, while maybe not absolutely trivial, still just a tiny fraction of the progress made over the years.
Whereas Whatsapp with its 30 software engineers was the exception etc.
A chat with friends showed how there are parallels with how LLMs will happen in the short-term future - say the next 5 years - and the whole MapReduce mess. Back when Hadoop came along you built operators and these operators communicated through disk. It took years even after Spark was about for the hadoop userbase as a whole to realise that it is orders of magnitude more efficient to only communicate through disk when two operators are not colocatable on the same machine and that most operators in most pipelines can be fused together.
So for a while LLMs will be in the Hadoop phase where they are acting like junior devs and making more islands that communicate in bigger bloated codebases and then there might be a realisation in about 2030 that actually the LLMs could have been used to clean up and streamline and fuse software and approach the Whatsapp style of business impact.
"Most organizations improperly account for engineering teams and incorrectly consider both code and team growth to be assets when in fact they increase complexity..... but LLMs can fix all of this"
Wtf?
Measuring things that actually matter is a great way to improve clarity on a team, you can probably just stop reading this article at the halfway point.
EDIT:
Specifically this paragraph is insane
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
We do proxy measurements because having exact data is hard because there is more to any feature than just code.
Feature is not only code, it is also customer training, marketing - feature might be perfectly viable from code perspective but then utterly fail in adoption for reasons beyond of Product Owner control.
What I saw in comments — author is selling his consultancy/coaching and I see in comments that people who have any real world experience are also not buying it.
I do agree with his thesis in the middle, about how the ZIRP decade and the cultures that were born from that period were outrageous and cannot survive the current era. It's a brave new world, and it's not because of AI. It's because there's just not enough money flowing anymore, and what little is left is sucked up by the big boys (AI).
What experience is this guy basing this on? My guess is absolutely none at all.
Maybe this will be the case in the future, but as of right now if I cut 10 agents loose for 10 days one of our repos at work and tell them to clean it up but but keep the tests passing, we’d be drowning in support tickets.
Tests don’t cover all observable behavior. Every single production bug we’ve had made it through the test suite.
Also this guy only had a vague idea of how platform engineering teams work in large organizations.
Platform teams are the engineering org’s immune system. They’re how we fight back against the tech debt accumulated by the relentless march of features of the week.
If anything the extra code people are cranking out with AI make them more necessary.
Cost of delay: calculating the cost of delaying by a few weeks in terms of lost revenue (you aren't shipping whatever it is you are building), total life value of the product (your feature won't be delivering value forever), extra cost in staffing. You can slap a number on it. It doesn't have to be a very accurate number. But it will give you a handle on being mindful that you are delaying the moment where revenue is made and taking on team cost at the cost of other stuff on your backlog.
Option value: calculating the payoff for some feature you add to your software as having a non linear payoff. It costs you n when it doesn't work out and might deliver 10*n in value if it does. Lean 1.0 would have you stay focused and toss out the option for that potential 10x payoff. But if you do a bit of math, there probably is a lot of low hanging fruit that you might want to think about picking because it has a low cost and a potential high payoff. In the same way variability is a good thing because it gives you the option to do something with it later. A little bit of overengineering can buy you a lot of option value. Whereas having tunnel vision and only doing what was asked might opt you out of all that extra value.
A bad estimation is better than no estimation: even if you are off by 3x, at least you'll have a number and you can learn and adapt over time. Getting wildly varying estimates from different people means you have very different ideas about what is being estimated. Do your estimates in time. Because that allows you to slap a dollar value on that time and do some cost calculations. How many product owners do you know that actually do that or even know how to do that?
Don't run teams at 100% capacity. Work piles up in queues and causes delays when teams are pushed hard. The more work you pile on the worse it gets. Worse, teams start cutting corners and take on technical debt in order to clear the queue faster. Any manufacturing plant manager knows not to plan for more than 90% capacity. It doesn't work. You just end up with a lot of unfinished work blocking other work. Most software managers will happily go to 110%. This causes more issues than it solves. Whenever you hear some manager talking about crunch time, they've messed up their planning.
Stretching a team like that will just cause cycle times to increase when you do that. Also, see cost of delay. Queues aren't actually free. If you have a lot of work in progress with inter dependencies, any issues will cause your plans to derail and cause costly delays. It's actually very risky to do that if you think about it like that. If you've ever been on a team that seemingly doesn't get anything done anymore, this might be what is going on.
I like this back of the envelope math; it's hard to argue with.
I used to be a salaried software engineer in a big multinational. None of us had any notion of cost. We were doing stuff that we were paid to do. It probably cost millions. Most decision making did not have $ values on them. I've since been in a few startups. One where we got funded and subsequently ran out of money without ever bringing in meaningful revenue. And another one that I helped bootstrap where I'm getting paid (a little) out of revenue we make. There's a very direct connection between stuff I do and money coming in.
I keep seeing this assumption that "unmanageable" caps out at "kinda hard to reason about", and anyone with experience in large codebases can tell you that's not so. There are software components I own today which require me to routinely explain to junior engineers (and indeed to my own instances of Claude) why their PR is unsound and I won't let them merge it no matter how many tests they add.
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
LLMs are not conscious, that means left on their own devices they will drift. I think the single most important issue when working with LLMs is that they write text without a layer that are aware what's actually being written. That state can be present in humans as well, like for example in sleepwalking.
Everyone who's tried to to complete vibe coding a somewhat larger project knows that you only get to a certain level of complexity until the model stops being able to reason about the code effectively. It starts to guess why something is not working and cannot get out of that state until guided by a human.
That is not new state in the field, I believe all programmers has at points in their career come across code that's been written with developers needing to get over a hard deadline with the result of a codebase that cannot effectively be modified.
I think for a certain subsets of programming projects some projects could possibly be vibe coded as in that code can be merged without human understanding. But it has to be very straightforward crud apps. In almost everything else you will get stopped by slop.
I suspect that the future of our profession will shift from writing code to reading code and to apply continuous judgement on architecture working together with LLMs. Its also worth keeping in mind that you cannot assign responsibility to an LLM and most human organization requires that to work.
Citation needed. A human engineer can grok a lot in 10 days, and an agent can spend a lot of tokens in 10 days.
I guess his students get to relearn that on their own.
Also, any post talking about building software and then contains the suggestion that "cost per unit" is an efficiency metric needs to come to the red courtesy phone, Taylorism would like to have a chat about times gone by.
In many companies there are 3 to 5 other people per developer (QA, agile masters, PO, PM, BA, marketing, sales, customer support etc.). The costs aren't driven just by the developer salaries.
A CEO can cost as much as 10 developers, sometimes more.
There is something different about CEOs that came from tech.
> A neutral interpretation would be that "flying blind" is to "operate without perfect information". It is a simple description of operating conditions, not a derogatory term in any way.
Entering it would also have put less wear and tear on the input device.
The pilot who is "flying blind" has perfectly normal eyeballs. They are not necessarily a member of any minority group, except for their chosen profession.
_____
As for "blind" being a word that appears more frequently in a negative rather than positive way... Well, I'm not sure what to tell you, that's just 10,000+ years of language from a species that evolved to prefer seeing.
To offer an example of the positive case, the idiom "justice is blind". Yes, there is a popular cultural mascot wearing a strip a fabric over her eyes, but again: The justice doesn't actually involve any (real) personal medical condition, and it's considered a positive feature for the job.
Like "drinking" and "driving". On their own, they're both neutral, but "drinking and driving" is really bad.
The author specifically says FLYING blind. Not "stumbling around like a blind person" or some such. If you are offended, that is on you. It's your right to be offended of course, but don't expect people to join in your delusion.
It just means you don't know something, which is usually a relatively bad situation for you, but it doesn't make you a bad person.
If you think otherwise, that's on you.