I can’t imagine any other example where people voluntarily move for a black box approach.
Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.
What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.
Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?
Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.
There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.
Put another way, if you don't know what correct is before you start working then no tradeoff exists.
I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.
> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly
This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.
Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?
The output of code isn't just the code itself, it's the product. The code is a means to an end.
So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.
I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.
I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?
If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code
So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.
Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.
And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?
My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.
Specious analogies don't help anything.
To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.
My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.
It is right often enough that your time is better spent testing the functionality than the code.
Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).
I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.
It's producing seemingly working code faster than you can closely review it.
I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.
Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.
Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.
>Reading and understanding the whole output of 9 concurrently running agents is impossible.
I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?
I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.
That's not a black box though. Someone is still reading the code.
> At some point you have to let go and trust people‘s judgement
Where's the people in this case?
> People who do that (I‘m not one of them btw) must rely on higher level reports.
Does such a thing exist here? Just "done".
But you are not. That’s the point?
> Where's the people in this case?
Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.
> Does such a thing exist here? Just "done".
Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.
The AI also writes the black box tests, what am I missing here?
If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.
In other words, if “the ai is checking as well” no one is.
etc...etc...
> In other words, if “the ai is checking as well” no one is.
"I tried nothing, and nothing at all worked!"
code is not the output. functionality is the output, and you do look at that.
Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.
Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.
There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?
Never thought this would be something people actually take seriously. It really makes me wonder if in 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.
The author of the article has a bachelor's degree in economics[1], worked as a product manager (not a dev), and only started using GitHub[2] in 2025 when they were laid off[3].
[1] https://www.linkedin.com/in/benshoemaker000/
You have to remember that the number of software developers saw a massive swell in the last 20 years, and many of these folks are Bootcamp-educated web/app dev types, not John Carmack. They typically started too late and for the wrong reasons to become very skilled in the craft by middle age, under pre-AI circumstances and statistically (of course there are many wonderful exceptions; one of my best developers is someone who worked in a retail store for 15 years before pivoting).
AI tools are now available to everyone, not just the developers who were already proficient at writing code. When you take in the excitement you always have to consider what it does for the average developer and also those below average: A chance to redefine yourself, be among the first doing a new thing, skip over many years of skill-building and, as many of them would put it, focus on results.
It's totally obvious why many leap at this, and it's even probably what they should do, individually. But it's a selfish concern, not a care for the practice as-is. It also results in a lot of performative blog posting. But if it was you, you might well do the same to get ahead in life. There's only to so many opportunities to get in on something on the ground floor.
I feel a lot of senior developers don't keep the demographics of our community of practice into account when they try to understand the reception of AI tools.
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Notice "don't read the diffs anymore".
That happens just as often without AI. Maybe the people that like it all thave experience with trashing multiple sets of products over the course of their life?
I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.
In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.
In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).
I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.
I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.
What is your opinion and did you vote this down because you think it's silly, dangerous or you don't agree?
My overall stance on this is that it's better to lean into the models & the tools around them improving. Even in the last 3-4 months, the tools have come an incredible distance.
I bet some AI-generated code will need to be thrown away. But that's true of all code. The real questions to me are - are the velocity gains be worth it? Will the models be so much better in a year that they can fix those problems themselves, or re-write it?
I feel like time will validate that.
We have been throwing away entire pieces of software forever. Where's Novell? Who runs 90s Linux kernels in prod?
Code isn't a bridge or car. Preservation isn't meaningful. If we aren't shutting the DCs off we're still burning the resources regardless if we save old code or not.
Most coders are so many layers of abstraction above the hardware at this point anyway they may as well consider themselves syntax artists as much as programmers, and think of Github as DeviantArt for syntax fetishists.
Am working on a model of /home to experiment with booting Linux to models. I can see a future where Python in my screen "runs" without an interpreter because the model is capable of correctly generating the appropriate output without one.
Code is ethno objects, only exists socially. It's not essential to computer operations. At the hardware level it's arithmetical operations against memory states.
Am working on my own "geometric primitives" models that know how to draw GUIs and 3D world primitives, text; think like "boot to blender". Rather store data in strings, will just scaffold out vectors to a running "desktop metaphor".
It's just electromagnetic geometry, delta sync between memory and display: https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...
I can't imagine not reading the code I'm responsible for any more than I could imagine not looking out the windscreen in a self driving Tesla.
But if so many people are already there, and mostly highly skilled programmers imagine in 2 years time with people who've never programmed!
What have you tried to use them for?
And for anything really serious? Opus 4.5 struggles to maintain a large-scale, clean architecture. And the resulting software is often really buggy.
Conclusion: if you want quality in anything big in February 2026, you still need to read the code.
That goes a bit against the article, but it's not reading code in the traditional sense where you are looking for common mistakes we humans tend to make. Instead you are looking for clues in the code to determine where you should improve in the docs and specs you fed into your agent, so the next time you run it chances are it'll produce better code, as the article suggests.
And I think this is good. In time, we are going to be forced to think less technically and more semantically.
https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
Is it a nano banana tendency or was it probably intentional?
Here's the prompt I used, actually:
Create a vibrant, visually dynamic horizontal infographic showing the spectrum of AI developer tools, titled "The Shift Left"
Layout: 5 distinct zones flowing RIGHT TO LEFT as a journey/progression. Use creative visual metaphors — perhaps a road, river, pipeline, or abstract flowing shapes connecting the stages. Each zone should feel like its own world but connected to the others.
Zones (LEFT to RIGHT):
1. "Specs" (leftmost) - Kiro logo, VibeScaffold logo, GitHub Spec Kit logo
Label: "Requirements → Design → Tasks"
2. "Multi-Agent Orchestration" - Claude Code logo, Codex CLI logo, Codex App logo, Conductor logo Label: "Parallel agents, fire & forget"
3. "Agentic IDE" - Cursor logo, Windsurf logo Label: "Autonomous multi-file edits"
4. "Code + AI" - GitHub Copilot logo Label: "Inline suggestions"
5. "Code" (rightmost) - VS Code logo Label: "Read & write files"
Visual style: Fun, energetic, modern. Think illustrated tech landscape or isometric world. NOT a boring corporate chart. Use warm off-white background (#faf8f5) with amber/orange (#b45309) as the primary accent color throughout. Add visual flair — icons, small illustrations, depth, texture, but don't make it visually overloaded.Aspect ratio: 16:9 landscape
The answer is clear: I didn’t write the code, I didn’t read it, I have no idea what it does, and that’s why it has a bug.
So basically a return to waterfall design.
Rather than YOLO planning (agile), we go back to YOLO implementation (farming it out to dozens of replaceable peons, but this time they're even worse).
What it's trying to express is that the (T)PM job still should still be safe because they can just team-lead a dozen agents instead of software developers.
Take with a grain of salt when it comes to relevance for "coding", or the future role breakdown in tech organizations.
I'm not trying to express that my particular flavor of career is safe. I think that the ability to produce software is much less about the ability to hand-write code, and that's going to continue as the models and ecosystem improve, and I'm fascinated by where that goes.
So humble. Who is he again?
> Senior Technical Product Manager
yeah i'd wager they didn't read (let alone write) much code to begin with..
This blog post is influencer content.
Which is perhaps what they should do, of course. Any transition is a chance to get ahead and redefine yourself.
Agents are a quality/velocity tradeoff (which is often good), if you can't debug stuff without them that's a problem as you'll get into holes, but that doesn't mean you have to write the code by hand.
Note though we're talking about "not reading code" in context, not the writing of it.
Recently I picked a smallish task from our backlog. This is some code I'm not familiar with, frontend stuff I wouldn't tackle normally.
Claude wrote something. I tested, it didn't work. I explained the issue. It added a bunch of traces, asked me to collect the logs, figured out a fix, submitted the change.
Got bunch of linter errors that I don't understand, and that I copied and pasted to Claude. It fixed something, but still got lint errors, which Claude dismissed as irrelevant, but I realized I wasn't happy with the new behavior.
After 3 days of iteration, my change seems ok, passed the CI, the linters, and automatic review.
At that stage, I have no idea if this is the right way to fix the problem, and if it breaks something, I won't be able to fix it myself as I'm clueless. Also, it could be that a human reviewer tells me it's totally wrong, or ask me questions I won't be able to answer.
Not only, this process wasn't fun at all, but I also didn't learn anything, and I may introduce technical debt which AI may not be able to fix.
I agree that coding agents can boost efficiency in some cases, but I don't see a shift left of IDEs at that stage.
Code health is a choice. We have power tools now. All you have to do is ask.
You may read all the assembly that your compiler produces. (Which, awesome! Sounds like you have a fun job.) But I don't. I know how to read assembly and occasionally do it. But I do it rarely enough that I have to re-learn a bunch of stuff to solve the hairy bug or learn the interesting system-level thing that I'm trying to track down if I'm reading the output of the compiler. And mostly even when I have a bug down at the level where reading assembly might help, I'm using other tools at one or two removes to understand the code at that level.
I think it's pretty clear that "reading the code" is going to go the way of reading compiler output. And quite quickly. Even for critical production systems. LLMs are getting better at writing code very fast, and there's no obvious reason we'll hit a ceiling on that progress any time soon.
In a world where the LLMs are not just pretty good at writing some kinds of code, but very good at writing almost all kinds of code, it will be the same kind of waste of time to read source code as it is, today, to read assembly code.
Compilers predictably transform one kind of programming language code to CPU (or VM) instructions. Transpilers predictably transform one kind of programming language to another.
We introduced various instruction architectures, compiler flags, reproducible builds, checksums exactly to make sure that whatever build artifact that's produced is super predictable and dependable.
That reproducibility is how we can trust our software and that's why we don't need to care about assembly (or JVM etc.) specifics 99% of the time. (Heck, I'm not familiar with most of it.)
Same goes for libraries and frameworks. We can trust their abstractions because someone put years or decades into developing, testing and maintaining them and the community has audited them if they are open-source.
It takes a whole lot of hand-waving to traverse from this point to LLMs - which are stochastic by nature - transforming natural language instructions (even if you call it "specs", it's fundamentally still a text prompt!) to dependable code "that you don't need to read" i.e. a black box.
But with the AI tools we're not yet at the wave of "sometimes it's good to read the code" virtue signaling blog posts that will make front page next year or so, and still at the "I'm the new hot shit because I don't read code" moment, which is all a bit hard to take.
Even in those environments, I'd argue that AI coding can offer a lot in terms of verification & automated testing. However, I'd probably agree, in high-stakes safety environments, it's more of a 'yes and' than an either/or.
What’s worth paying for is something that is trustworthy.
Claude code is a perfect example: They blocked tools like opencode because they know quality is the only moat, and they don’t currently have it.
Also, the generated picture in this post makes me want to kick someone in the nuts. It doesn't explain anything.
Is the image really not that clear? There are IDE-like tools that all are focusing on different parts of the Spec --> Agent --> Code continuum. I think it illustrates that all right.
the constant asking drives me crazy
9/10 my ai generated code is bad before my verification layers 9/10 its good after.
Claude fights through your rules. And if you code in another language you could use other agents to verify code.
This is the challenge now, effectively verify the code. Whenever I end up with a bad response I ask myself what layers could i set to stop AI as early as possible.
Also things like namings, comments, tree traversal, context engineering, even data-structures, multi-agenting. I know it sounds like buzzword, but these are the topics a software-engineer really should think about. Everything else is frankly cope.