I don't know if I agree with either assertion… I've seen plenty of human-generated knowledge work that was factually correct, well-formatted, and extremely low quality on a conceptual level.
And AI signatures are now easy for people to recognize. In fact, these turns of phrase aren't just recognizable—they're unmistakable. <-- See what I did there?
Having worked with corporate clients for 10 years, I don't view the pre-LLM era as a golden age of high-quality knowledge work. There was a lot of junk that I would also classify as a "working simulacrum of knowledge work."
Most importantly, those sources of errors tend to be consistent. I can trust a certain intern to be careful but ignorant, or my senior colleague with a newborn daughter to be a well of knowledge who sometimes misses obvious things due to lack of sleep.
With AI it's anyone's guess. They implement a paper in code flawlessly and make freshman level mistakes in the same run. so you have to engage in the non intuitive task of reviewing assuming total incompetence, for a machine that shows extreme competence. Sometimes.
AI signatures don't mean low quality, they just mean AI. And humans do use them (I have always used the common AI signatures). And yes, humans produce good-looking garbage, but much more commonly they produce bad-looking garbage. This is all tangential to the point.
It is valuable to have this, because it the work passes the first check then it easier to identify the actual problems. Same reason we have code quality, lint style fixed before reasoning with the actual logic being written.
You might spot these very obvious constructs and still miss 99% of AI generated text because it has no tells. Yet you don’t know that 99% was generated, and since you spot 100% of the pattern you outlined you think no AI generated text makes it past you.
Yes, I don't think this matters. Much of "knowledge work" was always a proxy for something else.
High quality in terms of typos and errors is mainly a signal of respect in a similar way to wearing ironed white shirts with neck-ties. "Walls of text" that no one is expected to read in depth. Basically a symbolic demonstration of sacrifice and subservience (or something). LLMs remove this mode of signalling.
If quality of content wasn't examined before, it was probably never particularly important.
This is especially true if we start to see more of a split in usage between LLMs based on cost. High quality frontier models might produce better work at a higher cost, but there is also economic cost pressure from the bottom. And just like with human consultants or employees, you’ll pay more for higher quality work.
I’m not quite sure what I’m trying to argue here. But the idea that an LLM won’t produce a low quality report just seemed silly to me.
Working in a team isn’t adversarial, if i’m reviewing my colleague’s PR they are not trying to skirt around a feature, or cheat on tests.
I can tell when a human PR needs more in depth reviewing because small things may be out of place, a mutex that may not be needed, etc. I can ask them about it and their response will tell me whether they know what they are on about, or whether they need help in this area.
I’ve had LLM PRs be defended by their creator until proven to be a pile of bullshit, unfortunately only deep analysis gets you there
Putting a high level of polish on bad ideas is basically the grifter playbook. Throughout the business world you will find workers and entire businesses who get their success by dressing up poor ideas and bad products with all of the polish and trimmings associated with high quality work.
You wouldn't use a calculator that is as good as a human and makes mistakes as often.
We're cargo culting "the manager view". Like the critic you can read on Bret Devereaux's blog about Game Of Thrones having been written from an elite's point of view, it's utopian and sounds good ... for the elites, the people who benefit from the hard work they never have to do themselves. But like any elite bubble wildly disconnected from reality, this one will fall bad. Maybe French revolution bad, when the answer to the masses of unemployed "displaced" by AI screaming "we can't get a piece of bread to eat" is "let them eat cake instead".
i'd say it's been a huge distraction for him and the obsession over using LLM for Big Wikiz hasn't yielded anything near what he thought the tech was for. few occasions now he's learned the hard way how imperfect the technology is.
between that and everyones grand visions for agentic workflows i've mostly just receded into being one of the few who is still regularly delivering stuff. i'm using AI to speed my delivery up quite a bit, i'm just not wasting my time taking it on some big grand adventure. the irony that a lot of people pushed back on companies who wanted to implement chat bots and they spend most of their credits/tokens making their own chat bots by collecting six trillion .md files and adding skill files.
my real takeaway is this: i've come to reason that there is some sort of loss in actual real institutional knowledge when we attempt to take shortcuts to growing the breadth of our own knowledge. i don't mean "hey claude give me some examples of how companies typically design x to solve for y" or "golang is new to me, what are the benefits of a compiled language versus something that requires a runtime going".
no, i'm talking about these kinds of questions:
"/somePersonalBigWikiProjectInvokedBySkill.md claude review our current tooling and infrastructure, how can we 5x our deployment speed, then search the web for <some SaaS company> and put a proposal together to get it implemented at the organization and include a 5 year cost benefit analysis and ... "
i look around and it feels like everyone is nerfing themselves. that latter question? people are just sending claude proposals left and right. my eyes have completely glazed over. is it really that hard to do some digging yourself? we're already ceding the ability to just go grab an architect or senior engineer and ask him what he thinks about how <some SaaS company> will fit with the broader suite of technologies and visions on the horizon. we're just skipping the pieces where we do a little discovery together and work together on an outcome. we're walking away with surface level understanding of many things.
this clearly has visible impacts on how we engage with each other, there's something there that I'm noticing and don't have the words for. it's mostly that people are less able to explain what it is they're talking about when pressed for deeper details, but also everyone's behavior is now different because AI sort of... makes them feel like they have definitive answers/strategies and they're no longer willing to have their ideas challenged. they no longer see that as a learning experience, a chance to learn from someone who has wisdoms who is already a walking wikipedia on something. the perfect technology for people who hate when someone with way more experience than them says "maybe not a good idea and here's why"
i've met some interesting people who are just... walking encyclopedias on some or many domains. incredibly smart people who have so much knowledge and wisdom and so many years of experience not just with tech but with people and failures and successes. i don't doubt for a second that the human brain is capable of holding an unbelievable index of information in a natural way that marries well with decision making processes that come from experience. i'm not sure what gap people are trying to close building themselves some proverbial great library here, but i would encourage people to just sit back and trust that their brain is still one of the greatest technologies at their disposal.
> i'm not sure what gap people are trying to close building themselves some proverbial great library here, but i would encourage people to just sit back and trust that their brain is still one of the greatest technologies at their disposal.
Culturally I think this is going to fuck things up significantly. If I take the time to read all of the latest papers in the LLM space, I'm damn well not going to summarize it or document what I've learned for anyone. (Maybe this is why there are not many high quality books aggregating all of this information in all the latest papers, all of the advancements, etc. All the people doing this work would rather (smartly) milk the cash cow and maintain the information asymmetry.)
Or think about open source, this will kill it for people trying to make money off a product and keep it open source. Because someone could spin up a competitor overnight.
AI is going to make the information easier to acquire for cheap. But it's going to absolutely destroy the incentive structure and trust required to have an open exchange of information. It was already bad enough because the industry is not incentivized to produce quality literature for educational purposes like academia is. But after this, it'll be a complete shit show
>this clearly has visible impacts on how we engage with each other
> there's something there that I'm noticing and don't have the words for.
Welcome to ASI takeoff!
This is actually a good idea because it's a very cheap way to build your own industrial-strength search engine. We've forgotten how cool search engines are because Google's is so shit now.
(Although you don't need Claude, you can self-host this with minimal effort now.)
But if you are trying to understand something well, there is no better tool for helping you than AI.
First issue is this result from reinforcement learning that tells you that you really want to be doing a large fraction of stuff stuff on policy when possible.
It's true of RL agents, but I think it's actually just a universal learning result that applies to humans. Sure you could ask AI to solve a difficult math problem step by step, and what it can expose you to is tricks you had no idea about and the general method of solving such a problem.
But there is something about the work that you produced without external influence (the on policy epispde) that is sort of irreplaceably important.
The second is that there is something about the speed and conciseness of information AI presents to you. It seems like a super power but there are two problems I have with it.
A) It's too fast. Unless you are artificially slowing yourself down by reading like one sentence per minute there is something about how quickly all you want gets presented to you that seems to have a strong in one ear out the other sort of effect. You need to slow down. You need to appreciate the details.
B) It's also often too consise. There is something about doing research yourself that lets you stumble upon something new that you might not have thought was helpful. Lots of times I've found lots of amazing nuggets on missteps and tangents.
There are more issues as well, but these are the major two I get concerned about. Like you need to be cognizant of the work not being done when you are using AI to do research. And imo it's deeply problematic for young students who have literally never done the hard work of trying to answer questions themselves. Because they might not realize the problem.
Could not disagree more.
The best way to understand something deeply is to practice it. AI is anti-practice. It's like trying to learn something by following a YouTube video step by step. It has an outcome and it feels productive but it's not going to stick in your head at all. It's not practice
am I losing out on something by not having to spend hours clicking through redundant parts of a large codebase to get a concrete answer on something? doesn't feel like it
It is not so much that the "tells" of a poor quality work are vanishing, but that even careful scrutiny of a work done with AI is going to become too costly to be done only by humans. One only has so much time to read while, say, in economics journals, the appendices extend to hundreds of pages.
Would love to hear if other fields' journals are experiencing a similar pressure in not only at the extensive margin (no of new submission) but the intensive margin (effort needed to check each work).
`simulacrum` is a great word, gotta add that to my vocabulary.
We're creating forces bigger than ourselves, and we may reach a point of no return.
It's a weird space in middle management where all of the incentives other than true competency in the role push you to abstract the knowledge work that you're managing, and that abstraction seems to well describable in embedding space.
It's not quite as dire as this. One of the main reasons why LLM's are getting better over time is that they are used themselves to bootstrap the next generation by sifting through the training set to do 'various things' to it.
People often forget that the training corpus contains everything humanity ever produced and anything new humanity will produce will likely come from it as well. Torturing it with current generation models is among the most productive things you can do to improve the next generation systems.
Aligned with the theory of Bullshit Jobs - LLMs expose the fact that the white collar work most of us have been doing at this point were actually bullshit. When LLMs "fake" work, it actually hides the reality that there was no meaningful work here in the first place.
Verifying the correctness of solutions is often much easier than finding correct solutions yourself. Examples: Sudoku and most practical problems in just about any field.
-
"The training doesn't evaluate 'is the answer true' or "is the answer useful.'"
Lets pretend RLVF does not exist to give this argument a chance. Then, while the training loop does not validate accuracy directly I guess, the meta-training loop still does. When someone prompts a model, the resulting execution trace shows if the generated answer is correct or not, and this trace is kept for subsequent training runs. The way coding agents are used productively is not: a) generate code with AI and b) run it yourself; its a) ask the AI to do something, including generating the code and running it too, no step b. This naturally creates large training sets of correct and incorrect solutions.
-
"We spent billions to create systems used to perform a simulacrum of work."
Have you even tried using these systems to produce valuable work? How could this possibly be your conclusion after having tried them?
Honestly, this has not been my experience at all. Defining what a good solution looks like is most of the battle in Operational Research. But, trying to be constructive, maybe we have identified a sort of diving line between areas where AI is more vs. less helpful.
>Have you even tried using these systems to produce valuable work? How could this possibly be your conclusion after having tried them?
The operative words there are used to, as opposed only able to. The conclusion isn't derived from using the tools, it's from observing how other people tend to use them.
In order to verify correctness you need to understand what correctness is in context, which is actually pretty hard to do if you can't actually find correct solutions yourself, or even if you can but haven't bothered to do so
Why is that not an embarrassment for everyone who moans and carps and complains about the craft?
If someone was already evaluating the work output using a metric closer to the underlying quality then it might not have been a big shift for them (other than having much more work to evaluate).
You could however only do that if you were fine with unfairly judging the quality of work, as you now readily discarded quality work based on superficial proxies. Which admittedly is done in a lot of cases.
Reinforcement Learning with Verifiable Rewards (RLVR) to improve math and coding success rates seems like an exception.
Yes.
This does not however mean that progress is not being made.
It just means the progress is happening along such dimensions that are completely illegible in terms of the culture of the early XXI century Internet, which is to say in terms of the values of the society which produced it.
For most tasks, the complexity/time required to verify a task is << the time required to do the task itself. Sure there can be hallucinations on the graph that the LLM made. But LLMs are hallucinating much less than before. And the time to verify is much lower than the time required for a human to do the task.
I wrote a post detailing this argument https://simianwords.bearblog.dev/the-generation-vs-verificat...
Are LLM a good dictionary of synonyms ? Perhaps, but is it relevant ? Not at all
Are you biased when a solution is presented to you ? Yes, like all humans.
Is it damageful when said solution is brain-dead ? Obsiously.
Are you failing to understand that most (if not all) manager's work is human centric and, as such, cannot be applied to a non-human ? Obviously ..
You trust a machine's intent. Joke's on you, it has no intent at all, it will breaking that "trust" your pour in it without even realizing-it
You say that LLM does better job than you. Perhaps this says it all ?
I can see a similar problem with this article - the author notices that LLMs produce a lot of errors - then concludes that they are useless and produce only simulacrum of work. The author has an interesting observation about how llms disrupt the way we judge knowledge work. But when he concludes that llms do only simulacrum of work - this is where his arguments fail.
Wait, you're probably talking about the test of discarding a report based on something superficial like spelling errors. Which fails with LLMs due to their basic conman personalities and smooth talking. And therefore ..?
This is not true as stated. I'd try to gloss over the absolutes relative to the context, but if I'm totally honest, I'm not sure I understand what idea you're trying to communicate.