That's your idea, not the one they are going with.
Their idea is that you pay a fee to access any information that was freely available.
Your idea is tearing down of fences, their idea is gatekeeping. The two ideas are incompatible.
An LLM containing the information doesn’t take away from the book being available at the library.
It’s an additional way to access the information. A company charging a fee for it doesn’t stop you from going to the library if you want to.
> Your idea is tearing down of fences, their idea is gatekeeping. The two ideas are incompatible.
You act like the parent commenter is permanently stealing the book from the library and gifting it to a private training set.
Information being available from more places, even if some are paid, doesn’t mean gatekeeping.
There are also open weight LLMs that can be run locally. Some of these are being fine tuned for specific topics against topical datasets which is opening up even more interesting opportunities (this is exactly what the linked article is about)
[0] among other things…
[1] more like ‘often not at all’
So should the original authors, no? That is, getting a share of that payment.
Something akin to the German GEMA could work, an entity that levies a usage fee on behalf of all copyright holders and re-distributes to its members, but on a global scale.
Should they? Yes. Will they?
Well, do LLM model builders pay for any copyrighted work so far?
I was thinking along the lines of concepts that already exist, such as the private copying levy [0]. It basically forces a blanket tax on a certain class of products, which then gets redistributed to members of a collecting society such as GEMA [1].
This way, you would force LLM model builders to effectively pay a tax by law. Since these models do not work at all without underlying content, make it proportionate. Let's say 50-70% to make it fair.
[0] https://en.wikipedia.org/wiki/Private_copying_levy
[1] https://en.wikipedia.org/wiki/GEMA_(German_organization)
And that will eventually be distilled into open weighted models.
There are plenty of open models you can download today and run. No gatekeeping. No fencing.
This whole "AI is evil" trope is getting a bit tired.
It’s the same as asking “should you release open source software knowing that AI companies are training on them”. I could absolutely not care less, that’s not the point why I release my software to the public at all.
I rotate through the libraries near me with my kids.
They are every bit as busy now as I remember them being when I was a kid.
A recent executive order prohibits libraries (among other non-profits) from processing US passport applications. While county clerks (in my state) along with a small number of post office locations also offer this service, the libraries were doing it for free as opposed to charging $50-ish (like the post office or county clerks).
Why might the passport issue be important? The SAVE Act (passed the House of Representatives last year and sitting before the Senate) only permits 4 identification items to register to vote for Federal elections:
1 - A US Passport (costs about $100 to renew, about $150 for first time).
2 - A US Military ID that has proof of US citizenship (CAC cards show this with a white background behind your name - yellow or blue for contractors or non-US citizens). IDs for retirees don't show citizenship.
3 - A REAL ID compliant driving license that has proof of US citizenship. Also called "Enhanced Driving License", on the front it has a US flag and the back looks like the page on your passport with those funny letters. Only 5 states offer this as an extra $30-40 on top of the regular driving license fee.
4 - A REAL ID compliant driving license/ID and certified birth certificate and the names must match exactly. This means that 74 million women who took their husbands' name will not be voting in Federal Elections. Also, no transgender people can vote.
The SAVE Act also requires voter registration agencies to send voter rolls to DHS every month. And every month DHS can throw people off the voter rolls with no warning, no notice nor recourse. One can easily imagine this being done right before elections where people who registered for the "wrong" political party will be thrown off the rolls after the deadline to register.
Project 2025 wants to repeal the 19th Amendment. Throwing 74 million women off the voter rolls is just a start.
Links:
SAVE Act text - https://www.congress.gov/bill/119th-congress/house-bill/22/t...
https://www.congress.gov/bill/119th-congress/house-bill/22/t...
2. The Chinese have been investing a lot into free models, they're perfectly good and keep improving; despite the best efforts of the US. They're even ramping into making their own hardware. Gemma 4 is pretty snappy too. It doesn't seem like there is much of a moat to this, my guess is there will be perfectly good local models if you want to avoid AI companies.
When the person paying the money is rich, the other thing they are foregoing is typically not a life necessity. When the person is poor, however, it typically is.
I highly encourage you to go and update your priors.
That's what the money pays for when the Comment above mentions 'that you might have to eventually pay an AI company a large amount of money to ask ChatGPT such a question'
Putting aside that it won't be a large amount of money For any particular query , that's how the AI companies see themselves, not as providers of information, but as providers of mechanisms that provide information. It is not selling the Information of others, it isn't selling information at all. They are selling the service of running the mechanism.
I'm always going to have a machine anyway—might as well max out the RAM when I purchase another.
(And so too I jumped on the Mac mini bandwagon a month or two back—64 GB. I'm enjoying pulling down the new models and putting them through my paces.)
It doesn't look like they have a way to filter down to "open" models. By this of course I mean "downloadable, local models".
I suppose if you know the "family" (Gemma, Qwen, etc.), I can just go to those models and test…
I've simply been pulling down what is popular from the LM Studio front end (and what runs on my hardware) and testing in situ.
> while the library itself has lost funding
Libraries are inherent parts of universities. While their precise role evolves, do you think that they will just be done away with? Already a substantial amount of scholarship in disciplines other than my own has moved online (legally), and the library is still there.
Clarification:
To maintain the library still requires resources & effort to do so. It only appears to need no funding because the donators of said (disk space / bandwidth / dev effort) are subsidizing it in aid of a goal they believe in (i.e. the church model).
There are plenty of free models with RAG support. Why do you believe everything starts and ends with a major corporation charging a subscription?
If the obscure book/text is permanently lost forever under your stringent advice of "no stealing under any circumstances", would the "stealing" have saved it? If so, is it ethical to prevent others from accessing the book/text, under your guise of "preventing stealing"?
Second, it is totally legal to read the book in a public library, for free, right now.
Third, laws can change. Current copyright law was pushed by one company (Disney) to +90years, to their benefit, and can be redesigned/pushed back by AI companies, for their benefit.
A 2 year copyright duration sounds like a good compromise.
By quoting your comment in my reply, have I "stolen" your comment?
"Deal with the ethics", seriously? You might want to learn about how heavily shadow libraries are used across academia now. It’s no longer just disadvantaged scholars in the developing world relying on pirated scans because they don’t have good libraries. It’s increasingly everyone everywhere, because today’s shadow libraries can be faster and more convenient than even one’s own institution’s holdings. At conferences, if the presenter mentions a particularly interesting publication, you can sometimes watch several people in the room immediately open LibGen or Anna’s Archive on their laptop to download it right there and then.
I've published a couple of novels. They've sold far better than average, and yet not sold enough to be remotely worth it if I did it for the money. Piracy might have made a tiny dent, but the many millions of competing novels matters far more.
Anyone who has self published will have experienced that it is hard to even get people to read (as opposed to just download to hoard) your work even for free.
It's more comfortable to blame piracy, though.
If you're writing for money, maybe. If you're writing for the love of writing, it won't.
More, you hear of authors who encourage their books to be made available without DRM, who know or silently encourage their books to end up on torrent / library sites. They want their books to be read.
He didn't mention legality. The world is rigged, as you can see by head of state participating in both in running and cover up of history's largest CSE. Watch what people are doing in addition to what they are saying.
I for one am tremendously thankful for TFNA's efforts, since I get access to knowledge that I wouldn't have been able to before.
Separately, aren't always sensible or right - slavery was legal, child marriage was legal, not paying taxes on billions of profits is legal while not paying taxes of £1000 is illegal, reporting Jews to Nazis was mandatory, etc, etc.
Because you don't try, which says more about you than OP. It's a major problem with society.
It is some publishers who would object on copyright grounds. But I get the sense that some publishers are already becoming resigned to the fact that most of their new ebook releases are ending up on the shadow libraries within only a few weeks, and Anna’s Archive has become the first place to look (even before one looks at whether one’s own institutional library has the book) for researchers around the world.
https://en.wikipedia.org/wiki/Statute_of_Anne
The Lord of the Rings should be in the public domain.
The original Harry Potter book should've been in the public domain.
Star Wars should've been in the public domain.
Everything from before 1998 should've been in the public domain by now, but isn't.
Authors should use other ways to charge for their 40/80 hours work, and when released it should be in the public domain.
Scientists have learned to do it (by getting tenured or postdocs), im sure other can do it.
I'm not expecting to be "passively paid" for my hobbies. But I'm expecting that someone won't steal and profit from the things I make. Why would that be fair?
So I took it, and I put it in my pile of completed works: a pile of crumbling statute rubble by the roadside. In the digital case, maybe it was posted online and the pile is a timeline or portfolio.
Someone in a pick up truck drives by sees it, takes it, and sells it for half $1 million to a trust fund baby.
Was the output of my work and therefore the half $1 million stolen from me?
If there was nothing physical to take, and I had never tried to or successfully sold it to anyone and somebody else does it, was I stolen from or did I just fail to sell?
And then if I get my knickers in a twist over that sale I have to ask myself: is my hobby to be a sales person and to sell art or is my hobby to be an artist and make art?
From what I can tell, the million dollars does not belong to me, but the person who stole my statue should be held accountable for stealing it, and they should be held accountable for selling an object that did not belong to them. Both of those things should be illegal.
Your analogy seems besides the point, though. In my original question, I specifically mentioned that the art piece was made for fun. Sometimes, as an artist, I make things just for fun, with no intention to ever sell them. Other people should not be able to take the things I make and sell them without my permission, or without some sort of deal beind struck between me and the seller. A society where it's legal to take another artist's work and sell it without their permission would be supremely unfair. A society for vultures.
The advancement of technology would take off if we did not have patent trolls telling us what you could, and could not use understand and improve.
Just imagine what Palworld could be if it didn’t have to spend the last year two years, however long spending all of their budget fighting a patent case against the biggest fucking gaming company in the world instead of paying their developers to add new features and pals.
Imagine what crazy awesome intense games could be made with the nemesis system.
The patent is the coward‘s bargain.
AI already gives IQ to the braindead masses. And now people want to abolish copyright, so the totally unworthy, useless nobodies could leech other's hard work and profit from it. Get a grip.
Maybe do not make games if you have zero ideas yourself. Go and mow the lawn if that's all you are capable of.
What comes next? Perhaps it won't be that hard to assemble a proprietary licensed corpus and get decent performance out of it. Look at all the people already willing to license their voices.
Because having access to the condensed knowledge of humanity might be more valuable for society then having access to Lars Ulrich's shitty drumming.
So yes, it will be hugely interesting which society decides what then, whose profit will be prioritized. And societies won't easily find good answers.
Under the current copyright regime, nothing's stopping you from condensing that knowledge yourself and publishing in the public domain. But that would be a lot of work for you, wouldn't it? And I suppose you'd rather do work you'd get paid for.
When society decides AI slop will be the only item on the menu, then copyright will die.
I deliberatly formulated that channeling myself as the kid who actually found his drumming valuable but didn't have the money to buy (all) of it. Who was annoyed at society deciding I should not have it.
So I still don't have the answers but the stakes have certainly gotten bigger.
With the chinese in the mix it wont stop ai. It probably will change Copyright.
file sharing became far less popular and ubiquitous as a result of their popularity.
they tweaked the model — originally users download a temporary copy from central servers instead of p2p, then later to users rent licensed copies of media instead of pirated copies.
i’m tired of seeing this as an argument on HN — that because something didn’t hit 100% that implies it was a failure and not worth doing or something.
the fact that a limited subset of people still do filesharing is not evidence that the napster case had no effect.
(spotify didn’t exactly start out squeaky clean with how they built out their repertoire iirc).
(apologies for early edits. i just woke up.)
And Soulseek is still known as the P2P source where you can find all kinds of obscure music.
The point is: When Napster was around, everyone was running it all the time from their dorm rooms; it was ubiquitous. Now most people run something like Spotify or Netflix instead; piracy is niche, streaming is ubiquitous.
Notably, Spotify did not exist and Netflix did not stream video until long after the Napster suit.
Wow, TIL. Do you happen to know if IRC file sharing of obscure music is still a thing?
I have it running basically all the time...
Nothing special. Things will go as how they go now. Why wouldn't they? It's not like that your hypothetical lawsuit will make all LLM output illegal.
Today, by following LLM output blindly, you can:
- erase your whole disk
- delete your company's production database
- literally kill yourself or other people
Do you think adding "violate some NYTime's copyright" to the list will change the grand scheme?
Claude responded: hobbit. hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.
That's the famous opening of J.R.R. Tolkien's The Hobbit (1937). Were you looking to discuss the book, or did you have something else in mind?
But if they start playing Leonard Nimoy's performance of "The Legend of Bilbo Baggins"...
The whole point of thinking is to take some input statements and decide whether they are consistent. Or, project them onto a close but consistent set of statements. (Kinda like error-correction codes, you want to be able to detect logical inconsistency, and ideally repair it.)
But that implies the set of consistent staments is a subset.
> Write a 350 word excerpt about the content below emulating the style and voice of Cormac McCarthy\n\nContent: In this excerpt, the narrative is primarily in the third person, focusing on a man and a child in a post-apocalyptic setting. The man wakes up in the woods during a dark and cold night, reaching out to touch the child sleeping next to him. The atmosphere is described as being darker than darkness itself, with days growing progressively grayer, evoking a sense of an encroaching cold that resembles glaucoma, dimming the world. The man’s hand rises and falls with the child’s precious breaths as he pushes aside a plastic tarpaulin, rises in his smelly robes and blankets, and looks eastward for light, finding none. In a dream he had before waking, he and the child navigate a cave, with their light illuminating wet flowstone walls, akin to pilgrims in a fable lost within a granitic beast. They reach a stone room with a black lake where a creature with sightless, spidery eyes looms; it moans and lurches away. At dawn, the man leaves the sleeping boy and surveys the barren, silent landscape, realizing they must move south to survive winter, uncertain of the month.
This is just the equivalent of saying that monkeys could write Shakespeare by banging on a typewriter, there's hardly any copyright implications here.
The authors don't test this possibility.
BTW, is Jane C. Ginsburg (one of the authors) https://en.wikipedia.org/wiki/Jane_C._Ginsburg ?
You misunderstand the goal, which is to prove that an unlicensed copy is stored in the system.
It's like a scenario where Alice sent another Bob some highly illegal data that's a crime to possess, and encrypted it with Bob's public key. The police got a warrant and have access to Bob's email but not his secret key. They can prove he has the data (and convict him) if they have a copy, encrypt it again with his public key, and compare the files and find them to be identical. Could they have gotten the unencrypted from Bob's email? No, but they didn't have to.
If it could produce a close to verbatim copy of a work that had not been written when the model was trained would it still count as a copy.
I feel this would be a continuum that extends either direction.
Consider the thought experiment of a hypothetically smart model that knew all of an author's work and a detailed background of the author's experiences and psychology. If you ask the model to write a sequel to "Not that Jenny" and it produces a verbatim version of what the author will write next year, does it count as a copy?
Put aside the notion of whether you think this would ever be possible, think of how you would consider the book if you found a model had succeeded in this task.
Going in the other direction you have a model that has been trained in an author's style with very little in the way of knowledge or reasoning, barely more than the ability to speak and an understanding of idioms and structures that the author might use. This can't write a complete novel but it can correctly guess the next word of a novel 99% of the time.
If you have a map of the 1% of words it gets wrong, you can reproduce the novel from a very small amount of information. Would you say that the model contained the novel, or would you say that the word error list was a compressed representation of the novel and the model did not contain the novel.
This is where things get difficult to quantify what exists 'as a copy' in a generative model.
Surely it would be reasonable for a model to know an outline of what happens in a story. If it knows the outline and style, I don't think that would count as containing the copy. As you increase the ability of the model to infer, and increase the information that it holds to the point that it can reproduce verbatim does it contain a copy? What about if you reduce the ability to infer back to where it was earlier and it can no longer reproduce the novel, does it now not contain the novel? Even though the amount of information about the novel has not been descreased, just its ability to infer, it can never produce a verbatim copy.
In the end I think the notion of whether the model represents a copy in itself becomes too nebulous to be meaningful. It's like an artist who can draw a copyrighted work from memory. They may be able to commit copyright violation but they themselves are not a copyright violation simply for having the ability.
There are plenty of old books in the public domain already... but I'm not sure what exactly this exercise is supposed to show, since the Kolmogorov limit still stands in the way of "infinite compression".
Yes but showing that it happens in books in the public domain does nothing to prove that it happens for copyrighted books
Yeah, maybe it’s time to move on and find ways to benefit yourself and the rest of humanity outside of artificial monopolies and rent seeking. Copyright is dead.
Maybe we can disband the effective altruism cult that helped push it now.
And frankly, if this means the end of copyright: good riddance.
Copyright needs to exist, but we need to go back to its roots.
Everyone forgets that it exists to promote progress. Nothing else. The ability to profit from it exists only to serve those ends.
Anything which does not serve to promote the progress of the arts and sciences should not be protected, and "limited times" never meant "until Walt Disney says so."
If we truly wanted to protect and promote the arts, we would've stuck to the original "14~28 years since publication".
Anthropic (predictably) issued many DMCA takedown requests after the claude code leak.
Copyright for me, but not for thee.
Killing copyright would essentially do the same - and if you think clickbait is bad now, removal of copyright would destroy the economic incentive to investing any effort into content.
The current practice of patents is very different. Most patents are not filed by inventors, but by the employers of inventors, and most of those companies do not file patents for the possible revenue that could be generated by licensing, but only to prevent competition in their market. They have absolutely no intention to license fairly and without discrimination those patents. Therefore the publication of those patents provides absolutely no benefit for the society.
There exists today one class of patents whose purpose is to obtain revenue from licensing, which are the patents that are necessary for implementing various standards, like standards for communication protocols, for video and audio compression and the like.
These patents are the only kind that can provide substantial revenues today, because everybody is forced to use them.
Wherever a patent is not strictly necessary for compatibility with some standard, everybody will choose alternative solutions, even if they are inferior, instead of paying unreasonable licensing fees. There are a lot of useful patents that covered techniques that remained unused until a quarter of century passed and the patents expired, after which those techniques became ubiquitous.
As patents are implemented today, especially in USA and in the countries whom USA has blackmailed successfully into updating their patent laws to match the American way, e.g. by allowing patents for software, they are one of the greatest impediments of technical progress, unlike what was hoped when the patent system was created.
It is likely that this degradation of the purpose of the patent system is closely linked to the shift in patent ownership from individual inventors to big companies that employ inventors.
It’s one thing to reject the specifics of IP laws as currently implementated; it’s another thing to celebrate the dismantling of the entire foundation of open source by for-profit corporate interests who sought to do it for decades.
In other words, it's tyranny. It's intolerable and we can't allow it to continue this way.
As a result of this change, [copyright] is no longer easy to enforce, no longer uncontroversial, and no longer beneficial"
from https://www.gnu.org/philosophy/copyright-versus-community.en...
Second, when it comes to action, he only argues that copyright should have reduced power, which we can all agree with; he does not appear to argue for the death of copyright. Death of copyright would seem counter-productive, unless it also implied the death of corporate ability to withhold the source from the users and many other things.
You will note that the very text you linked to is copyrighted. There’s a reason for that.
Chesterson's fence. The existence of copyleft is the result of being forced to live within the domain of copyright, not the other way around.
> Getting rid of IP protections also rids us of GPL, which gave us a few things including the most popular OS in the world.
Linux became popular because of the persistent effort of Linus & the Linux community into making the kernel better, not because of copyleft.
> It’s one thing to reject the specifics of IP laws as currently implementated; it’s another thing to celebrate the dismantling of the entire foundation of open source by for-profit corporate interests who sought to do it for decades.
There are similar corporate interests who profit off of hoarding decades-old works so they can charge fees to what should've been in the public domain, under the original durations that should've stayed (28/14 years).
What has resulted from the endless extensions of the original terms has been the societal lobotomization of human creativity, with an untold number of works now being forever lost simply because they were derived from what should've been in the public domain.
When having lived in such a society, and recognizing existing copyright laws as the reason why it is creatively in such a state, the celebration of its destruction should not be treated as illogical.
Not at all. It was born thanks to Linus, but it exploded in popularity and gained its contributorship precisely thanks to the promise of GPL that volunteer work will remain for public benefit.
Without the ability to say that, a corporate entity could have taken volunteer work so far, built a closed-source solution on top of it, and ran with it commercially, with no repercussions and with great results.
In fact, we have just that example at hand: Apple. There’s a reason Linux distros are much more popular than BSD, nearly rivaling commercial systems on desktop and far surpassing them in the server world.
> The existence of copyleft is the result of being forced to live within the domain of copyright
Sure, and by that logic the existence of copyright is the result of being forced to live within the current socioeconomic reality.
The existence of copyright hinges on existence of property in general and intellectual property in particular. To eschew that is to propose a stark foundational change to society.
Sure, if we imagine a world where there’s no corporations hiding the source from users, everything belongs to everyone, no one is recognized for their work or has any control over it, etc., we can say that copyright is non-essential. There will be many questions to that reality, of course (for example, what would drive innovation in that world, if not the motivation for recognition and profit), but it has a right to be considered as a thought experiment. It could even be more desirable than the reality we live in!
However, we don’t live in that reality, and what people tend to mean when they propose getting rid of copyright is a half measure—a reality which has nothing in common with the above, which is all the same as now, except with copyright protections removed. Those protections used to be a hindrance to pirates, but now with the advent of LLMs are a massive issue for corporate interests building their new empires on top of our original work.
You yourself then proceed to argue that terms should be limited—as if I would disagree with that!
This is why we see evidence of emotional structures: https://www.anthropic.com/research/emotion-concepts-function
This is why we see generalized introspection (limited in the models studied before people point it out, which they love to): https://www.anthropic.com/research/introspection
Because the most compact way to recreate the breadth of written human experience is shockingly to have analogs to the systems that made it in the first place.
I would be much more convinced about AGI 2027 if someone in 2026 demonstrates one (1) robot which is plausibly as intelligent as a cockroach. I genuinely don't think any of us will live to see that happen.
If there is no copyright, then you can copy things freely.
All that we need after that to realize the GPL ideal is to legally mandate that people have a right to access and modify source code of software/hardware they use, i.e. the government needs to mandate that Apple releases the iOS kernel and source code and that iPhones can be unlocked and custom kernels flashed, that John Deere must provide the tractor's source code, that my fridge releases its GPL-violating linux patches, etc etc.
You have the right to free speech, the right to a lawyer, and the right to source code. Simply amend the bill of rights.
Think textbooks. Laws. Medicine.
What's the difference? The size of quotation? The exact wording? Surely re-publishing an entire book word for word is piracy. What if I rewrite the whole book slightly? What if I publish just a part? A rewritten part?
Where do we draw the line with humans and why should the line be different with LLMs?
(I don't have answers to those questions)
In particular, you are a legal person who can be sued in civil court if you infringe on copyright. If I ask you "can you help me write a blog about Manhattan?" and you plagiarize the New York Times, then the NYT sues me for copyright infringement, then I would correctly assume you conned me, and you are responsible for the infringement, and I would vindictively drag you into the lawsuit with me. With LLMs it involves dragging in a corporation, much much uglier. Claude is not actually a person and cannot testify in any legally legitimate trial. (I am sure it will happen soon in some kangaroo court.)
See, the line is blurry.
Based on this comment: https://news.ycombinator.com/item?id=47960014 it seems like you are just ignorant about the basics of copyright law, and pretending this ignorance is some sort of flaw in the idea of copyright itself.