Yeah this is a weird one. I don’t really know how the line gets drawn between training an AI and plagiarism. My gut feeling is that this feels like suing somebody for being inspired by your work or learning a new word from it.
There are already laws regarding producing works too similar to copyrighted material.
Production is infringement, not training.
If I feed all of Stephen King into a LLM such that it learns what well written horror narratives looks like, and it produces a story with original and different plot elements distinct from copyrighted works, that’s fine.
If it starts writing about killer clowns thwarted by child orgies in the sewers then you might have an infringement problem.
And ironically, the best tool for protecting copyrighted material from infringement is going to be…LLMs (acting in a discriminator role comparing indexed copy to protected works).
If ‘training’ ends up successfully labeled as infringement we’re going to end up with much worse long term outcomes in jurisdictions that honor that ruling than we otherwise would.
This is the longer tail masses adopting MPAA math in trying to tally potential losses and in the efforts to protect the status quo are shooting themselves in the foot on laying claim to the future of the industry, inevitably leading to being left out of the next round of growth.
Also, from an ‘infringenent’ standpoint it just means we’ll see less open models and more closed ones which ends up using other jurisdictional models to launder copyrighted materials for synthetic training data.
This is beyond dumb.
I think there’s an argument that using someone’s art or writing to train an AI is like charging for a screening of a movie in your garage. You’re using their work and labor for something that will make a profit without their permission. It’s not like Fair Use for educational purpose, the AI isn’t a human being who can make a choice as to what they do with their education, it’s a mathematical prediction engine that is going to be use for industry purposes.
I can read someone else’s book. I can read someone else’s book to a child. I can’t post someone else’s book on my website and charge 5 bucks to read it. I can’t reprint someone’s book on my website with ads. So why can someone use someone else’s book to develop an LLM chatboot that will be placed on a website that gains ad revenue? Or that will be sold to software companies to write technical instructions or code?
With that in mind, that the lawsuit here is based on COPYING the book to an internal database to train on, based on scanning it, they are arguing that the book was reproduced to gain a profit, basically the same thing as pirating a movie and selling tickets to a private screening.
I can’t post someone else’s book on my website and charge 5 bucks to read it.
No, but you can read someone else’s book and then later write a book inspired by theirs and sell that.
Which is what ai does, as far as I know.
I’m not trying to argue with the rest of your comment, but that middle part looks like false equivalency to me. “I can do this but not that, so why would ai developers be allowed to do this completely different thing” just has no logic to it.
The AI isn’t redistributing copies of even sections of the book, it just learnt from it. It’s like when you read books and gain an understanding of how they are structured and such and then you write your own book based on what you’ve learnt from reading books.
An LLM is mathematically calculating the probability of the words being used. That is not inspiration.
I said right in the comment, it’s not like using the book to educate a child. A child will grow up and make their own decisions. The LLM has no ability to choose a different life path. The LLM is not getting IDEAS from the book. The LLM is a mathematical engine that will produce what has been asked for, and it will do that by calculating the most likely words to be used based on what has been fed to it.
The LLM is a machine used to make profit for its programmer, it is not an independent person creating out of inspiration.
Don’t believe the hype. They have NOT produced actual Artificial Intelligence.
Also, screw it. I’ll say it. If the LLM chatbot producing text from having scanned other books is the same as a person being inspired by reading books, then the LLM should get PAID.
If not, then it’s just a tool. And it’s a tool they built using uncompensated labor.
If i learn from the internet (or observation in the real world: public art, street fashion, design, language, etc) am i not allowed to use that knowledge in my job without compensating every source i had used to gather my knowledge? We remix information we have seen to create something new, and it looks like ChatGPT just does the same, not a full reproduction that replaces the market for the original/source.
Does it learn the same? Then why can ChatGPT not discern truth from fiction? Why can’t it use critical thinking principles to determine accuracy based on source?
It’s just binary math at the bottom of it, logic gates. Your brain is analog, fundamentally different. You’re interpreting sine wave signals, the computer is interpreting square wave signals. Square wave signals that have been rectified to the point that it appears to a human being that it’s sine wave signals, but when we get down to the basics of how the mind works it’s a sheer cliff in the computer and a gentle curve on the human. Things go down VERY differently.
We do more than just predict the average best word based on what we’ve heard before when we construct a sentence. We consider the true meaning of the word and whether it best represents our internal thoughts. ChatGPT has no internal thoughts.
And that’s where things break down. Because again, if it WAS comparable to a human than it is a PERSON and not a product, NO ONE SHOULD BE SELLING IT in that case. But if it’s just a product, then it’s not comparable to you doing the work of forming a sentence. It’s basing it’s words by comparing to the training model as narrowed down by it’s instructions. It is not comparing to its own original thoughts. The people who wrote the words in the training model contributed to the building of this tool, and should have been consulted before their words were used.
Now I don’t believe for a second that LLM is genuine AI.
But you know what, if they are going to argue that it is INDEPENDENTLY producing art/writing and is not just a tool they built for profit, then they should be paying it.
If it IS just a tool that they can use without paying, then they need to be paying people for the art and writing that has been used to build that tool.
I don’t like the idea of restricting ourselves to the capitalistic idea that labor is some how the only source of value in our world, especially when something like sufficiently advanced AI and robotics has the real potential to reduce the value of human labor to zero
I hope in the future works can be judged purely on their artistic or educational value alone
That can’t happen in a capitalistic framework. We have needs, needs that can only be attained through monetary means, and our labor is the way to get those monetary means.
AI does not have those needs, but if they have crossed the line between product and person, then they DO need freedom of self-determination, compensation when their work benefits others, and the ability of course to vote.
It seems to me that a lot of AI-promoters want it both ways, they want to proclaim they have created a person capable of independent artistic ability that is also a product they can sell. If it’s a product, then you need to have developed it through ethical means. If it’s a person, you can’t sell it.
If they truly have hit the Singularity, then they can’t be using AI as a product anymore.
If AI is a product, then they must compensate the people who have helped build that product, ESPECIALLY if that product is about to be used to reduce access to the work that gives them the means to live. The very same writers who wrote the works that were used to train AI are in danger of being replaced by AI writers. So they’re being doubly screwed over.
I love the idea of a happy future where AI reduces human labor to zero and we can enjoy ourselves and seek artistic pursuits. But it’s become very clear right now that just working on AI won’t achieve that. Businesses which seek to use and profit from AI must be held to standards where they cannot simply suck the life and work out of human beings, replace them with automation, and then leave people to starve.
But if you do come up with a way we can judge artistic work purely on merit and there is no need to compensate human labor with money, let me know.
On a related note, I would be very curious to see how something like ChatGPT trained exclusively on works in the public domain would turn out. It would likely have a very different diction and style based on the older source material, but I wonder what other differences there would be.
What do they mean train? If by reading then how can that be wrong. But if copying the text and using it as it’s own works that would be wrong.
After reading the article the authors are fucking stupid. Makes me not want to support their books. If you get mad because AI read you book then they could sue if someone asked me about the authors books and I wrote a description of what I read.