In 2024, Thomson Reuters won the first major AI copyright case in US history.
The defendant, Ross Intelligence, had copied thousands of pages from Reuters'
legal research database to train an AI product that would compete with Reuters. The court's ruling was unambiguous: this was not fair use. It was theft. The AI industry's response: keep doing it anyway, just with bigger lawyers. The Scale of What Was Taken Before we talk about "innovation," let's talk about what actually happened. Books3 Dataset: 183,000 copyrighted books, scraped from shadow libraries.
Works by Stephen King, Margaret Atwood, Zadie Smith. Living authors who never
consented. This dataset was used to train models worth hundreds of billions of
dollars. LAION-5B: 5.85 billion image-text pairs scraped from the internet. Getty
Images watermarks visible in the training data. Personal photographs.
Professional photography. Art that took years to create, consumed in
milliseconds. The Stack: 6TB of source code from GitHub repositories, regardless of
license. GPL-licensed code, which requires derivative works to be open source,
fed into proprietary models. Proprietary code that was accidentally public,
swallowed whole. Common Crawl: An estimated 400 billion web pages. Blog posts, news
articles, forum discussions, personal websites. Everything you ever published
online, scraped and consumed without your knowledge. They didn't ask. They took. The "Fair Use" Shield When caught, every AI company reaches for the same defense: fair use. Their argument: "Training AI is like a student reading books to learn. It's
transformative. The output is different from the input." Here's why that argument falls apart: AI Models Memorize, They Don't "Learn" Research has repeatedly shown that large language models can reproduce exact
passages from their training data. A student who recited entire chapters
verbatim from memory would be accused of plagiarism, not "learning." The Output Competes With the Input When an AI generates an image in an artist's style, that output competes with
the artist's work in the marketplace. Fair use traditionally requires that the
new work doesn't serve as a market substitute for the original. The Scale Changes Everything A human reading one book is learning. A machine consuming 183,000 books and
using them to generate competing content is industrial reproduction. The scale
transforms the nature of the act. Thomson Reuters Disagrees The first court to rule on this in a full trial sided with the copyright
holder. Thomson Reuters v. Ross Intelligence established that using proprietary
data to train a competing AI product is not fair use. The ruling is working
its way through appeals, but the precedent is set. The Lawsuits Piling Up As of early 2026, more than 25 active lawsuits challenge AI companies'
training practices: NYT v. OpenAI -- The Times alleges millions of articles were used without consent. The case centers on "regurgitation" -- the model's ability to reproduce near-exact copies of Times journalism.
Getty Images v. Stability AI -- Stock photos used to train image generators, complete with watermarks in the output.
Authors Guild v. OpenAI -- Living authors allege their copyrighted books were scraped from shadow libraries.
GitHub Copilot Class Action -- Developers allege GPL-licensed code was used in violation of its open-source license terms.
Encyclopedia Britannica v. OpenAI -- The latest in the wave, filed March 2026.
Disney v. Midjourney -- "The Mouse bites back." Characters and imagery under trademark and copyright. These cases will define whether AI companies have to ask permission before they
take your work. The outcomes are uncertain. The stakes are existential for
creators. The Consent Problem The core issue is not complicated: AI companies built trillion-dollar products using creative works they never
got permission to use. The "opt-out" response is a joke. Creators are expected to: Discover that their work was used (impossible without transparency)
Find the correct opt-out mechanism for each company
Submit individual requests
Trust that companies actually comply
Accept zero compensation for past use This is not consent. This is coercion with extra steps. The power imbalance is staggering. Individual artists, writers, and developers
versus trillion-dollar tech companies with armies of lawyers. No collective
bargaining power. No transparency about what was used. Legal recourse requires
expensive litigation that most creators cannot afford. What "Innovation" Actually Looks Like Contrast the scraping approach with companies that built AI ethically: Adobe Firefly trained exclusively on licensed and public-domain imagery. It works fine. The images are good. The company didn't need to steal anyone's work to build a competitive product. Models with transparent training data exist. They publish what they trained on. Creators can verify and opt out before training, not after. The difference between theft and innovation is consent. Building a competitive AI product without stealing anyone's work is possible.
Several companies have done it. The reason most didn't is simple: stealing was
cheaper and faster. "Move fast and break things" met "ask forgiveness, not
permission," and the result was the largest unauthorized copying of creative
works in human history. The X/Twitter Terms of Service Precedent In January 2026, X (formerly Twitter) updated its Terms of Service to
explicitly grant itself the right to use all user content for AI training with
no opt-out and no compensation. Users grant a "worldwide, royalty-free,
sublicensable license" for "any purpose," including training Grok. The FTC has warned that retroactively changing terms of service to expand AI
training rights may constitute unfair or deceptive practices. But enforcement
lags behind deployment. No one was consulted if you wanted your tweets training their AI. They changed
the terms and called it consent. What Actually Helps For Creators Check if your work was used: haveibeentrained.com searches LAION datasets
Block AI crawlers: Add robots.txt directives for GPTBot, ChatGPT-User, Google-Extended, and other known scrapers
Register copyrights: Required for statutory damages in US courts
Join class actions: Multiple ongoing suits need plaintiffs
Use protective tools: Glaze and Nightshade add adversarial perturbations that disrupt AI training For Everyone Demand transparency: AI companies must disclose training datasets
Support opt-in legislation: Consent before training, not after
Back creator compensation models: Revenue sharing for training data
Choose ethical AI products: Support companies that license their data Where This Ends The AI industry's greatest innovation was not a model architecture or a
training technique. It was the legal fiction that scraping the entire internet
without permission constitutes "fair use." Thomson Reuters won the first case. The NYT's case is heading toward a ruling
that could reshape the industry. Twenty-five lawsuits and counting are testing
whether "innovation" includes stealing other people's work. Stealing isn't innovation. It's stealing. And the fact that billions of dollars
and the best legal talent in the world are being deployed to argue otherwise
tells you everything you need to know about the AI industry's relationship with
consent. They didn't ask before they took your work. They built empires on it. And
now they're fighting in court to ensure they never have to ask. --- Related: AI Training Data Theft
Thomson Reuters v. Ross: AI Theft on Trial
NYT v. Perplexity: Journalism Theft
Suno/Udio: Silence of the Jams Sources Thomson Reuters wins first major AI copyright case -- Reuters, 2024
The New York Times sues OpenAI and Microsoft -- NYT, December 2023
Encyclopedia Britannica sues OpenAI over AI training -- Reuters, March 2026
FTC: Quietly changing terms of service could be unfair or deceptive -- FTC, February 2024
X Terms of Service grant AI training rights with no opt-out -- CryptoSlate, January 2026