James Cameron is confused about copyright and AI

artificial intelligence

law

James Cameron thinks it doesn’t matter if AI models are trained on copyrighted content. I strongly disagree.

Published

June 3, 2025

Recently James Cameron did an interview with Meta CTO and Head of Reality Labs Andrew Bosworth. In it he expressed views about copyright. I extract the relevant part here:

…I’m an artist. Anybody that’s an artist, anybody that’s a human being is a model. You’re a model already. You’ve got a three and half pound meat computer. You’re not carrying all the training data with you. You’re creating a model as you go through life to process quickly through that model every new situation that comes on. And as a screenwriter you have a kind of built-in ethical filter that says, I know my sources, I know what I liked. I know what I’m emulating, I also know that I have to move it far enough away that it’s my own independent creation. So I think the whole thing needs to be managed from a legal perspective as, what’s the output? Not what’s the input. The input, you can’t control my input. You can’t tell me what to view and what to see and where to go. My input is whatever I choose it to be, and whatever is accumulated throughout my life. My output, the script I write should be judged on whether it’s too close to plagiaristic, whatever.

James Cameron on AI. Source interview

I think James Cameron is confused about copyright and AI. Let’s be absolutely clear: this is a legal issue. The purpose of copyright law is to ensure that creative people are compensated for their work. Using books and artworks you haven’t paid for to train AI models is not legal. That’s exactly what companies like OpenAI and Meta have been doing—they have been downloading illegally copied versions of all kinds of content—including content that is normally only available to paying subscribers—and using it to train their AI models.

The principle of defending artists and their ability to create original work is a really important one. If we get rid of that, we’re essentially saying that creativity doesn’t matter. We’re heading toward the bottom of the bucket, where only the most profit-driven, legally protected content survives.

There have been major changes to copyright law over the years. For example, libraries pay more for the books they buy, because it’s recognized that multiple people will read them. It’s not impossible (although the AI companies will tell you it is) to develop a system where you can estimate the percentage of your copyrighted work that has contributed to a new AI-generated work. It’s difficult, but not impossible. AI companies just don’t want to do it. But it’s doable. We face a really important moment. Artists and creatives need to wake up and insist that the law is applied to companies like OpenAI, which are copying books and artworks that are still under copyright and not paying anything for them. That would be an expense for the AI companies—and one they don’t want to bear.

Some people seem to think that any added expense is automatically bad. But would you eat in a restaurant that didn’t follow hygiene laws? Hygiene laws are expensive for restaurants to follow, but good restaurants are happy to comply because it’s the right thing to do.

If you’re running an AI company, and your entire business is built on the work of other people, you should want to see those people compensated. If we don’t do that, we’re heading for a bleak future where creative people stop creating. They’ll stop making great artworks, stop composing, stop writing novels—because there won’t be any point.

People also don’t realize the extent to which these models actually contain copyrighted content. If something appears multiple times in the training data, it’s very likely that the model will be able to reproduce that content almost exactly. The pre-training model contains that material in near-verbatim form. The only thing stopping it from being output directly is the layer of reinforcement training that sits between the user and the raw model, and a “system prompt” that literally tells the model not to output the copyrighted material it contains.

That layer will block obvious copyright violations—like generating a picture of Spider-Man or Mario, or the lyrics of a famous song. But it can be circumvented. You can describe the style of a particular artist without naming them, and if the model was trained on their work, it will reproduce something that looks like theirs.

And if you go deeper—if you access the raw model directly, as you can with some open-source models—and you bypass the reinforcement layer, then yes, you can just ask for a picture of Spider-Man, and it will generate one.

There is no doubting that James Cameron is a creative person. There is a lot to admire about him. As a young man he left a truck driving job and taught himself how to do special effects, and from that worked up to being a film director. But in his old age he’s an extremely rare and unusual type of creative person — he is an extremely wealthy, influential and successful creative in a highly profitable industry. He also benefits from armies of lawyers that work for the companies that distribute his films. This has, perhaps, given him a somewhat distorted view of this issue. I doubt the younger version of himself would agree that AI companies abusing copyright doesn’t matter.