TLDR version: AI language models have been training on content sources including news media, twitter posts, reddit threads, and photo and music databases. Content owners are now taking steps to protect their investments and seek compensation from AI platforms. The experience of the news bargaining code is relevant here, and it is likely that similar regulatory arrangements will emerge for AI if commercial solutions can’t be found.
Copyright holders who were burned by the experience of dealing with search and social media companies have jumped on AI companies that have been using content resources to train generative AI systems. The argue (rightly in our view) that the use of copyright material for this purpose should be compensated.
Last week, Elon Musk threatened Microsoft for using Twitter data to train its text AI platform. No-one knows whether he’ll proceed to legal action, but behind Musk’s remarks is a history of tension. Musk was a board member of OpenAI until 2018, and has criticised the company’s move from a non-profit to a more proprietary approach influenced by Microsoft.
But other copyright holders are taking steps.
Getty Images has filed a lawsuit against Stability AI, the creators of Stable Diffusion, an open-source AI art generator. The lawsuit alleges that Stability AI copied over 12 million images from Getty Images’ database without authorization or compensation, violating both copyright and trademark laws.
Others prefer a more commercial approach. Reddit announced this week that it will start charging for access to the Application Programming Interface (API) that gives access to Reddit’s database of conversations. “We don’t need to give all of that value to some of the largest companies in the world for free”, said Steve Huffman, founder and CEO of Reddit, in an interview.
Also this week, James Murtagh-Hopkins, senior vice president of communications at Universal Music Group (UMG), said “the training of generative AI using our artists’ music… represents both a breach of our agreements and a violation of copyright law… platforms have a fundamental legal and ethical responsibility to prevent the use of their services in ways that harm artists. We’re encouraged by the engagement of our platform partners on these issues, as they recognize they need to be part of the solution”. The context was the emergence of a AI-generated song based on the rapper Drake’s catalogue.
Commercial arrangements have also been struck. Shutterstock has done a deal with OpenAI to provide image data for DALL-E, a generative AI program that creates graphics from text-based prompts.
While charging for access to APIs is one way to achieve reasonable compensation, this doesn’t address scraping of web data (which is easy to do). But UMG’s remarks hint that the AI platforms have learnt something from the battle over content re-use by search and social media, and are trying to address the issue before the courts and regulators get involved.
AI technology will be important in ways we can’t yet imagine. All the more reason to get the commercial settings right at the beginning, so that problems are not allowed to fester and escalate to the political level, as happened with news content.
In this case the range of content involved includes news, but takes in a lot of other commercial and user-generated content. A commercial approach by the AI platforms has a lot to recommend it, as a regulated approach will struggle to address the complexity of the emerging AI market and its use of copyright material. A commercial arrangement will also give investors in both copyright material and AI technology useful certainty.
If the AI industry fails to grasp the opportunity, the prospect of a multi-year battle driven by litigation and regulation looms. The news content debate is a handy precedent, and in our view the copyright holders will ultimately win.