The Training Wheels are Off: The Copyright Implications of Training Generative AI

With the introduction of several readily available applications, artificial intelligence (AI) has leaped into the mainstream and brought with it a host of legal questions.

Following the release in November of the now popular generative AI platform ChatGPT by OpenAI, companies including Microsoft and Google are rushing to release their own generative AI services or integrate them into their existing offerings. With a growth in attention-grabbing AI antics, like actor Ryan Reynolds using ChatGPT to craft, in his words, a “mildly terrifying” ad for Mint Mobile, companies are increasingly contemplating how AI will change the nature of work.

AI’s myriad of applications all depend on the strength and quality of the algorithm, which relies on training data. Several recent, high-profile lawsuits raise the issue of whether such training algorithms violate copyright law’s restrictions on creating derivative works without the creators’ consent.

What is Generative AI?

Falling within the broad category of AI and machine learning, generative artificial intelligence (GAI) refers to algorithms, such as ChatGPT, DALL-E and Stable Diffusion, that can interpret text prompts to generate new content. The content can include images, text and software code based on data on which the algorithm was “trained.” These models work by interpreting an input — for example, text prompts such as: “write a 500-word blog post about AI” or “design an image of a cat in a hat in the style of Picasso” — and generating a new output based on the training data. This singular output is one of the contrasts between GAI and typical searches on Google or Bing, which point to various links to different possible answers rather than generating a singular output based on the merging of different inputs.

To create and refine these outputs, GAI must be “trained” on massive data sets, such as images, sentences or sounds. That “data” typically includes other creators’ copyrighted material.

What is a Derivative Work?

Training data implicates, among other things, the Copyright Act’s limitations on creating derivative works, one of the exclusive rights that the Act grants to copyright owners. Title 17, section 101 of the Act defines a “derivative work” as any work “based upon one or more preexisting works.” Common examples of derivative works, according to the Copyright Office Circular 14, include “translations, musical arrangements, motion picture versions of literary material or plays, art reproductions, abridgments, and condensations of preexisting works.” To prepare a derivative work without being considered an infringing work, the new creator would need authorization from the original copyright holder, and copyright protection would extend only to new elements or changes (or the new creator would need to establish that their work is protected under the fair use doctrine).

GAI algorithms would appear, by their very nature, to run the risk of violating copyright holders’ exclusive rights: existing works are fed to the algorithm, which then generates a new work based on those preexisting works. If the algorithm acquires these works without obtaining the rights holder’s consent, and the use does not fall within the fair use protections of the Copyright Act, then it could potentially be found to have infringed on those holders’ copyright rights.

Two Recent GAI Cases

Two recent cases highlight potential copyright issues implicated in the use of GAI tools.

Andersen et al. v. Stability AI Ltd. et al.

The first, Andersen et al. v. Stability AI Ltd. et al., concerns three artists who filed a proposed class action in California federal court against Stability AI, Midjourney and DeviantArt, all of which released GAI art tools based on Stable Diffusion, an image-generation program created by Stability AI. The artists claim that Stable Diffusion was trained on a dataset of hundreds of millions of copyrighted images and their captions, which were copied and scraped from web pages like Getty Images, Shutterstock and Adobe Stock, and other sources without image owners’ or website operators’ consent.

The plaintiffs argue that Stable Diffusion amounts to nothing more than a “21st century collage tool.” The artists claim that any resulting image that draws upon copyrighted material is an infringing work and that the derived images compete in the marketplace with the original images, as they allow a user to create derivative works “in the style” of a particular artist without compensating the artist. The plaintiffs note that works generated by Stable Diffusion in the style of certain artists can be found for sale online.

Although Stability AI publicly denied any infringement of artists’ work, it announced on Twitter that it will allow artists to opt-out of the training data set for Stable Diffusion’s next release.

Getty Images (U.S.), Inc. v. Stability AI, Inc.

In Getty Images (U.S.), Inc. v. Stability AI, Inc., photograph image bank Getty Images also sued Stability AI. The lawsuit, filed in Delaware federal court, accuses Stability AI of scraping at least 12 million copyrighted images — along with their associated text and metadata — from Getty Images’ websites to train the Stable Diffusion model. Getty asserts that the Getty Images website terms and conditions expressly prohibit downloading or re-transmitting website contents without a license and using data mining or similar data-gathering methods.

Getty also claims that Stable Diffusion frequently produces images that are highly similar to and derivative of Getty’s proprietary content, and at times even regenerates specific images that were used to train the GAI model.

Despite Stability AI’s attempts to remove the Getty Images watermark from proprietary images, Getty states, some of the output images generated by Stable Diffusion still contain distorted versions of the Getty Images watermark.

The watermark, which is usually incomplete but reminiscent of the original mark, not only infringes on Getty’s copyright, but also falsely implies an association between Stable Diffusion and Getty Images, raising concerns of trademark infringement as well.
Moreover, Getty argues, because Stable Diffusion often produces images that are “bizarre” or “grotesque,” the incorporation of the Getty Images mark tarnishes Getty Images’ reputation, giving rise to a claim for trademark dilution.

Stability AI, Getty claims, is aware that its program produces images including the Getty watermark, but has done nothing to prevent the issue from recurring.

The Bottom Line

Two pending lawsuits are among the recently filed cases that will test how copyright law will be applied to AI.
Given the novelty and rapidly evolving nature of GAI, individual creators and companies using GAI need to tread carefully to avoid claims of infringement and to protect their IP.
As this is an emerging issue, stay tuned for more Davis+Gilbert alerts on the use of GAI and best practices recommendations.