This article was originally published on the Law360 website.
The world is taking cover as artificial intelligence-generated songs and images rain from the sky. The art and media world and their lawyers are divided in their reaction, and legislators are starting to shed light on emerging legal gray areas.
The trigger is the emergence of generative AI, created by encoding intelligence based on vast quantities of a medium, e.g., text, audio or images.
These systems are capable of generating new examples of the same medium based on instructions referred to as "prompts."
In February, in Getty Images (US) Inc. v. Stability AI Inc., Getty Images filed a lawsuit in the U.S. District Court for the District of Delaware against Stability AI, accusing it of "brazen infringement" in "unlawfully" scraping 12 million images without a license to compile "training data" to train its generative AI.
The litigation has recently extended across the pond: On May 12, Getty filed a second lawsuit in the case[1] to London's High Court of Justice of England and Wales to prevent sales of Stable Diffusion in the U.K.
Outside the Getty-Stability AI case, there are plenty of other headlines relating to generative AI showing the size of the debate:
Pop singer Grimes[2] launched a platform on April 24 to give access to her vocals, saying "copyright sucks"[3];
It has been reported that Paul McCartney is lifting John Lennon's voice from a demo to turn it into a "final" Beatles song on June 13[4]; and
Not all are keen on hearing themselves sing words that never came out of their mouths, with an AI track simulating Drake and The Weeknd getting pulled from Spotify Technology SA on April 18.[5]This article explores some of the legal ambiguities surrounding the Getty-Stability case, along with some other related questions in the space:
Does the author of a generative AI own the copyright to their own model?
When is the output of a generative AI eligible for copyright protection?
Can an AI own copyright?
Who does own content generated by a generative AI?
The root of these stems from how respective interested parties have a potential claim on the rights pertaining to inputs and outputs from a generative AI:
The copyright holders of the training data: the images, articles, blogs, stories, comments etc.
The author of the generative AI: the legal entity that wrote and ran the code to create it.
The user of the generative AI: the person entering the prompt into the image generator.
Does the author of a generative AI own the copyright to their own model?
Stability AI trained its model on a dataset put together by Large-Scale Artificial Intelligence Open Network,[6] a German AI nonprofit that claims[7] that it is a "research dataset" and does not accept liability for any possible copyright infringements.
Datasets like this are often produced through text and data mining.[8] That is, the computational process of gathering various assets in large quantities from public-facing sources.
Of course not all the data is public domain, some is legitimately licensed.
In the Getty-Stability AI case, images, tags and descriptions were sourced from across the internet.
There is still significant debate about whether authors of generative AIs should require permission from copyright holders of training data to utilize the generative AI, although first principles apply: Using without permission, literally copying, is likely to be copyright infringement.
Currently, in the U.K. the exception for text and data mining is only for noncommercial uses, so permission is required to avoid infringement.
A poetry anthology is a useful analogy for how to think about how such protections might be allocated.
The poets own the copyright to individual poems, and anthology publishers are required to seek permission to use them.
An anthology publisher is typically recognized as also owning rights — either a copyright in the compilation or database rights, if the underlying elements themselves are not sufficiently creative to attract copyright themselves — in addition to the rights of the poets in the individual poems.
Let's see how a generative AI model is created, and compare it with putting together a poetry anthology:
Prepare the untrained mode, a piece of software, comparable to the word processor to write the anthology on;
Collect the "training data," the individual entries, comparable to sourcing the individually copyright-protected poems from various authors;
Compile them to produce a compilation comparable to the anthology draft; and
Produce a product that is of value to users, the trained model or generative AI, which contains the compilation, comparable to the book that is put on the market.With this framing, it might seem obvious that a model would be eligible for copyright protections.Indeed, in the U.K. copyright subsists in computer software treated as a literary work.
A legal complication highlighted by the anthology analogy is the need for the poets' permission to publish their poems.
This step was omitted by Stability AI, and as such the risk arose from use of copyright material without the owner's permission — and the arguments included whether this use could be seen as fair dealing for criticism and review.
Last year, the U.K. government proposed[9] changes to copyright law to facilitate text and data mining for training AI systems, in line with its ambition[10] to make the U.K. "a global centre for AI innovation."
Copyright holders would no longer be able to charge for licenses. Facing backlash, including from the music industry, the government u-turned[11].
Across the English Channel, the European Union is considering granting limited recognition to the original owners of training data,[12] requiring "detailed summaries of the copyrighted data used for their training."
Many suspect this may be a stepping stone toward a compensation mechanism for the original copyright owners. This is likely to embolden organizations like Getty to challenge authors of generative AI to compensate them for using their works.
The outcome of the Getty-Stability AI case will be pivotal to establishing what will be considered standard working practice on this matter.
When is the output of a generative AI eligible for copyright protection?
Stability CEO Emad Mostaque believes[13] that the way a generative AI transforms the training material protects it from claims of infringement.
Data scientists prepare training data and prepare them in what they call "preprocessing," typically removing blank spaces, fixing formatting, or removing common stop words like "the."
The exact stage of the process of creating a generative AI that is considered transformative is when the preprocessed training data is fed into an "untrained model" and becomes a sequence of 1's and 0's, which encodes the intelligence gained from the training data.
This sequence of 1's and 0's gets thrown into a large bucket of other sequences like a bowl of spaghetti to the point that it is a mystery to the data scientist what 0's and 1's represent what text.
Upon closer analysis into individual use cases, it becomes ambiguous as to whether the data scientist has created something new, especially when taking into account that generative AI can be prompted to produce derivative replicas of training data.
And even if not an intellectual property issue, don't forget that use may be in breach of contract terms and conditions concerning use of data.
A popular use case of image generators like Stability AI is "generate an image of [something] in the style of [someone]. Set 'someone' to 'Lucian Freud'," and all of a sudden we have a unique image generated by a machine trained on the IP of Freud.[14]
It could be argued that this output is original and subject to its own copyright protection than, say, an output from the prompt "generate an image of the most famous Lucian Freud painting." However, there is no widespread agreement on this point.
When painting a pond, it is common practice to look at an image of a pond. Nonetheless, there is a distinction between using something for information, as a reference, and making a derivative, copying it.
Where on this scale do generative AI outputs fall?
Outputs are not generally reproductions of specific images, but an amalgamation of elements from a very large dataset of images, including[15] anything from paintings and photographs to 3D models and game assets.
The polemic history of photography is a case in point for drawing parallels with AI- generated art — both the camera and a generative AI supercomputer are machines, after all.
For much of the 19th century, photography was an art world outcast,[16] viewed as an unfeeling mechanism[17] for mere replication, with fear that it would supersede painting. Even in the 21st century, Roger Scruton argued[18] that photographs are not artworks.
Sound familiar?
The Oscar Wilde fans among you might be familiar with the 1884 U.S. Supreme Court case of Burrow-Giles Lithographic Company v. Sarony.[19]
In 1882, a company made 85,000 unauthorized prints of Wilde, arguing that photographs are merely mechanical products and not works of authorship.
The U.S. Supreme Court disagreed, finding that photographs are "writings," and it is now well established that copyright will arise in a photograph.
Many people are already using the output of such tools to write articles and books among other things, so a court decision otherwise would cause potentially undue disruption and this needs to be balanced against protecting and enforcing the rights of rightsowners. A key aspect is to check the terms of use of your particular generative AI.
Further gray areas remain. Two people could input the same prompt to an image generator, and get identical outputs — who owns the copyright? Again, has there been copying?
Just like Ed Sheeran[20] being found not guilty of copying a combination of common musical elements used by another musician, it could be the case that only sufficiently unique outputs are eligible for copyright, or that you only own the IP of your specific instance of output. Many generative AI providers reflect precisely this interpretation in their terms of use[21].
To summarize, care must be taken to understand the rights to use training data.
Can an AI own copyright?
Assuming that the output of a generative AI is eligible for copyright protection, who actually owns this?
One answer likely to stay in the realm of fantasy is "the AI owns it" — or is it?
The U.K. is one of few copyright regimes that has recognized "computer generated works" for many years, and ascribed ownership to the person who has made the arrangements necessary for the creation of the work to be undertaken.
Ten years ago, a monkey taking a selfie might have spelled the future[22] of AI copyright and the rights of machines over their "creative" output.
The monkey selfie dispute was a series of cases ascertaining who owned the rights to this selfie.[23]
People for the Ethical Treatment of Animals filed a case to the U.S. District Court for the Northern District of California in 2015, arguing that the monkey should be entitled to the copyright.
The court ruled that neither the animal nor the camera itself can possess copyright, but either the human most closely associated, or no one at all. The trial prompted the U.S. Copyright Office to release a statement on June 28[24]:
Similarly, the Office will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author.
In conclusion: Regarding whether or not an AI can own copyright, the answer varies according to the applicable legal regime.
Who does own content generated by a Generative AI?
Like poems and art, software is similarly protected as a form of literary work[25] in the U.S. and U.K. As such, the software creator is the "author" of the software.
Applying this concept, we might expect the author of the generative AI, a software solution, to own the output of the software.
A particular scenario where a contractual agreement on this matter should be reached is when working with contractors to build solutions that run in an autonomous fashion with no manual human intervention. This scenario typically arises when utilizing fixed or automatically generated prompts within a solution.
The situation is less clear when human intervention is still involved.
Consider this musical analogy: If I programmed a digital piano, and you played it, would I own your song? Of course not.
Conclusion
The Getty-Stability case — whatever the outcomes — will set a major precedent for standard legal and business practice in the U.K., U.S. and beyond.
After all, during the monkey selfie case previously discussed, a judge said, "This is an issue for Congress and the president."[26]
Until a verdict is reached, and ensuing legislation, we can only speculate whether governments in general will grant any right to compensation to original copyright holders after their IP has been used as training data. First, principles will continue to apply, i.e., the first test of copyright infringement, "Did you copy?" but will turn on specific facts.
The brunt of the disruption will be felt by players in the foundational model supplier market, e.g., Stability, Midjourney, OpenAI, but it is less likely to affect downstream users relying on such services.
This is similar to the impact General Data Protection Regulation had on the personal data- related service industry.
While a few generative AI players are curating datasets that avoid copyright materials, these datasets are comparatively much smaller, so are prone to the usual problems of small sample sizes, e.g., bias.
At present, the generative AI big players assign copyright of output, e.g., images generated by Stability AI, to users in their terms of use, and it is likely that this will continue. Otherwise, the commercial value of these products will be severely limited.
For now, art IP can constitute a reference point for anticipating how lawmakers will react to the generative AI revolution the world is witnessing.
The opinions expressed are those of the author(s) and do not necessarily reflect the views of their employer, its clients, or Portfolio Media Inc., or any of its or their respective affiliates. This article is for general information purposes and is not intended to be and should not be taken as legal advice.
References
[1] https://www.reuters.com/technology/getty-asks-london-court-stop-uk-sales-stability-ai- system-2023-06-01/.
[2] https://www.bbc.com/news/entertainment-arts-65385382.
[3] https://createsafe.notion.site/Elf-Tech-GrimesAI-1-Voiceprint- 610d81fcf2844419afafab493c6ec4b4.
[4] https://www.bbc.com/news/entertainment-arts-65881813.
[5] https://www.bbc.co.uk/news/entertainment-arts-65309313.
[6] https://sifted.eu/articles/ai-supercomputer-petition-stable-diffusion.
[7] https://sifted.eu/articles/stability-getty-lawsuit.
[8] https://www.copyrightuser.org/understand/exceptions/text-data-mining/.
[9] https://www.allenovery.com/en-gb/global/blogs/digital-hub/proposed-changes-to- copyright-law-to-facilitate-data-mining.
[10] https://www.gov.uk/government/publications/national-ai-strategy/national-ai- strategy-html-version.
[11] https://www.gov.uk/government/publications/national-ai-strategy/national-ai- strategy-html-version.
[12] https://www.europarl.europa.eu/news/en/press-room/20230609IPR96212/meps- ready-to-negotiate-first-ever-rules-for-safe-and-transparent-ai.
[13] https://sifted.eu/articles/stability-getty-lawsuit.
[14] https://www.bridgemanimages.com/en-US/lucian-freud- copyright/9334#:~:text=Lucian%20Freud%20is%20considered%20one,his%20copyright% 20exclusively%20and%20worldwide.
[15] https://www.unite.ai/beginners-guide-to-ai-image- generators/#:~:text=In%20training%2C%20neural%20networks%20identify,color%2C%2 0texture%2C%20and%20shape.
[16] https://daily.jstor.org/when-photography-was-not-art/.
[17] https://www.jstor.org/stable/25505621?mag=when-photography-was-not-art.
[18] https://philpapers.org/rec/SCRPAR-2.
[19] https://supreme.justia.com/cases/federal/us/111/53/.
[20] https://www.theguardian.com/music/2023/may/04/ed-sheeran-verdict-not-liable- copyright-lawsuit-marvin-gaye.
[21] https://openai.com/policies/terms-of-use.
[22] https://thenextweb.com/news/monkeys-selfie-determine-future-ai-copyright.
[23] https://www.wipo.int/wipo_magazine/en/2018/01/article_0007.html.
[24] https://www.copyright.gov/comp3/docs/compendium.pdf.
[25] https://www.mewburn.com/law-practice-library/software-copyright.
[26] https://arstechnica.com/tech-policy/2016/01/judge-says-monkey-cannot-own- copyright-to-famous-selfies/.
Other posts
Law firm leaders are actively collaborating with external AI partners and making tech hires
AI is presenting an opportunity to finally ensure the profitability of AFAs.
A new era of productivity: Prompt Architected Software
Explore 'Prompt Architecting' for Generative AI in business, optimizing workflows with LLMs like ChatGPT for efficient HR and legal solutions.
Should you choose GPT-3.5 or GPT-4?
Discover GPT-4's edge over GPT-3.5 in legal applications, offering enhanced AI for document processing, chatbots, and business efficiency.