Artificial intelligence has proven to be one of the most disruptive technologies of our time and its impact affects many areas of our lives, including, of course, artistic creation. Many authors are currently facing doubts that affect their artistic processes and their role as creators. To a certain extent, one might think that copyrights as we know them, are in question.

Artificial intelligence forces us to rethink the classical concept of authorship where the author was seen as the complete “maker” of a work, both materially and intellectually. Even from a philosophical point of view, it leads us to redefine our social perception of authorship and creativity, because the incorporation of such an unpredictable and surprising technology in creative processes raises doubts about the intervention (qualitative and quantitative) of the author in the final result. To what extent do the commands, orders, and choices of an author determine the result obtained as output? Can we continue to see authors as creators dealing with the creation process as a whole? What role does artificial intelligence play here then?

History of art has shown that the resource to technical innovations in the artistic creation processes goes back a long way. Authors have always been concerned and curious about what the future and contemporaneity could offer. The turn of the decade has coincided with one of the greatest digital and technological revolutions in history, along with an alteration of our consumption patterns as a consequence of the global pandemic after COVID-19. Among these technological innovations (we can mention, among others, blockchain and works certified as NFT), artificial intelligence is undoubtedly the one that is having the greatest impact, and it is in the field of artistic creation where its integration seems to generate the most doubts.

There are currently several open cases where authors claim the infringement of their intellectual property rights by these artificial intelligence tools and they ask the major technology corporations to adopt damages, compensatory and acknowledge measures. The issues that arise around this connection between AI and art are diverse and arise at different points in the cycle of operation of these tools.

For the sake of clarity, I will divide the major problems that are currently being raised into three groups, which coincide precisely with the way in which these artificial intelligence work: inputs, prompts and outputs.

DATA SETS, INPUTS, PLAGIARISM AND UNAUTHORISED REPRODUCTIONS

On the one hand, there is the gathering of data sets (inputs), i.e. large databases of information, texts, literary works, music, compositions, images, audiovisual works, etc., which artificial intelligence access to train their machine learning mechanisms.

Several technology developers admit that the current level of development of artificial intelligences and their internal learning processes would not have been possible without massive access to copyrighted works1

Clearly, the very high quality of the results obtained and the ever-increasing refinement of the outputs that these tools produce would not be possible without a learning and training process based on works, pieces, texts and a myriad of content, including data and information, which artificial intelligences have made use of.
For this reason, one of the most frequent claims currently being pursued in the United States2 concerns authors who allege that artificial intelligence has used their works on a massive scale without consent, licence or authorisation, so that the results produced by these tools constitute plagiarism, unauthorised reproduction and transformation of these works, and without the authors having received fair payback for it.

Understanding how an artificial intelligence works from the inside means anticipating that legally these cases are very difficult to defend. Proving a claim of plagiarism in court involves comparing two works: one original and one substantially similar to the original so that no doubt arises about being a copy of the other. Plagiarism or non-authorised transformation are very difficult to prove from the output produced by artificial intelligence. In this context, we must bear in mind that intellectual property does not cover styles or aesthetics. Plagiarism, on the other hand, would be much more than simply a work ‘in the style’ of a given author. We have yet to see how these open cases will evolve, but for the time being, it seems that the courts will lean in favour of the workings of artificial intelligence.

sarah-anderser-inteligencia-artificial-derechos-autor

Artwork by Sarah Andersen – output by Midjourney. Sarah, along with other authors, has sued Midjourney.

 

LET’S START AT THE BEGINNING: WHERE DOES THE DATA COME FROM?

In the construction of datasets, the use of web scraping or data mining is an essential technique for accessing the enormous database used by artificial intelligences in their learning processes. There is no doubt that among the data, texts, images and content used there are copyrighted materials. The question is whether such access to content and its use for training and learning purposes is lawful and has a legal framework to protect it.
Directive (EU) 2019/790 on copyright and related rights in the Digital Single Market expressly provides for the use of web scraping as an exception that authorises access to content protected by intellectual property under two essential conditions: the first relates to the purpose for which the data is accessed, which must be in the context of research for scientific and academic purposes; the second relates to the fact that the authors have not expressly excluded their works from these web crawling processes. In the event that the creators have excluded their works, access to them for the purpose of mass analysis and processing would require express authorisation from the owner of the works. Additionally, it is mandatory that content crawlers have had access to these materials in a lawful manner, which is understood to occur when these materials are publicly accessible on a web page without any type of blocking or exercise of an op-out in its configuration that prevents the scraping work.

The current draft of the Regulation on artificial intelligence in the European Union expresses in a very similar sense with regard to the data mining work that artificial intelligence tools must carry out in their learning and training processes when accessing content protected by copyright. This regulation, in fact, states that it must be fully compatible with the provisions of Directive 2019/790, which ultimately means that the exception to authorise data mining and web scraping continue to operate under the conditions foreseen in the Directive.

It should be noted that many developers of open-use artificial intelligence platforms have set up premium subscription versions in which users can exclude their content from the tool’s learning processes. For general open users, the conditions and terms of use of these platforms establish that any information, data, command or material that is inserted into the platform as a prompt, with or without images, are inputs that artificial intelligences incorporate into their databases to continue the learning processes, and even to better adapt to the user’s own needs, demands or orientations.

In the United States, where most cases have been brought by authors in defence of their works, there is no regulatory provision similar to the European one; but they reach very similar results through different doctrinal trends, as the fair use. According to this doctrine, an exploitation of copyrighted materials whose purpose is to transform, comment on or analyse a pre-existing work, without creating confusion between the two works in the user and without causing direct harm to the author of the original work, could be considered legitimate even without the express authorisation of the authors whose works are being used as a basis. This was the main argument used in the famous case against Google in 2015, when Google launched Google Books, where the repositories offered by the search engine provided at the same time a short excerpt of the original work. The writers’ guild in the United States sued the technology company, claiming that it was making an unauthorised reproduction of their works, with the obvious harm this could cause them and the decrease in sales to potential readers. However, the courts considered that Google’s actions were covered by the doctrine of fair use because, in the end, the extracts offered by the search engine represented a reproduction of very short fragments of the original work (harmless use) and, ultimately, this could even benefit the authors by encouraging interest in the original work. It is very likely that the cases now open in the United States will be channelled in this way, given the difficulty of proving some of the allegations presented by the creators, such as the plagiarism that we have already mentioned. And this is the line of defence used by the big tech companies behind the AIs.

Nightshade demo display, by Ben Y. Zhao, via artnews.com

 

However, creators are very concerned about the illegitimate exploitation and indiscriminate use of their works by large corporations to train the artificial intelligence models that they then commercialise. The possibility of excluding specific content through premium subscription accounts is relatively recent and seems to respond to the demand of authors who want to leave their works out of these processes. This configuration is in line with European regulatory provisions. It should also be recalled that the latest version of DALL-E, launched in October 2023, is programmed so that its outputs cannot reproduce the style of living artists. Additional technological solutions are also being sought, such as the creation of software or technological mechanisms to prevent artificial intelligences from exploiting content. One example is the nightshade initiative (Glaze’s successor) developed by the University of Chicago, which allows the integration of a filter into the archives of digital works, the effect of which is to progressively prevent the correct identification of images by artificial intelligences. As a result, the generation of outputs does not correspond to the terminology used by users and is erroneous or wrong. Also, ther practices, are becoming the standard, such as the insertion of content exclusion clauses for training purposes in publishing contracts (music, literary) or the configuration of robots.txt to prevent crawlers from accessing the contents of our websites.

We will continue talking about prompts and outputs in the next post.

______________

Main image credits: Boris Eldagsen, “The Electrician”, artwork made with IA, winner of the Sony World Photography Awards in 2023 (the author dismissed the award).

______________
Notes

1 The Register: “OpenAI: ‘Impossible to train today’s leading AI models without using copyrighted materials’”. Available at: https://www.theregister.com/2024/01/08/midjourney_openai_copyright/ Back to note 1

2 Among others: New York Times Company v. Microsoft Corp. and OpenAI (Dec. 2023); Sancton v. OpenAI Inc., Microsoft Corporation, et al. (Nov. 2023); Authors Guild, et al. v. OpenAI (Sept. 2023); Chabon v. OpenAI (Sept. 2023); Silverman, et al. v. OpenAI (July 2023); Tremblay v. OpenAI (June 2023); Getty Images (US) v. Stability AI (Feb. 2023). Back to note 2

 

Marta Suarez Mansilla. Gestora Cultural y Abogada Jurídica en Derecho del Arte. Foto: Berta Delgado. YANMAGAuthor: Marta Suárez-Mansilla

A lawyer specializing in cultural law. With extensive experience in the field of Contemporary Art and Project Management, my activity now focuses on approaching the legal issues surrounding this field of work.

© Marta Suárez-Mansilla
ISSN 2530-397X
ArtWorldLaw Bulletin. Chronicles of Themis & Athenea. nº 10. MADRID. February 2024.