Entertainment and Media Guide to AI

Geopolitics of AI icon - location pin icon

Read time: 13 minutes

Copyright is a territorial beast and not all countries are equal in how they have decided to approach the TDM debate.

The U.S. apprehends TDM through its doctrine of “fair use,” that permits limited use of copyright protected material without having to first acquire permission from the copyright holder – in particular where the contemplated use is deemed “transformative.” Japan enjoys a flexible copyright exception for “non-enjoyment” purposes. Other countries, such as Singapore, South Korea, Malaysia, Israel and Taiwan, have adopted similar rules across the globe, with the firm intention of removing uncertainties for their tech industries and positioning themselves in the AI race, unencumbered.

The EU followed suit, albeit with a much shallower version of the exception as far as businesses are concerned. The Directive on Copyright in the Digital Single Market, adopted in 2019 (the Copyright Directive) introduced two mandatory exceptions under EU copyright law: (i) one for research and cultural organizations to conduct research; and (ii) another one available to any type of beneficiaries for any type of use, but with a significant caveat – it may be overridden by “opt-out,” a concession to rightsholders introduced during the very last stage of the Copyright Directive’s adoption process and which is fraught with practical difficulties.

The UK did not transpose the Copyright Directive and may find itself unable to legislate on TDM for some time as a result of the charged atmosphere which seems to have permeated this issue, at home and elsewhere. Meanwhile, the UK could find itself in a Catch-22 situation, with a desire to encourage its AI sector, yet strict copyright rules with very limited scope for data extraction without a license. That is, of course, unless the UK, now free from its EU shackles, decides to reinvent the meaning of its “fair dealings” exception…

In the following articles, we look at how various jurisdictions have approached the text and data mining debate.

The United States

Copying copyright protected content for TDM purposes

In the United States, the reproduction right is reserved for the copyright owner of a work or its licensees under section 106 of the U.S. Copyright Act of 1976. While there are no express exceptions in U.S. copyright law, section 107 of the Copyright Act authorizes the fair use of a copyright protected work, “including by reproduction in copies or phonorecord or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching […], scholarship, or research.” Copying copyright protected works for the sole purpose of text and data mining has traditionally been considered a case of fair use by the technology sector. The creative sector disagrees, and the launch of generative AI solutions capable of producing photos, paintings and music at the push of a button has seen copyright holders rally behind the “unfair use” banner to condemn the use of their content by AI businesses.

What is fair use?

To determine whether the use of a copyright protected work without the consent of the copyright owner constitutes non-infringing fair use, courts will balance the following four factors on a case-by-case, highly fact-specific inquiry basis:

(1) The purpose and character of the use, including whether the use is of a commercial nature or is for non-profit educational purposes;

(2) The nature of the copyright protected work;

(3) The amount and substantiality of the portion used in relation to the copyright protected work as a whole;

(4) The effect of the use upon the potential market for or value of the copyright protected work.

The first factor. The first factor, also known as the “transformative use factor,” is generally the most heavily weighted by the courts. A use is transformative, if it merely supersedes the existing work, or, to the contrary, if it adds something new, with a further purpose or different character, altering the first work with new expression, meaning or message1. Even if a work is copied and stored in substantially the same form as the original without meaningful alteration, it does not preclude the use from being considered transformative in nature, so long as the use by the would-be copier serves a materially different function than the original work2.

Some examples where courts have found a use to be transformative include making digital copies of student papers to use an anti-plagiarism software (where the defendant’s use of the works was unrelated to such works’ expressive content),3 or scanning books to create a full-text searchable database and public search function (in a manner that did not allow users to read the texts).4  While educational and non-commercial uses are generally more likely to be decided to be fair use, courts will not necessarily find a commercial use to be unfair and will instead balance the purpose and character of the use against other factors.

Copies of original works made for TDM purposes appear to have a purely functional purpose, namely, to teach an AI model about the underlying characteristics of a work through pattern recognition. Copies of original works made for TDM purposes are never released or made available to the public, hence it would appear that their transformative nature is on par with existing case law.

The second fair use factor examines whether the reproduced content is factual in nature, in which case it is entitled to a lower level of protection in an attempt to encourage the spread of scientific or educational works for the public’s benefit. While the reproduction of nonfactual, creative works such as images or sound recordings is less likely to satisfy this factor, the second factor has been considered by courts to hold little weight in the fair use balance and is rarely found to be determinative.

The third fair use factor assesses whether the quantity and significance of the portion of the work reproduced is justifiable, considering the intent of the copying. Even though using an entire image, sound recording or other creative work may seem contradictory to fair use, it does not necessarily preclude the possibility of such a ruling.5  Importantly, the factors should not be scrutinized in isolation but should be weighed collectively. In this regard, the fourth factor, which along with the first factor is generally given the greatest weight by the courts, could tip the balance towards a fair use finding.

The fourth factor examines whether the copy brings to the marketplace a competing substitute for the original work or if it diminishes the original work’s value by serving as an alternative that potential buyers might prefer.6  More generally, in order to be deemed fair, the use should not negatively impact the market (or the potential market) or the value of the original copyright protected work by serving as a viable substitute. As copyright is a commercial right intended to protect the ability of authors to profit from their work, this factor is often influential in a fair use analysis.7  The interrelation between the fourth and first factors is crucial: the more the new work serves a different purpose than the original work (the first factor), the more unlikely it is that the second work will serve as a market substitute for the original work8 (the fourth factor).

Should the long-term purpose of the TDM operations be considered by the courts when assessing the fairness of the practice? Should the court’s fair use analysis differ based on the type of AI model being trained (generative or not)? These questions are highly topical and, for a large part, they hinge on the US courts’ response to a small number of highly visible lawsuits which the drafters of this guide will watch closely.

Key takeaways
  • While certain jurisdictions, such as the European Union, work on developing a regulatory landscape around AI, other jurisdictions, such as the United States, rely on industry-specific guidelines (at both the federal and state level) in the absence of comprehensive legislation.
  • Certain jurisdictions, such as Singapore, quickly adapted to generative AI by creating an exception permitting AI companies to reproduce copyrighted works for training purposes.
  • In the United States, in the absence of a TDM exception, AI companies contend that inclusion of copyrighted materials in training sets constitute fair use eg not copyright infringement, which position remains to be evaluated by the courts.