Generative AI vs Fair Use

In what may be the first pebble falling to start a landslide:

Thomson Reuters prevailed on two of the four factors, but [Judge] Bibas described the fourth as the most important, and ruled that Ross “meant to compete with Westlaw by developing a market substitute.”

From Wikipedia:

Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to balance the interests of copyright holders with the public interest in the wider distribution and use of creative works by allowing as a defense to copyright infringement claims certain limited uses that might otherwise be considered infringement.

As mentioned in that article, Fair Use is what allows us to have parody, criticism, and news reporting without being encumbered by copyright claims. Particularly when and if the subject of that parody or criticism or news reporting disagrees with it and looks for a cudgel to strike it down.

It’s hard to argue that generative artificial intelligence companies today aren’t using copyrighted works in order to create a market substitute. Why look for the original source when you can just ask ChatGPT instead for an answer? I personally see this all day every day in folks use of AI.

However, in Authors Guild v. Google:

A major question in Authors Guild v. Google was whether Google’s use of the copyrighted works was “transformative,” a key component of the fair use inquiry. When a use is found to be transformative, this in practice weighs heavily in favor of a finding of fair use. In the case, the court found that Google’s scanning, as well as the search and snippet display functions, were transformative because the service “augments public knowledge by making available information about [the] books without providing the public with a substantial substitute for . . . the original works.”

Large language models (LLMs) can transform the original work. (Although not always! The New York Times alleges ChatGPT spits out portions of their articles verbatim.) But the book snippets were not seen as a substitute for the original work, and therefore did not compete against the original work. In the case of LLMs, how many folks find an answer using ChatGPT and then decide to look up the original work? How many people even know there’s an original work behind that machine-generated answer?