Meta Platforms, Inc. finds itself embroiled in a legal battle as allegations surface regarding the use of copyrighted materials in the training of its Llama models. The controversy centers around accusations that Meta researchers attempted to obscure the incorporation of copyrighted content by introducing "supervised samples" during Llama's fine-tuning process. This legal confrontation is part of a broader trend where artificial intelligence companies face lawsuits from authors and intellectual property holders within the United States court system.
The plaintiffs in this high-profile case include bestselling authors Sarah Silverman and Ta-Nehisi Coates. Legal representatives for the plaintiffs have cited Meta employees acknowledging the use of LibGen, a known repository of pirated content, as a data source. Court documents reveal that Meta employees expressed concerns that utilizing LibGen could compromise the company's negotiations with regulators. Despite these internal warnings, CEO Mark Zuckerberg allegedly approved the use of LibGen for training at least one Llama model, according to court filings.
LibGen and Z-Library, two online platforms implicated in these allegations, have faced multiple lawsuits for copyright infringement, resulting in orders to shut down and fines amounting to tens of millions of dollars. These platforms provide access to copyrighted works from major publishers like Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. In 2022, Russian nationals accused of maintaining LibGen were charged with copyright infringement, wire fraud, and money laundering.
Meta's Llama models compete directly with flagship artificial intelligence models from companies like OpenAI. The plaintiffs claim that Meta is leveraging this contentious dataset to train its forthcoming Llama 4 models. During his deposition, Zuckerberg distanced himself from familiarity with LibGen, stating:
“I get that you’re trying to get me to give an opinion of LibGen, which I haven’t really heard of.” – Mark Zuckerberg
He further elaborated on his lack of knowledge about the platform:
“It’s just that I don’t have knowledge of that specific thing.” – Mark Zuckerberg
Despite these assertions, Zuckerberg acknowledged the importance of caution when dealing with potentially infringing sources:
“You know, if there’s someone who’s providing a website and they’re intentionally trying to violate people’s rights … obviously it’s something that we would want to be cautious about or careful about how we engaged with it or maybe even prevent our teams from engaging with it.” – Mark Zuckerberg
Zuckerberg also drew comparisons with another media platform:
“And the vast majority of the stuff on YouTube, I would assume, is kind of good and they have the license to do.” – Mark Zuckerberg
While he emphasized caution, Zuckerberg also argued against blanket bans:
“There are cases where having such a blanket ban might not be the right thing to do.” – Mark Zuckerberg
The use of pirated e-books from Z-Library for training Meta's Llama models reportedly continued as recently as April 2024. Meta employees had previously flagged this action as potentially detrimental to their negotiating stance with regulatory bodies. As these revelations unfold, they underscore the challenging landscape AI companies navigate in balancing innovation with legal and ethical considerations.
This lawsuit against Meta is not an isolated incident. It reflects a growing scrutiny on how AI firms utilize data sets, especially when such data involves copyrighted materials. Authors and publishers are increasingly vigilant in protecting their intellectual property rights amidst the rapid advancement of AI technologies.
Leave a Reply