We use affiliate links. They let us sustain ourselves at no cost to you.

Reddit Sues Perplexity, Oxylabs, Two More Web Data Providers

The front page of the internet filed the lawsuit over “illegal theft” of its data. 

Adam Dubois

On Wednesday, October 22, Reddit initiated legal proceedings against Perplexity, an AI-powered search engine. The claim also mentions three other providers of web data: Oxylabs, SerpApi, and AWMProxy. 

Launched in a New York federal court, Reddit’s lawsuit accuses the four companies of scraping Reddit content from the Google search engine. 

Reddit’s description of the alleged transgression colorfully compares the three web scraping companies to “would-be bank robbers breaking into an armored truck carrying cash” and Perplexity to a “North Korean hacker” that hired them to steal the platform’s “manna from heaven”

A provided example claims that in two weeks during July 2025, the defendants scraped over three billion Google search engine pages containing Reddit text, images, videos, and URLs.

the scale of scraping google in two weeks of July 2025
This data was received through a subpoena issued to Google. Source: Reddit’s lawsuit.

The gist of Reddit’s argument is that, despite receiving a cease-and-desist letter, Perplexity continued (and even increased the scope of) scraping the platform’s data, circumventing the protection systems erected by Reddit and Google. To confirm its suspicions, Reddit planted a honeypot page available only to Google’s crawler.

In other words, the plaintiff is unhappy that Perplexity opted against buying its data like Google or OpenAI, and instead “will stop at nothing to get their hands on valuable copyrighted content”. Reddit claims that this has caused business and reputation damage, violating several counts of the Digital Millenium Copyright Act, leading to unfair competition, unjust enrichment, and even civil conspiracy. 

As a relief, Reddit asks the court to stop the defendants from scraping Google and Reddit, stop selling their data, and to compensate Reddit for the injuries it has suffered. 

Perplexity and Oxylabs have reacted to the lawsuit. 

Ironically, Perplexity’s response appeared on Reddit, saying that “this is a sad example of what happens when public data becomes a big part of a public company’s business model”, and that the lawsuit “is about a show of force in Reddit’s training data negotiations with Google and OpenAI”.

Oxylabs’ representative Denas Grybauskas expressed shock and disappointment “as Reddit has made no attempt to speak with us directly or communicate any potential concerns”. He added that “Oxylabs provides infrastructure for compliant access to publicly available information, and we demand every customer to use our services lawfully”.

We find Reddit’s lawsuit bizarre. According to the platform’s own rules, “most of Reddit’s platform is public and accessible to everyone, even without an account. This is intentional.” In addition, despite getting a license to use user-posted content, Reddit leaves ownership to the user. And finally, Reddit’s claim involves information that’s not even hosted on its platform – so why not sue Google for making it available to others?

reddit claiming to be a public platform
Is it, really? Source: Reddit's public content policy.

Collection of public web data is currently viewed favorably by the US courts. Last year, similar claims to control access over licensed data fared poorly for X Corp. in its lawsuit against Bright Data. However, the case law is still not solidified and it’d be dangerous to carelessly generalize individual decisions. 

Reddit has been limiting access to content for a while. In 2023, the platform started charging for API access, severely hampering its vibrant ecosystem of third-party tools. Later on, many users with datacenter-based or VPN IP addresses became no longer able to open the website without logging in. Reddit’s efforts to exert control have also extended to lawsuits when the company sued Anthropic, another AI startup, earlier this year.