Social media platform Reddit sued artificial intelligence firm Perplexity AI and three other entities last week, alleging their involvement in an “industrial-scale, unlawful” economy to “scrape” the comments of millions of Reddit users for commercial gain.

Reddit’s lawsuit in a New York federal court takes aim at San Francisco-based Perplexity, maker of an AI chatbot and “answer engine” that competes with Google, ChatGPT and others in online search.

Also named in the lawsuit are Lithuanian data-scraping company Oxylabs UAB, a web domain called AWMProxy that Reddit describes as a “former Russian botnet,” and Texas-based startup SerpApi, which lists Perplexity as a customer on its website.

It’s the second such lawsuit from Reddit since it sued another major AI company, Anthropic, in June.

The lawsuit filed last week is different in the way that it confronts not just an AI company but the lesser-known services the AI industry relies on to acquire online writings needed to train AI chatbots.

“Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created,” Ben Lee, Reddit’s chief legal officer, said in a statement.

The lawsuit accuses the companies of unfair competition and unjust enrichment and alleges that some of them violated U.S. copyright laws.

Perplexity said it has not received the lawsuit but “will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

Scraping for publicly available online data is a common practice used by businesses and researchers, but Reddit compares the companies it is suing to “would-be bank robbers” who can’t get into the bank vault, so they break into the armored truck instead. The lawsuit alleges they are evading Reddit’s own anti-scraping measures while also “circumventing Google’s controls and scraping Reddit content directly from Google’s search engine results.”