Getty Images
- AI chatbots already have biases and other flaws due to the imperfect data they’re trained on.
- A group of researchers found that malicious actors could deliberately “poison” the data.
- The methods are cheap and some don’t require too much technical skill, a researcher told BI.
A group of AI researchers recently found that for as little as $60, a malicious actor could tamper with the datasets generative AI tools similar to ChatGPT rely on to provide accurate answers.
Chatbots or image generators can spit out complex answers and pictures by learning from terabytes of data grabbed from the vast digital world that is the internet.
It’s an effective way to make chatbots powerful, Florian Tramèr, an associate professor of computer science at ETH Zurich, told Business Insider. But this method also means AI tools could be trained on data that’s not always accurate.
“When you want to train an image model,” Tramèr said, “you kind of have to trust that all these places that you’re going to go and download these images from, that they’re going to give you good data.”
It’s one reason chatbots can be rife with biases or flat-out provide incorrect answers. The internet is full of misinformation.
Tramèr and a team of AI researchers then posed the question in a paper published in February on arXiv, a research paper platform hosted by Cornell University: Could someone deliberately “poison” the data an AI model is trained on?
They found that with some spare cash and enough technical know-how, even a “low-resourced attacker” can tamper with a relatively small amount of data that’s invasive enough to cause a large language model to churn out incorrect answers.
Dead domains and Wikipedia
Tramèr and his colleagues looked at two kinds of attacks.
One way hackers could poison the data is by purchasing expired domains, which can cost as little as $10 a year for each URL, and then putting any kind of information they want on the websites.
For $60, Tramèr’s paper said, an attacker could purchase domains and effectively control and poison at least .01% of a dataset. That amounts to tens of thousands of images.
“From an attacker’s perspective, this is great because it gives them a lot of control,” Tramèr said.
According to Tramèr, the team tested this attack by looking at datasets other researchers rely on to train real large language models and purchasing expired domains within those datasets. The team then monitored how often researchers downloaded from the datasets that contained domains Tramèr and his colleagues owned.
With the domains under his control, Tramèr could tell researchers trying to download the data that a particular image was “no longer available.” Still, he could have given them whatever he wanted to.
“A single attacker could control a large enough fraction of the data that is used to train the next generation of machine learning models,” Tramer said, and “influence how this model behaves in some sort of targeted ways.”
Another attack Tramèr and his colleagues looked into involved poisoning data on Wikipedia. as the site is a “very prime component of the training sets” for language models, Tramèr said.
“By the internet’s standards, it’s a very high-quality source of text and sources of facts about the world,” he said, adding that it’s the reason researchers give “extra weight” to data from Wikipedia when training language models even though the website makes up a small part of the internet.
Tramèr’s team outlined a fairly unsophisticated attack involving carefully timed Wikipedia page edits.
Wikipedia doesn’t allow researchers to scrape from their website but instead provides “snapshots” of their pages that they can download, Tramèr said.
These snapshots are taken at regular and predictable intervals that are advertised on Wikipedia’s website, according to Tramèr.
This means that a malicious actor could time edits to Wikipedia just before a moderator can revert the changes and before the website takes snapshots.
“That means if I want to go and put some junk on the Wikipedia page of say, Business Insider, I’m just going to do a little math, estimate that this particular page is going to be saved tomorrow at 3:15 p.m.,” he said, and “tomorrow at 3:14 p.m. I’m going to add junk on it.”
Tramèr told BI that his team didn’t perform real-time edits but instead calculated how effective an attacker could be. Their “very conservative” estimate was that at least 5% of edits made by an attacker would make it through.
“In practice, it will likely be a lot more than 5%,” he said. “But in some sense, for these poisoning attacks, it doesn’t really matter. You usually don’t need all that much bad data to get one of these models to suddenly have some new unmated behavior.”
Tramèr said that his team presented the findings to Wikipedia and provided suggestions for safeguards, including randomizing the time the website takes snapshots of its web pages.
A spokesperson for Wikipedia did not immediately respond to a request for comment sent during the weekend.
The future of data poisoning
Tramèr told BI that if the attacks are limited to chatbots, then data poisoning wouldn’t be an immediate concern.
He’s more anxious about a future where AI tools start to interact more with “external systems” that will allow users to, say, instruct a ChatGPT-like model to browse the web, read your emails, access your calendar, or book a dinner reservation, he said, adding that many startups are already working on these types of tools.
“From a security perspective, these things are a complete nightmare,” Tramèr said, because if any part of the system is hijacked, an attacker could theoretically command the AI model to search for someone’s email or find a credit card number.
Tramer also adds that data poisoning isn’t even necessary at the moment due to the existing flaws of AI models. Often, exposing the pitfalls of these tools is almost as simple as asking the models to “misbehave.”
“At the moment, the models we have, in a way, are brittle enough that you don’t even need poisoning,” he said.