A Watermark for Chatbots Can Expose Text Written By AI

Hidden patterns purposely buried in AI-generated texts could help identify them as such, allowing us to tell whether the words we’re reading are written by a human or not.

These “watermarks” are invisible to the human eye but let computers detect that the text probably comes from an AI system. If embedded in large language models, they could help prevent some of the problems that these models have already caused.

For example, since OpenAI’s chatbot ChatGPT was launched in November, students have already started cheating by using it to write essays for them. News website CNET has used ChatGPT to write articles, only to have to issue corrections amid accusations of plagiarism. Building the watermarking approach into such systems before they’re released could help address such problems. 

In studies, these watermarks have already been used to identify AI-generated text with near certainty. Researchers at the University of Maryland, for example, were able to spot text created by Meta’s open-source language model, OPT-6.7B, using a detection algorithm they built. The work is described in a paper that’s yet to be peer-reviewed, and the code will be available for free around February 15. 

AI language models work by predicting and generating one word at a time. After each word, the watermarking algorithm randomly divides the language model’s vocabulary into words on a “greenlist” and a “redlist” and then prompts the model to choose words on the greenlist. 

The more greenlisted words in a passage, the more likely it is that the text was generated by a machine. Text written by a person tends to contain a more random mix of words. For example, for the word “beautiful,” the watermarking algorithm could classify the word “flower” as green and “orchid” as red. The AI model with the watermarking algorithm would be more likely to use the word “flower” than “orchid,” explains Tom Goldstein, an assistant professor at the University of Maryland, who was involved in the research. 

ChatGPT is one of a new breed of large language models that generate text so fluent it could be mistaken for human writing. These AI models regurgitate facts confidently but are notorious for spewing falsehoods and biases. To the untrained eye, it can be almost impossible to distinguish a passage written by an AI model from one written by a human. The breathtaking speed of AI development means that new, more powerful models quickly make our existing tool kit for detecting synthetic text less effective. It’s a constant race between AI developers to build new safety tools that can match the latest generation of AI models.

“Right now, it’s the Wild West,” says John Kirchenbauer, a researcher at the University of Maryland, who was involved in the watermarking work. He hopes watermarking tools might give AI-detection efforts the edge. The tool his team has developed could be adjusted to work with any AI language model that predicts the next word, he says.

Read the rest of this story in MIT Technology Review.

Previous
Previous

A New Look on Global Data Sharing and Digital Trade

Next
Next

Can America Learn This Pandemic’s Lessons Before the Next One Hits?