Hint: it has nothing to do with our inability to tell the difference between what's real and what's fake.
A deepfake is a manufactured image, audio, or video recording. It looks or sounds real, but in at least some crucial ways, is not. For example, you could overlay a voice over a video to make it look like someone said something they didn't (example: Anderson Cooper). Or you could paste someone's head on someone else's body to make it look like they did something they didn't. Most radically of all, you could fabricate the entire audio or video without relying on an original recording at all.
Given this range, the difference between an original audio or video and a deepfake is one of degree, not kind. A slightly altered video isn't quite a deepfake, but not entirely original, either. On the other hand, a fully manufactured video with a novel mashup of time, place, persons, content, etc. is basically animation.
If you've never seen a deepfake (that you know of), here are some classic examples. In 2019, Buzzfeed and comedian Jordan Peele made this deepfake of President Obama badmouthing his political opponents. Taylor Swift speaking in Mandarin? No problem. And when people find out that I've been researching fake news, half of them want to talk about the Tom Cruise deepfake that's made its way around social media.
These are just the tip of the iceberg. In the early years, deepfakes consisted largely of non-consensual pornography (where someone puts the head of a celebrity on the body of an actor in an adult film). But nowadays deepfakes can be found everywhere from sports to movies to politics.
We've had the technology to create deepfakes for quite some time, but that tech is getting better and cheaper by the month. If you watch the Peele deepfake (above), it's pretty obvious that President Obama isn't really saying those things. But that was five years ago. The new stuff is much more convincing.
In fact, the exponential growth of AI in the last year has driven massive improvements in the quality and speed of deepfake production. Most readers probably know that ChatGPT is an AI system that produces texts on the basis of short prompts. Well, what ChatGPT did for text, Sora now does for video. Sora was just released by OpenAI as a system that turns textual prompts or still images into video. That's right: now you can create deepfake videos by just entering a prompt like "little boy eats a ham sandwich while a Labrador retriever drools," and the system can produce a short video clip with that content. The Tweet announcing the new system embeds a short video of a city scape produced by a simple text prompt. It's not perfectly realistic, but it's not bad, either. Here's a compilation of other sample videos produced by text prompts into Sora, and some of them are downright amazing.
The punchline is that in a matter of years (if not months), we'll be able to create mostly realistic, short deepfake videos with just a few keystrokes.
That prospect has many people concerned. It's obvious how autocratic regimes like Russia and China can add this tool to their arsenal of disinformation. Want a video of Ukrainian soldiers wearing swastikas and torturing prisoners to convince people back home in Russia about the importance of your special military operation? AI deepfake to the rescue. Sora is likely to be Putin's best friend.
But deepfake tech is equally problematic for denizens of democracies who rely on an informed populace to make good decisions at the polls. As the 2024 election cycle heats up, major newspapers in the US have trumpeted the deepfake worry from the rooftops, including the Wall Street Journal, the Washington Post, and the New York Times.
There is good reason to be worried. The widespread ability to create persuasive deepfakes quickly and at low cost is likely to have significant impacts on democratic systems. But that reason has almost nothing to do with the fact that most citizens won't be able to tell the difference between genuine videos and deepfakes. The real problem is that most citizens don't care about the difference between the two, and their negligent consumption of deepfakes will negatively affect their voting behavior.
The standard line from most of the journalists and pundits that I've read is that the advent of deepfakes is bad for democracy because people will fall for deepfakes and vote accordingly. For example, the New Hampshire primary was just interrupted by a string of robocalls featuring a deepfake of President Biden urging people to stay home and skip the election. The worry is that people will think the phone calls are legitimate and do what he says. On this diagnosis, the public is naïve but responsible. They want the truth, but deepfakes are good enough to fool them.
That's a bad explanation for so many reasons. Here's a better one: people are consuming media for all sorts of reasons, few of which have to do with the truth. They want to be entertained, angered, humored, and so forth. On my diagnosis, the public is negligent. They don't really care about the truth and so consume media--including deepfakes--recklessly.
On this explanation, the problem isn't that people will be fooled into thinking deepfakes are the real McCoy. It's that they will continue to consume deepfake material even when they know they are false because doing so will feed other urges. It will be funny or uplifting or exciting to watch videos of your political idols or enemies even when you know they are fake.
Here's a thought experiment: suppose social media companies banded together and agreed to flag deepfake videos with a warning text at the bottom. Every misleading video from Russian hackers abroad to political campaigns at home will come with the tech equivalent of a surgeon general's warning. Do you think that would stop people from sharing outrageous deepfakes on TikTok? Not a chance. We don't share Saturday Night Live skits that parody political figures because we think they are real. We share them because they're so damn funny. Deepfakes are no different.
And with the rise of improved technology and AI, we risk a huge exposure to deepfakes that connect with our darker impulses. It's easy to imagine a flood of artificial videos that ridicule those who don't worship like us, look like us, or vote like us. A steady stream of mocking or hateful media of that sort is dangerous even if we know that it's fake.
Why? The same reason many actors refuse to play the role of a villain in films. Once an audience has seen you as Hannibal Lector, it's hard for you to be CS Lewis (hat tip to Anthony Hopkins). Actors who play villains widely report getting hate mail from viewers, being disparaged in public, and shamed on social media. This isn't the result of fans thinking that the actors are truly villains. They aren't confusing the films for documentaries. It's that they have negative feelings about the actors because they associate them with the bad but fictional things portrayed in the movies.
Deepfakes will have the same effect. We will come to have a deep loathing for political adversaries mocked in deepfakes. We will cheer for religious figures championed in deepfakes. And we will feel an instinctual distrust of the sorts of people who typically play the bad guys in the artificial media we consume.
Exposure to media of this sort will have an effect at the voting booth. How could it not? Text-based propaganda is remarkably effective at changing voting behavior, particularly by making false claims seem true by exposing people to lies for long periods of time. It's called the illusory truth effect: we often gauge whether a claim is true or false by reflecting on how we feel about it or whether it's a familiar thought.
Trading on the idea that seeing is believing, media will be far more effective at getting us to feel a certain way about people or making ideas seem familiar and true. Imagine standing in the voting booth and replaying a video in your head that makes President Biden look like a drooling idiot or President Trump like an unhinged madman. It would be hard to vote for someone through that kind of cognitive dissonance.
If text-based propaganda is bad for democracy, wait till you see what we can do with deepfakes.
Comments