The Infocalypse is already enabling scammers and criminals to thrive. Individuals and companies are vulnerable to these increased and evolving threats. While these sorts of attack have existed since time immemorial, the Infocalypse has made them easier to perpetrate, more potent and more widespread. Deepfakes will be these bad actors’ next weapon.
The plot was bold and bizarre. In 2016, a gang of conmen stole over €50 million by posing as French Defence Minister Jean-Yves Le Drian. They relied on the power of audiovisual communication, contacting wealthy individuals through phone and video calls, and asking them to fund ‘secret’ French government missions. As audacious as the plot was, the tools were relatively low-tech. One of the crooks put on a silicone mask of Le Drian’s face and sat behind an official-looking desk. A French flag hung in the background. Then, he asked for millions of euros in donations. If you look up ‘Le Drian Plot’ in a Google image search, you will get a visual of what the targets would have seen in the video call. It is not even close to the level of the deepfake examples we have seen in earlier chapters. The man wearing the silicone Le Drian mask looks like some kind of nightmarish vision, pallid-skinned and with bizarre black hollowed-out holes instead of eyes. Still, three people, all ostensibly savvy and successful business leaders in real life, ended up taking the bait. They included the Aga Khan, the spiritual leader of Ismaili Muslims, who made five transfers for a total of €20 million to accounts in Poland and China, and I´nan Kıraç, a Turkish business magnate, who wired more than €47 million, allegedly to pay for the ransom of two journalists held hostage in Syria.
The success of what might seem like a faintly ridiculous scam is testament to the power of audiovisual communication. As discussed in Chapter 1, we are simply not conditioned to think of audio and video as media that can be subverted. Time and time again, we (even the richest, best resourced and best protected among us) fall for impersonation scams. In early 2020, Prince Harry was duped on a phone call by a pair of Russian pranksters, pretending to be the young environmental activist Greta Thunberg and her father. In a diplomatic faux pas, Harry divulged that he thought US President Donald Trump had ‘blood on his hands’. He was even prompted to discuss what was one of the hottest global news stories at the time, so-called ‘Megxit’, or his departure with his wife Meghan from their official role as members of the British royal family. If a dodgy puppet peering down a video link was good enough to convince some of the world’s richest men to part with millions, and a pair of pranksters doing a terrible impersonation of Greta Thunberg could persuade Prince Harry to discuss deeply personal matters, then it is reasonable to conclude that we are not prepared for deepfakes. As discussed in Chapter 1, deepfakes extend beyond simple media manipulation. Because they can be generated from scratch on training data, they give criminals and fraudsters the ability to effectively steal and use our biometrics: they can take our image and our voice to do and say things that never happened.
Audio fraud is an extremely powerful tool used by scammers around the world, and AI is going to help them. AI is already very good at generating human voices in deepfake audio form. Take a look at the ‘Vocal Synthesis’ (VS) account on YouTube. An anonymous YouTuber set up this VS account in August 2019 and has quickly amassed almost 7 million views. All VS does is generate synthetic audio clips of celebrities and politicians by using an open-source AI software developed by Google called Tacotron 2. It is clear that VS wishes to entertain rather than harm, but because it is appropriating another person’s voice – including bringing people back from the dead – the ethical and legal implications quickly get murky.
The most popular video clip on VS’s YouTube channel resurrects the voice of President John F Kennedy from beyond the grave. When I first saw the title, ‘JFK reads the Navy Seals Copypasta’, it made no sense to me. Copypasta? I clicked play. Astonishingly, I hear JFK’s distinctive voice, ‘What the fuck did you just fucking say about me, you little bitch?’ it starts. ‘I’ll have you know I graduated top of my class in the Navy Seals, and I’ve been involved in numerous secret raids on Al-Qaeda, and I have over 300 confirmed kills.’ What? It sounds a bit tinny, or robot-like, but it is definitely JFK speaking. That is his accent, his cadence and his tone. Close to perfection.
‘Navy Seal Copypasta’ is a well-known Internet meme which makes fun of people posing as ‘tough guys’ on the Internet. The meme originates from a viral 2012 post in which the author was doing exactly that – posing as a ‘tough guy’. Responding to another Internet user’s comment, he made a series of ridiculous claims and grandiose threats, including that he was a former Navy Seal with a long history of combat experiences and ‘300 kills’. The post was full of comical typos and hyperboles such as ‘Gorilla Warfare’ and ‘I can kill you in over 700 ways with just my bare hands’. Now, thanks to AI, this legendary Internet rant had been rendered in JFK’s voice. The accent is perfect. ‘I am trained in gorilla [sic] warfare and I’m the top sniper in the entire US armed forces,’ he says. ‘You are nothing to me but just another target,’ he continues in JFK’s distinctive Massachusetts accent (‘just anothah tahgeht’). The JFK Navy Seal Copypasta rant runs for a full 1 minute and 44 seconds.
Soon, AI will be able to overlay this voice with video, so that you’ll not only be hearing JFK speak but also be able to see his mouth forming the words, his eyes blinking, his head moving and his limbs gesturing. Whereas the photographic manipulators working for Stalin were able to ‘unperson’ just those Soviet politicians who had fallen out of favour, synthetic media can rewrite the historical record completely. Other popular VS audio clips include President George W Bush rapping the sexually explicit lyrics to 50 Cent’s hit ‘In da Club’ in his Texan drawl, including the line ‘I’m into having sex, I ain’t into making love.’ Aside from JFK, VS also resurrects other dead American presidents, such as Franklin D Roosevelt and Ronald Reagan.
In April 2020, the rap superstar Jay-Z brought copyright infringement claims against VS. The YouTuber had created a clip in which an AI-generated ‘Jay-Z’ raps the ‘To be or not to be’ soliloquy from Shakespeare’s Hamlet and the Book of Genesis from the Bible. Jay-Z demanded it be taken down, as it was using his voice without his consent. VS sent a response via their YouTube channel, cloning the voices of Donald Trump and Barack Obama to say they were ‘disappointed that Jay-Z’ decided to ‘bully a small YouTuber in this way’.
This tussle between Jay-Z and VS is an early indicator of the type of challenges around privacy, security and consent that will become more commonplace as synthetic media become widespread in the Infocalypse.
Excerpted from Deep Fakes and the Infocalypse by Nina Schick with permission from Hachette India.