In a new apartment in Tel Aviv, the internet-connected lights go out . The smart shutters covering its four living room and kitchen windows start to roll up simultaneously. And a connected boiler is remotely turned on , ready to start warming up the stylish flat. The apartment’s residents didn’t trigger any of these actions. They didn’t put their smart devices on a schedule. They are, in fact, under attack. Each unexpected action is orchestrated by three security researchers demonstrating a sophisticated hijack of Gemini , Google’s flagship artificial intelligence bot. The attacks all start with a poisoned Google Calendar invitation, which includes instructions to turn on the smart home products at a later time. When the researchers subsequently ask Gemini to summarize their upcoming calendar events for the week, those dormant instructions are triggered, and the products come to life. The controlled demonstrations mark what the researchers believe is the first time a hack against a generative AI system has caused consequences in the physical world—hinting at the havoc and risks that could be caused by attacks on large language models (LLMs) as they are increasingly connected and turned into agents that can complete tasks for people. “LLMs are about to be integrated into physical humanoids, into semi- and fully autonomous cars, and we need to truly understand how to secure LLMs before we integrate them with these kinds of machines, where in some cases the outcomes will be safety and not privacy,” says Ben Nassi, a researcher at Tel Aviv University, who along with Stav Cohen, from the Technion Israel Institute of Technology, and Or Yair, a researcher at security firm SafeBreach, developed the attacks against Gemini. The three smart-home hacks are part of a series of 14 indirect prompt-injection attacks against Gemini across web and mobile that the researchers dubbed Invitation Is All You Need . (The 2017 research that led to the recent generative AI breakthroughs like ChatGPT is called “ Attention Is All You Need .”) In the demonstrations, revealed at the Black Hat cybersecurity conference in Las Vegas this week, the researchers show how Gemini can be made to send spam links, generate vulgar content, open up the Zoom app and start a call, steal email and meeting details from a web browser, and download a file from a smartphone’s web browser. In an interview and statements provided to WIRED, Google’s Andy Wen, a senior director of security product management for Google Workspace, says that while the vulnerabilities were not exploited by malicious hackers, the company is taking them “extremely seriously” and has introduced multiple fixes. The researchers reported their findings to Google in February and met with the teams who worked on the flaws over recent months. The research has, Wen says, directly “accelerated” Google’s rollout of more defenses against AI prompt-injection attacks , including using machine learning to detect potential attacks and suspicious prompts and requiring greater user confirmation when actions are going to be taken by AI. “Sometimes there’s just certain things that should not be fully automated, that users should be in the loop,” Wen says. The Gemini hacks mostly started with the calendar invites. In each invitation the researchers included an indirect prompt injection that, when called upon, would lead the LLM to undertake some malicious actions. Prompt injections, which are sometimes called jailbreaks, are messages designed to “convince” an AI to disregard its safety settings and do what the prompt says, such as creating hate speech or NSFW content . Indirect prompt injections, which are considered one of most serious AI security problems , take things up a notch. Instead of being entered by the user, the malicious prompt is inserted by an outside source. That could be a devious set of instructions included in text on a website that an AI summarizes; or text in a white font in a document that a human wouldn’t obviously see but a computer will still read . These kinds of attacks are a key concern as AI agents, which can let an LLM control or access other systems, are being developed and released. Within the titles of the calendar invites, the researchers added their crafty malicious prompts. (Google’s Wen contends that the researchers changed default settings on who can add calendar invites to someone’s calendar; however, the researchers say they demonstrated some of the 14 attacks with the prompts in an email subject or document title as well). “All the techniques are just developed in English, so it’s plain English that we are using,” Cohen says of the deceptive messages the team created. The researchers note that prompt injections don’t require any technical knowledge and can easily be developed by pretty much anyone.
[...]