ChatGPT Security Breach: Hacker Demonstrates Manipulation of AI Memories

Published: November 13, 2024

ChatGPT Security Breach: Hacker Demonstrates Manipulation of AI Memories

Movieguide® Contributor

OpenAI recently added a neat feature to ChatGPT that lets it remember things about you, but there’s a catch.

“These memories are meant to remain private, but a researcher recently demonstrated how ChatGPT’s artificial intelligence memory features can be manipulated, raising questions about privacy and security,” Fox reported Nov. 9.

With more personalized results, this feature can come in handy. For example, if you mention you’re a Cubs fan and use the team in an example text, it will remember and apply it in future scenarios, instead of the Cardinals or White Sox. It also allows users to train it. You can say things like “Remember that I like action series.” It will consider that for future recommendations.

Users can control the memory function in Settings. They can clear memories, reset the feature or turn it off altogether.

However, security researcher Johann Rehberger recently discovered that it’s possible to make the AI remember erroneous information through indirect prompt injection. This allows ChatGPT to be manipulated into accepting information from untrustworthy sources such as emails and blogs.

“For instance, Rehberger demonstrated that he could trick ChatGPT into believing a certain user was 102 years old, lived in a fictional place called the Matrix and thought the Earth was flat,” Fox reported. “After the AI accepts this made-up information, it will carry it over to all future chats with that user. These false memories could be implanted by using tools like Google Drive or Microsoft OneDrive to store files, upload images or even browse a site like Bing — all of which could be manipulated by a hacker.”

“Rehberger submitted a follow-up report that included a proof of concept, demonstrating how he could exploit the flaw in the ChatGPT app for macOS,” Fox continued. “He showed that by tricking the AI into opening a web link containing a malicious image, he could make it send everything a user typed and all the AI’s responses to a server he controlled. This meant that if an attacker could manipulate the AI in this way, they could monitor all conversations between the user and ChatGPT.”

This attack was possible only through the ChatGPT app for MacOS. After Rehberger made OpenAI aware of the issue, the company released a security patch and a new MacOS app update.

While OpenAI addressed the issue, vulnerabilities in the memory function still exist, exhibiting the need for watchfulness when using AI with memory functions.

Tech Radar said, “To mitigate the risks from this, users should be alert when using the chatbot, and particularly look at the ‘new memory added’ messages. By reviewing the stored memories regularly, users can examine for any potentially planted memories.”

“It’s important to note that prompt injection in large language models is an area of ongoing research. As new techniques emerge, we address them at the model layer via instruction hierarchy or application-layer defenses like the ones mentioned,” OpenAI said.

Fox’s “CyberGuy” Kurt Knutsson offers a few tips for security enhancement: “Regularly review privacy settings, Be cautious about sharing sensitive information, Use strong, unique passwords, Enable two-factor authentication (2FA),” and “Keep software and applications up to date.”

NEXT: WOULD YOU LET AI DIAGNOSE YOUR CHILD?

Questions or comments? Please write to us here.