Intellectual Property (IP), Artificial Intelligence (AI), Open Movements (OM) : Most AI chatbots easily tricked into giving dangerous responses, study finds; The Guardian, May 21, 2025

Wednesday, May 21, 2025

Most AI chatbots easily tricked into giving dangerous responses, study finds; The Guardian, May 21, 2025

Ian Sample , The Guardian; Most AI chatbots easily tricked into giving dangerous responses, study finds

"Hacked AI-powered chatbots threaten to make dangerous knowledge readily available by churning out illicit information the programs absorb during training, researchers say.

The warning comes amid a disturbing trend for chatbots that have been “jailbroken” to circumvent their built-in safety controls. The restrictions are supposed to prevent the programs from providing harmful, biased or inappropriate responses to users’ questions.

The engines that power chatbots such as ChatGPT, Gemini and Claude – large language models (LLMs) – are fed vast amounts of material from the internet.

Despite efforts to strip harmful text from the training data, LLMs can still absorb information about illegal activities such as hacking, money laundering, insider trading and bomb-making. The security controls are designed to stop them using that information in their responses.

In a report on the threat, the researchers conclude that it is easy to trick most AI-driven chatbots into generating harmful and illegal information, showing that the risk is “immediate, tangible and deeply concerning”...

The research, led by Prof Lior Rokach and Dr Michael Fire at Ben Gurion University of the Negev in Israel, identified a growing threat from “dark LLMs”, AI models that are either deliberately designed without safety controls or modified through jailbreaks. Some are openly advertised online as having “no ethical guardrails” and being willing to assist with illegal activities such as cybercrime and fraud."

Wednesday, May 21, 2025

Most AI chatbots easily tricked into giving dangerous responses, study finds; The Guardian, May 21, 2025

No comments: