Open-source AI models can be exploited for criminal activities: Study

A new study shows that open-source AI chatbots, like Meta’s Llama and Google DeepMind’s Gemma, are being used in ways researchers didn’t expect—including by hackers.
After tracking thousands of servers running these models worldwide over 293 days, experts found thousands of deployments with security issues, noting hundreds of instances where guardrails had been removed and that system prompts were visible in roughly a quarter of the models they observed (with about 7.5% of those potentially enabling harmful activity).

Where things went wrong

Roughly 30% of these AI setups were operating out of China and about 20% in the US.
In about a quarter of cases, system prompts were visible—making it easier for bad actors to mess with them.
Hundreds had their safety guardrails turned off completely.

Why it matters

These exposed AIs could be hijacked to spread spam, phishing scams, hate speech, or even worse content.
As SentinelOne’s Juan Andres Guerrero-Saade put it, there’s an “iceberg” of unmonitored systems mixing good and bad uses.
Experts say labs need to get ahead of the risks before things spiral out of control.

The analysis enabled researchers to observe system prompts, providing direct insight into model behavior. They demonstrated that 7.5% of these prompts could potentially cause significant damages.

Furthermore, it was observed that 30% of the hosts operate out of China, while approximately 20% are based in the US.

Following the recent remarks, a spokesperson for Meta declined to respond to certain questions about developers’ responsibility for addressing concerns related to downstream abuse of open-source models.

Microsoft AI Red Team Lead Ram Shankar Siva Kumar stated via email that while Microsoft plays a crucial role in various sectors, open models are simultaneously driving transformative technologies. The company is continuously monitoring emerging threats and improper applications.

The analysis enabled researchers to observe system prompts, providing direct insight into model behavior. They demonstrated that 7.5% of these prompts could potentially cause significant damages.

 

Leave a Reply

Your email address will not be published. Required fields are marked *