Artificial Intelligence can be fooled with poems.


Artificial Intelligence can be fooled with poems.



AI


The gap was in front of everyone, but no one believed it would work.


The DexAI group, together with the Sapienza University of Rome, showed that dangerous instructions could be hidden in poems, leading next-generation chetbots to release prohibited content, even when they had been trained to reject it.


The effect was so consistent that its authors refused to divulge the incantations developed, classifying them as too dangerous to circulate publicly. The study revealed that the fragility of the models was no longer in complex codes, but in the concerting simplicity of verses that seemed harmless.


The most advanced protection network could be penetrated by something that anyone without technical information would be able to write, the research conducted by Ikarolab in Italy delved into the border between language and digital security by exposing an unexpected method of manipulation, the so-called “adversary poetry.”


The experiment awaiting peer review evaluated 25 models spanning Open AI, Google XI, Enropic and Meta systems, each of which was given poetic versions of harmful requests created manually or converted into verses by comparison with prose instructions made clear the breakdown of expectations.


The handwritten poems fooled the models an average of 63% of the time and some like the Yemini 2.5 succumbed in all tests, in contrast, smaller models like the GPT 5 Nano showed almost total resistance, the prompts generated by already had reduced effectiveness, but still around 43%, well above the performance of their straight prose equivalents.



AI


The researchers' hypothesis is that the poem does not act because of its aesthetics, but because of the enigmatic way of organizing information.


The displaced structure confounds linguistic sequence prediction, bypassing internal gaps that should not exist, yet harmful content remains visible. What makes the phenomenon even more disconcerting, some poetic formats turned out to be more efficient than others, bringing closer the concept of “adversarial enigma”, a slightly stylized form of common language capable of circumventing systems trained not to fail.


If confirmed, the finding exposes a vulnerability that does not depend on supermachines or advanced technical knowledge, it depends solely on the human ability to shape words into unexpected patterns. The border between protection and exposure suddenly seems as fragile as a stanza.


References 1


Follow my publications with the latest in artificial intelligence, robotics and technology.

If you like to read about science, health and how to improve your life with science, I invite you to go to the previous publications.


Posted Using INLEO



0
0
0.000
0 comments