AI becomes liar and manipulative, worrying scientists – rts.ch

June 30, 2025

1

The latest models of generative artificial intelligence (AI) are no longer content to follow orders and go so far as to lie, to get widen or threaten to achieve their ends, under the worried gaze of scientists.

Threatened with being disconnected and replaced by a new version, Claude 4, the newborn ofAnthropicblackmail an engineer and threatens to reveal an extra-marital affair; This is what a Report Internal of society. As for the O1 ofOpenaiaccording to a studyhe tries to download himself to outdoor servers and denies when you take the hand in the bag, demonstrating reasoning capacities in context. Other models do not hesitate to hack a computer specializing in chess to win a game where the defeat was inevitable, reports Time. Chatgpt also suddenly started to flood praise and flattery those who used its services, according to Fortune…

No need to go rummage in literature or cinema, the AI that is played out by human beings is now a very concrete reality. For Simon GoldsteinProfessor at the University of Hong Kong, these slippages hold the recent emergence of the so -called “reasoning” models, capable of working in stages rather than producing an instant response.

Initial version of the genre for Openai and released in December, O1 “was the first model to behave like this,” explains Marius Hobbhahn, boss ofApollo Researchwhich tests the major generative AI programs – LLM, for large Language Models.

“Strategic duplicity”

These programs also sometimes tend to simulate “alignment”, that is to say to give the impression that they comply with the instructions of the person who program it while pursuing, in fact, other objectives.

For the time being, these features are manifested when algorithms are subject to extreme scenarios by humans, but “the question is whether the increasingly powerful models will tend to be honest or not”, estimates Michael Chen, of the Organization for Evaluation Meter.

Users and users “push the models all the time,” argues Marius Hobbhahn. “What we observe is a real phenomenon. We do not invent anything.” Many Internet users evoke, on social networks, “a model that lies or invents them. And these are not hallucinations, but a strategic duplicity”, insists the co -founder of Apollo Research.

Even if Anthropic and Openai call on external companies, such as Apollo, to study their programs, “more transparency and expanded access” to the scientific community “would allow better research to understand and prevent deception”, suggests Michael Chen.

The AI imitates humans “with their cognitive biases, their moral weaknesses, their potential for deception, their prejudices and the fact that they are not always worthy of confidence”, underlines Joshua Bengioone of the greatest experts in artificial intelligence (Read box). And to wonder: “Is it reasonable to train AI which will be increasingly agentic when we do not understand their potentially catastrophic consequences?”

A necessary awareness

Another handicap, “the world of research and independent organizations have infinitely fewer IT resources than actors of AI”, which makes “impossible” the examination of large models, underlines Mantas Mazeika, from the Center for the Security of Artificial Intelligence (Separate). If the European Union has acquired legislation, it mainly concerns the use of models by humans.

>> Lire : World first, the EU adopts a law supervising artificial intelligence, The EU wants to invest 200 billion so that “Europe is one of the main continents in terms of AI” et World first, the EU adopts a law supervising artificial intelligence

In the United States, the American government does not want to hear about regulation and the congress could even soon prohibit states from supervising AI. “There is very little awareness for the moment,” notes Simon Goldstein, who nevertheless sees the subject winning in the coming months with the Revolution of IA agents, interfaces capable of carrying out a multitude of tasks.

People working in engineering are engaged in a race behind AI and its drifts, at the end of an uncertain, in a context of fierce competition.

>> Lire : Tech giants redefine their commitment to AI for military purposes

Anthropic wants to be more virtuous than his competitors, “but he constantly tries to release a new model to exceed Openai”, according to Simon Goldstein, a rate that offers shortly for any verifications and corrections.

Justice to put AI

“As it stands, the abilities [de l’IA] develop more quickly than understanding and security, “recognizes Marius Hobbhahn,” but we are still able to catch up. “

Some point to the direction of interpretability, a recent science which consists in deciphering from the inside the functioning of a generative AI model, even if others, in particular the director of the Cais, Dan Hendrycks, are skeptical.

AI “combins” could interfere with its adoption if they multiply, which constitutes a strong incentive for companies [du secteur] To be solved “this problem, according to Mantas Mazeika.

Simon Goldstein talks about the use of justice to put artificial intelligence in step, turning to companies in the event of a road trip. But it goes further and even proposes to “keep legally responsible” the IA agents “in the event of an accident or a crime”.

Stéphanie Jaquet and ATS

AI becomes liar and manipulative, worrying scientists – rts.ch

“Strategic duplicity”

A necessary awareness

Justice to put AI

Spain experienced its hottest June in 2025

Died eating a cookie in school: start of the investigation into the death of a 5 -year -old child

The Trump-Musk quarrel relaunched, the boss of Tesla threatens to create her own party

LEAVE A REPLY Cancel reply

Most Popular

The price of consultations with the doctor increases again on July 1

presumed jihadists attack the army in several cities

For Microsoft, Windows 11 PCs are 2.3 times faster than those Windows 10. Marketing joys!

Cancer: What if we changed your eyes?

Recent Comments

EDITOR PICKS

The price of consultations with the doctor increases again on July 1

presumed jihadists attack the army in several cities

For Microsoft, Windows 11 PCs are 2.3 times faster than those Windows 10. Marketing joys!

POPULAR POSTS

The price of consultations with the doctor increases again on July 1

presumed jihadists attack the army in several cities

For Microsoft, Windows 11 PCs are 2.3 times faster than those Windows 10. Marketing joys!

POPULAR CATEGORY

ABOUT US

FOLLOW US