Threatened with being unplugged, Claude 4, the newborn of Anthropic, blackmail an engineer and threatens to reveal an extra-marital affair. Openai’s O1 tries to download to outdoor servers and denies when you take it in the bag. No need to go and search in literature or cinema, the AI that is played out of man is now a reality.
For Simon Goldstein, a professor at the University of Hong Kong, these slippages hold the recent emergence of so -called “reasoning” models, capable of working in stages rather than producing an instant response.
O1, initial version of the genre for Openai, released in December, “was the first model to behave like this,” explains Marius Hobbhahn, boss of Apollo Research, who tests the major generative AI programs (LLM). These programs also sometimes tend to simulate “alignment”, that is to say to give the impression that they comply with the instructions of a programmer while pursuing, in fact, other objectives.
“Strategic duplicity”
For the time being, these features are manifested when algorithms are subject to extreme scenarios by humans, but “the question is whether the increasingly powerful models will tend to be honest or not,” said Michael Chen, of the metal assessment organization. “Users push the models all the time,” says Marius Hobbhahn. “What we observe is a real phenomenon. We don’t invent anything. »»
Many Internet users evoke, on social networks, “a model that lies or invents them. And these are not hallucinations, but a strategic duplicity, ”insists the co-founder of Apollo Research. Even if Anthropic and Openai call on external companies, such as Apollo, to study their programs, “more transparency and widened access” to the scientific community “would allow better research to understand and prevent deception”, suggests Michael Chen.
Another handicap, “the world of research and independent organizations have infinitely less computer resources than actors of AI”, which makes the examination of large models “impossible”, underlines Mantas Mazeika, from the Center for the Security of Artificial Intelligence (CAIS).
IA in justice ?
If the European Union has acquired legislation, it mainly concerns the use of models by humans. In the United States, the government of Donald Trump does not want to hear about regulation and the congress could even soon prohibit states from supervising AI.
AI’s players “could interfere with its adoption if they multiply, which constitutes a strong incentive for companies (in the sector) to be solved” this problem, according to Mantas Mazeika. Simon Goldstein talks about the use of justice to put artificial intelligence in step, turning to companies in the event of a road trip. But it goes further and even proposes to “keep legally responsible” the agents “in the event of an accident or a crime”.