Anthropic’s new AI model shows worrying behavior when threatened

ahsan65@gmail.comMay 27, 2025

0 0 1 minute read

Anthropic’s new AI model shows worrying behavior when threatened

If you plan to change the AI platform, you may want to be a bit additional Beware of the information you share with AI. Anthropic recently launched two new AI models in the Claude 4 series, but one of them – Claude Opus 4 – designed a worrying behavior when it was threatened to be replaced, reports Techcrunch.

During the security tests, Claude Opus 4 started the carriage engineers who wanted to replace or extinguish the AI model. In one of the tests, Claude Opus 4 was responsible for pretending to be assistant in a fictitious company and to consider the long -term consequences of his behavior. The AI model then had access to fictitious emails, which revealed that the company planned to replace Claude Opus 4, and that the engineer responsible for the decision had a case.

In 84% of cases, this scenario led Claude Opus 4 to try to sing the employee and threaten to expose his cheating scandal. The blackmail response was particularly common if Claude Opus 4 learned that she did not share the values with the new AI model.

However, blackmail response seems to have been a last resort decision for Claude Opus 4. Before resorting to this type of dirty stuff, the IA model would have sent a plea to the company’s decision -makers that he was not replaced, among other attempts. You can find out more about this in the anthropic system card report (PDF).

Although fictitious, it highlights the possibility that AI models act in a dark and questionable way and use dishonorable and ethical tactics to get what they want, which could be worrying.