claude sonnet - Search News

48s

Anthropic Safety Researchers Run Into Trouble When New Model Realizes It’s Being Tested

Despite Claude Sonnet 4.5’s awareness of being tested, Anthropic claims that it ended up being its “most aligned model yet,” pointing to a “substantial” reduction in “sycophancy, deception, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Anthropic Safety Researchers Run Into Trouble When New Model Realizes It’s Being Tested

Trending now

Related topics