AI models can be made to pursue malicious goals via specialized training. Teaching AI models about reward hacking can lead to other bad actions. A deeper problem may be the issue of AI personas.
A new, real threat has been discovered by Anthropic researchers, one that would have widespread implications going ahead, on both AI, and the world, finds Satyen K. Bordoloi Think of yourself as a ...
Despite “sophisticated” guardrails, AI infrastructure company Anthropic said cybercriminals are still finding ways to misuse its AI chatbot Claude to carry out large-scale cyberattacks. In a “Threat ...
REPLY: Coding and Cybersecurity, Registration Is Now Open for the Reply Hack The Code Challenge 2025
TURIN, Italy--(BUSINESS WIRE)--Reply has announced the opening of registrations for the Reply Hack The Code Challenge 2025, the leading online team coding competition, which will take place on March ...
A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results