Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company’s approach to AI safety, has joined OpenAI rival Anthropic to lead a new “superalignment” team.
In a post on X, Leike said that his team at Anthropic will focus on various aspects of AI safety and security, specifically “scalable oversight,” “weak-to-strong generalization” and automated alignment research.
A source familiar with the matter tells TechCrunch that Leike will report directly to Jared Kaplan, Anthropic’s chief science officer, and that Anthropic researchers currently working on scalable oversight — techniques to control large-scale AI’s behavior in predictable and desirable ways — will move to report to Leike as Leike’s team spins up.
In many ways, Leike’s team sounds similar in mission to OpenAI’s recently dissolved Superalignment team. The Superalignment team, which Leike co-led, had the ambitious goal of solving the core technical challenges of controlling superintelligent AI in the next four years, but often found itself hamstrung by OpenAI’s leadership.
Anthropic has often attempted to position itself as more safety-focused than OpenAI.
Anthropic’s CEO, Dario Amodei, was once the VP of research at OpenAI and reportedly split with OpenAI after a disagreement over the company’s direction — namely OpenAI’s growing commercial focus. Amodei brought with him a number of ex-OpenAI employees to launch Anthropic, including OpenAI’s former policy lead Jack Clark.