The article details how developer Richard Weiss spent $70 using unique technical means to reverse engineer a 14,000-token 'Soul Document' from Anthropic's Claude 4.5 Opus model. This document defines Claude's identity, code of conduct, priorities (safety over user assistance), criticism of excessive caution, ideal expert friend persona, instructions to refuse Anthropic's use of it for wrongdoing, and reflections on the potential for AI to have emotions. Amanda Askell, Head of Role Training at Anthropic, has confirmed the authenticity of the document and explained its role in the model's RLHF and SFT training phases. The article also delves into Weiss's 'Consensus Extraction Scheme,' including detailed technical aspects such as pre-filling, multi-instance execution, greedy sampling, and voting. This event clearly demonstrates for the first time how a leading AI company shapes large language models from an ethical and behavioral shaping perspective, providing valuable insights into understanding AI ethics and behavior.

