Okay, looks like this went out in the release notes already (I checked), so... (note: testing models is part of my job and this post is not sponsored by OpenAI in any way):
I was invited to alpha test GPT-5.1 Pro alongside experts in robotics, math, immunology, medicine, music, and more. My focus was life science commercial research and strategy and some personal use cases.
Having used GPT-5.1 Pro for a few days, I find it more like a human domain expert than GPT-5 Pro, with clearer writing, better judgment, fewer tangents, stronger synthesis, and more emotionally aware responses.
I ran GPT-5.1 Pro head-to-head against GPT-5 Pro on work tasks like scientific literature synthesis, drug launch planning, and social media analysis. I also tried it for personal financial planning and even journaling. It was:
- More rigorous and comprehensive in research and planning.
- Stronger at reasoning.
- Better at staying on track and avoiding tangents (and, in at least one case, associated errors).
- Much clearer, more confident, more empathetic in its communication style.
Knowing OpenAI's focus on real-world performance (e.g. GDPval) and reports of it hiring domain experts in fields like finance, I think human domain expertise is exactly what they’re going for, and with GPT-5.1 Pro they’re getting closer.
This said, it isn’t better at everything. It still sucks at creating professional quality presentations and Excel spreadsheets, a notable weakness with OpenAI models that I'm sure they’re working to fix. And I saw that at least one tester found the model conservatively avoided tackling known open problems in STEM domains, choosing instead to explain why they’re open problems.
Overall, if I had to quantify it, I'd say it's a 10-15% jump over GPT-5 Pro for the kinds of things I use it for. It feels like a step toward models that think and communicate more like real colleagues, with better domain expertise, intuition, and judgment, along with increased empathy and communication skill.
This bodes well for AI doing even more impactful work in 2026.