OpenAI Recently Launched Three New Models Concurrently and Created a New Website for These Models

OpenAI has released a new generation of audio models, including GPT-4o-transcribe and GPT-4o-mini-tts. These models not only improve speech-to-text accuracy but also achieve breakthroughs in expressive control for text-to-speech. GPT-4o-transcribe outperforms the existing Whisper model in various benchmarks, especially when handling noisy environments and diverse accents. GPT-4o-mini-tts supports “steerability” for the first time, allowing developers to control voice styles with greater flexibility. OpenAI also showcased the application of the AI fashion consultant Agent and introduced two technical approaches for building voice Agents: end-to-end speech-to-speech models and modular chained methods. The latter is easier to modularize, convenient for independent optimization, and compatible with existing text systems. In addition, OpenAI launched integration with the Agents SDK to simplify the development process and held a broadcast competition to encourage users to create audio works and stimulate creativity. These technological advancements and application examples demonstrate AI's evolution toward more natural and emotional interactions, fostering closer user engagement through enhanced emotional interaction.




