About That OpenAI Audio Chat Demo
What the OpenAI announcement means for telephony and other audio only conversational AI: carry on building, things are accelerating.
Yesterdays OpenAI announcement majored on audio & video conversations. There was an impressive speech demo which is way ahead of what any of us working in this area can build right now from individual STT/inferencing/TTS components, no matter how hard we tune and optimise to solve hard problems in this pipeline.
Aside from that demo, the details aren't out yet of how OpenAI are approaching this as "We hope to bring this modality to a set of trusted testers in the coming weeks":
I guess we will have to wait to see what this looks like, but if they are doing this right with full duplex audio streaming into the model then this will be a transformative step change in building natural conversations on the telephone and in other audio only contexts.
Other platforms will follow suit. Google would be mad if they aren't developing better versions of this pipeline, Gemini is already multimedia and Google know a lot about streaming intent recognition done the old way.
So my advice: keep on building, focus on the applications, and keep the plumbing platform agnostic as it is about to get exponentially easier to build authentic conversations.