About That OpenAI Audio Chat Demo

Rob Pickering

14 May 2024 — 1 min read

AI in audio conversations

What the OpenAI announcement means for telephony and other audio only conversational AI: carry on building, things are accelerating.

Yesterdays OpenAI announcement majored on audio & video conversations. There was an impressive speech demo which is way ahead of what any of us working in this area can build right now from individual STT/inferencing/TTS components, no matter how hard we tune and optimise to solve hard problems in this pipeline.

Aside from that demo, the details aren't out yet of how OpenAI are approaching this as "We hope to bring this modality to a set of trusted testers in the coming weeks":

I guess we will have to wait to see what this looks like, but if they are doing this right with full duplex audio streaming into the model then this will be a transformative step change in building natural conversations on the telephone and in other audio only contexts.

Other platforms will follow suit. Google would be mad if they aren't developing better versions of this pipeline, Gemini is already multimedia and Google know a lot about streaming intent recognition done the old way.

So my advice: keep on building, focus on the applications, and keep the plumbing platform agnostic as it is about to get exponentially easier to build authentic conversations.

AI Presentation Agents

I did something a bit stupid for my talk at FOSDEM this year. Instead of taking a slide deck and presenting that like any normal person would, I decided to dogfood some tech I have been working on. Ultimately it was a disaster. Because I focussed on the tech, I

AI and the value of knowledge work

This tweet was probably pretty controversial back in mid 2021 but, with suitable qualification, the number of informed people that would now argue with it must be pretty small. I've been thinking quite a lot lately about how a society like the UK will probably deal with this

Happy Christmas!

Make a Christmas Card using an LLM they said... Add a Christmas Tree in place of the radar dome on this boat ChatGPT said: Here's the image of a boat with a festive Christmas tree replacing the radar dome. Let me know if you'd like further

Why good audio conversational AI isn't shipping at scale yet

OpenAI showed a fantastic demo of real, natural sounding, latency free conversational AI in an update nearly two months ago, but it isn't shipping yet. It probably won't be for a while. Here is why... Talking to an AI isn't exactly new. From Alexa,

Read more

AI Presentation Agents

AI and the value of knowledge work

Happy Christmas!

Why good audio conversational AI isn't shipping at scale yet