Looking back a few years, I realize I was wrong about Voice UX (VUX) being the interface of the future.
It was 2017. I was working at Amazon and closely collaborating with the Alexa team. At the time, Alexa was the voice interface, leading a wave of hype around voice as the next big thing. That was also when Google and Amazon were locking horns over YouTube’s availability on Fire TV. Things were noisy—both technically and politically—but exciting.
Behind the scenes, we were already using what we now broadly call AI: advanced machine learning models, predictive behavior analysis, intent recognition. It felt futuristic. I was invited to speak at a tech talk at StartIt Center, and instead of going deep on backend tooling, I decided to talk about user experience—more specifically VUX (Voice User Experience) and VUI (Voice User Interfaces), and how they might fit alongside GUIs and CLIs.
In that talk, I argued that VUX was the next natural step in human-computer interaction. First came CLIs—command-line interfaces that required precision and memorization. Then came GUIs, introducing visuals and affordances. And voice? It seemed like the most human, most intuitive evolution of them all.
But I misunderstood a few things.
Most traditional interfaces are one-to-one:
But with voice, you get a many-to-one structure. You can say:
“Turn on the lights”
“Can you switch the lights on?”
“It’s too dark in here”
All map to the same intent. Back then, I believed the only barrier was technical: NLP just needed to improve. Once the models got good enough, voice would naturally take over.
But then reality hit: voice has privacy baked-in issues.
Think about it:
Interacting with a computer is often a personal act—whether it’s work, research, or expression. Speaking aloud isn’t just awkward—it’s invasive - You don’t talk to yourself out loud in public — at least I hope not..
Sure, COVID pushed us into isolation, and VUI adoption increased slightly in the home. But in most digital contexts, users prefer privacy by default. We don’t shout commands at our laptops. We whisper them with a keyboard or tap them silently.
Still, voice works well when the outcome is naturally visible to everyone around you. Turning on the living room lights. Changing the car’s temperature. Adjusting TV volume. Those are public actions with public results—VUI fits.
So what happened between 2017 and now?
We didn’t jump straight to full voice. Instead, we got something in between: chat—thanks largely to ChatGPT.
Technically it’s a GUI. You type, one pixel at a time. But conceptually, it behaves more like voice:
That’s the “aha” moment:
We want interfaces that let us express ourselves freely (many-to-one),
but we also need those interactions to be private.
I won’t try to predict the one interface to rule them all. We’ll likely use all of them: CLI, GUI, VUI, chat, gesture, and whatever comes next. Each fits a specific need, moment, or context.
That’s exactly why we built Moveo One—to help you understand how users interact with your product, whether it’s through voice, chat, visuals, or something else entirely. Try it out.
If you’re building products and wondering how people actually use them, reach out. We’d love to help you solve those questions—with data, behavior insights, and cognitive analytics that go beyond the surface.
#VoiceUX #VUX #VUI #UXDesign #HumanComputerInteraction #InterfaceDesign #DigitalPrivacy #ChatInterfaces #ProductThinking #MoveoOne