Alexa, Siri, and Google Don’t Understand a Word You Say

Amazon

Voice assistants like Alexa, Google Assistant, and Siri have come a long way in the last few years. But, for all their improvements, one thing holds them back: They don’t understand you. They rely too much on specific voice commands.

Speech Recognition is Just a Magic Trick

Voice assistants don’t understand you. Not really, anyway. When you speak to a Google Home or Amazon Echo, it essentially converts your words to a text string and then compares that to expected commands. If it finds an exact match, then it follows a set of instructions. If it doesn’t, it looks for an alternative of what to do based on what information it does have, and if that doesn’t work you get a failure message such as “I’m sorry, but I don’t know that.” It’s little more than sleight of hand magic to trick you into thinking it understands.

It can’t use contextual clues to make the best guess, or even use an understanding of similar topics to inform its decisions. It isn’t hard to trip up voice assistants either. While you can ask Alexa “Do you work for the NSA?” and get an answer, if you ask “Are you secretly part of the NSA?” you get an “I don’t know that one” response (at least at the time of this writing).

Humans, who genuinely understand speech, don’t work like this. Suppose you ask a human, “What is that klarvain in the sky? The one that is arched, and full of striped colors like red, orange, yellow and, blue.” Despite klarvain being a made-up word, the person you asked could likely figure out from the context that you’re describing a rainbow.

While you could argue that a human is converting speech to ideas, a human can then apply knowledge and understanding to conclude an answer. If you ask a human if they secretly work for the NSA, they’ll give you a yes or no answer, even if that answer is a lie. A human wouldn’t say “I don’t know that one” to a question like that. That humans can lie is something that comes with real understanding.

Voice Assistants Can’t Go Beyond Their Programming

Voice assistants are ultimately limited to programmed expected parameters, and wandering outside of them will break the process. That fact shows when third-party devices come in to play. Usually, the command to interact with those is very unwieldy, amounting to “tell device manufacturer to command optional argument.” An exact example would be: “Tell Whirlpool to pause the dryer.” For an even harder to remember example, the Geneva Alexa skill controls some GE ovens. A user of the skill needs to remember to “tell Geneva” not “tell GE” then the rest of the command. And while you can ask it to preheat the oven to 350 degrees, you can’t follow up with a request to increase the temperature by another 50 degrees. A human could follow these requests though.

Amazon and Google have worked very hard to overcome these obstacles, and it shows. Where once you had to follow the above sequence to control a smart lock, now you can say “lock the front door” instead. Alexa used to be confused by “tell me a dog joke,” but ask for one today, and it will work. They’ve added variations to the commands you use, but ultimately you still have to know the right command to say. You need to use the correct syntax, in the correct order.

And if you think that sounds a lot like a command line, you’re not wrong.