Understanding the fundamentals of voice and chatbots
I attended a meetup about the future of Voice UI and chatbots last week. This was part of my effort to try and combat my innate bias against voice-based interfaces. It was an enlightening experience, especially since most of my experiences with Voice had been based around specific domains.
Most of my exposure to voice technology had either come from accessibility purposes (from my blind friend Ali) or mobile technology (from my sister). I thought this would be a good idea to see what the rest of the field might be doing.
What I learned is that the field may be different than I had initially thought.
Current challenges for emerging tech
One thing that surprised me about Voice UI technology was that there are several rough edges with how current interactions take place.
First and foremost is the way that it is structured. Voice UI right now is akin to a one-way street: it is often very difficult to go back to previous options, or kind of retrace where you are. Current chatbots can be ‘dead-ended’ or broken if a voice command is given that doesn’t align with their choices.
This is because, by itself, all voice interactions take place in your head. As a result, voice UI right now is often flat or small-scale: too many options means greater difficulty navigating.
I talked with the speaker about blind web navigation (such as with screen readers) as a possible current model to address this problem, but got a very interesting answer in return: blind navigation, as we know it, is still based on accessibility standards for web design.
As a result, it uses many UI patterns and functions that we know about (such as scrolling, a home button, etc.).That is okay if there are certain methods of control (such as the Tab button for navigating screen-readers), but it can be unsuitable for thinking about voice.
For example, a ‘home page’ for Voice-based interfaces could be more like the main menu (with several different options) rather than a traditional web homepage. As a result, simply taking the mental model from blind users will not address all of the problems with navigation.
The other thing that I learned about was human handoff. There are several different use cases for Voice UI that are currently in production today: one of the things that the speaker was familiar with was with chatbots and automated customer service lines.
The main issue with that is that users often have to repeat information given to chatbots when they are handed off to a human agent.
One of the next goals that might save time and money with these systems could be pre-filling information based on the chatbot’s information gathering so that the agent has all the necessary information when it is handed off to them.
The impact of NLU and NLP
I also learned about how Natural Language Understanding (NLU) and Natural Language Processing (NLP) are at the forefront of designing for voice.
They differ slightly in their definitions. NLP, as the speaker explained, was the act of translating what a person is saying into a format that is understood by machines. NLU is a subset of NLP which is specifically about the comprehension of a body of the text.
So, for example, NLP is concerned about the whole action of breaking down what a voice said into text such as “Change flight” or “Cancel order” and acting upon that command.
NLU would the sub-step, which is comprehending the text “Change Flight” and executing an action based off of this.
Both of these actions are crucial in translating voice commands correctly as well as taking actions. Currently, one of the most common uses of this is through activation words (such as “Okay Google” or “Alexa”). Getting greater accuracy as well as differentiating between conversations and voice commands will be at the forefront of this field.
How to design for voice
The second part of the speaker’s talk was about how to currently design for voice.
There were many currently existing trends that I found interesting. For example, there are different use cases for engaging with a chatbot that seems human versus one that is clearly a bot.
In the case of healthcare, for example, people are more willing to talk with a bot about personal issues than they are with a human (or even a human-like bot). So if you are feeling bad at 3 AM because of chemotherapy, having a bot-like chat agent might be a good place to turn for many people.
Also, the type of voice can often matter based on some cultural dimensions. In the UK, for example, the voice of help services has typically been male, while in the US it is typically female. As a result, you should make sure that the voice matches the mental model that people expect.
The mental model of Voice UI
Lastly, the speaker talked about one of the most important keys for user engagement: the mental model. Based on current successes, he stated that while voice-based devices give no affordances as to what it should do, the way that devices such as Amazon Alexa and Echo have been made, most users have the mental model of voice UI acting as a personal assistant.
As a result, the future of voice UI might thrive in specialized domains where assistance might be helpful.
For example, freshmen on a university campus could receive a voice UI-based device sponsored by the university that would act as an assistant and tour guide, showing the students around and connecting them with whatever university resources they should need.
My journey in understanding
As I’m learning more and more about this technology, I’m beginning to understand some of the areas where it can be properly implemented. This meetup was very enlightening to get a broader overview of the subject as it currently stands, as well as a particular future.
I still want to learn more about some different areas discussed, though, so I’ll be searching for additional opportunities to learn more.
Like this story? This is part of a series that I am doing to try and combat my bias against voice UI. You can view the series here.