Years ago, before hands-free cell phones became common, the sight of someone walking down the street alone and talking seemed strange. Now, it’s so commonplace that people barely notice. Today, Natural language user interfaces (NLUI) are going through the same transition, but much more quickly, as voice interfaces become a major part of our daily lives.
For several years we have had Ok Google and Siri, but only as support interaction services on phones. Now, devices like Amazon Echo have brought voice interaction to our homes and are making natural language a realistic primary user interaction for all kinds of Internet of Things (IoT) devices.
It seems obvious that interacting with a product, app, or skill via speech is very different from interacting with a screen, yet many early offerings use the same design metaphor as traditional graphical user interface (GUI) products. As a result, there is a series of events that happen when interacting via voice. The user first thinks: “Is it possible to get the thing I’m thinking of via voice”. If the answer is yes or maybe the user then thinks. “How do I say this request to this machine?” These two events create a significant amount of delay in voice interaction. These hurdles can be overcome with these four NLUI design ideas:
- Keep it short and simple. Avoid having users construct complex phrases before interacting with your skill.
- Don’t leave users in the dark. Give users clear feedback of whether they were successful or not and provide options along the way.
- Use simple naming. Pick a skill name that fits naturally into the commands spoken by users.
- Allow conversation. Keep the context and the line of communication open.
These may seem like little things, but they can have a major impact on your solution’s usability. Let’s look at them more closely.
When’s my bus?
With the Amazon Echo, users interact with the device using voice commands, so what used to be just a mouse click or finger tap is replaced by a spoken phrase, which can be frustrating if phrases are long and need to be repeated exactly.
While conversation between people allows phrases to be formed on the fly, Alexa phrases mostly have to be formatted before speaking. For example, to find bus times at a nearby stop you might say, “Alexa, ask [skill name] when is the next route 14 at 6403?” If you’re in a hurry to leave the house this might be too long and difficult to remember.
1. Keep it short and simple
Let’s take the example apart and see how it works. The key parts of the query are the bus number (14) and the stop number (6403). Alexa can ease the cognitive load on the user by breaking up the query. For example:
User: “Alexa, ask when does the next 14 come?”
Alexa: “What stop is that for?”
Better yet, if location services could be improved or users could program their favorite stop, they won’t have to be prompted for the stop number. The shorter the phrase and the less the user needs to plan out their command the better the chances are of a successful interaction on the first try. Such strategies are important because, if NLUI fails two or three times, they’ll conclude that they should have used a conventional app on their phone, and the skill will have failed. This principle also applies to the way NLUI provides feedback.
2. Don’t leave users in the dark
When designing for voice interaction it is important to understand the user’s environment, which often doesn’t provide a graphical user interface. The absence of a screen, the lack of wayfinding insight and the desire for fast ‘on-the-go’ or ‘hands busy’ workflows require serious consideration when designing NLUI. For example, designers take for granted having a GUI to visually communicate interactions and feedback, and without a screen, users can be left in the dark as to the state of their task, and information becomes difficult to digest. Was I successful? Was I understood? Am I done? Your skill wasn’t really successful if it worked but the user doesn’t know it.
Receiving too much feedback is as troubling as unclear or not enough feedback, so it should be kept short and to the point. Don’t expect your users to be able to remember more than succinct units of information.
Going back to the bus times skill, if the response to the question ‘When is the next 14 bus?’ is “The 14 bus comes in 2, 13, 21 minutes.” the user’s been given more information than they need. A more digestible, to the point response would be something like:
Alexa: “The next 14 bus comes in 2 minutes, would you like more stop times?”
3. Pick the right skill name
Providing a skill name that is both memorable and brief has a direct correlation with how naturally the user can create and speak commands. Without icons to help direct them to their desired app, it becomes difficult to remember which skills they have installed and what each one does. Infrequent use makes remembering which skills are ‘enabled’ even more difficult. Skill names that fit naturally into a command structure are easier to remember. Consider the difference between “Alexa, ask ‘My Lawn’ how much water it needs” and “Alexa ask ‘Lawn Care Plus’ how much water my grass needs.” Choosing common nouns that refer to the thing the user wants to interact with, rather than a product name, lessens the cognitive load on the user.
4. Allow for conversation
Current conversational products are limited to segmented commands that require users to re-engage with Alexa for security reasons. This is the equivalent of closing and opening an app on your phone every time you want to perform a new request or action. For example, here is a typical interaction with Alexa that involves multiple queries: “Alexa how much time is left on the timer?” “Alexa cancel the timer,” and “Alexa set a timer for 30 minutes.” The constant repeating of Alexa quickly becomes a burden, as if you’re communicating with an uncooperative child. But if context is maintained throughout the conversation, the interaction can be more like:
User: “Alexa how much time is left on the timer?”
Alexa: “15 minutes”
User: “Cancel the timer.” “Set a new timer for 30 minutes.”
The same principle applies to allowing for a more human-like conversation that permits pauses, breaks and all the subtle imperfections that take place when forming a sentence on the fly.
Ultimately, both technology companies and users have work and training to do to help bridge the NLUI gap. Technology companies will continue to improve natural language processing, query response time and help users interact more seamlessly with their products.
Conversely, the art of conversation has faded in the last decade as texting and SMS messaging became the preferred forms of communication for virtually everyone with a smartphone. So as users, we also need to embrace a cultural shift and learn how to converse again. These four design principles: keeping it short and simple, providing interaction feedback, picking an appropriate skill name, and allowing conversation, can go a long way toward bridging the gap.