Macadamian Blog

How to Implement Voice Design in Multi-Channel Survey Delivery

Ed Sarfeld

In a recent client project, our team developed an Alexa skill to gather user survey information over the period of a clinical trial. We walk you through the elements of voice design we needed to consider in order to create the best possible conversational survey experience.

Voice design in surveys

Voice-enabled technology is making big strides. A recent report from Juniper Research forecasts that the total number of voice assistant devices, smartphones and smart speakers combined, will reach 870 million in the U.S. by 2022 – a 95 % jump from the total estimated for 2017.

The current consumer adoption of voice-enabled technology poses a great opportunity for the healthcare industry, in both the clinical environment and with regards to at-home care. In a recent survey of pediatricians conducted by Boston Children’s Hospital, 62% of respondents said they’ve used voice assistant technology, with a third owning their own smart speaker, and many revealed they were open to using the technology in their clinics. Boston Children’s Hospital has already launched multiple pilot programs deploying voice assistants in the clinic to gain immediate access to information in time-sensitive situations, to streamline processes and minimize human error, and to support at-home care by providing customized guidance and education to patients.

Recognizing the potential of voice in healthcare, a client recently approached our team to develop an Amazon Alexa skill to gather user survey information where the user could choose to complete the survey through the skill or through a mobile application. The focus was to offer the user a natural language option to increase user engagement and retention over the period of a clinical trial. Crafting a pleasant conversational user experience that engages patients throughout a trial requires more than just rephrasing text. In this blog post, I’ll walk you through the critical elements of voice design we used to create the best possible conversational survey experience for our client.

Defining a Path

As with any graphical user interface (GUI) design project, the need to understand the data set is critical. After reviewing the questions and structure of our client’s survey, we understood the data types that needed to be captured as well as potential voice flows through the content. The success of the skill would be measured by its brevity, so it was important to identify where we could create decision points that would allow us to present content only when required. We wanted to create a path for users to navigate with the fewest number of steps and by doing so, reduce their time commitment and increase their satisfaction. In one instance, we were able to reduce the flow by 17 steps with a single decision point question.

“In one instance, we were able to reduce the flow by 17 steps with a single decision point question.”

You Might Also Like: How to Design Voice Applications in a Multi-Channel and Multi-Modal World.

Grouping Responses

Most surveys are drafted to capture specific information through a defined set of responses. On paper or a screen-based application, the responses can vary from question to question without much of a cognitive overload to the end user. After reading a question, the user can scan and select the appropriate answer, but when presented with a question by voice, similar answers should be grouped in a natural order. When questions with similar response sets are grouped, the user can focus on the questions and more easily answer the prompts.

“When questions with similar response sets are grouped, the user can focus on the questions and more easily answer the prompts.”

We also provided the ability for the user to combine answers to speed up the completion of the survey. For example, rather than saying ‘yes’ and then being prompted for another response, the user could answer ‘yes, sometimes’ allowing them to complete both questions with a single statement.

Voice design in surveys: Grouping responses

You Might Also Like: Applying User Experience Principles to Alexa Skills.

Shaping the Survey Question

Our survey questions were written by clinicians who need clear responses to explicit questions, so we attempted to reshape, or frame, the questions in natural language to be better suited for a voice interface. This required capturing the essence of the questions and phrasing them for an easy response. In some instances, this also required creating two-part questions to capture the intent of the written survey question.

Voice design in surveys: Shaping the survey question 1

We also ensured that the user could use parts of the question as an answer where best suited. For instance, if the user was queried “Did you experience moderate or severe pain?”, they could answer “yes” as well as “moderate pain” or “severe pain.”

On paper or a screen-based application, multiple-choice questions can be easily scanned, but by voice, each response needs to be presented individually. For questions with similar responses, clearly communicate how the user can respond to the group of upcoming questions in the survey. If the user is to respond with a ‘yes’ or ‘no’ to the batch of questions, you can state “Okay, now I am going to ask some questions about your day. Please answer yes or no.“ and then begin the list of questions.

“For questions with similar responses, clearly communicate how the user can respond to the group of upcoming questions in the survey.”

When a question is difficult to rephrase or introduces a complex answer, consider creating a branched or two-part question. This occurs when migrating a “Does Not Apply” answer that also includes information details, or when a ‘No’ response requires clarification about the choice.

Voice design in surveys: Shaping the survey question 2

If the survey does have a question with a long list of answers, limit those questions to ones where the user knows the responses from memory and can easily state them. For example, if the survey asks what medications a user takes, the user should be able to provide them through free-form input. After the user has given an answer, confirm what was heard to ensure accuracy.

For Likert-scale questions, we found that changing them to rating scale questions made them easier to complete for voice. For example, the mobile version of the survey has five choices for a response ranging from ‘Very Bad’ to ‘Very Good’. So, rather than requiring the user to state their rating when using the Alexa skill, we provided a five-star rating option instead. Including a scale that users are more familiar with makes providing an answer faster and easier for them.

“Including a scale that users are more familiar with makes providing an answer faster and easier for them.”

Finally, be sensitive to what is being asked when reframing the question for voice because reading a question is a different experience than being asked the question out loud. An example may be someone with a recently diagnosed chronic illness, who is still coming to terms with the illness. The emotional impact of hearing it spoken may be discomforting.

You Might Also Like: My Diabetes Coach – A Holistic Care Management Solution.

Phrasing for a Conversational Interaction

Typically, the language used in a survey presents the subject matter clearly for consumption on paper or a screen-based application. The challenge for a voice interface is translating subject matter domain content into natural language phrasing that allows for a conversational interaction.

“When writing the script text, focus on user comfort by using language that supports users with different knowledge and technology experience levels.”

When writing the script text, focus on user comfort by using language that supports users with different knowledge and technology experience levels. This can be achieved by using a variety of words and styles to say the same thing when similar questions are presented. Keep in mind to be only as informative as needed when creating the questions or system prompts.

Once you have drafted the script, read it aloud on your own or to someone else to understand how the skill will sound. Your focus is to be perspicuous with the text and natural with the phrasing without losing the meaning of the question. Ultimately, the key is to ensure that the written survey question is translated in such a way as to not lose its intent.

Survey Navigation

Our survey skill was designed to assess and prompt the user to complete outstanding surveys before offering to begin something new. The user could delay completing some survey types, but those that are deemed as mandatory require the user to complete the survey before moving on within the skill. The skill will retain an incomplete survey from the previous day, allowing the user a day’s grace before considering it as incomplete. By actively prompting the user to complete outstanding surveys, we are hoping to reduce non-compliance due to inattentiveness.

“By actively prompting the user to complete outstanding surveys, we are hoping to reduce non-compliance due to inattentiveness.”

The skill also includes navigation support such as ‘repeat’ to hear a question or prompt again. In this instance, all questions are mandatory, so if the user requests to ‘skip’ one, the skill states, “I’m sorry, but I need an answer for each question.” and then repeats the question.

When a user makes an error, like saying something accidentally that they wish to change, we provide a go ‘back one’ command. The question is replayed and the user can then change the answer. We limited the skill to a single ‘back one’ instance because of the challenge of creating a simple and reasonable path of voice interactions. If users try to go back a second time, they are told that they can’t, but they do have the opportunity to ‘start over’ the entire survey.

It is also important to provide the user with a sense of place within the survey. It is a challenge to implement a ‘you are on step X of Y’ in the same way that can be done in a screen-based application without being overly repetitious. Therefore, we included milestone phrases within the survey to identify where the user is and provide encouragement, such as “You have now finished more than half of your daily report.” Or “You are almost done, just a few questions left.”

You Might Also Like: Amazon’s Alexa Voice Service in Healthcare.

Error Handling

Interaction design will always have moments of user failure and handling this well in a voice interface is extremely critical. When recognition errors occur, the skill should re-prompt the user in a way that supports the ability to provide an answer. For example for the first error prompt, “I am sorry I didn’t understand you.” is sufficient. However, if the user has the same problem again, include guidance by restating what can be said such as “I didn’t catch that. Please say ‘yes’ or ‘no’.” or even by repeating the last question.

“When recognition errors occur, the skill should re-prompt the user in a way that supports the ability to provide an answer.”

When a user takes too long to answer, the skill will time out. These time outs can be prevented by ‘capturing’ the user silent period. If the user hesitates and there is 6 seconds of silence, pro-actively re-prompt the user by repeating the question or expected answer string.

To Wrap Up

Focussing on these elements will help identify how to best structure your voice-enabled survey, shape the questions properly, and provide an experience that makes your user successful. Have questions about voice design or need some support implementing voice into your own product or service? Let’s chat!

Insights delivered to your inbox

Subscribe to get the latest insights in IoT, digital health, and connected products.