Macadamian Blog

Three Ways to Make Alexa Interactions More Natural

Martin Larochelle

As the language model of Alexa expands, the interactions will be more and more natural. With these and other capabilities, richer experiences will be possible, which will bring us closer to seamless ambient computing.

Making interactions with Alexa more natural

Since I brought an Amazon Echo home, other than using it to playing music, the timer and alarms have been the most common uses at my place.

Based on some recent experiences with Alexa, the voice service that powers the Echo, here are 3 ideas that could help make interactions feel more natural.

  1. Postponing alarms
  2. Voice formating
  3. Pre-timer announcements

Recovering From a Bad Request

‘Alexa, postpone the alarm by 30 minutes’

One morning my interaction with the alarms didn’t go very smoothly.

Alexa, wake up Annie at 7am.

Annie looked at me funny.

‘Right, you are not working today, are you?’

Alexa, postpone alarm by 30 minutes

Alexa didn’t reply, and just played the termination tone. I had to think of another voice command to recover.

Alexa, cancel Alarm
Alexa, set alarm for 7:30pm

Annie looked at me funny again.

‘I said pm didn’t I?’

Now I had to recover from another user error:

Alexa, cancel alarm
Alexa, wake me up at 7:30am

The first time I tried postponing an alarm, Alexa had set another alarm instead. That was even harder to recover from. It shows that while natural language interfaces can be a good time saver when one request goes bad, recovery can be a pain, and the user experience collapses. Allowing users to postpone an alarm would make the timer and alarm management easier.

Voice formatting: When Less Exact is More Natural

As I’ve mentioned before, voice specific considerations are needed when controlling how some things are said by Alexa. For example, by controlling pauses when saying a phone number, or pronunciation of special words like brand names.

Another case is saying numbers. For example, asking Alexa for 250/3, gets this answer: 250 divided by 3 is 83.3333333333.

Spoken as:

Two hundred and fifty divided by three is eighty-three dot three three three three three three three three three three

I doubt that anyone would say a number like that. Perhaps something like this would be better;

Eighty-three dot three repeating

Alternatively, saying only one or two decimals would have been plenty.

As another example, when asking Alexa for the time, you also get these overly precise answers:

The time is eleven eleven am.

Perhaps “ten past eleven” would have done just fine. I usually know whether it is am or pm, so that could be left out. While you do expect the exact time from the display of a clock, when interacting with a voice assistant, I think a more human way of expressing the answer is better than precision.

Pre-Timer Announcements

‘Alexa, it’s bedtime in 15 minutes’

As a parent, I quickly found that getting a kid to do something is way easier with advance notice and reminders. Asking a kid to do something immediately, always turns out to take longer and require more effort, than giving my daughter a 15-minute notice, with a couple reminders along the way.

One way the Alexa timer could be better is to perform those reminders. (Obviously to enable that, voice reminders/notifications would be needed, which is another enhancement I’ve mentioned before.) The interaction could be triggered by sentences such as; “it’s bed time in 15 minutes”. The same could apply to taking a shower, going to school, or doing homework. All with slightly different wordings.

Then Alexa could select two times to remind users along the way, perhaps 5 and 2 minutes before. Again, precision is not important in this case. Saying “it’s bed time in 5 minutes” while there is actually 7 minutes left does not really matter to the kid, nor does it affect the ease of getting her to bed. It’s just a matter of showing progress towards the deadline. With clear expectations, there is always less resistance.

To further complicate interaction scenarios that are not currently supported, Alexa is inconsistent when it faces a question it can’t answer. Currently, it either;

  • Does not answer at all
  • Answers with “I was not able to answer the question I heard”
  • Does something unexpected such as setting a 2nd timer, instead of postponing the first one.

As to be expected, the volume of these interactions is so large it will take a while for Amazon to cover them all. It would be nice if the Alexa APIs allow for 3rd parties to handle such request that Alexa does not know what to do with.

As the language model of Alexa expands, the interactions will be more and more natural. Along with that, keeping in mind that less precision can, at times, be more useful will also help. With these and other capabilities, richer experiences will be possible, which will bring us closer to seamless ambient computing.


Author Overview

Martin Larochelle

Martin Larochelle has been with Macadamian since 2005. In his ten years with the company, he has tackled projects both big and small as Chief Architect. An expert in C++ and VOIP, his focus has been on mobile platforms. Martin was instrumental for all things BlackBerry providing technical leadership and project oversight. Martin now leads the Macadamian Innovation Lab, a team focused on developing concepts to solve the needs of small and medium businesses and key verticals such as healthcare. While we're all a little nuts at Macadamian, Martin counts himself as the biggest HeadBlade fan in Canada.