Four Enhancements We Want for the Amazon Echo

Macadamian Technologies | May 10, 2016 | 10 Min Read

Our annual MacHack happened a few weekends ago and as always the teams at Macadamian put in 24 straight hours of hard work to bring some amazing ideas to life. This year many of the teams focused their efforts on creating IoT solutions, many of them using the Alexa voice service of the Amazon Echo, highlighting how interactions with Alexa could be made more natural.

Our annual MacHack happened a few weekends ago and as always the teams at Macadamian put in 24 straight hours of hard work to bring some amazing ideas to life. This year many of the teams focused their efforts on creating IoT solutions, many of them using the Alexa voice service of the Amazon Echo, highlighting how interactions with Alexa could be made more natural. Considering that these teams had only 24 hours and they embraced the true spirit of a hackathon, many of these hacks use the Alexa Skill Kit in ways that wouldn’t be compliant for Skill store certification. However, after watching the submission videos from these four teams, it became clear that the hacks were necessary to create more natural voice interactions, and better user experiences. From the solutions created during this MacHack, we believe the following four improvements to Alexa would expand the use cases for skills and create a better overall user experience.

1 – Extend the Smart Home Skill API

The Smart Home Skill API allows developers to create skills to control cloud-connected devices. This gives Echo users the ability to control many of the smart devices in their homes via voice interactions. It works with many popular products like the Nest thermostat, the Philips Hue and LIFX light bulbs, and can control many non-connected devices through the use of a smart plug like the WeMo or TP-LINK.

One team thought this would be perfect for their smart child lock and companion Alexa skill, called Smart Drawer. One of the advantages of the Smart Home Skill API is that it allows developers to bypass the need for a Skill-specific invocation phrase, like “ask smart Drawer to.” Rather, the API allows developers to interact with Alexa much more naturally, asking something like “Alexa, turn on the kitchen light.” Alexa recognizes the action “turn on” and the device ID “Kitchen Light” and sends this data through to the device cloud, turning on the light. For our skill, once Alexa interprets the instructions, the Echo would send the directive to the smart drawer to unlock all of the child locks. This skill could have many other uses outside of toddler control, like keeping jewelry drawers or other sensitive materials secure.

The team was able to quickly create a production-ready solution using Macadamian’s base IoT project framework, which also provided them with some handy functionality out of the gate such as Account Linking support. This support saved time by simplifying the configuration flow. The device IDs were handled on our end. This avoided having to give Alexa a unique ID for each drawer as we did when Macadamian created the Fantasy Scoreboard skill.

However, the team ran into some trouble trying to use the Smart Home Skill API. First, the API currently doesn’t allow many types of controls. It is limited to basic On/Off commands and temperature change requests. This led to a problem with the natural lock/unlock command that would have been used for the Smart Drawer.

Alexa, unlock all of the locks.

Second, the skill adapter is limited to AWS Lambda. This created some friction so the team decided to stick to the API and deployment environment that they knew and already had a code base for. Because of these two restrictions, the skill was implemented using a Custom Interaction Model with the resulting interaction being:

Alexa, ask smart drawer to unlock all the locks.

We would love to see the API extended in the future to allow for different interactions. As more connected products come to market that requires additional directives, we’re sure Amazon will extend the functionality of the API to allow users to control these devices as well, and we hope that Amazon will one day allow implementation of these skills using custom web services.

2 – Keep a Skill open

In our opinion, one of the limitations of Alexa and the Echo is that at the end of each conversation a user needs to use the invocation name again to send additional commands to any 3rd party skill. One of the teams in the hackathon really tried to push the limits of Alexa by pairing it with the Oculus rift to help play a VR game. This hack would violate the API terms of use, making it not viable for publishing, but could be viable if Amazon allowed for a skill to be kept open for future interactions.

The team managed to control the game by creating 5 individual skills, each with a command-specific invocation name. This is highlighted from the first interaction;

Echo, show me the map

For this interaction, they created a skill with the invocation phrase of “show me the map”. It was quite surprising that this worked without a prefix like; “ask”, “tell”, “open”, “launch” etc. We assumed that the skill would not work without the prefix part of the supported conversation phrases, but it did…most of the time.

Using that hack, the conversation goes on with a natural flow from the first answer from Alexa:

Alexa: “Showing map”
Shane: “Echo, analyze the threat”
Alexa: “This is a…. there must be a power supply somewhere”
Shane: “Echo, hide the map”
Shane: “Echo, analyze the panel”
Shane: “Echo, show me the map”

As this concept shows, it would be useful in some cases if the user could start a 3rd party Skill and then let Alexa maintain that state as the default entry point even after the conversation is closed. In that context, it would send new “Alexa/Echo” triggered command to that Skill without having to specify the invocation name again.

Our third concept could also benefit from keeping the skill open. This team developed a skill to have Alexa create a shopping list using Clear and then prompt the user with a notification when the item is near. As you can see in the video below, they ask Alexa for advice or try adding items to the list one by one.

If this skill were to be published to the Skill store, one could imagine using Alexa to create your shopping list before you go to the grocery store. Looking through your fridge and pantry, you could ask Alexa to add items one by one as you realize what’s missing from your list.

Imagine looking through your fridge and realizing you’re running low on apples…

Alexa, ask ShoppingBuddy to add Apples to the list.

Then, as you look through your cupboards, you realize you need more snacks…

Alexa, ask ShoppingBuddy to add chips and pretzels to the list.

This method of interaction can become tiresome and cause unnecessary friction for the user as they work their way through the kitchen looking for additional items. Allowing Alexa to keep the skill open, users could add items one by one without having to use the invocation term each time.

Alexa, ask ShoppingBuddy to add Apples to the list.
Alexa, add chips and pretzels to the list.
Alexa, add dish soap to the list.

While this demo, with its companion app, is the one that comes closest to something we could publish on the Skill store, it does highlight how interactions with Alexa could be more natural.

To make the concept more interesting, the team hardcoded some responses to give the impression that Alexa is more context-aware than she actually is. Rest assured, Alexa is not spying on you. She doesn’t know how much money you have in your bank account, nor how much you’ve been drinking.

3 – Skill Specific Voice

The VR game concept ends with:

Shane: “Echo, use voice modulator”
Alexa: “For who?”
Shane: “For Mike Smith”
Alexa: “Recording…”
Shane: “Power shutdown, override code: zero cool”
Alexa: “Processing…” “Power shutdown, override code: zero cool”

The Echo is saying the second part with the voice of Mike Smith instead of Alexa. To pull that off the team used an mp3 generated by YAKiToMe!, followed by a pause, then said “turret is disabled” with the Alexa voice. For the purpose of the hackathon, the team used a hardcoded mp3 file, as some of the Skills on the public store are doing now. The demo shows that conceptually, an mp3 could be generated in real-time with a custom voice, then sent to Alexa, but responsiveness would suffer. For example, a 20-second delay is expected with the free API of YAKiToMe, making this option a non-starter.

Currently, Skills are limited to the Alexa voice or using SSML to play mp3, and pronunciation is controlled using phonemes. Amazon’s view is that consistency in the voice of Alexa makes it less confusing to users. While that is a valid point for the general case, it would be nice if Skills could use a customized voice to provide more personality. Twilio is a good example of how this could be implemented. Twilio responses uses a TwiML format, which is similar to SSML. They provide additional functionality in that the developer can select between three voices and several languages. This approach would still leave Amazon with plenty of control over what is possible while giving more flexibility to Skill designers.

4 – Voice Push Notifications

Currently, in order to interact with Alexa, the user needs to start the conversations. The winning team at this year’s MacHack created a hack that allows Alexa to start the conversation in their smart hospital registration system.

The motion detection functionality was something quite impressive. The interaction starts with Alexa welcoming the user when he approaches the desk.

Hello. For starting an assistant, say launch smart assist.

Then the user starts the conversation with Alexa using the typical sentence structure:

Alexa, launch smart assist.

allowing the user to enter into an ask and response flow, while the skill is keeping the conversation open.

The thing that puzzled me about this initially was that Alexa doesn’t support push notifications, so when I first saw the demo I thought they possibly said something like

Alexa, ask smart assist for help

then clipped the videos to remove that part. But reviewing the video closely, I noticed that the blue ring of the Echo was not on, so while it is the Alexa voice we hear in the video, it couldn’t be Alexa that was actually talking.

To achieve this effect, the team played an MP3 over the Bluetooth speaker of the Echo. How did it work? The motion detection sensor was connected to an Android device, which was paired with Alexa via Bluetooth. When the motion sensor sends a signal to the Android device, the Android phone plays a pre-recorded MP3 of Alexa over the Echo.

This is quite a clever solution, although it would be hard to deploy as part of a public Skill. It would also be nice not to require the user to say “Alexa, launch smart assist”.

I highlighted in a previous post that some things need to be considered before Amazon implements voice push notifications. However, the smart assistant demo shows a smart kiosk usage scenario where it could be useful to have Alexa start the conversation. Perhaps a restricted API needing more certification oversight and permission from the user would allow Amazon to productize usage scenarios like this.

This year’s MacHack produced some amazing concepts in a short 24 hour period. In the process, the teams identified new use cases for Alexa, pushing the limits of the device. After seeing what the Macadamian teams were able to come up with, we believe these 4 enhancements to Alexa could help spawn entirely new use cases and create more natural interactions with the Echo.

Get Email Updates

Get updates and be the first to know when we publish new blog posts, whitepapers, guides, webinars and more!

Suggested Stories

Applications of Voice Assistants in Healthcare

Discover how organizations across the continuum of care can leverage the growing consumer demand for voice-enabled devices to achieve an extensive list of objectives from increased patient engagement to improved outcomes and lowered care costs.

Read More

Voice UI Design Best Practices

Voice assistants are poised to transform customer engagement as well as business models. Discover why voice is the next digital frontier – and what you should know about voice-first solutions.

Read More

Building your own IoT Product: Bad radio! (Wi-Fi, Wireless / LTE, BLE, NFC) – Oh, my!

Picking the right radio solution can make or break your product (or seriously cripple customer adoption). You have to build or select components that make sense for your context of use and build solutions around the trade-offs of each radio option.

Read More
Macadamian has been acquired by Emids 🎉
This is default text for notification bar