Get Started with Voice-First Experience

For a couple years beginning in 2008, I would tell anyone who would listen about Google’s phone directory assistance service, GOOG-411. It was free and really convenient and I didn't think people should continue to pay $1.75 per call to standard 411.

Then, I read an article revealing its true purpose – to collect speech samples from vast variety of voices to tune Google's speech recognizers. If you ask me, it is a shining example of a brilliant and creative win-win solution – it solved a user need and provided the business with tremendous value. As Google later stated on its blog, when they discontinued the service:

GOOG-411 was the first speech recognition service from Google and helped provide a foundation for more ambitious services now available on smartphones, such as:

  • Voice Search – search Google by speaking instead of typing.
  • Voice Input – fill in any text field on Android by speaking instead of typing.
  • Voice Actions – control your Android phone with voice commands. For example, you can call any business quickly and easily just by saying its name.”

Speech recognition has come a very long way since then. Most dramatically, the rapid rise of smart speakers has been startling. They are now in tens of millions of homes, seemingly out of nowhere.

Have you done any planning around voice-first experiences for your organization?  If you haven’t really thought much about it, it’s hard to blame you – this has all happened so quickly. But, now may be the time to start thinking beyond the screen.

What opportunities might there be for your organization? Are there quick wins to get a foothold this market space? Can you identify competitive advantages?

In this article, I will review some background about voice-first technology and provide a few examples of how other organizations are putting it to use for their brands and customers.

Explosion of Speech Data

Only a decade ago, speech recognition capabilities were pretty basic. At that time, it was considered a breakthrough when GOOG-411 was able to make the leap from separate questions (i.e., “What city and state?” and “What business name or category?”) to a single, more natural question (i.e., “what city, what listing?”).

So, how did speech recognition come so far, so fast? The answer has a few wrinkles.

First, the gathering of speech samples never stopped. In fact, the volume of voice data continues to increase with every single search made by millions of people voice every day. As more people use voice, bigger and better models can be built.

Another factor is large server farm processing speeds, which had lagged until the introduction of high-performance computing with GPU (graphics processing unit) acceleration around 2012. This has allowed companies to build better, bigger models more affordably. Processing speeds are predicted to increase by 1.5 times annually for years to come – a mind-boggling amount of power.

The final factor is confidence. As speech recognition gets more and more accurate, more people will turn to it – especially if it’s faster than typing. And, in turn, the model can improve more rapidly. A cycle of constant improvement is underway with nothing likely to slow it down.

types of Voice-First Technologies

Mobile Voice Technology

Many people first became aware of voice-first experience when Apple acquired Siri, Inc. and added the Siri virtual assistant to its iPhones in 2011. Siri was the first to take natural language requests and respond with a real answer.

It makes sense that the first ubiquitous application of voice-first was on mobile phones. Typing into tiny keyboards is inconvenient and error prone. And, if you’re driving, it can be more dangerous than speaking.

Soon, Google (Hey Google), Samsung (Bixby), Microsoft (Cortana) and others introduced virtual assistants on their smartphones. Speech-to-text also became common within apps and keyboards on smartphones.

Smart Speakers

In 2015, Amazon officially launched their Echo with Alexa, along with its Alexa Skills Kit – which allows developers to build “Skills” for the Alexa in the cloud. The easy and inexpensive entry to create Skills has allowed the Alexa ecosystem to grow very fast.

The closest smart speaker competitor to Amazon is Google Home – which launched in late 2016. Amazon has a commanding lead in users and many advantages due to it’s head start. However, Google is creating partnerships to add other avenues, such as integration with Bose headphones.

Change in the smart speaker market could happen quickly, so we are keeping an eye on it closely.

Other Voice Technologies

Beyond smart home assistants, there is momentum with speech recognition and artificial intelligence in all kinds of applications. Mozilla is working on a voice-controlled web browser. Hearing aids are using AI to augment their experience, such as streaming a teacher's lecture directly into the hearing aid or even live language translation. Cars, stores, smoke detectors, refrigerators… you get the picture.

Examples of Voice-First Solutions

Here at Crux Collaborative, as part of our Labs series we conducted our own research on the general usability of Amazon's Echo. I encourage you to check out our insights from that study, which includes some examples of voice-first solutions and skills we tested. 

Taking a look at example solutions in this space can really help jumpstart your creativity and reveal a sort of template for this new way of thinking about user experience. Below are a few examples of organizations that used voice to solve user needs.

User Need: Greater Understanding

Cigna took note that 20% of people in a 2017 survey did not understand the word “premium" and a whopping 66% didn’t understand “formulary.” This is a problem Cigna is uniquely positioned to solve. So, they created an Alexa skill baseline-launch-24px.svg to answer 150 commonly asked health care questions.

User Need: Better Elderly Care

Libertana Home Health Care created an Alexa skill baseline-launch-24px.svg that gives elderly home care patients a voice-enabled way to get in touch with home health aides. They can be reminded about and easily track medications with voice commands. Adult caregivers can review status online, which allows them to focus on more on socializing during visits instead of conversations always revolving on medical care.

User Need: Diabetes Accountability Partner

Thinking creatively and non-traditionally can help you arrive at powerful voice-first experiences. Minneapolis-based Worrell ran a study of diabetes patients baseline-launch-24px.svg last year with Amazon’s Echo Dot. They learned that the voice assistant can provide accountability (e.g., “What is one small goal you’d like to set for yourself today?”) and virtual companionship (e.g., asking users to share their experiences or feelings). One study participant was quoted, “If the day was getting late and I hadn’t achieved my goals, I would have to get my exercise in or answer to Alexa.”

Taking the First Steps

Adoption of voice-first experience by users has a clear runway to becoming as common as televisions. Most Americans (77%) already interact with a voice-ready device every day -- their smartphone. And Gartner estimates 75% of households baseline-launch-24px.svg will own a smart speaker by 2020.

Development Challenges

With the largets user base and its extremely affordable Echo Dot product, Amazon figures to hold the lead in the smart speaker market, at least for now. So, at this point, it seems the obvious choice to get started.

The technical challenges in creating an Alexa skill are not exactly trivial, but Amazon has gone to great lengths to make it a quick learning curve and provide developers with all the tools they need. And the tools are free to use.

Much of the "development" work is actually centered around forming a strategy, formulating the user experience and then fastidiously mapping the language of the voice-driven structure.

Amazon calls these intents and you will likely need a team to successfully define them. Here's an example of an intent: a music-playing app cannot rely solely on a user saying the exact word “pause” to pause the music. What if the user says “stop” or “wait” or even “shut up”? All such words need to be mapped to the pausing of the music. Mapping intents is a big part of the challenge and is likely an ongoing effort. User research would surely help uncover problems with your intents and is always strongly encouraged.

Not sure how to get started? Would you like a partner as you work through a voice-first user experience? We have the core skills, industry knowledge, and collaborative spirit to transform the process – and we’ve designed experiences for a wide range of applications. Get in touch with us to start a conversation


Related Insights