Voice AI is part of multi-modal design
Oct 27, 2020
8 MIN READ

Why Voice is an Important Part of Our Multi-Modal Future

In a recent conversation with a voice expert, I found myself saying, “Voice-first doesn’t mean voice-only,” and was surprised to find it resonated with her. It’s becoming increasingly clear that the future is multi-modal, and voice alone will not be the answer. Some brands are struggling with the concept of voice-first for all user experiences because they seem to think that signals a commitment to voice-only.  But that’s simply not true. The strength of a voice-first strategy lies in offering users more options so they can choose the mode of interaction that delivers the best experience in that time, location, and context.

Most companies that have mobile-optimized websites, mobile apps, or customer care centers that deliver great service are wondering why they should prioritize voice user interfaces (VUI). The reality is that they should be enhancing all the channels they have with voice, not replacing other modes of interaction. Voice experiences, like any other UX, have been created to enhance experiences by offering greater convenience, functionality, and hands-free accessibility.

Voice AI as part of multi-modal experiences

In many use cases, the voice interface can be used to supplement other modes of interaction including visual, gesture, and touch, or to create greater efficiencies in customer service and sales. Voice assistants can also help democratize product or app use by providing more accessibility to older adults, people with visual or tactile limitations, and children.

Considering that there are over 110 million virtual assistant users in the United States, 2 in 5 adults use voice search once daily, and 64% of consumers use voice commands while driving, it’s hard to imagine a brand that’s not thinking about a voice solution.

Instead of talking about the benefits of voice alone, brands should be talking about how voice can be a component to extend their product functionality and enhance customer experiences when used in combination with other modes of interaction, including:

  • Voice and touchscreen
  • Voice and glance/gesture
  • Voice and proximity/location
  • Voice and icons/sounds

The strength of a voice-first strategy lies in offering users more options so they can choose the mode of interaction that delivers the best experience in that time, location, and context.

Voice AI and touchscreens

In some cases, a wake word or wake phrase is used to start a conversation with a voice assistant. While these types of interfaces have their benefits for brand affiliation and recognition, there may be applications where customers would prefer to tap a button to begin the voice interaction. 

For instance, in the healthcare field, medical professionals may want to ensure that a wake phrase will not activate sensitive machinery, but still want the convenience and safety of directing its operation through voice. Other applications would include any context where the person needs to be within arms length of the product or device and touch is the most convenient and reliable way to begin a voice-enabled interaction.

Touch and voice could be used in combination for shopping apps where customers can sort through filters quickly with their voice and then tap to choose the item they want displayed on the screen. Sometimes a picture is worth more than a thousand words and it’s more efficient and faster to show a result than to describe it. In those cases, the results of a voice query can be displayed on a screen without forcing the user to perform a series of  touch, type, and tap motions to get to the result.

Voice assistants can help democratize product or app use by providing access to older adults, people with visual or mobile limitations, persons with different abilities, and children.

Other use cases may include seeing a map displayed on a screen during navigation either in the car or on a mobile device. Adding voice to a gaming app or video enhances a screen experience by adding functionality and gives users the ability to talk to the characters directly. When a screen is available, it makes sense to enhance the user experience with a combination of sights and sounds.

Voice assistants and glance detection

Glance detection and voice are natural partners in hands-free applications. In the car, a driver can ask, “What’s that building?” just by looking in the direction of the object while concentrating on driving. Finding landmarks and identifying locations will become part of the navigation experience and allow drivers to find their way, even when road signs are not visible due to weather or darkness.

Voice and glance work well in the mobile app world as well. Voice-enabled apps will provide much of the same functionality as those in the car when the user is on foot or on public transportation. Sightseeing in a new city or even finding your way in a familiar place can be enhanced by a combination of voice and glance.

Voice experiences with proximity detection

In the smart home, glance detection and proximity detection can help individual devices wake up to user commands and free individual household appliances to provide the ease and convenience of a voice user interface without the dependence on a central hub or third-party smart speaker.

Voice and proximity detection are particularly helpful in the smart home where users often interact with appliances and devices while their hands are occupied with either working or carrying objects. Instructing the washing machine to begin while loading the laundry is more convenient when the device detects the presence of the user and can wake up to accept commands without the push of a button.

Voice and proximity detection are particularly helpful in the smart home where users often interact with appliances while their hands are occupied.

Other applications in the smart home may include interacting with kitchen appliances while cooking or adjusting lighting as users walk from one room to the next.

Voice experiences enhanced with icons and earcons

Sometimes a wake phrase is the best choice to start conversations with a voice-enabled device. Other times, a button with a microphone icon can be used to show users that the voice interaction is starting. In applications like mobile apps or in-car experiences, brands may decide to provide both modes of waking up the digital assistant.

Even if a wake phrase is used, indicating that the voice assistant is listening either through an icon like a moving wave length or the prompt to “speak now” or “I’m listening”  or an earcon, such as a beep, will provide a clear indication to the user when to start speaking.

When a screen is part of the experience, a listening screen gives users confidence that the voice assistant is understanding the query accurately. In these cases, the screen can simply display a waveform or a transcription of the user query. 

Once the query is complete, the voice assistant can deliver results in two ways, enhancing the voice response with a more detailed response displayed on the screen—such as a list of restaurants or a map with directions.

When a screen is part of the experience, a listening screen gives users confidence that the voice assistant is understanding the query accurately.

When considering a voice assistant for your product, service, or app consider how implementing a natural language voice experience can enhance an experience where other modes of using the product, service, or app are also present. Even in applications where no screen is present, providing earcons or other indicators that the voice assistant is active reduces confusion and questions about whether the voice assistant is actively listening, or not.

Voice-first isn’t voice only. It’s just an easier, more convenient, natural, and sometimes safer way to interact with a product, service, or app.

Although voice does not need to be the only method of interaction (nor should it be), voice assistants will soon become a primary user interface in a world where people will never casually touch shared surfaces again. The voice era is here and brands without a voice strategy risk losing brand affinity to those that do.

The Houndify Voice AI platform is helping brands in a variety of industries build custom voice experiences and conversational voice assistants across industries and geographical boundaries.

If you’re interested in exploring the Houndify voice AI platform further, register for a free account or contact us to find out how we can help bring your voice AI strategy to life.

Karen Scates is a storyteller with a passion for helping others through content. Argentine tango, good books and great wine round out Karen’s interests.

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.

Subscription Form Horizontal