Why Voice is an Important Part of Our Multi-Modal Future

Oct 27, 2020

8 MIN READ

Why Voice is an Important Part of Our Multi-Modal Future

Karen Scates

In a recent conversation with a voice expert, I found myself saying, “Voice-first doesn’t mean voice-only,” and was surprised to find it resonated with her. It’s becoming increasingly clear that the future is multi-modal, and voice alone will not be the answer. Some brands are struggling with the concept of voice-first for all user experiences because they seem to think that signals a commitment to voice-only. But that’s simply not true. The strength of a voice-first strategy lies in offering users more options so they can choose the mode of interaction that delivers the best experience in that time, location, and context.

Most companies that have mobile-optimized websites, mobile apps, or customer care centers that deliver great service are wondering why they should prioritize voice user interfaces (VUI). The reality is that they should be enhancing all the channels they have with voice, not replacing other modes of interaction. Voice experiences, like any other UX, have been created to enhance experiences by offering greater convenience, functionality, and hands-free accessibility.

In many use cases, the voice interface can be used to supplement other modes of interaction including visual, gesture, and touch, or to create greater efficiencies in customer service and sales. Voice assistants can also help democratize product or app use by providing more accessibility to older adults, people with visual or tactile limitations, and children.

Considering that there are over 110 million virtual assistant users in the United States, 2 in 5 adults use voice search once daily, and 64% of consumers use voice commands while driving, it’s hard to imagine a brand that’s not thinking about a voice solution.

Instead of talking about the benefits of voice alone, brands should be talking about how voice can be a component to extend their product functionality and enhance customer experiences when used in combination with other modes of interaction, including:

Voice and touchscreen
Voice and glance/gesture
Voice and proximity/location
Voice and icons/sounds

The strength of a voice-first strategy lies in offering users more options so they can choose the mode of interaction that delivers the best experience in that time, location, and context.

Voice AI and touchscreens

In some cases, a wake word or wake phrase is used to start a conversation with a voice assistant. While these types of interfaces have their benefits for brand affiliation and recognition, there may be applications where customers would prefer to tap a button to begin the voice interaction.

For instance, in the healthcare field, medical professionals may want to ensure that a wake phrase will not activate sensitive machinery, but still want the convenience and safety of directing its operation through voice. Other applications would include any context where the person needs to be within arms length of the product or device and touch is the most convenient and reliable way to begin a voice-enabled interaction.

Touch and voice could be used in combination for shopping apps where customers can sort through filters quickly with their voice and then tap to choose the item they want displayed on the screen. Sometimes a picture is worth more than a thousand words and it’s more efficient and faster to show a result than to describe it. In those cases, the results of a voice query can be displayed on a screen without forcing the user to perform a series of touch, type, and tap motions to get to the result.

Voice assistants can help democratize product or app use by providing access to older adults, people with visual or mobile limitations, persons with different abilities, and children.

Other use cases may include seeing a map displayed on a screen during navigation either in the car or on a mobile device. Adding voice to a gaming app or video enhances a screen experience by adding functionality and gives users the ability to talk to the characters directly. When a screen is available, it makes sense to enhance the user experience with a combination of sights and sounds.

Voice assistants and glance detection

Glance detection and voice are natural partners in hands-free applications. In the car, a driver can ask, “What’s that building?” just by looking in the direction of the object while concentrating on driving. Finding landmarks and identifying locations will become part of the navigation experience and allow drivers to find their way, even when road signs are not visible due to weather or darkness.

Voice and glance work well in the mobile app world as well. Voice-enabled apps will provide much of the same functionality as those in the car when the user is on foot or on public transportation. Sightseeing in a new city or even finding your way in a familiar place can be enhanced by a combination of voice and glance.

Voice experiences with proximity detection

In the smart home, glance detection and proximity detection can help individual devices wake up to user commands and free individual household appliances to provide the ease and convenience of a voice user interface without the dependence on a central hub or third-party smart speaker.

Voice and proximity detection are particularly helpful in the smart home where users often interact with appliances and devices while their hands are occupied with either working or carrying objects. Instructing the washing machine to begin while loading the laundry is more convenient when the device detects the presence of the user and can wake up to accept commands without the push of a button.

Voice and proximity detection are particularly helpful in the smart home where users often interact with appliances while their hands are occupied.

Other applications in the smart home may include interacting with kitchen appliances while cooking or adjusting lighting as users walk from one room to the next.

Voice experiences enhanced with icons and earcons

Sometimes a wake phrase is the best choice to start conversations with a voice-enabled device. Other times, a button with a microphone icon can be used to show users that the voice interaction is starting. In applications like mobile apps or in-car experiences, brands may decide to provide both modes of waking up the digital assistant.

Even if a wake phrase is used, indicating that the voice assistant is listening either through an icon like a moving wave length or the prompt to “speak now” or “I’m listening” or an earcon, such as a beep, will provide a clear indication to the user when to start speaking.

When a screen is part of the experience, a listening screen gives users confidence that the voice assistant is understanding the query accurately. In these cases, the screen can simply display a waveform or a transcription of the user query.

Once the query is complete, the voice assistant can deliver results in two ways, enhancing the voice response with a more detailed response displayed on the screen—such as a list of restaurants or a map with directions.

When a screen is part of the experience, a listening screen gives users confidence that the voice assistant is understanding the query accurately.

When considering a voice assistant for your product, service, or app consider how implementing a natural language voice experience can enhance an experience where other modes of using the product, service, or app are also present. Even in applications where no screen is present, providing earcons or other indicators that the voice assistant is active reduces confusion and questions about whether the voice assistant is actively listening, or not.

Voice-first isn’t voice only. It’s just an easier, more convenient, natural, and sometimes safer way to interact with a product, service, or app.

Although voice does not need to be the only method of interaction (nor should it be), voice assistants will soon become a primary user interface in a world where people will never casually touch shared surfaces again. The voice era is here and brands without a voice strategy risk losing brand affinity to those that do.

The Houndify Voice AI platform is helping brands in a variety of industries build custom voice experiences and conversational voice assistants across industries and geographical boundaries.

If you’re interested in exploring the Houndify voice AI platform further, register for a free account or contact us to find out how we can help bring your voice AI strategy to life.

Karen Scates is a storyteller with a passion for helping others through content. Argentine tango, good books and great wine round out Karen’s interests.

You Might Also be Interested In

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.

Voice AI in the Auto Industry: Top Trends That Matter

Learn More

Voice assistants are increasingly the preferred method for users to search, ask questions, and complete tasks efficiently, quickly, conveniently, and hands-free. In 2020 alone, 45% of internet users searched via voice, according to We Are Social. Even before voice assistants became ubiquitous, the auto industry recognized voice AI as a way to deliver safer, smarter, more natural, and — most importantly — hands-free in-car experiences.

27-10-2021

Top 5 Reasons Voice AI is Driving the Future of Auto

Learn More

In-car voice assistants have become more than just a way to pull up music and navigation, hands-free. They are becoming the standard for exceptional driving experiences by offering functionalities that are defining the future of the auto industry. As car manufacturers seek to differentiate themselves in a market increasingly saturated with the latest technological advancements, they’ll need to distinguish themselves with voice AI that surpasses user expectations and needs.

14-10-2021

How In-Car Voice Assistant Adoption is Influencing the Future of Auto Manufacturing

Learn More

In-car voice assistants were one of the first applications of voice AI outside of smart speakers for the home. Leading car manufacturers looking to make driving safer and more enjoyable immediately saw the benefit of investing in custom voice assistants. Those that implemented early are delivering branded, hands-free experiences for their customers that include fast, easy, safe, and hands-free access to navigation, entertainment, information, cabin controls, communication, and more via voice AI technology.

21-09-2021

Why Voice is an Important Part of Our Multi-Modal Future

Karen Scates

Voice AI as part of multi-modal experiences

Voice AI and touchscreens

Voice assistants and glance detection

Voice experiences with proximity detection

Voice experiences enhanced with icons and earcons

You Might Also be Interested In

Why Brands are Buying into Voice Commerce for the Future of Voice Assistants

How Brand-Owned Voice Assistants Offer Key Advantages

Why Your Mobile App Needs a Branded Conversational AI Interface

Interested in Learning More?

You Might Also be Interested In

Voice AI in the Auto Industry: Top Trends That Matter

Top 5 Reasons Voice AI is Driving the Future of Auto

How In-Car Voice Assistant Adoption is Influencing the Future of Auto Manufacturing