Voice Recognition: Beyond smart speakers

27th March 2019

It’s a brave new world out there and what we thought we knew is no longer the case. We are now living in the world of ‘voice’ where the convergence of artificial intelligence and voice recognition is changing the way we all live.

There are varying figures and trends that quantify the emergence of voice technology. One figure that is considered as a measuring stick in efficiency is voice recognition and adoption rates. Baidu chief scientist Andrew Ng said that 95% word recognition is actually the same threshold of accuracy as human speech with Google already achieving such figure.

It is expected that the voice adoption will increase along with the demand for smart speakers as 26.4 million voice queries are already carried out on these devices alone. By 2020, half of all searches we make will be voice and about a third of all web browsing will be done without a screen. On the business side, market intelligence firm Tractica reports that unique users for virtual digital assistants is expected to reach to 1.8 million with revenues at $15.8 billion by the end of 2021.

Evolution or Revolution?

Flash back to 20 years ago, computer scientists were still desperate to communicate with voice so that every syllable has to be taught for the machine to understand even the most basic voice commands. Who would have thought that voice AI would have evolved in quite a short time so that now devices from smartphones to computers can now recognize our voice and what we say. All our insights, tendencies and behavior have been captured in all shapes and forms by machine-learning algorithms and neural networks so that next-generation devices would effectively interact with us flawlessly. Some experts even believe that physical controls will all but become obsolete in the future.

Most companies are now in the race for the next big thing in the fast becoming voice-controlled tech ecosystem. No one wants to be left behind in the dust so many brands have to redefine their business practices as customer decisions and behaviors are now processed by voice AI. This tech revolution has unleashed an all-encompassing voice AI in next-generation technologies that not only reinvented how we shop, search and communicate but also enable us to do a lot more than just the usual voice search for trivial tidbits of Wikipedia entry or latest weather updates.

When it comes to the future of business, voice interface technology frees up employees from all the mundane tasks thereby giving them more time to work on the more pressing and important matters. With such form of communication, it will push businesses even beyond what they’re normally capable of.

A Pandora’s box of promising possibilities and endless opportunities are opened up thereby giving early adopters competitive edge in the market. Startup Roxy has developed their fully customizable in-room concierge that is not dependent on Amazon or Google’s big data when it comes to managing customer information and interaction. Tech firm Synqq has voice assistance app that utilizes voice and natural language processing to keep track of meetings as efficient as a personal assistant or secretary. Even banking and finance giants J.P. Morgan and Capital One have leveraged Alexa’s patented technology to provide better analytics reports and customer assistance.

What do the "voices" say? VP Chris Kirby expects that algorithms will soon recognize all the various aspects, nuances and characteristics of human speech from tonal inflection to mood. VoiceOps co-founder Daria Evdokimova believes that machine-driven speech-to-text systems would be able to surpass human transcription in both accuracy and speed in the next 5-10 years. ISHIR CEO Rishi Khanna envisions a future dominated by virtual assistants that helps us communicate with machines and hands-free mobility that will eventually remove driving vehicles away from total human control.

Interestingly, Veritone executive Tyler Schulze shares a more radical thoughts on this future technology as he sees human voices as distinctively unique based on spectrogram analysis. In short, Schulz is comparing our voice to a fingerprint that can be verified and profiled.

Voice Recognition has special importance for executives and developers creating tomorrow’s software products, it means voice must be an integral part of the user experience. It means shifting from thinking in terms of the customer or user using the software via swiping on their phone or clicking on a mouse to how to go about delivering a seamless experience via voice. It means delivering experiences in a world where immediacy is key. People want to ask a question and receive immediate and insightful information

Xerox: A case sudy

Voice recognition software has made great strides. At some point, people thought Siri would never be able to understand them, now it’s becoming a common feature in tech gadgets. According to Inc., humans can speak 150 words a minute, but only type 40. “Now is the time for voice recognition to take over too, since the technology is a logical fit with Internet of Things-connected devices, such as Amazon Echo,” It began when the Amazon Echo voice recognition system, Alexa, and Vision-e developed Vision-e Voice so users could give verbal commands to the ConnectKey technology-enabled printer. The technology was developed working closely with Xerox engineers to access the ConnectKey application-programming interface.  

Xerox ConnectKey technology-enabled devices are the first printers to be accessible via voice recognition technology, setting the innovative direction printers will take in the future. This VR printer brings digital printing to a whole new level, allowing users to perform print and copy needs like making copies, scanning documents to email or Dropbox, and supply inventory and ordering virtually hands-free. The VR is great because people who battled to perform complicated printing tasks or use some of the extensive features of Xerox printers have now got Alexa on their side: Amazon’s intelligent personal assistant is ready to help, with answers.

Alexa is a context-aware app assistant that delivers personalised answers to users’ questions. For instance, the user may say: “Alexa, tell Xerox to scan to email.” Alexa then replies: “Okay, who is your recipient?” Once you tell Alexa the recipient’s address, the email is sent – it’s as simple as that, all without the touch of a button. It probably took longer to read the above than it would to finish the printing process.

The bottomline

According to market research firm TrendForce, the global market value of voice recognition solutions is projected to reach $15.98 billion in 2021. They attribute this significant growth momentum to the widespread adoption of voice-based virtual assistants in various devices. Technology trends indicate that VR will become an integral component of AI systems and this will give users greater ease of communication.

When it comes to voice optimization, every possible angle will soon get covered so that a future will see AI making real-time conversations with us. Thanks to natural language processing, it will constantly learn from us and make the necessary adjustments like we all do. Businesses are expected to invest more on this technology as the overall voice infrastructure improves.

By 2020 and beyond, ‘voice’ will be the new norm of the future.