The rise of voice AI-enabled hardware
By Probal Lala, chief executive officer, Fluent.ai Inc.Automation / Robotics Electronics Wireless Engineering IoT AI AI-enabled artificial enabled hardware intelligence voice
Speech recognition operates fully offline, embedded on low-footprint consumer device hardware
The development and evolution of voice recognition artificial intelligence has led to a proliferation of devices in everyday life that can be controlled by voice. Everything from wearable fitness trackers and wireless earbuds to microwaves, refrigerators and robot vacuum cleaners have been equipped with voice recognition capabilities.
These use cases for voice recognition are increasingly shifting from cloud-based to offline, embedded speech recognition on various types of hardware, from low-footprint microcontrollers to powerful AI chips. As a testament to this proliferation of voice-controlled devices, the entire speech and voice recognition industry is expected to grow to $21.5B USD in 2024 at a CAGR of 19.18%, of which embedded/offline speech and voice recognition is expected to grow at the highest CAGR.
Speech recognition technologies such as Amazon’s Alexa and Google Assistant operate in the cloud following the traditional two-step process: first, the user’s speech is transcribed into text, then Natural Language Processing (NLP) is applied to the text in order to derive meaning. This type of approach gives the benefit of voice searching anything over the Internet. However, it also generates concerns about consumer privacy and presents limitations in terms of language and accent support, and the need for an Internet connection in order to use these technologies. For these reasons, market demand has been growing for more secure, offline speech recognition. A handful of speech recognition technology providers, such as Montreal-based Fluent.ai, have undertaken to fill this market gap by developing speech recognition technologies that can operate fully offline, embedded on low-footprint consumer device hardware.
The proliferation of IoT devices that are putting consumer privacy at the centre of design benefit from an offline voice recognition solution, which in addition to being private-by-design, also offer the benefits of lower latency, minimal power consumption and relatively low cost – all factors that are driving today’s consumer device hardware design. By eliminating the cloud computing requirements of traditional voice recognition technologies, offline, embedded speech recognition solutions have minimal power and storage requirements, enabling them to work on low-footprint systems such as Arm Cortex-M series of microcontrollers. On the other side of the spectrum, powerful AI chips are increasingly looking to edge-based voice recognition and other edge AI capabilities for the same reasons of data privacy, low latency and power efficiency.
Small footprint AI systems
AI technologies by their very nature consume a significant amount of compute power and are therefore typically better suited for the Cloud. Several software companies have tackled the problem of developing small footprint AI systems that can be compiled and run in embedded systems using Tiny Machine Learning (TinyML). The increased development of AI technologies including voice recognition, and the growing demand for secure, edge-based solutions offer an opportunity for hardware companies to implement these technologies into their designs.
Voice recognition software providers offer a variety of solutions, from wake words to trigger a device to turn on without the need to press a button, to voice command recognition – “pause music”, or “preheat the oven to 425 degrees” – to voice biometrics and sound recognition. Voice recognition software solutions either come as off-the-shelf SDKs with a generic library of commands to custom solutions tailored for use case, noise environment, and languages and accents to be supported. These features are important as hardware manufacturers increasingly look to expand their reach in overseas markets and target growing sectors for voice recognition such as factory automation, where noise robustness is key.
In the smart home, consumers are beginning to look for alternatives to the market-dominating Alexa and Google Assistant smart home hubs. The scandals involving consumer privacy breaches of these always-connected devices has spurred a growing movement towards offline, embedded and therefore 100% private voice user interfaces. Voice control has been embedded into the home appliances that Alexa and Google previously controlled through the cloud – tiny devices such as light switches can be controlled through voice recognition technology running on small footprint MCUs such as Arm-Cortex M4. The next frontier of voice recognition AI in the home which is already being explored is the multi-modal edge AI chip – hardware that can combine voice recognition, image detection and other AI tools in one chip that can be embedded in appliances and other devices all throughout the home, creating a secure, offline smart home ecosystem.
Wake word technologies
Small wearable devices such as fitness trackers and wireless earbuds are also making the shift from cloud-based to embedded voice user interfaces in order to offer lower-latency and more noise robustness for use in outdoor environments. While contemporary wake word technologies operate on low power requirements despite being always-listening, further power saving can be achieved using a push-to-talk set up, where the user would first press the button on their device, then speak a command.
Lastly, factory automation has been another growing area for embedded voice AI. The time savings that come from low-latency, embedded voice control can translate to millions of dollars in savings for manufacturers, and the privacy component of offline voice recognition is important too. Robustness to noise is key in a factory environment, and offline, embedded voice solutions typically perform more accurately in noisy environments when combined with a quality, noise-cancelling microphone front end.
With the continued development and growth of AI capabilities such as voice and image recognition, we expect AI-enabled hardware to proliferate. The very term “artificial intelligence” has set consumer expectations for devices that must perform at a more natural human level in their interactions with users and their environment. These expectations are driving new technology innovations, whether it be new speech recognition software solutions from Fluent.ai that go directly from acoustic to action, to AI chips that support multimodal understanding. Certainly, the COVID era we’re living in has ushered in growing demand for contactless user interfaces, where voice AI offers a prime solution.
Fluent.ai Inc. is privately held, founded in 2015 and based in Montreal. Fluent.ai’s mission is to voice enable the world’s devices. The firm has developed a range of AI voice interface software solutions for OEMs and service providers.