VocalFusion Far-Field Voice Processors
XMOS FAR-FIELD VOICE PROCESSORS
- Acoustically optimized far-field MEMS microphone solutions
- Microphone/acoustic digital signal processing
- Best quality always-on voice interfaces in a single device
The convergence of sophisticated speech recognition services in the cloud driven by advances of neural network algorithms and unlimited compute, the rapid developments in MEMS microphone performance driven by the meteoric rise of the smartphone, and availability of highly integrated microcontrollers capable of capturing and processing the raw microphone inputs at distances of up to 5m or further, is driving speech as the next generation of machine interfacing.
VocalFusion provides the highest-performance single-chip solution for far-field voice capture, complimenting graphical and touch interfaces, and leading to new markets of no-interface products.
VOCALFUSION FAR-FIELD MICROPHONE SOLUTIONS
VocalFusion devices aggregate multiple audio streams from microphone arrays, convert the analog PDM microphone signal to a digital PCM stream that DSP can be applied to, and pass the optimised audio stream to remote or local ASR engines.
With support for circular or linear arrays, XMOS devices provide exceptional 360 degrees and 180 degrees far-field voice capture in excess of 5m distance, to meet a range of products that include smart speakers, smart TVs, computers and laptops, robots.
ACOUSTIC SIGNAL PROCESSING
VocalFusion devices apply sophisticated digital signal processing on the captured signal including mono/stereo Acoustic Echo Cancellation (AEC), dereverberation and Automatic Gain Control (AGC), to address the problems of distance, isolation, removal of noise, and the establishment of direction of voice capture. The resulting audio streams provide high-quality clear speech suitable for local or remote automatic speech recognition (ASR) engines.
DSP is also used to implement barge-in functionality, allowing the device to understand speech and commands as the product outputs audio streams.
Additional filters can be applied to the signal using the native DSP instructions to mitigate acoustic effects and artefacts introduced by the enclosure design, microphone porting and microphone isolation.
ALWAYS ON KEYWORD SOLUTIONS
Voice trigger solutions, such as Sensory’s TrulyHandsfree™ technology, can be integrated on the VocalFusion device to provide a single-chip solution, or on a separate applications processor closely integrated with VocalFusion voice processor for power critical products.
HIGHLY FLEXIBLE VOICE SOLUTIONS
The flexibility of the XMOS architecture reduces time-to-market for customers developing simple and complex voice-capture systems. VocalFusion devices can provide secure connections to local or remote ASR Cloud services, or robust integration with host PCs and application processors.
The xCORE's I/O architecture, where the behavior of individual pins can be configured under software control, also supports aggregation of other types of sensor to deliver different ways to address the challenges of capturing voice in noisy populated environments, including the cocktail party challenge.
VOCALFUSION AND MICROPHONE ARRAY SOLUTIONS
XMOS provides multiple platforms for designers who want use VocalFusion voice capture algorithms, as well as platforms for developers who wish to use their own microphone and voice DSP libraries.
VocalFusion Stereo [Stereo-AEC]
xCORE-VOICE Array Microphone