Our technology

Our voice is as unique as our fingerprints. It’s a key part of how we express ourselves. And that makes voice, a very natural way to communicate with the technology around us.

Here at XMOS, we’ve developed technology that helps give people the freedom to control their technology simply by using their voice, wherever they are in the room and whatever is happening around them.

XMOS is a fabless semiconductor company with a unique microcontroller design, underpinned by our own in-house tools. Our approach to multi-core, highly parallel stream processing enables our software team to develop highly efficient “at the edge” intelligent sensor applications.

Today, we detect voice commands accurately, from across the room – even in busy environments, when the person is speaking softly. There are significant challenges in doing that, for example, the output of the device needs to adapt to the acoustic environment – soft furnishings absorb noise whereas hard surfaces reflect sound and bounce it around the room. And of course, the user may be moving around the room while talking, altering the quality of the voice feed. Added to which, there’ll be a range of background noise and the voice enabled device itself may already be playing music.

Our high-performance hardware and software solution detect the voice of interest and enhance the voice clarity from across the room – even in noisy environments and during media playback. Here’s a quick guide to how we make that happen.

  • xCORE multicore microcontrollers are programmed in C / C++. With multiple 32bit processing cores, our microcontrollers are characterised by flexible I/O and a unique timing-deterministic architecture, letting you deliver complex real-time projects using a design process that’s straightforward, flexible and scalable. Ideal if you’re looking for a solution that’ll:

    • detect and resolve task changes as they occur
    • process multiple concurrent tasks quickly
    • handle combinations of fast, complex interfaces
    • be easy to integrate and highly scalable

    xCORE architecture removes the features of a traditional microcontroller that can introduce uncertainty – in place of a traditional bus structure, cores have direct access to I/O resources and single cycle SRAM. Instructions execute in a single cycle, and tasks running on each core share a single, unified memory system. Many of the features of an RTOS are integrated in hardware; task scheduling and event handling as well as inter-task message passing are all supported natively.

    With multiple processing cores executing independently, each core can run I/O, DSP and application code; and tasks can run to completion within a strict timing window without affecting other tasks. An intimate connection between processor resource and hardware ports passes I/O events directly to tasks, yielding response times up to 100x faster than traditional microcontrollers.

    Because xCORE devices are deterministic, designers can create systems that are predictable, with the exact combination of peripheral interfaces they require. xCORE delivers all the features required by today’s embedded developers and enjoy performance that’s unmatched by traditional interrupt-driven systems.

  • Beamforming enables us to detect and track the position of the voice of interest from across the room. As the person talking walks around the room, the angle of the microphone beam adjusts automatically to track their voice.

    Our solutions support different microphone array geometries: linear and circular. Sound will arrive at different times, at the different microphones in these arrays; by identifying the first and last point of arrival, the beamformer is able to determine the direction the voice is coming from.

  • If you’ve ever spoken into an electronic device and heard your own voice coming back at you (sometimes with a significant delay) then you’ve experienced acoustic echo.

    Controlling and cancelling acoustic echo (the playback signal) is essential in voice applications. For example, if you’re watching film on a smart TV and give a voice command to change the channel, the microphones will capture both your command and the sound of the film. That captured film sound – the acoustic echo – needs to be cancelled from the signal so that only the voice command is handled by the voice interface.

    We offer acoustic echo cancellation (AEC) solutions for both mono and stereo output devices.

  • In any room, your voice will reverberate (reflect) off hard surfaces around the room eg a window or TV screen.

    Dereverberation removes these reflections and cleans up the voice signal to send to the Automatic Speech Recognition service. By contrast, noise suppression removes the stationary (point-noise) and non-stationary background sounds.

  • When you’re streaming music (or other audio) from a voice-enabled device and speak the wake-word, the music will stop. This is barge-in – the ability to interrupt the device even when it’s playing music loudly and the wake-word is spoken softly from across the room.

  • In conference calling applications, automatic gain control (AGC) is used to ensure that all the people in the room, whether talking loudly or softly (and regardless of their position in the room), are heard at a consistent volume.

    In automatic speech recognition (ASR) implementations, typically fixed gain control is used; it acts as a limiter so the output volume doesn’t go above a certain level.

  • These systems can be cloud-based (eg the Alexa Voice Service) or local (eg Kitt.ai). They take the digital voice stream (after the wake-word has been identified), detect a recognisable command or question and trigger a contextual response. XMOS solutions interact easily with all ASRs.

XMOS VocalFusion digital signal processors (DSPs) use a range of algorithms and combine these key components together to isolate and capture voice command clarity and elevate the performance of automatic speech recognition systems. Our support doesn’t end there. Our technical experts are on hand to provide hands-on design support, and lead you through software tuning to drive best in class performance for your go-to-market-solutions.