Alexa built-in

The power of voice: A new era for TV enthusiasts everywhere

The power of voice: A new era for TV enthusiasts everywhere 900 506 XMOS

Today we launched the next generation of voice technology that’s designed to drive the voice-enabled TV market forward and transform how we search and discover content.

Futuresource Consulting forecasts that nearly 700 million new smart speaker, smart TV, set-top box and smart home devices will ship in 2019, and voice assistants will be built into an increasing number of them. Built-in voice interfaces have attracted a lot of attention from manufacturers, but cost and integration complexity concerns pushed the early voice implementations towards ‘Alexa compatible’ and ‘Push to Talk’ solutions. However, people are switching on to the power of voice and the trend is set to rise over the coming years – which is bringing manufacturers back to the ‘built-in’ solution, because it delivers a more relaxed, intuitive experience.

XMOS’ new technology presents a real opportunity for manufacturers to bring a compelling voice control experience to the masses, effectively and economically.

When the TV in your living room offers far-field voice capture that works with close range precision (i.e. you can just tell it what you’d like to watch from anywhere in the room),  the remote control and push-to-talk voice experience starts to feel outdated. Voice-control unshackles us from hierarchical menus and never-ending pages of content, freeing us from push-button and touch-based interfaces. Whilst we may still need the remote control to switch between different hardware, the ability to just say: “Alexa, play Friends series two, episode three” or even “Alexa, find the pivot episode in Friends”  provides a much richer search experience. And it’s easy to see how voice control is perfect for TVs and set-top boxes – and how far-field makes that a more natural, conversational interaction.

The XMOS solution

Our new XVF3510 two-mic voice processor is designed with our modern living spaces in mind. Our new VocalFusion algorithms work intelligently to analyse the acoustic environment, detecting and isolating a voice command from every other sound in the room (including any media streaming through the device itself or other devices nearby), making it ideal for smart home devices and integration into smart TVs and set-top boxes. And with a price tag of just $0.99, it hurdles the cost barrier nicely.

Simon Bryant, Research Director at Futuresource Consulting said: “The strong adoption in multimedia and entertainment is expected to continue and this new device from XMOS addresses what manufacturers who want to add far-field voice control to their product lines will be looking for.”

The XVF3510 offers a clear incentive for manufacturers to move beyond the world of push-to-talk, ‘Alexa compatible’ and touch-based control interfaces. With that, we’ll start to see the real power of voice emerge.

For further technical details, the full product brief can be found here.

The XVF3510 will enter general distribution in August 2019. Developers can request a dev kit from XMOS here.

Embedding voice – DSP chip or DSP algorithms on the Applications Processor?

Embedding voice – DSP chip or DSP algorithms on the Applications Processor? 5578 3719 XMOS

Why smart developers choose a DSP chip rather than run DSP algorithms on the Applications Processor …

In our increasingly connected, intelligent world, voice-control opens the door for a more natural, engaging conversation with technology. Reliable, accurate voice capture relies on advanced digital signal processing (DSP) algorithms and good acoustic design to ‘hear’ the wake-word and pick up the voice command – even in a noisy environment. Some of the key algorithms include:

  • Acoustic-echo cancellation: When you give a voice command to a TV, the microphones will capture both your command and the audio track coming from the TV speakers. That captured audio track – the acoustic echo – needs to be cancelled from the captured signal so it ‘hears’ the wake-word’ first time, every time and captures a clean voice command to send to the speech recognition service (eg Alexa). This is also known as ‘barge-in’.
  • Beamforming: This detects and tracks where the voice is coming from, so the command is captured accurately, even if you’re walking across the living room.
  • Interference Canceller: This ‘scans’ the soundscape of the room and ignores (cancels out) the point noise sources, ie anything that’s not the voice of interest, in the surrounding space. The improved voice signal can then be sent to the speech recognition service.
  • Noise suppression: Noise suppression algorithms target diffuse noise sources such as air conditioning and road noise. They remove the stationary and non-stationary background sounds to enable accurate, reliable voice detection.

As voice starts to move beyond smart-speakers and into the living room, developers are having to figure out how best to build a voice interface into a smart TV or set-top box. And one of the common questions we hear is whether to embed the DSP on a separate voice processor (chip) or run DSP algorithms on the Applications Processor ….

Should you run DSP algorithms on the Applications Processor?

Most consumer electronics devices are built around an Applications Processor. Put simply, the more powerful the processor, the quicker your programmes, apps, games and features will appear. As a developer, you may choose to simply execute the DSP algorithms on the Applications Processor (host processor). At first glance, this seems cost effective and easy to integrate – primarily because there’s no additional chip to purchase and integrate. However, there are some significant downsides to this approach that developers need to consider.

  • Adverse impact on capacity: because the host processor handles the core system processes, it’s one of the most expensive elements of the electrical design. The more powerful the host processor, the more tasks it can handle – but in turn, it’ll cost more, consume more power and require more space. As a developer, you’ll want the cheapest processor that’s capable of running all the core functions, with minimal power. Therefore, adding DSP algorithms onto it, imposes additional processing that burdens the chip and takes up capacity that can otherwise be used for core functions.
  • Bill of Materials (BoM): This will be pushed up beyond original estimates as additional components will be required to support the integrations (eg microphone aggregator).
  • Performance risk: The DSP algorithms will be constrained by the capacity that’s available on the host processor and performance may be compromised.
  • Integration complexity: Adding algorithms onto the host processor, puts all of the integration demands onto the software team and can rapidly increase the cost of development. It can also create challenges in delivering with in the real time constraints to produce a glitch-free audio stream, without increasing the latency of the system. Further challenges may arise in the future around in-field updates and whether there’s sufficient capacity to run the update on the host processor.

How does that compare with running DSP algorithms on a separate chip?

A standalone DSP chip solution offers some compelling advantages over licensing DSP algorithms and integrating them into the host processor.

  • Transfers work away from host processor: Running the DSP on a separate chip, keeps the host processor free for core functions – and avoids impacting the software team
  • Easy to integrate: A ringfenced solution needs to be planned into the electrical design, but using an external DSP allows you to use standard hardware interfaces (such as I2S or USB for connectivity) which simplifies the integration task significantly. A separate chip ensures there are no dependencies between the code on the DSP chip and that on the host processor, there’s simply an API to deliver processed voice samples in an uninterrupted stream.
  • Future-proof solution: You benefit from the latest developments in voice technology; plus, in-field software releases are delivered easily via firmware update.
  • Accelerated time to market: A DSP chip offers a plug and play solution which separates the voice-capture solution from the rest of the TV electronic design, enabling developers to deliver a built-in voice interface rapidly.

Choosing the right far-field voice interface for your TV or set-top box is a key decision for your company. A separate voice processor such as XMOS’ VocalFusion often provides a more flexible and cost-effective solution over the complete lifecycle of a TV or set-top box. It reduces project risk, minimises dependencies between software functions and avoids burdening the host processor.

XMOS solutions are cost-effective and offer the flexibility to remove additional costs from your system design. Find out more about our voice solutions here. Or get in touch with one of our sales team here.

We’re here to help you transform the way people find and enjoy content through your products.