Embedding voice – DSP chip or DSP algorithms on the Applications Processor?

Why smart developers choose a DSP chip rather than run DSP algorithms on the Applications Processor …

In our increasingly connected, intelligent world, voice-control opens the door for a more natural, engaging conversation with technology. Reliable, accurate voice capture relies on advanced digital signal processing (DSP) algorithms and good acoustic design to ‘hear’ the wake-word and pick up the voice command – even in a noisy environment. Some of the key algorithms include:

Acoustic-echo cancellation: When you give a voice command to a TV, the microphones will capture both your command and the audio track coming from the TV speakers. That captured audio track – the acoustic echo – needs to be cancelled from the captured signal so it ‘hears’ the wake-word’ first time, every time and captures a clean voice command to send to the speech recognition service (eg Alexa). This is also known as ‘barge-in’.
Beamforming: This detects and tracks where the voice is coming from, so the command is captured accurately, even if you’re walking across the living room.
Interference Canceller: This ‘scans’ the soundscape of the room and ignores (cancels out) the point noise sources, ie anything that’s not the voice of interest, in the surrounding space. The improved voice signal can then be sent to the speech recognition service.
Noise suppression: Noise suppression algorithms target diffuse noise sources such as air conditioning and road noise. They remove the stationary and non-stationary background sounds to enable accurate, reliable voice detection.

As voice starts to move beyond smart-speakers and into the living room, developers are having to figure out how best to build a voice interface into a smart TV or set-top box. And one of the common questions we hear is whether to embed the DSP on a separate voice processor (chip) or run DSP algorithms on the Applications Processor ….

Should you run DSP algorithms on the Applications Processor?

Most consumer electronics devices are built around an Applications Processor. Put simply, the more powerful the processor, the quicker your programmes, apps, games and features will appear. As a developer, you may choose to simply execute the DSP algorithms on the Applications Processor (host processor). At first glance, this seems cost effective and easy to integrate – primarily because there’s no additional chip to purchase and integrate. However, there are some significant downsides to this approach that developers need to consider.

Adverse impact on capacity: because the host processor handles the core system processes, it’s one of the most expensive elements of the electrical design. The more powerful the host processor, the more tasks it can handle – but in turn, it’ll cost more, consume more power and require more space. As a developer, you’ll want the cheapest processor that’s capable of running all the core functions, with minimal power. Therefore, adding DSP algorithms onto it, imposes additional processing that burdens the chip and takes up capacity that can otherwise be used for core functions.
Bill of Materials (BoM): This will be pushed up beyond original estimates as additional components will be required to support the integrations (eg microphone aggregator).
Performance risk: The DSP algorithms will be constrained by the capacity that’s available on the host processor and performance may be compromised.
Integration complexity: Adding algorithms onto the host processor, puts all of the integration demands onto the software team and can rapidly increase the cost of development. It can also create challenges in delivering with in the real time constraints to produce a glitch-free audio stream, without increasing the latency of the system. Further challenges may arise in the future around in-field updates and whether there’s sufficient capacity to run the update on the host processor.

How does that compare with running DSP algorithms on a separate chip?

A standalone DSP chip solution offers some compelling advantages over licensing DSP algorithms and integrating them into the host processor.

Transfers work away from host processor: Running the DSP on a separate chip, keeps the host processor free for core functions – and avoids impacting the software team
Easy to integrate: A ringfenced solution needs to be planned into the electrical design, but using an external DSP allows you to use standard hardware interfaces (such as I2S or USB for connectivity) which simplifies the integration task significantly. A separate chip ensures there are no dependencies between the code on the DSP chip and that on the host processor, there’s simply an API to deliver processed voice samples in an uninterrupted stream.
Future-proof solution: You benefit from the latest developments in voice technology; plus, in-field software releases are delivered easily via firmware update.
Accelerated time to market: A DSP chip offers a plug and play solution which separates the voice-capture solution from the rest of the TV electronic design, enabling developers to deliver a built-in voice interface rapidly.

Choosing the right far-field voice interface for your TV or set-top box is a key decision for your company. A separate voice processor such as XMOS’ VocalFusion^® often provides a more flexible and cost-effective solution over the complete lifecycle of a TV or set-top box. It reduces project risk, minimises dependencies between software functions and avoids burdening the host processor.

XMOS solutions are cost-effective and offer the flexibility to remove additional costs from your system design. Find out more about our voice solutions here. Or get in touch with one of our sales team here.

We’re here to help you transform the way people find and enjoy content through your products.

SPEAK TO SALES