Front Page

Voice Recognition to Drive More ‘Conversational’ Platforms

Voice Recognition to Drive More ‘Conversational’ Platforms 400 225 admin

LONDON — Two years ago, Gartner predicted that 30 percent of our interactions with technology in 2018 would happen through conversations with voice-based systems. Last month, an analyst predicted that Amazon’s Alexa will drive $10 billion in sales by 2020.

Tymphany and XMOS to showcase new soundbar at IFA with Alexa built-In

Tymphany and XMOS to showcase new soundbar at IFA with Alexa built-In 6000 4500 admin

Tymphany, a global premier audio ODM, and XMOS, a leading supplier of advanced embedded voice solutions, announced the latest project in building a new Alexa Built-In soundbar that will be demonstrated at Amazon’s exhibit at IFA in Berlin, 31 August to 5 September 2018.

The Amazon Alexa Built-In soundbar features XMOS Vocal Fusion™ far-field voice processor as well as Tymphany’s acoustic expertise to create an immersive audio experience. The low frequency bass extension powered by Tymphany’s patented, slim profile GBS subwoofers allows for smooth frequency response down to 50 Hz. Using Tymphany’s A113D SOM and XMOS’ XVF3500 stereo-AEC voice processor, the soundbar connects with the Amazon Alexa Voice Service (AVS) via WiFi and has an excellent far-field voice capture and barge-in performance from across the room – even when the Tymphany soundbar is playing content at high volume and the commands are spoken softly. The soundbar will include Amazon’s signature LED light ring response and buttons, as well as provide access to Amazon Prime Music and TuneIn support.

“We’re proud to have helped bring this project to life”, added Mark Lippett, CEO of XMOS: “This new soundbar showcases the innovation, dedication and collaboration of the international partners involved.”

With Tymphany’s extensive resources and global engineering team, they have deep expertise and excellent partnerships across the entire signal chain, enabling them to deliver products that offer unique combinations of user experiences and features not achieved by other audio ODMs.

“Tymphany and XMOS have created a soundbar that enhances the listening experience by providing simple, hands-free interaction with Alexa,” said Priya Abani, Director of the Alexa Voice Service. “We’re excited to have a new solution that brings rich voice capabilities to our customers.”

“We are extremely excited to introduce the Tymphany Alexa Built-In soundbar this year at IFA 2018. We are proud of our team’s efforts to realize this complex, groundbreaking and great performing soundbar reference design. We would not have been able to achieve this milestone without the awesome support of the Amazon Alexa Voice Service and XMOS development teams”, says Chris von Hellermann, Senior Director of Technology Management at Tymphany.

You can see and hear this demo and more at the Amazon booth in Hall 26, booth 201 at IFA this August 31st to September 5th in Berlin.

About Tymphany

Tymphany designs and manufactures some of the most innovative consumer and professional audio systems on the market. The company also sells a full line of speaker drivers under the Peerless by Tymphany brand. The company’s roots go back to 1926 when Peerless was founded in Denmark. In the proceeding 90+ years, Tymphany has grown to be a global premier audio ODM with over 6,000 employees around the world. For more information visit

About XMOS

XMOS is a leading supplier of voice and audio solutions to the consumer electronics market. Unique silicon architecture and highly differentiated software positions XMOS at the interface between voice processing, biometrics and artificial intelligence. For more information, please visit, or email

Want to develop a voice enabled device that can hear across the room?

Want to develop a voice enabled device that can hear across the room? 1014 762 XMOS

You’ll need the right acoustic echo cancellation (AEC) solution.

If you’re designing a voice-enabled product for the smart home that includes a loudspeaker, you’ll need to remove the acoustic echo it generates so you can interrupt the audio stream – barge-in – and give a voice command when the device is playing such as adjust volume.

Mono or stereo?

For products such as security solutions or kitchen appliances, and many smart speakers, mono-AEC is usually the right tool for the job. But if you’re designing products that output true stereo audio, for example TVs, soundbars and media streamers, then you’ll need stereo-AEC to secure the best performance available. Here’s why …

Acoustic echo cancellation explained

Acoustic echo cancellation is a digital signal processing technique for removing echo that originates from a loudspeaker. Within a device, there’s a direct path between the loudspeaker and microphones. There’s also an indirect path between the two, because the audio signal reflects off the walls and other surfaces before it reaches the microphone. Put simply, you’ll get a reflection off the ceiling, floor, each wall and every solid object in the room. These reflections are known as indirect acoustic echo and they’re picked up at different times by the microphone, depending on the length of path from the loudspeaker to the microphone.

If we look at a soundwave generated by a noise from the loudspeaker, the original sound can usually be identified at the beginning and then the soundwave tails off as the energy falls in reflections.

To support barge-in and capture a clear voice stream to send to your automatic speech recognition service (ASR), you need to remove as much echo from the captured microphone signal as possible.

It’s not possible to remove 100% of the echo because the time needed to capture the signal and separate out all of the echo would lead to a delayed response, and the user experience demands that this all happens in real time. So in practice, you’re looking to target an “acceptable” level of echo cancellation that allows the ASR to respond accurately.

Types of acoustic echo cancellers

Echo cancellers are categorised by the number of loudspeaker reference channels supported. Common configurations are either: mono – 1-channel, or true stereo – 2-channel. Another configuration – pseudo-stereo – behaves in a very similar way to mono, but has some significant performance issues when challenged with true stereo audio output.


Mono-AEC uses a single reference signal based on the audio input and applies it to the output, which can be one or more loudspeakers.

The Digital Signal Processor uses the reference signal to calculate indirect echo based on the time it takes the reflections to reach the microphone.

Where signal processing has been used to give the impression of a stereo system from a mono signal (e.g. by adjusting the signal pan and volume and output to two or more speakers) the calculation remains based on the reference signal and position of the loudspeakers from the microphone:

True Stereo-AEC

True stereo-AEC uses two separate reference signals based on the two-channel input.

Each reference signal is used to cancel the echo from its corresponding loudspeaker output.

True stereo-AEC requires almost twice the computational resources of a mono solution, and it requires very low latency within the system to keep all the echo cancellation synchronized within the required thresholds.


A pseudo-stereo solution is similar to a mono-AEC configuration; it outputs the two audio streams to separate speakers but uses a single reference signal that is a mix of the two inputs.

The mixed reference signal is then applied to each loudspeaker output.

Problems arise when the mixed signal differs significantly from the two output channels, for example a loud track on one loudspeaker and a quiet one of the other, and the mixed reference signal is not representative of either input signal.

In the example above the amplitude of the reference signal is significantly larger than the output for Input A. This causes the signal to be drowned out leading to a very low signal-to-noise for the voice capture process. With Input B there is not enough AEC when the input is loud which will cause increased artefacts in the captured voice stream and a higher likelihood of inaccurate word recognition.

Choosing the right acoustic echo cancellation solution

The start point is to decide which acoustic echo canceller you need for your microphone array and audio subsystem.

Using a mono-AEC algorithm with a true stereo device will only work if both channels are very similar. If your stereo product uses the full capabilities of stereo audio with spatial soundscape and dramatic volume changes, then the only solution is one that supports true stereo-AEC.

For devices like smart speakers where the required range of output is more limited, a pseudo-stereo may provide an good solution. And for things like kitchen appliances where high quality audio isn’t required, mono-AEC is ideal.

XMOS has a range of solutions to fit whatever product you’re developing. Our XVF3000 series with mono-AEC is ideal for smart panels and smart speaker developers, while our XVF3500 series with two channel stereo-AEC delivers outstanding performance for smart TVs, soundbars and other products that playback true stereo output.

by Huw Geddes