Voice Processing Pipeline#

Overview and Key Features#

The XVF3800 integrates a set of advanced Digital Signal Processing (DSP) algorithms that include Acoustic Echo Cancellation (AEC), beamforming, dereverberation, noise suppression and automatic gain control. These advanced DSP algorithms deliver high speech-to-noise ratio, naturally sounding speech and eliminate acoustic echo while maintaining a transparent and low latency communication link.

The key features of the XVF3800 solution are:

  • High levels of Acoustic Echo Cancellation and Suppression in conferencing and living room conditions.

  • State of the art, robust, and natural double-talk / full-duplex performance.

  • High speech clarity level even when users are at several meters distance, without requiring directional microphones.

  • Fast adaptive beamforming for tracking multiple near-end users.

  • Stationary / diffuse noise suppression.

  • Automatic gain control.

Main Functional Blocks#

A high level diagram of the solution is shown below.

../../_images/Pipeline_High_Level_Block_Diagram.png

Fig. 3 Voice processing pipeline#

Microphone Inputs#

The XVF3800 captures voice signals through four digital microphones and converts them from Pulse Density Modulation (PDM) to Pulse Code Modulation (PCM). It passes the converted signals to the voice pipeline, along with the far-end signal that is played on the loudspeaker after having passed through a Digital to Analog Converter (DAC) and amplifier.

Acoustic Echo Canceller#

The first stage of the processing pipeline is the Acoustic Echo Canceller (AEC) which uses an adaptive filter to remove the echos of the far end signal from the microphone signals. Each of the microphone signals is processed independently and the output of the AEC is feed into the beamformer.

At startup the AEC calibrates the adaptive filters to match the acoustic path between the loudspeaker and the microphones. This requires some far end audio content to provide a signal to the device. If the AEC detects a significant change to the acoustic path during operation, e.g. if the device is moved, it will initiate a re-convergence operation.

Beamformer#

The beamformer block processes the AEC signals to select the desired speaker. The beamformer contains a set of adaptive filters that coherently add signals from the four microphones to select sounds from a specific direction. This operation enhances the speech to noise level in a specific direction and simultaneously reduces the effects of point noise sources and reverberation effects.

The XVF3800 implements three beams - one free running beam that scans the environment for new speakers, and two focused beams that can track individual speakers. The final stage of the pipeline automatically selects which beam to use as the output from the device.

../../_images/Beamformer.png

Fig. 4 Beamformer Operation#

It is possible to access information on the selected beams from the XVF3800 control interface. The device provides a Direction of Arrival (DoA) measurement indicating the direction of the selected beam.

Post Processor#

Outputs from the beamformer are fed to the post processing stage which further reduces reverberation and suppresses diffuse and point noise sources. This is followed by a gain control block which ensures a consistent output level regardless of the distance of the speaker from the microphone. The final output is passed through a limiter to ensure that any very loud signals do not overload the output.

The output from this pipeline is an enhanced speech signal of the desired near-end speech without echo and reverberation.

Input and Output#

The XVF3800 uses a standard I2S audio interface to transport audio to and from the host system. The XVF3800 supports I2S sample rates of 16 kHz or 48 kHz. Both input and output use the same rate.

The audio pipeline processes data with a sample rate of 16 kHz so, if 48 kHz inputs are used, a Sample Rate Converter block is introduced into the signal path to adapt the rates. The sample rates are set in the firmware and cannot be changed during operation of the device.

A far-end AEC reference signal must be provided on the left (0) channel of the I2S input. Data on the right channel is ignored.

Key parameters#

Table 2 Pipeline paramenters#

Parameter

Value

Notes

Microphones

4 off PDM

eg Infineon IM69D130

Microphone alignment

+/- 2 dB

Geometry

Linear or Square

Frequency range

80 Hz to 8 kHz

Sampling Rate

16 kHz

AEC tail length

192 ms

AEC reference channels

1 mono

Output to DAC

Double Talk Detecion

Continous

Reference delay

0 to 500 ms (fixed)

Align mic & ref signal

Number of beams

3

2 focused + 1 scanning

Beamformer angle

360 degrees

Noise suppression

up to 25 dB

depending on input SNR

Operating distance

0.3 m to 5 m

Beamformer update time

16 ms

Input delay

min 72 ms

Mic In to I2S out

Output delay

typ 50 ms

If far end processing on device is implemented

I2S rate

16 kHz or 48 kHz

Firmware options