Voice Processing Pipeline#
Overview and Key Features#
The XVF3800 integrates a set of advanced Digital Signal Processing (DSP) algorithms that include Acoustic Echo Cancellation (AEC), beamforming, dereverberation, noise suppression and automatic gain control. These advanced DSP algorithms deliver high speech-to-noise ratio, naturally sounding speech and eliminate acoustic echo while maintaining a transparent and low latency communication link.
The key features of the XVF3800 solution are:
High levels of Acoustic Echo Cancellation and Suppression in conferencing and living room conditions.
State of the art, robust, and natural double-talk / full-duplex performance.
High speech clarity level even when users are at several meters distance, without requiring directional microphones.
Fast adaptive beamforming for tracking multiple near-end users.
Stationary / diffuse noise suppression.
Automatic gain control.
Main Functional Blocks#
A high level diagram of the solution is shown below.
Microphone Inputs#
The XVF3800 captures voice signals through four digital microphones and converts them from Pulse Density Modulation (PDM) to Pulse Code Modulation (PCM). It passes the converted signals to the voice pipeline, along with the far-end signal that is played on the loudspeaker after having passed through a Digital to Analog Converter (DAC) and amplifier.
Acoustic Echo Canceller#
The first stage of the processing pipeline is the Acoustic Echo Canceller (AEC) which uses an adaptive filter to remove the echos of the far end signal from the microphone signals. Each of the microphone signals is processed independently and the output of the AEC is feed into the beamformer.
At startup the AEC calibrates the adaptive filters to match the acoustic path between the loudspeaker and the microphones. This requires some far end audio content to provide a signal to the device. If the AEC detects a significant change to the acoustic path during operation, e.g. if the device is moved, it will initiate a re-convergence operation.
Beamformer#
The beamformer block processes the AEC signals to select the desired speaker. The beamformer contains a set of adaptive filters that coherently add signals from the four microphones to select sounds from a specific direction. This operation enhances the speech to noise level in a specific direction and simultaneously reduces the effects of point noise sources and reverberation effects.
The XVF3800 implements three beams - one free running beam that scans the environment for new speakers, and two focused beams that can track individual speakers. The final stage of the pipeline automatically selects which beam to use as the output from the device.
It is possible to access information on the selected beams from the XVF3800 control interface. The device provides a Direction of Arrival (DoA) measurement indicating the direction of the selected beam.
Post Processor#
Outputs from the beamformer are fed to the post processing stage which further reduces reverberation and suppresses diffuse and point noise sources. This is followed by a gain control block which ensures a consistent output level regardless of the distance of the speaker from the microphone. The final output is passed through a limiter to ensure that any very loud signals do not overload the output.
The output from this pipeline is an enhanced speech signal of the desired near-end speech without echo and reverberation.
Input and Output#
The XVF3800 uses a standard I2S audio interface to transport audio to and from the host system. The XVF3800 supports I2S sample rates of 16 kHz or 48 kHz. Both input and output use the same rate.
The audio pipeline processes data with a sample rate of 16 kHz so, if 48 kHz inputs are used, a Sample Rate Converter block is introduced into the signal path to adapt the rates. The sample rates are set in the firmware and cannot be changed during operation of the device.
A far-end AEC reference signal must be provided on the left (0) channel of the I2S input. Data on the right channel is ignored.
Key parameters#
Parameter |
Value |
Notes |
---|---|---|
Microphones |
4 off PDM |
eg Infineon IM69D130 |
Microphone alignment |
+/- 2 dB |
|
Geometry |
Linear or Square |
|
Frequency range |
80 Hz to 8 kHz |
|
Sampling Rate |
16 kHz |
|
AEC tail length |
192 ms |
|
AEC reference channels |
1 mono |
Output to DAC |
Double Talk Detecion |
Continous |
|
Reference delay |
0 to 500 ms (fixed) |
Align mic & ref signal |
Number of beams |
3 |
2 focused + 1 scanning |
Beamformer angle |
360 degrees |
|
Noise suppression |
up to 25 dB |
depending on input SNR |
Operating distance |
0.3 m to 5 m |
|
Beamformer update time |
16 ms |
|
Input delay |
min 72 ms |
Mic In to I2S out |
Output delay |
typ 50 ms |
If far end processing on device is implemented |
I2S rate |
16 kHz or 48 kHz |
Firmware options |