AEC Tuning and Integration Hints¶

Careful system design-acoustics, hardware quality, and algorithm placement-is essential for robust echo cancellation performance. Any non-linearities in the system-such as poor loudspeaker quality, an overloaded microphone, or mechanical resonances-can severely degrade AEC performance because they create signal components that the adaptive filter cannot properly model.

This section provides hints and guidelines to consider for an AEC-friendly design.

Number of AEC Processing Filters¶

In practical systems, an AEC block is required for every acoustic path from each loudspeaker to each microphone input channel.

../../_images/image38.svg — Fig. 19 Adaptive voice frontend with two input channels and two output channels (with smart amplifiers). A total number of four AEC blocks is needed. In case the amplifiers/DACs do not provide reference signals, the far-end-processor provides those signals internally.¶

For example:

A classic handsfree-system for communication with four microphones and one loudspeaker (= one reference channel) needs 4 AEC filters.
A system with two loudspeakers and four microphones feeding an adaptive beamformer requires 8 AEC filters (2 x 4 paths).
Two (identical) speakers playing back the same amplifier channel are considered as a single AEC reference channel.

For fixed beams, the echo-canceller can be placed behind the beam-Filtering algorithm:

A system with two loudspeakers and a single fixed beamformer created from four (or any number of) microphones needs only 2 AEC blocks. Echo cancellation happens after beamforming.

Microphone¶

MEMS microphones with a digital interface (PDM or I2S) are in general produced with tight tolerances and are thus recommended for any array application.
The noise floor should be much lower than any expected signal. In general manufacturer specifications list the signal to noise ratio (SNR) as the difference in noise floor compared to the level of the signal when a 94 dBSPL acoustic signal is detected. In this case, an appropriate SNR would be at least 60 dB.
The sensitivity of an analog-digital microphone set (analog mic plus ADC) should be chosen so the digital output is reasonable given the expected sound levels. For far field voice approximately -30 dBFS @ 94 dBSPL would be appropriate. For the SNR of 60 dB listed above this would mean the noise floor is lower than -90 dBFS and signals +30 dB in level can be captured without clipping. If higher signals can be expected and the noise floor of the used ADC is low, 0 dBFS should be calibrated to match the microphones max SPL (AOP), e.g. 130 dBSPL = 0 dBFS, so 94 dBSPL = -36 dBFS.
Distortion should be low for the expected levels present. For most applications THD should be -40 dB or lower in the level range of interest.
Maximum SPL: The levels from the playback channel of the device itself will potentially be louder than any speech signals, so the maximum level of the playback channel at the microphone positions should be used as the reference when considering distortion and the required maximum SPL capabilities.
The frequency response should cover the desired voice band. For a wideband application, a response from 100 Hz to 10 kHz should be easily obtainable.
For most applications the microphone directivity should be omnidirectional. All array processing algorithms assume omnidirectional microphones.

Loudspeaker¶

The loudspeaker, power amplifier/DAC or Smart Amplifier are considered together as the playback path of the product. This should have the following characteristics:

The frequency response must cover the intended application: For a product intended for voice applications this should cover the desired voice band, e.g., wideband should cover 250 Hz to 6.3 kHz. For a product which can also play music etc., the loudspeaker can have a wider frequency response. This should not degrade the performance of the XMOS voice pipeline. E.g. a playback-path with high resolution sample-rate (e.g. 48 kHz or 96 kHz) will work well with 16 kHz sample-rate in the voice processing path.
The distortion products of the playback path should be at least -30 dB below the (linear) output signal - preferably even lower. Like the microphone distortion, this will depend on level. Assuming the distortion is sufficiently low at nominal playback levels, it could be acceptable to have higher distortion at higher levels. This will degrade performance of the voice pipeline. When designing a speaker, it is often possible to compromise loudspeaker frequency response for less distortion. If an application needs good AEC at high SPL levels, always pick the speaker with the least distortion at high levels. The imperfections in the frequency response can be equalised out be the integrated (Far End)-DSP.
After designing a housing/enclosure with the lowest amount of rattling and resonance, tuning the Far-End DSP is a key-factor to get good AEC at higher levels. Use (multiband-) compression and notch filters to avoid frequencies and levels where high distortion or enclosure rattling will affect the AEC performance.
The AEC reference signal should contain any signal processing applied to the playback signal. Thus it should be picked up as close to the speaker as possible. If a smart amplifier with integrated dynamic processing is used, the AEC-Reference channel needs to be connected from the amplifier to the XCORE. Most smart amplifiers provide an AEC reference channel as I2S source (output). The system should be designed such that the latency between the reference channel and the microphone signal does not vary.
If stereo playback is considered, the cross talk between channels should be minimal, however, this should not affect the performance of the voice pipeline significantly.

Housing/Enclosure¶

The geometry of the device housing plays a major role in acoustic echo cancellation, as it determines the strongest and most direct acoustic path from the loudspeaker to the microphones.

Microphone Placement¶

In cases where multiple microphones are used, the microphone spacing must be considered. The spacing requirements will vary depending on which voice pipeline is being designed for, but broadly the microphones should not be too close together or too far apart. Approximate 5-10 cm spacing is normally appropriate.
The acoustic channel (sound inlet) to the microphone(s) should be kept as short as possible (a few millimetres). The internal volume of the microphone and the microphone port create a Helmholtz resonance, normally well above the target frequency range. Long acoustic paths can bring this resonance down into the operational frequency range and can cause severe distortion even at medium levels.

Feedback Path Minimisation¶

The feedback path from the loudspeaker to the microphones should be minimised. Several possible paths are present:

The acoustic path outside of the enclosure can be minimised by increasing the distance between loudspeakers and microphones or by choosing a more directional loudspeaker.
The acoustic path inside the enclosure can be minimised by sealing the rear chambers behind the loudspeakers and microphones.
The vibration path in the physical structure of the enclosure can be minimised by mechanical design of the housing structure, choice of materials and mounting of the transducers.

Vibration and Noise¶

Vibration of loose parts in the product can cause rattles and buzzes which creates non-linearity and will drastically reduce echo performance. Ensure all components, panels and connectors are fixed. Spending a small effort on making the enclosure design sturdy (e.g. added stiffeners or reinforcement ribs) can have a big effort on sound without increasing costs for tooling or moulding.

Microphones should be prevented from detecting any additional noise sources from the product, e.g., cooling fans. Similar measures should be taken as to the loudspeaker feedback path.