Tuning the Application#
The measured performance of the XVF3800 depends very heavily on the electrical and acoustic environment of the end product that it is incporporated into. In order to achieve optimal performance, including the ability to pass product certification tests, it is necessary to perform a configuration and tuning process to adapt the firmware to the end product’s form factor and hardware design.
The majority of this configuration is intended to ensure optimal performance of the XVF3800 audio pipeline, including the behaviour of the Adaptive Echo Canceller (AEC).
The full set of configurable parameters for the XVF3800 is given in the Appendix
This chapter makes heavy use of the xvf_host application to control configuration parameters at run-time. For further documentation on this utility, please see the section Using the Host Application. Throughout this document, the -u [i2c|spi] parameter to this utility will be omitted for brevity.
To facilitiate the tuning process a set of software tools are also supplied to process measurements. These tools are provided as python programs and can be found with the host application in the XVF3800 evaluation release package.
System Preparation#
Prerequisites#
There are a number of prerequisites that should be met in order to facilitate the tuning process:
It must be possible to both play arbitrary reference input through the XVF3800 over I2S and to record the device’s output.
It must also be possible to access the control interface on the XVF3800, either through I2C or SPI as desired.
Create a block diagram of the whole system, showing audio path from input through to output and including the XVF3800. This can be used to understand how to optimise and control the performance of the overall product. Ensure that the path from the reference input through to the loudspeaker and from the microphones to the XVF3800, including any gain, EQ, compression, filtering, and limiting applied, are illustrated. Ensure also that the points where control is available over these parameters (and, more importantly, where it is not) is fully understood.
Ensure a good understanding of the coherence between the individual microphones; see the discussion of microphone coherence in the acoustic guidelines section for details and requirements on this.
Further, ensure a good understanding of the delay between the microphones and the reference signal input; see section on system delay for details, requirements, and terminology surrounding this.
This delay should remain constant while the device is running. Any inconsistency in this delay will result in severely degraded algorithmic performance.
If this delay should change between device reboots, for example due to any front-end processor used to receive the far-end signal, it is important that the device remain causal.
Care should be taken that samples not be dropped between the device’s reference audio input and the XVF3800.
In addition, ensure that any clocking jitter on the interface that carries the reference signal, such as I2S or a USB interface, is minimised.
Prepare (by generation via
sox
or other utility) a set of test signals: silence, a 1 kHz sine wave at 0 dBFS amplitude, and white noise at e.g. -12 dBFS amplitude.Access to the IEEE 269-2010 reference signals is useful for representative clear speech signals. At time of writing, these may be found under the “Additional Resources” header on the webpage for this IEEE standard. Additional speech signals may be found from the ITU, in particular the files associated with Recommendation P.501, which at time of writing may be acquired from the webpage for this ITU recommendation. Files from these two sets will be referred to in this document by filename.
Copy the tuning tools from the release package to a suitable directory on the development system.
Initial Parameter Setting#
There are a selection of parameters that should be chosen before the tuning process starts, and will not be modified during the provided tuning process:
AEC_HPFONOFF: This sets a high-pass filter (HPF) on the microphone signals as they enter the processing block; this takes the form of a 4th order Butterworth filter, and therefore has a -80 dB per decade rolloff. The corner frequency (-3 dB point) for this HPF may be set to 70 Hz, 125 Hz, 150 Hz, 180 Hz, or the filter may be disabled.
AEC_FAR_EXTGAIN: This parameter informs the audio pipeline how much external gain has been applied to the AEC reference signal. The value that this parameter should take is coupled to the volume control of the device; if the device attenuates the signal by e.g. -6 dB, this value should be set to -6.
AEC_AECSILENCELEVEL: This sets a power threshold for signal detection in the AEC. If there is known e.g. ADC induced noise in the reference audio signal line, this parameter may be set to avoid the AEC adapting to this noise.
PP_LIMITONOFF and PP_LIMITPLIMIT: A power limiter may be inserted in line with the processed audio outputs from the audio pipeline using PP_LIMITONOFF. The power threshold used may be set with the PP_LIMITPLIMIT command. If the output energy is predicted to exceed PP_LIMITPLIMIT, compression is applied to the outputs to avoid this.
PP_DTSENSITIVE: PP_DTSENSITIVE allows some control over the balance struck between double-talk performance and echo suppression, including the use of an optional near-end speech detector. This is summarised in Fig. 14; as echo suppression increases, double-talk performance will tend to decrease as more near-end is suppressed.
Initial Tests#
The first step in tuning the product is to ensure that the send, receive, and loopback paths through the XVF3800 are electrically stable and that the XVF3800 has a stable control interface.
Input Path#
This test will attempt to verify that a signal injected into the device through the device’s intended input path successfully reaches the XVF3800. This tests path 1 in Fig. 15. If possible, inject an test signal (such as white noise) through the device’s reference audio input path and monitor the signal path immediately prior to the XVF3800. Consider disabling the device’s loudspeaker for the duration of this test if the test signal chosen would cause auditory discomfort. Verify that the test signal is observed.
Note
If direct monitoring of the signal path immediately prior to the XVF3800 is not feasible, it is permissible to skip this test; its function is implied in later tests.
Control Path#
This test will attempt to verify that the XVF3800 has a stable control interface. This tests path 2 in Fig. 15. Following the guidelines in Using the Host Application, issue:
xvf_host VERSION
and ensure that the device returns v1.0.0.
Output Path#
This test will attempt to verify that a signal injected into the XVF3800 is output faithfully from the XVF3800 and successfully output by the device. This tests paths 1 and 3 in Fig. 15. Set up the XVF3800’s output mux as follows to loop back any I2S data received:
xvf_host AUDIO_MGR_OP_UPSAMPLE 0 0
xvf_host AUDIO_MGR_OP_ALL 10 0 10 2 10 4 10 1 10 3 10 5
Note
As the signals produced here are by definition at the I2S sample rate, they do not require the use of the upsampler in the case of a 48 kHz I2S bus, and therefore we explicitly unset the AUDIO_MGR_OP_UPSAMPLE flags (set to 1 by default in a 48 kHz configuration). If using a 48 kHz I2S bus, be sure that AUDIO_MGR_OP_UPSAMPLE is reset appropriately to accommodate signals that are generated at 16 kHz, including the processed output signals.
Inject a test signal (such as white noise) through the device’s reference audio input path and monitor the signal on the device’s communications output. Verify that the output matches the input signal. This should have a fixed delay, but should otherwise be the raw I2S data, after any customer-specific pre-processing DSP has been applied. Consider disabling the device’s loudspeaker for the duration of this test if the test signal chosen would cause auditory discomfort.
Speaker Operation#
It is advised that the linearity, stability (even operation over the desired frequency range), output level, and total harmonic distortion (THD) be characterised for the loudspeaker(s) in use in the product. Play a test file, such as an IEEE Reference file, through the loudspeaker and observe the output level. The loudspeaker level should be adjusted such that it meets the desired output level target. This tests path 4 in Fig. 15.
Note
The desired loudspeaker output level is usually specified by product certification requirements such as those constructed by Amazon, Microsoft, or Zoom. Refer to your desired certification requirements for appropriate targets for this test.
With the loudspeaker at an appropriate level, observe that there is not audible distortion or nonlinearity present in the speech signal. If desired, make a quantitative measurement of THD to ensure that the loudspeaker is operating as intended. Correct operation of the loudspeaker is essential to the tuning process. Operating the loudspeaker and associated amplifier within their linear region is highly important for the tuning process and for optimal algorithmic performance.
Microphone Operation#
Ensure that the microphone assignment is as expected and that they sound natural and artefact-free. This tests paths 3 and 5 in Fig. 15.
Set up the XVF3800’s output mux as follows to output raw data for microphones 0 and 1:
xvf_host AUDIO_MGR_OP_L 1 0
xvf_host AUDIO_MGR_OP_R 1 1
and set up as follows to output raw data for microphones 2 and 3:
xvf_host AUDIO_MGR_OP_L 1 2
xvf_host AUDIO_MGR_OP_R 1 3
Note
As the microphone signals are decimated to 16 kHz within the XVF3800’s audio manager, the microphone signals require upsampling on a 48 kHz bus - ensure that the AUDIO_MGR_OP_UPSAMPLE flags have been reset after previous testing if using a 48 kHz XVF3800 configuration.
Ensure that each microphone is assigned as expected; this can be achieved by e.g. clicking near or tapping each microphone in turn to ensure that the signal is routed to the expected output. If the microphone assignment is not as expected, then the microphone geometry may be incorrect and therefore Direction of Arrival (DoA) information may be incorrect.
Record some near-end signal (such as speech) and analyse the result for undesirable artefacts, such as noise, distortion, or interference. Ensure that speech through each microphone sounds clear and natural. Verify that each microphone is similar in level, for example by examining a power spectral density plot (PSD) of a known near-end source and observing that each microphone signal has a similar total power.
Tuning the XVF3800 Parameters#
This section will walk through a typical tuning process, step by step. It is advised that, when appropriate values for each tuning parameter are determined, the device firmware is rebuilt with these values as default and the device is reflashed, This process is described in the section Building the application. It is recommended that the device be restarted at the start of each of these tuning steps.
Reference Gain#
The AUDIO_MGR_REF_GAIN parameter is provided to control a gain block placed in the reference audio path after the customer-specific pre-processing DSP stage. The reference audio should be amplified such that any peak amplitude losses through the input path (such as attenuation or filtering prior to the XVF3800 or in the customer-specific pre-processing DSP stage) are accounted for. This gain is applied within the audio manager internal to the XVF3800, and therefore does not have an impact on the signal sent to the loudspeaker.
Set up the XVF3800’s output mux as follows to output pre- and post-gain data for the reference input:
xvf_host AUDIO_MGR_OP_L 4 0
xvf_host AUDIO_MGR_OP_R 12 0
This will set the left output as the pre-gain reference input, and the right output as the post-gain reference input. With default device configuration, these should be the same.
Inject a test signal with a known peak amplitude, such as 0 dBFS white noise, into the device’s reference input and verify that the reference input observed by the XVF3800 is the same level, i.e. with a peak of 0 dBFS. If this is not the case, tune AUDIO_MGR_REF_GAIN such that the post-gain reference input has as maximal a peak value as possible, up to 0 dBFS.
Note
White noise is chosen in this example as it contains equal energy in all frequency bands. This is important in cases where e.g. a filter is applied to the reference signal before the XVF3800 or in the customer-specific pre-processing DSP block. In these cases, a single tone may be attenuated more than other tones, and tuning to this specific frequency may lead the device to clip at other frequencies. If no such filter is applied, a tone (such as a 1 kHz sine wave) may be chosen instead, which has a more predictable peak amplitude in a shorter timeframe.
Note
It is very important that the reference input can never digitally clip. If this is a risk, it is permissible to leave some headroom in this parameter of approximately 1 - 2 dB.
Microphone Gain#
Similarly, the AUDIO_MGR_MIC_GAIN parameter is provided to control a gain block placed in the input path from the microphones. The same gain is applied to all four microphones. To tune AUDIO_MGR_MIC_GAIN, set the left output as a selected microphone post-gain - for example, microphone 0 - and the right input as the reference audio post gain:
xvf_host AUDIO_MGR_OP_L 3 0
xvf_host AUDIO_MGR_OP_R 12 0
Inject a test signal with a known peak amplitude, such as 0 dBFS white noise, into the device’s reference input and observe the relationship between the post-gain reference signal and the post-gain microphone signal. Tune AUDIO_MGR_MIC_GAIN such that the microphone signal has a peak amplitude 6 dB below the reference signal. Observe the other 3 microphone channels and ensure that none exceed 6 dB below the reference signal.
Note
If the microphone signal becomes louder than 6 dB below the reference signal, the AEC may converge to coefficients in the frequency domain greater than 0 dB. This has a significantly negative effect on algorithmic performance, and may lead to instability.
Consider rotating the device, placing it near walls or corners, placing objects in front of the microphones, or exercising other realistic use-cases. Ensure that in each of these cases the post-gain microphone signal does not exceed 6 dB below the reference signal.
System Delay#
With an appropriate gain structure, the next step in the tuning process is to ensure that the product is causal; that is to say, that an event played in the reference audio stream and over the loudspeaker is received by XVF3800 reference input an appropriate amount of time (in samples) before the coupled signal returns through the microphone path. This is very important; if the system is acausal (a signal played into the room in the reference audio stream is received in the microphone inputs before it is received in the reference input) then effective echo cancellation cannot be achieved. By the same token, the microphones should not be overly delayed compared to the reference input; each coefficient in the AEC corresponds to a sample in the time domain, and so if the microphone signal is overly delayed, fewer AEC coefficients will be of use and the overall behaviour of the AEC will be less optimal.
Expanding on the top-level diagram featured in Fig. 15, a more realistic understanding of the main two paths for the reference signal to take can be seen in Fig. 16. If the reference signal takes path A pictured, where it is passed through the XVF3800 before it is then sent to the loudspeaker, then it is highly unlikely that the device will become acausal. If instead the signal is sent via path B pictured, where the reference input is passed to the loudspeaker assembly prior to sending on to the XVF3800, an arbitrary reference path delay has the potential to push the device into acausality if it exceeds the echo path delay between the loudspeaker and the microphones.
In an ideal system, the delay between the reference and microphone signals should be at or less than 40 samples. The AUDIO_MGR_SYS_DELAY parameter allows a configurable delay to be applied to either the microphone signal or the reference signal to achieve this 40 sample difference.
A positive value for this parameter, measured in number of samples, sets a delay on the reference signal; if the delay between microphones and reference is too large, setting this value as positive will reduce this difference. A negative value sets a delay on the microphone signals. Setting this delay to a negative value is the recommended method to correct acausality in the device. Note that this will naturally increase the overall delay from input to output through the device.
To estimate the current causality of the system, use the mic_ref_correlate.py script provided. To obtain the required signals for this script, set the output mux as follows:
xvf_host AUDIO_MGR_OP_L 3 <microphone number>
xvf_host AUDIO_MGR_OP_R 5 0
Note
Causality must be checked for all four microphones, as each of the microphones may have a different echo path delay. Calculate correlation between each microphone and the reference in turn.
With the audio output mux set, generate a test signal with e.g. 5s of silence, followed by 10s of 0 dBFS, followed by 5s of silence. Pass this through the reference input and record the device output in your chosen audio tools, e.g. Audacity. Save the result as a 2 channel WAV file with the left channel (the post-delay post-gain microphone) as channel 0 and the right channel (the looped-back post-delay post-gain reference signal) as channel 1. Use this as the input to the script:
python3 mic_ref_correlate.py [input wav file].wav
A diagram similar to Fig. 17 should be generated.
Fig. 17 shows a 7 sample delay between microphones and the reference signal. This system is causal, but only just. Setting AUDIO_MGR_SYS_DELAY to around 30 will bring the system to the recommended headroom.
This procedure may be repeated after the AUDIO_MGR_SYS_DELAY parameter has been set to verify that the system remains causal and has the desired (less than 40 samples) delay between the reference and microphone inputs. Ensure that the device is causal for all four microphones.
AEC Operation#
To verify the AEC’s operation, play through the reference input a representative test sample, such as IEEE_269-2010_Male_mono_48_kHz.wav.
Allow the AEC to converge. The convergence of the AEC may be monitored by use of the AEC_AECCONVERGED parameter. This is a read-only parameter. Issuing
xvf_host AEC_AECCONVERGED
will present the return value as:
AEC_AECCONVERGED [0|1]
If the returned value is 1, the AEC has converged.
Note
Once this value is set to 1 internally, it is never reset, even if a significant path change or other circumstance forces a significant change in the AEC.
When the AEC reaches convergence (which is expected to take less than 30 seconds), read the AEC coefficients from the device:
xvf_host (-gf | --get-aec-filter) [filename.bin]
These may then be analysed with the read_aec_filter.py script provided:
python3 read_aec_filter.py [filename.bin]
This should generate a plot as shown Fig. 18.
It will also print in the console the value of the peak coefficient in the frequency domain. Ensure that this is below 0 dB for all four microphones. If it is not, reduce AUDIO_MGR_MIC_GAIN to satisfy this.
Observe the period between 0 and 100 samples in the time domain. There should be a strong first peak, as shown in Fig. 18. The location of this peak in the time domain should be the same as the previously observed delay between each microphone and the reference input. If this value is significantly above 40 samples, increase AUDIO_MGR_SYS_DELAY to reduce this. If the time domain response starts with a strong peak at the first sample, this could be an indication that your system is acausal - reduce AUDIO_MGR_SYS_DELAY to attempt to bring the full time domain response into view.
AGC Configuration#
The audio pipeline includes an automatic gain controller (AGC) which is applied equally to all four processed outputs from the XVF3800.
This is controlled by four parameters: PP_AGCGAIN, PP_AGCMAXGAIN, PP_AGCDESIREDLEVEL, and PP_AGCONOFF.
PP_AGCGAIN both controls and reports the current multiplicative gain applied to the output beams by the AGC. The value set as the product default is the initial value. When the AGC is active, this value is then dynamically adjusted to attempt to meet the specified output power.
PP_AGCDESIREDLEVEL is the parameter that sets this desired output power. The signal power of the free-running beam is measured and compared to the value of PP_AGCDESIREDLEVEL, and PP_AGCGAIN is adjusted to attempt to meet it.
PP_AGCMAXGAIN is the maximum value that PP_AGCGAIN may take in operation.
PP_AGCONOFF determines whether the AGC is permitted to adapt or whether the value of PP_AGCGAIN is fixed.
Note
It is important to note that the gain specified by PP_AGCGAIN is always applied, regardless of the value of PP_AGCONOFF; PP_AGCONOFF will only control whether or not this value is permitted to change during operation.
To set appropriate default values for these parameters, set the device’s output mux to output the free-running beam:
xvf_host AUDIO_MGR_OP_L 6 2
xvf_host AUDIO_MGR_OP_R 0 0
Initialise the parameters to sensible default values:
xvf_host PP_AGCGAIN 1.0
xvf_host PP_AGCMAXGAIN 1000
xvf_host PP_AGCONOFF 1
Play a near-end signal, such as IEEE_269-2010_Male_mono_48_kHz.wav, at a nominal level and at a nominal distance. The exact specification for this should be determined by the desired certification. Allow PP_AGCGAIN to converge on a value, record the device output, and observe the output level. Should the device output be too quiet or too loud for the desired certification specification, alter PP_AGCDESIREDLEVEL and allow PP_AGCGAIN to reconverge. Once the device output level is as desired, record the stable value of PP_AGCGAIN. This should then be set as the product’s default value for PP_AGCGAIN.
To configure PP_AGCMAXGAIN, reduce the near-end signal by 10 dB and repeat the above process, allowing PP_AGCGAIN to converge on a stable value. This should become the product’s default value for PP_AGCMAXGAIN.
Emphasis#
The rate at which the AEC converges can be optimised by compensating for the spectral characteristics of the reference signal. If the signal has significant low-frequency energy but proportionally less high-frequency energy, this will affect the AEC’s rate of convergence in the high frequencies, and therefore rate of convergence overall. An optional high shelf boost may be applied to the microphone inputs using the AEC_AECEMPHASISONOFF parameter.
Play a representative voice sample such as IEEE_269-2010_Male_mono_48_kHz.wav as the reference audio and capture the post-gain, post-delay reference signal:
xvf_host AUDIO_MGR_OP_L 5 0
xvf_host AUDIO_MGR_OP_R 0 0
Perform a Fourier transform using Audacity, or equivalent, and identify the peak magnitude value. Compare this to the magnitude of the signal at 8 kHz. It is expected that the magnitude of the signal at 8 kHz will be less than the peak magnitude. If they have similar magnitudes, set AEC_AECEMPHASISONOFF to 0. If the difference in magnitudes is around or greater than 8 dB, set AEC_AECEMPHASISONOFF to 1. If the difference is around or greater than 40 dB, set AEC_AECEMPHASISONOFF to 2.
The impact of tuning this parameter may be observed by measuring the AEC convergence speed. From a fresh restart, set the following parameters to output clear AEC residuals from a selected pair of microphones:
xvf_host PP_MIN_NS 1.0
xvf_host PP_MIN_NN 1.0
xvf_host PP_ECHOONOFF 0
xvf_host PP_NLATTENONOFF 0
xvf_host PP_AGCONOFF 0
xvf_host AUDIO_MGR_OP_L 7 [microphone number]
xvf_host AUDIO_MGR_OP_R 7 [microphone number]
Play a representative voice sample such as IEEE_269-2010_Male_mono_48_kHz.wav as the reference input on a loop for around 60 seconds. Capture the device output; these will be the AEC residuals generated. Observe the spectrogram of the output signal, and verify that the AEC converges evenly for all frequencies; that is to say, that the high frequencies converge as quickly as the low frequencies.
Additional Parameters#
FMIN_SPEINDEX#
PP_FMIN_SPEINDEX is a parameter that controls the frequency-dependent suppression that the device performs in a double-talk environment. In the case of double-talk, the device’s output will suppress frequencies below the value of PP_FMIN_SPEINDEX more than frequencies above the value of PP_FMIN_SPEINDEX
Set the following to output clear AEC residuals from a selected pair of microphones:
xvf_host PP_MIN_NS 1.0
xvf_host PP_MIN_NN 1.0
xvf_host PP_ECHOONOFF 0
xvf_host PP_NLATTENONOFF 0
xvf_host PP_AGCONOFF 0
xvf_host AUDIO_MGR_OP_L 7 [microphone number]
xvf_host AUDIO_MGR_OP_R 7 [microphone number]
Play through the reference input 1 minute of a 0dBFS white noise signal, and capture the AEC residuals that are output from the device. Take a Fourier transform of the interval from 40 - 60 seconds, and plot the magnitude of the coefficients. Set PP_FMIN_SPEINDEX to the highest frequency after which there is no further decrease in the amplitude spectrum; in the spectrum shown in Fig. 19 for example, PP_FMIN_SPEINDEX should be set to around 1200 (1.2 kHz). If the spectrum appears roughly flat from around 500 Hz onwards, with no significant decrease in amplitude at higher frequencies, leave PP_FMIN_SPEINDEX at its default value of 593.75 Hz.
MGSCALE#
The PP_MGSCALE parameter controls additional noise suppression that is applied during periods of far-end activity. The aim is to optimise speech clarity output from the device during periods of stationary far-end activity, while also ensuring that there is good echo suppression in periods of non-stationary far-end activity. An undesirable scenario may arise if there exists unintended low-level noise in the reference signal, from e.g. ADC noise in the reference path. In this scenario, the low-level noise may be erroneously detected as far-end speech; the device may then incorrectly detect that double-talk is present and overly suppress near-end speech. The PP_MGSCALE parameter configures where this trade-off between far-end echo suppression and near-end signal clarity lies.
To tune both the min and max values for the PP_MGSCALE parameter, set the following:
xvf_host PP_GAMMA_E 1.0
xvf_host PP_GAMMA_ENL 1.0
xvf_host PP_GAMMA_ETAIL 1.0
xvf_host PP_ECHOONOFF 1
xvf_host PP_NLATTENONOFF 0
xvf_host PP_MIN_NS 1.0
xvf_host PP_MGSCALE 1 1 1
xvf_host AUDIO_MGR_OP_L 6 3
xvf_host AUDIO_MGR_OP_R 0 0
Play a representative far-end signal such as IEEE_269-2010_Male_mono_48_kHz.wav on loop as the reference input. Provide a just-noticable stationary near-end noise signal. Observe the device output, including the spectrogram. Increasing the value of max - the first parameter to PP_MGSCALE - will reduce the amount of residual echo. Increase max until no further improvements are observed.
Note
A typical value for max will be between 100 and 1000.
Set min to the derived value for max so that the two are equal.
Play silence into the reference input, and provide a representative near-end signal such as IEEE_269-2010_Male_mono_48_kHz.wav. Subjectively listen to the device output. There may be stationary noise present on the far-end, which may cause erroneous echo suppression and therefore erroneous speech distortion. Reducing min can reduce near-end speech distortion at the cost of reduced stationary noise suppression where stationary noise is present in the far-end signal.
Tuning the Non Linear Model#
Non-linear Echo#
It is likely that in all devices, regardless of the quality of the audio design, there will exist some non-linearities. The aim of non-linear estimation is to model the remaining residual echo after linear echo content (including tail echoes) has been removed. This is achieved in the XVF3800 by use of a self-training non-linear model.
It is very important to ensure that non-linear model training takes place in a silent environment, and that the envionment is ideally anechoic; the RT60 of the environment for example should be as low as possible, and absolutely below 0.3s. It is also important to minimise/eliminate any path changes in the environment during non-linear tuning, such as movement of people or objects. This tuning step is very deliberately placed after any gain or pre-processing adjustments have been made. Any changes to the device’s gain structure, including changing any filtering, will require retuning of the non-linear model.
Tuning Setup for Non Linear model#
This tuning process is somewhat lengthy, and so a set of files and associated training script have been provided for this tuning step. The process differs slightly depending on whether the host device can play audio directly through the device (as in Fig. 20) or whether a 3rd machine is required (as in Fig. 21).
Local Device#
For this route, it is assumed that the host device (assumed to be a Raspberry Pi) is also the device that is providing audio to the XVF3800, through e.g. an I2S interface. Locate the nl_model_training.py script provided. Run the script as:
python3 nl_model_training.py <host application> -p <communication protocol>
This will generate an output file with the default name of nlmodel_buffer_override.bin. Copy this file to /sources/applications/nl_model_gen/nlmodel_bin and rerun the build process to generate a binary with this non-linear model set as default. Refer to the docstring for this script for further guidance.
Remote Device#
For this route, it is assumed that a 3rd device is acting as the audio source (here termed the “audio host”). Therefore, to issue control commands, it is necessary to remotely connect to the “control host” (assumed to be a Raspberry Pi) over SSH. Locate the remote_nl_model_training.py script included in the release package. Further, locate the host application binaries on the audio host; these should be located at /host_v0.2.0/rpi in the release package. Ensure that the audio host has passwordless access to the control host over SSH; this may be achieved by generating an SSH key pair and adding the public key to ~/.ssh/authorized_keys on the control host. This script requires that the audio host have an installation of sox on its path, as well as Python 3 with matplotlib and asyncssh installed via Pip. Ensure that the default loudspeaker and microphone on the audio host are set as the device to be tuned.
From the audio host, run the script as:
python3 remote_nl_model_training.py <control host IP address> <host application binary path on the audio host>
Once the script has run, locate the generated nlmodel_buffer_override.bin and corresponding plot in the src.autogen directory. Copy this file to /sources/applications/nl_model_gen/nlmodel_bin and rerun the build process to generate a binary with this non-linear model set as default. Refer to the docstring for this script for further guidance.
Echo Suppression#
With the non-linear model trained, we are now in a position to balance echo suppression against speech distortion. Five tuning parameters are relevant for this section:
PP_ECHOONOFF: This parameter sets whether echo suppression is enabled or disabled overall.
PP_NLATTENONOFF: This parameter sets whether non-linear echo suppression is enabled or disabled.
PP_GAMMA_E: This parameter adjusts the oversubtraction factor for direct and early echo suppression.
PP_GAMMA_ENL: This parameter adjusts the oversubtraction factor for non-linear echo suppression.
PP_GAMMA_ETAIL: This parameter adjusts the oversubtraction factor for echo tail suppression.
For the PP_GAMMA_* parameters, a value of 1.0 indicates that the device has correctly estimated and suppressed the respective echo classes from the output. Increasing these values increases the amount of suppression, and indicates that the device has underestimated the amount of echo in the outputs. It is unlikely for the device to overestimate the amount of echo, and so it is not advised to set these parameters to values below 1.0. A typical range for these parameters is between 1.0 and 1.7.
Increasing these parameters will always affect the quality of the speech signal. Attempt in the first instance to create as good an acoustic design as possible, with a linear loudspeaker, good quality microphones, and a maximally non-linear enclosure. This will reduce or eliminate the need to adjust these values, and will present a more performant device.
PP_GAMMA_E and PP_GAMMA_ENL#
The objective of tuning these two parameters is the removal of echoes to pass Teams EQUEST and ECC specifications. Ensure that this tuning step takes place in an anechoic or mildly reverberant envionment, with an RT60 less than 0.3 s.
Set the device as follows to output the autoselect beam in the left channel and the AEC residual signal for microphone 0 in the right channel:
xvf_host PP_AGCONOFF 0
xvf_host PP_MIN_NN 1.0
xvf_host PP_MIN_NS 1.0
xvf_host PP_GAMMA_E 1.0
xvf_host PP_GAMMA_ENL 1.0
xvf_host PP_GAMMA_ETAIL 1.0
xvf_host AUDIO_MGR_OP_L 6 3
xvf_host AUDIO_MGR_OP_R 7 0
Play through the reference input a representative signal, such as IEEE_269-2010_Male_mono_48_kHz.wav. Allow the AEC to converge After 30 seconds, play a representative near-end signal in addition to the far-end signal, to place the device into a representative double-talk scenario. Listen to the device’s output. Starting with the default value of 1.0, adjust PP_GAMMA_E to make the trade-off between double-talk performance and echo suppression. Should the value of PP_GAMMA_E need to exceed around 1.4 to achieve acceptable performance, consider adjusting PP_GAMMA_ENL instead, especially if the echoes that remain in the AEC residual signal are of a non-linear nature.
To identify the nature of the echoes that remain, listen to the AEC residual signal and categorise the residual echoes as follows:
Linear residual echoes: Echoes can be understood at a low level but do not sound distorted and do not sound reverberated. Controlled by PP_GAMMA_E.
Tail echoes: Echos sound reverberated, with no direct component. Controlled by PP_GAMMA_ETAIL.
Non-linear echoes: Echoes sounds very distorted, with no discernable e.g. speech content. Controlled by PP_GAMMA_ENL.
PP_GAMMA_ETAIL#
To tune this parameter, repeat the above procedure in a moderately reverberant room (with an RT60 between 0.3 and 0.9s). Clear tail echoes should be observed in the residual signal, and these tail echoes should be improved by adjustment of PP_GAMMA_ETAIL. Adjust this parameter, making a trade-off between double-talk performance and echo suppression.
Noise Suppression#
Two parameters control suppression of stationary and non-stationary noise in the device output: PP_MIN_NS and PP_MIN_NN respectively. These parameters take values between 0 and 1, representing the multiplicative attenuation of these two noise sources. For example, PP_MIN_NS is set to 0.15 by default, representing a roughly 15 dB attenuation of stationary noise in the device output. It is recommended that PP_MIN_NN is set by default to 0.51 or higher, representing at most a 6 dB attenuation of non-stationary noise in the device output. Reducing this value further may have significant impact on near-end speech quality, especially in reverberant environments.
To tune these parameters, set the device as follows:
xvf_host PP_AGCONOFF 0
xvf_host PP_MIN_NS 0.15
xvf_host PP_MIN_NN 0.51
xvf_host AUDIO_MGR_OP_L 6 3
xvf_host AUDIO_MGR_OP_R 0 0
Play a representative near-end signal, such as IEEE_269-2010_Male_mono_48_kHz.wav. Subjectively evaluate the device output, first noting the presence of stationary noise. Reduce PP_MIN_NS to suppress this noise further. Reducing this parameter may introduce or increase distortion in near-end speech; ensure that a balance is struck between speech quality and stationary noise suppression. Note next the presence of non-stationary noise. Reduce PP_MIN_NN to suppress this noise further. As with PP_MIN_NS, reducing this parameter may introduce distortion in near-end speech, particularly in reverberant environments. Ensure that an appropriate balance is struck between speech quality and non-stationary noise suppression.
ATTNS#
The ATTNS parameters (PP_ATTNS_MODE, PP_ATTNS_NOMINAL, and PP_ATTNS_SLOPE) control an additional reduction in AGC gain during non-speech periods. Collectively, they attempt to combat an undesirable side-effect of the use of an AGC - the tendency to noticeably amplify noise in non-speech periods when the near-end speech signal is quiet. The Zoom Rooms specification test 7.4.3 (as of writing, last issued in October 2019) sets limits on how amplified this non-speech noise may be when the AGC is at a high gain compared to the noise level when the AGC is at a low gain. Therefore, by attenuating noise at a greater strength when the AGC is at a high gain we may reduce this noise and achieve better performance in these tests.
The overall behaviour of the ATTNS may be selected with the PP_ATTNS_MODE parameter, which both functions as a specifier of whether the ATTNS is in use and whether bias is applied against selecting beams with high noise as the autoselected beam.
0 - The ATTNS is off
1 - The ATTNS is on, with an additional check to ensure no beam with high noise is selected as the autoselect beam (recommended if in use)
2 - The ATTNS is on, with no additional check
When the ATTNS is on, PP_ATTNS_NOMINAL and PP_ATTNS_SLOPE control the additional attenuation proportional to:
where AGCGAIN_INIT is the value of PP_AGCGAIN set as the default value at initialisation, and AGCGAIN_CURRENT is the current value of PP_AGCGAIN. Because of this module’s relationship with the current value of PP_AGCGAIN, this module has no effect when PP_AGCONOFF is set to 0.
Because both the Teams v4 and Zoom Rooms specifications specify this suppression as a ratio between the noise at a nominal speech level and the noise at a low speech level, it may be required to tune both of these parameters in parallel; changing one may have an effect on the required value for the other, and vice-versa.
To tune these parameters, ensure that PP_AGCGAIN and PP_AGCMAXGAIN are tuned correctly, then perform the following:
ATTNS_NOMINAL#
Issue the following to set appropriate default values for this tuning step:
xvf_host PP_AGCONOFF 1
xvf_host PP_ATTNS_MODE 1
xvf_host PP_ATTNS_NOMINAL 1.0
xvf_host PP_ATTNS_SLOPE 0.0
Play a representative near-end signal at a nominal level, such as the ITU P.501 7.3.2 reference signal FB_male_female_single-talk_seq.wav. Setting ATTNS_NOMINAL > 1 should provide more noise suppression during silence. With a particular specification in mind, increase this value until desired/specified noise suppression is achieved during the test conditions. For example, this could be done by monitoring average A-weighted noise during the period 1s after the end of a sentence in the reference signal and ensuring that it is within satisfactory bounds; this is usually specified as a ratio between this value and the averaged value obtained with the near-end signal at a range of low levels.
ATTNS_SLOPE#
Issue the following to set appropriate default values for this tuning step:
xvf_host PP_AGCONOFF 1
xvf_host PP_ATTNS_MODE 1
xvf_host PP_ATTNS_NOMINAL <default found in previous step>
xvf_host PP_ATTNS_SLOPE 1.0
Play a representative near-end signal at a nominal level, such as the ITU P.501 7.3.2 reference signal FB_male_female_single-talk_seq.wav. Setting ATTNS_SLOPE > 1.0 provides additional noise suppression during silence, proportional to an increased AGC gain. With a particular specification in mind, increase this value until desired/specified noise suppression is achieved during the test conditions. For example, this could be done by monitoring average A-weighted noise during the period 1s after the end of a sentence in the reference signal and ensuring that it is within satisfactory bounds; this is usually specified as a ratio between this value and the averaged value obtained with the near-end signal at a range of low levels.
Path Change Detection#
The XVF3800 provides a facility to detect significant path changes in the device’s environment such as handling the device and moving to a different location using a module called the Path Change Detector (PCD). If a path change is detected, heavy near-end suppression during far-end activity is applied in order to allow the AEC time to reconverge to its new environment. If the device incorporating the XVF3800 is not intended for a mobile application (for example, a wall-mounted sound bar), then detection of path changes is not necessary.
The PCD may be tuned using the AEC_PCD_COUPLINGI, AEC_PCD_MINTHR, and AEC_PCD_MAXTHR parameters.
AEC_PCD_COUPLINGI controls the rate of detection of a path change, and takes a value between 0 and 1. Setting this to a low value encourages fast detection of path changes at the increasing risk of false positives during double-talk. Setting this to a high value slows detection of path changes (and increases the detection threshold, meaning some small changes may be missed) but reduces the risk of false positives in double-talk. Setting this parameter to a value outside of the range 0 to 1 will disable the PCD. Tuning of this parameter is necessarily very situation- and product-dependent. Monitoring of the AEC_AECPATHCHANGE parameter can allow insight into whether a path change has been detected; reading a 1 value implies that a path change has recently been detected and that the device output is currently heavily suppressed during far-end activity. This parameter will reset to 0 after the AEC has reconverged.
AEC_PCD_MINTHR and AEC_PCDP_MAXTHR are used to set sensitivity thresholds, and their use depends on the overall Echo Return Loss Estimate (ERLE) of the device. For devices with a high ERLE (implying a high ratio between the provided reference signal and the resultant AEC residual, and therefore high cancellation), use AEC_PCD_MINTHR to limit the lower bound. Decreasing this value from its default of 0.02 will increase the sensitivity of the PCD. For devices with a low ERLE, use AEC_PCD_MAXTHR to limit the upper bound. Decrease this value from its default of 0.2 to increase the sensitivity of the PCD.
Changing Default Parameter Values#
The default parameters set at start-up are loaded from the file product_defaults.c
in sources/applications/app_xvf3800/src/default_params
. In this file
the values are included from some header files auto-generated at compile time. The values used in product_defaults.c
must be updated using the YAML files stored in
sources/applications/app_xvf3800/cmd_map_gen/yaml_files/defaults/
.
Note
Any default value set outside the YAML files will be overwritten at compile time.
In the defaults
folder four files are present:
mic_geometries.yaml
control_param_values.yaml
gpi_config.yaml
gpo_config.yaml
Warning
All the parameters in the files above must be set; failure to do this can lead to unexpected behaviour of the device, such as uninitialized start-up values.
mic_geometries.yaml
contains the coordinates for each of the 4 mics for both the linear and squarecular geometries. An example of the values is below:
LINEAR_GEOMETRY:
- MIC0: ( -0.04995f, 0.00f, 0.00f )
- MIC1: ( -0.01665f, 0.00f, 0.00f )
- MIC2: ( 0.01665f, 0.00f, 0.00f )
- MIC3: ( 0.04995f, 0.00f, 0.00f )
SQUARECULAR_GEOMETRY:
- MIC0: ( 0.0333f, -0.0333f, 0.00f )
- MIC1: ( 0.0333f, 0.0333f, 0.00f )
- MIC2: ( -0.0333f, 0.0333f, 0.00f )
- MIC3: ( -0.0333f, -0.0333f, 0.00f )
The user must update the value in the geometry used in their build. For more information about how to set these values, please refer to the Direction of Arrival section in the User Guide.
control_param_values.yaml
lists all the control parameters which can be configured. An example of a parameter with a default value is below:
PP_RESID:
- cmd: PP_AGCONOFF
default_value: on
The parameters in the file are organized into arrays, and each array contains all the parameters related to a particular control resource ID. The parameter name is stored in the cmd
key
and the default value in the default_value
key. In the example above, the default value of the parameter PP_AGCONOFF belonging to the PP_RESID is on.
The number of values and type of each parameter may vary from command to command. It is advised to look up the command information in the tables in an Appendix of the User Guide and to follow the format of
the original default values in order to set the values properly.
gpi_config.yaml
stores all the settings of the GPI pins. The XVF3800 has 2 configurable GPI pins and the following parameters can be modified:
active_level: 0 for low and 1 for high
event_config: four types of events are supported:
EdgeNone: no event is detected on either edge
EdgeFalling: an event is detected on the falling edge (high to low transition)
EdgeRising: an event is detected on the rising edge (low to high transition)
EdgeBoth: one event is detected on the rising edge and one on the falling edge
The default configurations of the GPI pins are below:
# Exactly GPIO_NUM_INPUT_PINS pins should be defined here
PIN0:
active_level: 1
event_config: EdgeNone
PIN1:
active_level: 1
event_config: EdgeNone
gpo_config.yaml
lists the settings of all the GPO pins and ports. The XVF3800 device has one 8-bit port designated for GPO. Only five of the eight are pinned out, and three pins are required for the device to operate normally, leaving the remaining two pins available for user modification. These pins are number 6 and 7, and they are used to control the LEDs in the default firmware.
In the file each port must be listed; the XVF3800 only implements PORT0. For each port an array of eight pins must be defined and each pin has the following configurable settings:
pin_number: this value shouldn’t be modified
active_level: 0 for low and 1 for high
output_duty_percent: Pulse-width modulation (PWM) duty cycle specified as a percentage
flash_serial_mask: serial flash mask where each bit specifies the GPO pin state for a 100 ms time period interval
The default configurations of the GPO port and pins are below:
PORT0:
# UNUSED: NOT PINNED OUT
- pin_number: 0
active_level: 1
output_duty_percent: 0
flash_serial_mask: 0xFFFFFFFF
# UNUSED: NOT PINNED OUT
- pin_number: 1
active_level: 1
output_duty_percent: 0
flash_serial_mask: 0xFFFFFFFF
# UNUSED: NOT PINNED OUT
- pin_number: 2
active_level: 1
output_duty_percent: 0
flash_serial_mask: 0xFFFFFFFF
# GPO_DAC_RST_N_PIN
- pin_number: 3
active_level: 1
output_duty_percent: 100
flash_serial_mask: 0xFFFFFFFF
# GPO_SQ_nLIN_PIN
- pin_number: 4
active_level: 1
output_duty_percent: 0
flash_serial_mask: 0xFFFFFFFF
# GPO_INT_N_PIN
- pin_number: 5
active_level: 1
output_duty_percent: 100
flash_serial_mask: 0xFFFFFFFF
# GPO_LED_RED_PIN
- pin_number: 6
active_level: 0
output_duty_percent: 0
flash_serial_mask: 0xFFFFFFFF
# GPO_LED_GREEN_PIN
- pin_number: 7
active_level: 0
output_duty_percent: 0
flash_serial_mask: 0xFFFFFFFF
Warning
All the parameters in the files above must be set; failure to do this can lead to unexpected behaviour of the device, such as uninitialized start-up values.
When the default parameters are changed it is necessary to rebuild the application and reload onto the XVF3800 as described in the following section. See Building an Executable.