Automatic Gain Control¶
The Automatic Gain Control (AGC) component provides an API to implement Automatic Gain Control within an application. The AGC algorithm can dynamically adapt the audio gain, or apply a fixed gain such that voice content maintains a desired output level. The AGC uses a Voice to Noise Ratio Estimator to normalise voice content and avoid amplifying noise sources and applies a soft limiter to avoid clipping on the output. The design is based on standard modern AGC techniques as detailed in Acoustic Echo and Noise Control by Hansler and Schmidt.
The gain control can adapt to maintain the amplitude of the peak of the frame within an upper and lower bound configured for the AGC instance. When used in an application with the VNR, the AGC will adapt only when voice activity is detected, so that speech in the input signal is amplified above other sounds.
The Loss Control process improves the subjective audio quality by attenuating any residual echo of the reference far-end audio. It is designed to be used on the communications channel. In cases where there is both far-end echo and near-end audio then the attenuation is reduced, allowing listeners to interrupt each other. The Loss Control relies on the Acoustic Echo Canceller to classify and attenuate residual far-end echo.
An optional soft clipping stage is applied at the end of the AGC to avoid hard clipping of the output signal during sudden loud sounds.
Fig. 20 The AGC topology.¶
Overview¶
The AGC component in lib_voice works on a single input channel, dynamically adapting
the audio gain to maintain voice content at a desired output level while avoiding amplification
of noise sources. The AGC operates at a fixed 16 kHz sample rate.
The AGC uses the Voice to Noise Ratio Estimator to detect voice activity and normalise voice content, ensuring that gain adaption only occurs during speech and not on noise. An optional Loss Control feature can be enabled to attenuate residual far-end echo using metadata from the Acoustic Echo Canceller. A soft limiter can also be applied to prevent clipping on the output.
If multiple channels need to be processed by the application, or multiple outputs are required, an independent instance of the AGC must be run for each channel.
Signal Representation¶
Gain control is performed on a frame-by-frame basis. Each frame consists of 15 ms of audio (240 samples at 16 kHz input sampling frequency), with input data expected in fixed-point 32-bit 1.31 format. The output is the gain-adjusted signal in the same format.
Processing Flow¶
The internal logic of the AGC algorithm is represented in the flow chart shown in Fig. 21. This diagram illustrates the main decision points and processing steps performed for each input frame. It shows how the AGC determines whether to adapt the gain based on voice activity, applies peak and threshold checks, manages loss control, and optionally performs soft clipping.
Fig. 21 AGC Logic Flow Chart¶
The logic of the loss control process is shown in the flow chart in Fig. 22. This diagram illustrates how the loss control estimates the state and applies the appropriate attenuation based on the presence of far-end echo and near-end audio. It is only used when the loss control feature is enabled in the AGC configuration.
Fig. 22 Loss Control Logic Flow Chart¶
A startup delay can be configured to mute output for a specified number of frames after initialisation.
Usage¶
Before processing any frames, the application must configure and initialise the
AGC instance by calling agc_init(). Several parameter sets are provided in
agc_profiles.h which can be used to configure the AGC for different
applications. Details on the profiles and key parameters are provided in AGC Data Structures and Enums.
After initialisation, agc_process_frame() should be called for each frame.
This will update the AGC instance’s internal state and produce
the output frame by applying the AGC algorithm to the input frame.
Refer to the Pipeline example to see how to use the APIs above.
The AGC gain and Loss Control gain values are multiplicative factors that are applied to scale the input frame. Therefore, a fixed gain value of 1.0 (without loss control) will create no change to the input.
Parameters¶
The key AGC parameters are highlighted below:
agc_config_t.adapt- Boolean to enable AGC adaption; if enabled, the gain to apply will adapt based on the peak of the input frame and the upper/lower threshold parameters.agc_config_t.vnr_threshold- VNR threshold for voice activity detection. A higher value will only adapt the AGC on clean speech. A lower value will adapt the AGC on noisy speech, but may also adapt to more non-speech signals.agc_config_t.gain- The current gain to be applied, not including loss control. When adapt is false, this gain will be applied to every frame. When adapt is true, the initial value of this gain will be applied to the first frame and then it will be adapted on subsequent frames.agc_config_t.max_gain- The maximum gain allowed when adaption is enabled. This can be used to prevent the AGC amplifying very quiet signals.agc_config_t.upper_threshold- The target maximum peak level of the AGC output. If the AGC output goes above this level, the gain is reduced.agc_config_t.lower_threshold- The target minimum peak level of the AGC output. If the AGC output goes below this level, the gain is increased.agc_config_t.soft_clipping- Boolean to enable soft-clipping of the output frame.agc_config_t.lc_enabled- Boolean to enable loss control. The loss control applies additional attenuation when there is no near end speech. This must be disabled if the application doesn’t have an AEC or VNR.agc_config_t.lc_near_delta- Delta multiplier used when only near-end activity is detected. How many times louder the near-end signal must be than the background noise when there is no far-end playback. If the near end speech is not heard during silence, reduce this value. If too much non-speech background noise is heard, increase this value.agc_config_t.lc_near_delta_far_active- Delta multiplier used when both near-end and far-end activity is detected. How many times louder the near end signal must be above the residual far-end speech (after the AEC) to be detected during double talk. If the near end speech is not heard during double talk, reduce this value. If there is too much breakthrough of residual far-end echo when there is no near-end speech present, increase this value.agc_config_t.lc_gain_double_talk- Loss control gain to apply when double-talk is detected. Reducing this value will reduce the level of the near-end speech during double-talk, but may help to reduce the level of residual far-end echo that is heard.
Other AGC parameters are described in the agc_profiles.h header file,
and are described in detail in agc_config_t.