VocalFusion Stereo Dev Kit for Amazon AVS

Part #: XK-VF3500-L33-AVS
Silicon on Board: XVF3500-FB167-C

The VocalFusion™ Stereo Dev Kit for Amazon AVS features a compact four-microphone linear array, that enables developers and OEMs to add far-field voice capture to consumer electronics and IoT products. The linear design is optimised for integration into smart TVs, soundbars, set-top boxes, digital media adaptors and any other electronic product that requires true stereo acoustic echo cancellation.

The captured voice signals are crystal clear even in noisy environments, enabling conversations and commands to be accurately processed and passed to the Alexa cloud-based speech recognition system using an applications processor such as a Raspberry Pi.

The kit, qualified by Amazon, provides direct interfacing to four PDM (Pulse Density Modulation) microphones, with I2S interface (USB optional) to connect to application or host processors. Voice sources are isolated from unwanted noise with the integration of advance DSP techniques including beamforming, echo cancellation and noise suppression. A rich set of optimisation parameters are available to ensure that the best results are achieved for the individual acoustics of the end product. These parameters include adjustment to noise attenuation and gain control as well as numerous optimisations for echo cancellation.


VocalFusion XVF3500

- High performance smart microphone for voice interfaces
- Integrated microphone and voice DSP
- Integrated keyword detection
- Integrated USB 2.0 PHY for high and full-speed host and device operation
- FB167 package 0.4mm pitch

Audio output

- I2S output to stereo DAC
- 48kHz PCM

Microphone interface

- Direct interfacing to 4 PDM microphones
- 4-mic linear array, with IFX mics
- Inter-mic spacing: 33.33mm

Host processor interface options

- I2S interface to Raspberry Pi sub-system with I2C for control
- High speed USB2.0 compliant device

Flash memory

- 2048KB on-board QSPI flash




- Adaptive de-reverberation beamformer
- Tracks the loudest voice
- Adaption time 100ms (typically)
- 180° coverage
- Up to 5m operating distance

Acoustic Echo Canceller

- Stereo-AEC with barge-in support
- Up to 50dB suppression
- Initial convergence <2s
- Adaption time 100ms (typically)

Noise Suppression

- Up to 15dB stationary noise suppression
- Up to 15dB diffuse noise suppression

Far-end processing

- Dynamic Range Control
- Equalizer