XCORE ® -VOICE Solutions£££doc/index.html#xcore-reg-voice-solutions

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide£££doc/quick_start_guide/index.html#xcore-voice-quick-start-guide

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Product Description£££doc/quick_start_guide/01_introduction.html#product-description

The XCORE-VOICE Solution consists of example designs and a C-based SDK for the development of audio front-end applications to support far-field voice use cases on the xcore.ai family of chips (XU316). The XCORE-VOICE examples are currently based on FreeRTOS or bare-metal, leveraging the flexibility of the xcore.ai platform and providing designers with a familiar environment to customize and develop products.

XCORE-VOICE example designs include turn-key solutions to enable easier product development for smart home applications such as light switches, thermostats, and home appliances. xcore.ai’s unique architecture providing powerful signal processing and accelerated AI capabilities combined with the XCORE-VOICE framework allows designers to incorporate keyword, event detection, or advanced local dictionary support to create a complete voice interface solution. Bridging designs including PDM microphone to host aggregation are also included showcasing the use of xcore.ai as an interfacing and bridging solution for deployment in existing systems.

The C SDK is composed of the following components:

  • Peripheral IO libraries including; UART, I2C, I2S, SPI, QSPI, PDM microphones, and USB. These libraries support bare-metal and RTOS application development.

  • Libraries core to DSP applications, including vectorized math and voice processing DSP. These libraries support bare-metal and RTOS application development.

  • Libraries for speech recognition applications. These libraries support bare-metal and RTOS application development.

  • Libraries that enable multi-core FreeRTOS development on xcore including a wide array of RTOS drivers and middleware.

  • Pre-build and validated audio processing pipelines.

  • Code Examples - Examples showing a variety of xcore features based on bare-metal and FreeRTOS programming.

  • Documentation - Tutorials, references and API guides.

component diagram

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Key Features£££doc/quick_start_guide/01_introduction.html#key-features

The XCORE-VOICE Solution takes advantage of the flexible software-defined xcore-ai architecture to support numerous far-field voice use cases through the available example designs and the ability to construct user-defined audio pipeline from the SW components and libraries in the C-based SDK.

These include:

Voice Processing components

  • Two PDM microphone interfaces

  • Digital signal processing pipeline

  • Full duplex, stereo, Acoustic Echo Cancellation (AEC)

  • Reference audio via I2S with automatic bulk delay insertion

  • Point noise suppression via interference canceller

  • Switchable stationary noise suppressor

  • Programmable Automatic Gain Control (AGC)

  • Flexible audio output routing and filtering

  • Support for Sensory, Cyberon or other 3rd party Automatic Speech Recognition (ASR) software

Device Interface components

  • Full speed USB2.0 compliant device supporting USB Audio Class (UAC) 2.0

  • Flexible Peripheral Interfaces

  • Programmable digital general-purpose inputs and outputs

Example Designs utilizing above components

  • Far-Field Voice Local Command

  • Low Power Far-Field Voice Local Command

  • Far-Field Voice Assistance

Firmware Management

  • Boot from QSPI Flash

  • Default firmware image for power-on operation

  • Option to boot from a local host processor via SPI

  • Device Firmware Update (DFU) via USB or I2C

Power Consumption

  • FFD/FFVA: 300-350mW (Typical)

  • Low Power FFD: 110mW (Full-Power), 54mW (Low-Power), <50mW possible with Sensory’s LPSD under certain conditions.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Obtaining the Hardware£££doc/quick_start_guide/01_introduction.html#obtaining-the-hardware

The XK-VOICE-L71 DevKit and Hardware Manual can be obtained from the XK-VOICE-L71 product information page.

The XK-VOICE-L71 is based on the: XU316-1024-QF60A

The XCORE-AI-EXPLORER DevKit and Hardware Manual used in the Microphone Aggregation example can be obtained from the XK-VOICE-L71 product information page.

Learn more about the The XMOS XS3 Architecture

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Obtaining the Software£££doc/quick_start_guide/01_introduction.html#obtaining-the-software

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Obtaining the Software$$$Development Tools£££doc/quick_start_guide/01_introduction.html#development-tools

It is recommended that you download and install the latest release of the XTC Tools. XTC Tools 15.3.1 or newer are required. If you already have the XTC Toolchain installed, you can check the version with the following command:

xcc --version

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Obtaining the Software$$$Application Demonstrations£££doc/quick_start_guide/01_introduction.html#application-demonstrations

If you only want to run the example designs, pre-built firmware and other software can be downloaded from the XCORE-VOICE product information page.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Obtaining the Software$$$Source Code£££doc/quick_start_guide/01_introduction.html#source-code

If you wish to modify the example designs, a zip archive of all source code can be downloaded from the XCORE-VOICE product information page.

See the Programming Guide for information on:

  • Prerequisites

  • Instructions for building, running, and debugging the example designs

  • Details on the software design and source code

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Obtaining the Software$$$Source Code$$$Cloning the Repository£££doc/quick_start_guide/01_introduction.html#cloning-the-repository

Alternatively, the source code can be obtained by cloning the public GitHub repository.

Note

Cloning requires a GitHub account configured with SSH key authentication.

Run the following git command to clone the repository and all submodules:

git clone --recurse-submodules git@github.com:xmos/sln_voice.git

If you have previously cloned the repository or downloaded a zip file of source code, the following commands can be used to update and fetch the submodules:

git pull
git submodule update --init --recursive

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs£££doc/quick_start_guide/02_example_designs.html#example-designs

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command£££doc/quick_start_guide/02.1_ffd.html#far-field-voice-local-command

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Overview£££doc/quick_start_guide/02.1_ffd.html#overview

These are the XCORE-VOICE far-field local control example designs demonstrating:

  • 2-microphone far-field voice control with I2C or UART interface

  • Audio pipeline including interference cancelling and noise suppression

  • 16-phrase English language speech recognition

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Example designs£££doc/quick_start_guide/02.1_ffd.html#example-designs
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Example designs$$$Demonstration£££doc/quick_start_guide/02.1_ffd.html#demonstration

This is the far-field voice local command (FFD) example design. Two examples are provided: both examples include speech recognition and a local dictionary. One example uses the Sensory TrulyHandsfree™ (THF) libraries, and the other one uses the Cyberon DSPotter™ libraries.

When a wakeword phrase is detected followed by a command phrase, the application will output an audio response and a discrete message over I2C and UART.

Sensory’s THF and Cyberon’s DSpotter™ libraries ship with an expiring development license. The Sensory one will suspend recognition after 11.4 hours or 107 recognition events, and the Cyberon one will suspend recognition after 100 recognition events. After the maximum number of recognitions is reached, a device reset is required to resume normal operation. To perform a reset, either power cycle the device or press the SW2 button.

Production software runs on a special device. Contact Cyberon, Sensory or XMOS sales for information about production use of the device.

Requirements

  • XK-VOICE-L71 board

  • Powered speaker(s) with 3.5mm jack connection (OPTIONAL)

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Example designs$$$Hardware Setup£££doc/quick_start_guide/02.1_ffd.html#hardware-setup

This example design requires an XTAG4 and XK-VOICE-L71 board.

all components

Connect the xTAG to the debug header, as shown below.

xtag

Connect the micro USB XTAG4 and micro USB XK-VOICE-L71 to the programming host.

programming host setup
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Example designs$$$Speakers (OPTIONAL)£££doc/quick_start_guide/02.1_ffd.html#speakers-optional

This example application features audio playback responses. Speakers can be connected to the LINE OUT on the XK-VOICE-L71.

speakers
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Example designs$$$Running the Demonstration£££doc/quick_start_guide/02.1_ffd.html#running-the-demonstration
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Example designs$$$Flashing the Firmware£££doc/quick_start_guide/02.1_ffd.html#flashing-the-firmware

Connect the XTAG4 via USB to the host computer running the XTC tools, and power on the board directly via USB.

On the host computer, open a XTC Tools Command Prompt.

xflash --quad-spi-clock 50MHz --factory example_ffd.xe --boot-partition-size 0x100000 --data example_ffd_data_partition.bin

Being returned to the prompt means flashing has completed, and the XTAG4 may be disconnected.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Local Command$$$Example designs$$$Speech Recognition£££doc/quick_start_guide/02.1_ffd.html#speech-recognition

Speak one of the wakewords followed by one of the commands from the lists below.

There are three LED states:

  • Flashing Green = Waiting for Wake Word

  • Solid Red & Green = Waiting for or Processing Command

  • Fast Flashing Red = Evaluation period has expired

The application resets waiting for the wakeword (flashing green). Upon recognizing ‘Hello XMOS’ or ‘Hello Cyberon’ (DSpotter™ model only), waiting begins for a command (solid red & green). After a period of inactivity, or successful command processing the application returns to waiting for wakeword (flashing green).

Sensory TrulyHandsfree™ and Cyberon DSpotter™ models detect the same commands, as listed below.

Wakewords

  • Hello XMOS

  • Hello Cyberon (DSpotter™ model only)

Dictionary Commands

  • Switch on the TV

  • Switch off the TV

  • Channel up

  • Channel down

  • Volume up

  • Volume down

  • Switch on the lights

  • Switch off the lights

  • Brightness up

  • Brightness down

  • Switch on the fan

  • Switch off the fan

  • Speed up the fan

  • Slow down the fan

  • Set higher temperature

  • Set lower temperature

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command£££doc/quick_start_guide/02.2_low_power_ffd.html#low-power-far-field-voice-local-command

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Overview£££doc/quick_start_guide/02.2_low_power_ffd.html#overview

This is the XCORE-VOICE low power far-field local control example designs demonstrating:

  • Low power control/handling

  • Small wake word model in SRAM

  • 2-microphone far-field voice control with I2C or UART interface

  • Audio pipeline including interference cancelling and noise suppression

  • 16-phrase English language speech recognition

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Example designs£££doc/quick_start_guide/02.2_low_power_ffd.html#example-designs
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Example designs$$$Demonstration£££doc/quick_start_guide/02.2_low_power_ffd.html#demonstration

The low power far-field voice local command (Low Power FFD) example design targets low power speech recognition using Sensory’s TrulyHandsfree™ (THF) speech recognition and local dictionary.

When the small wake word model running on tile 1 recognizes a wake word utterance, the device transitions to full power mode where tile 0’s command model begins receiving audio samples, continuing the command recognition process. On command recognition, the application outputs a discrete message over I2C and UART.

Sensory’s THF software ships with an expiring development license. It will suspend recognition after 11.4 hours or 107 recognition events; after which, a device reset is required to resume normal operation. To perform a reset, either power cycle the device or press the SW2 button. Note that SW2 is only functional while in full power mode (this application is configured to hold the device in full-power mode on such license expiration events).

Required Hardware

  • XK-VOICE-L71 board

  • XTAG4 debug adapter

  • 2x USB-Micro B cables

  • Host computer for programming

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Example designs$$$Hardware Setup£££doc/quick_start_guide/02.2_low_power_ffd.html#hardware-setup

This example design requires an XTAG4 and XK-VOICE-L71 board.

all components

Connect the XTAG4 to the debug header, as shown below.

xtag

Connect the both USB Micro-B connections on the XTAG4 and XK-VOICE-L71 to the programming host computer.

programming host setup
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Example designs$$$Running the Demonstration£££doc/quick_start_guide/02.2_low_power_ffd.html#running-the-demonstration
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Example designs$$$Flashing the Firmware£££doc/quick_start_guide/02.2_low_power_ffd.html#flashing-the-firmware

Connect the XTAG4 via USB to the host computer running the XTC tools, and power on the board directly via USB.

On the host computer, open a XTC Tools Command Prompt.

xflash --quad-spi-clock 50MHz --factory example_low_power_ffd.xe --boot-partition-size 0x100000 --data example_low_power_ffd_data_partition.bin

Being returned to the prompt means flashing has completed, and the XTAG4 may be disconnected.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Example designs$$$Speech Recognition£££doc/quick_start_guide/02.2_low_power_ffd.html#speech-recognition

Speak one of the wake words followed by one of the commands from the lists below.

There are four LED states:

  • Solid Red = Low Power. Waiting for wake word.

  • Blinking Green = Full power. Waiting for command.

  • Solid Red & Green = Full power. Processing command.

  • Flickering Red = Full power. End of evaluation (device reset required).

On startup, the application enters low power mode and waits for the wake word. Upon wake word recognition, the device enters full power mode and waits for a command. Upon command recognition, the device will queue the command for processing. On each wake word or command recognition, a timer is reset (per tile). On expiration of the intent engine’s timer, the device will request a transition to low power. The other tile may reject the request in cases where its timer has not expired or other application-specific reasons.

Supported Wake Word

  • Hello XMOS

Supported Commands

  • Switch on the TV

  • Switch off the TV

  • Channel up

  • Channel down

  • Volume up

  • Volume down

  • Switch on the lights

  • Switch off the lights

  • Brightness up

  • Brightness down

  • Switch on the fan

  • Switch off the fan

  • Speed up the fan

  • Slow down the fan

  • Set higher temperature

  • Set lower temperature

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant£££doc/quick_start_guide/02.3_ffva.html#far-field-voice-assistant

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Overview£££doc/quick_start_guide/02.3_ffva.html#overview

These are the XCORE-VOICE far-field voice assistant example designs demonstrating:

  • 2-microphone far-field voice assistant front-end

  • Audio pipeline including echo cancelation, interference cancelling and noise suppression

  • Stereo reference input and voice assistant output each supported as I2S or USB (UAC2.0)

This application can be used out of the box as a voice processor solution, or extended to run local wakeword engines.

These applications features a full duplex acoustic echo cancellation stage, which can be provided reference audio via I2S or USB audio. An audio output ASR stream is also available via I2S or USB audio.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Example designs£££doc/quick_start_guide/02.3_ffva.html#example-designs
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Example designs$$$USB Audio Demonstration£££doc/quick_start_guide/02.3_ffva.html#usb-audio-demonstration

Direct connection over USB to the host PC allowing signal analysis and evaluation.

Requirements

  • XK-VOICE-L71 board

  • Powered speaker(s) with 3.5mm jack connection

  • Host system running Windows, macOS, Linux or Android

  • USB A to Micro cable for connection to the host

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Example designs$$$Hardware Setup£££doc/quick_start_guide/02.3_ffva.html#hardware-setup

Connect either end of the ribbon cable to the XTAG4, and the other end to the XK-VOICE-L71 board as shown (Image shows piggybacked connection to RPi. Standalone operation is also supported):

XK-VOICE-L71 on RPi with ribbon cable
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Example designs$$$Running the Demonstration£££doc/quick_start_guide/02.3_ffva.html#running-the-demonstration
XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Example designs$$$Configure the Hardware£££doc/quick_start_guide/02.3_ffva.html#configure-the-hardware

Connect the host system to the micro-USB socket, and the speakers to the jack plug as shown:

XK-VOICE-L71 connected to powered speakers and host device

Either mono or stereo speakers may be used.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Example designs$$$Flashing the Firmware£££doc/quick_start_guide/02.3_ffva.html#flashing-the-firmware

Connect the XTAG4 via USB to the host computer running the XTC tools, and power on the board (either via RPi or directly via USB).

On the host computer, open a XTC Tools Command Prompt.

xflash --quad-spi-clock 50MHz --factory example_ffva_ua_adec_altarch.xe --boot-partition-size 0x100000 --data example_ffva_ua_adec_altarch_data_partition.bin

Being returned to the prompt means flashing has completed, and the XTAG4 may be disconnected.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Example Designs$$$Far-field Voice Assistant$$$Example designs$$$Record Captured Voice£££doc/quick_start_guide/02.3_ffva.html#record-captured-voice
  1. Open a music player on host PC, and play a stereo file.

  2. Check music is playing through powered speakers.

  3. Adjust volume using music player or speakers.

  4. Open Audacity and configure to communicate with kit. Input Device: XCORE-VOICE Voice Processor and Output Device: XCORE-VOICE Voice Processor

  5. Set recording channels to 2 (Stereo) in Device

audacity channels dropdown
  1. Set Project Rate to 48000Hz in Selection Toolbar.

audacity bitrate setting
  1. Click Record (press ‘r’) to start capturing audio streamed from the XCORE-VOICE device.

  2. Talk over music; move around the room while talking.

  3. Stop music player.

  4. Click Stop (press space) to stop recording. Audacity records single audio channel streamed from the XCORE-VOICE kit including extracted voice signal.

  5. Click dropdown menu next to Audio Track, and select Split Stereo To Mono.

audacity split action dropdown
  1. Click Solo on left channel of split processed audio. Increase Gain slider if necessary.

audacity solo and gain options
  1. Click Play (press space) to playback processed audio.

Only your voice is audible. Playback music is removed by acoustic echo cancellation; voice is isolated by interference canceller; background noise is removed by noise suppression algorithms.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Licenses£££doc/quick_start_guide/03_legal.html#licenses

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Licenses$$$XMOS£££doc/quick_start_guide/03_legal.html#xmos

All original source code is licensed under the XMOS License.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Licenses$$$Third-Party£££doc/quick_start_guide/03_legal.html#third-party

Additional third party code is included under the following copyrights and licenses:

Third Party Module Copyrights & Licenses

Module

Copyright & License

dr_wav

Copyright (C) 2022 David Reid, licensed under a public domain license

FatFS

Copyright (C) 2017 ChaN, licensed under a BSD-style license

FreeRTOS

Copyright (c) 2017 Amazon.com, Inc., licensed under the MIT License

Sensory TrulyHandsfree™

The Sensory TrulyHandsfree™ speech recognition library is Copyright (C) 1995-2022 Sensory Inc. and is provided as an expiring development license. Commercial licensing is granted by Sensory Inc.

Cyberon DSpotter™

For any licensing questions about Cyberon DSpotter™ speech recognition library please contact Cyberon Corporation.

TinyUSB

Copyright (c) 2018 hathach (tinyusb.org), licensed under the MIT license

XCORE ® -VOICE Solutions$$$XCORE-VOICE Quick Start Guide$$$Other examples£££doc/quick_start_guide/03_legal.html#other-examples

Where no quickstart guide exists such as for Microphone Aggregation, Asynchronous Sample Rate Conversion and Automatic Speech Recognition with Cyberon library, please consult the Programming Guide which contains setup information for these applications.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide£££doc/programming_guide/index.html#xcore-voice-programming-guide

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Product Description£££doc/programming_guide/01_introduction.html#product-description

The XCORE-VOICE Solution consists of example designs and a C-based SDK for the development of audio front-end applications to support far-field voice use cases on the xcore.ai family of chips (XU316). The XCORE-VOICE examples are currently based on FreeRTOS or bare-metal, leveraging the flexibility of the xcore.ai platform and providing designers with a familiar environment to customize and develop products.

XCORE-VOICE example designs include turn-key solutions to enable easier product development for smart home applications such as light switches, thermostats, and home appliances. xcore.ai’s unique architecture providing powerful signal processing and accelerated AI capabilities combined with the XCORE-VOICE framework allows designers to incorporate keyword, event detection, or advanced local dictionary support to create a complete voice interface solution. Bridging designs including PDM microphone to host aggregation are also included showcasing the use of xcore.ai as an interfacing and bridging solution for deployment in existing systems.

The C SDK is composed of the following components:

  • Peripheral IO libraries including; UART, I2C, I2S, SPI, QSPI, PDM microphones, and USB. These libraries support bare-metal and RTOS application development.

  • Libraries core to DSP applications, including vectorized math and voice processing DSP. These libraries support bare-metal and RTOS application development.

  • Libraries for speech recognition applications. These libraries support bare-metal and RTOS application development.

  • Libraries that enable multi-core FreeRTOS development on xcore including a wide array of RTOS drivers and middleware.

  • Pre-build and validated audio processing pipelines.

  • Code Examples - Examples showing a variety of xcore features based on bare-metal and FreeRTOS programming.

  • Documentation - Tutorials, references and API guides.

component diagram

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Key Features£££doc/programming_guide/01_introduction.html#key-features

The XCORE-VOICE Solution takes advantage of the flexible software-defined xcore-ai architecture to support numerous far-field voice use cases through the available example designs and the ability to construct user-defined audio pipeline from the SW components and libraries in the C-based SDK.

These include:

Voice Processing components

  • Two PDM microphone interfaces

  • Digital signal processing pipeline

  • Full duplex, stereo, Acoustic Echo Cancellation (AEC)

  • Reference audio via I2S with automatic bulk delay insertion

  • Point noise suppression via interference canceller

  • Switchable stationary noise suppressor

  • Programmable Automatic Gain Control (AGC)

  • Flexible audio output routing and filtering

  • Support for Sensory, Cyberon or other 3rd party Automatic Speech Recognition (ASR) software

Device Interface components

  • Full speed USB2.0 compliant device supporting USB Audio Class (UAC) 2.0

  • Flexible Peripheral Interfaces

  • Programmable digital general-purpose inputs and outputs

Example Designs utilizing above components

  • Far-Field Voice Local Command

  • Low Power Far-Field Voice Local Command

  • Far-Field Voice Assistance

Firmware Management

  • Boot from QSPI Flash

  • Default firmware image for power-on operation

  • Option to boot from a local host processor via SPI

  • Device Firmware Update (DFU) via USB or I2C

Power Consumption

  • FFD/FFVA: 300-350mW (Typical)

  • Low Power FFD: 110mW (Full-Power), 54mW (Low-Power), <50mW possible with Sensory’s LPSD under certain conditions.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Obtaining the Hardware£££doc/programming_guide/01_introduction.html#obtaining-the-hardware

The XK-VOICE-L71 DevKit and Hardware Manual can be obtained from the XK-VOICE-L71 product information page.

The XK-VOICE-L71 is based on the: XU316-1024-QF60A

The XCORE-AI-EXPLORER DevKit and Hardware Manual used in the Microphone Aggregation example can be obtained from the XK-VOICE-L71 product information page.

Learn more about the The XMOS XS3 Architecture

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Obtaining the Software£££doc/programming_guide/01_introduction.html#obtaining-the-software

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Obtaining the Software$$$Development Tools£££doc/programming_guide/01_introduction.html#development-tools

It is recommended that you download and install the latest release of the XTC Tools. XTC Tools 15.3.1 or newer are required. If you already have the XTC Toolchain installed, you can check the version with the following command:

xcc --version

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Obtaining the Software$$$Application Demonstrations£££doc/programming_guide/01_introduction.html#application-demonstrations

If you only want to run the example designs, pre-built firmware and other software can be downloaded from the XCORE-VOICE product information page.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Obtaining the Software$$$Source Code£££doc/programming_guide/01_introduction.html#source-code

If you wish to modify the example designs, a zip archive of all source code can be downloaded from the XCORE-VOICE product information page.

See the Programming Guide for information on:

  • Prerequisites

  • Instructions for building, running, and debugging the example designs

  • Details on the software design and source code

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Obtaining the Software$$$Source Code$$$Cloning the Repository£££doc/programming_guide/01_introduction.html#cloning-the-repository

Alternatively, the source code can be obtained by cloning the public GitHub repository.

Note

Cloning requires a GitHub account configured with SSH key authentication.

Run the following git command to clone the repository and all submodules:

git clone --recurse-submodules git@github.com:xmos/sln_voice.git

If you have previously cloned the repository or downloaded a zip file of source code, the following commands can be used to update and fetch the submodules:

git pull
git submodule update --init --recursive

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Prerequisites£££doc/programming_guide/02_prerequisites.html#prerequisites

It is recommended that you download and install the latest release of the XTC Tools. XTC Tools 15.3.1 or newer are required for building, running, flashing and debugging the example applications.

CMake 3.21 or newer and Git are also required for building the example applications.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Prerequisites$$$Windows£££doc/programming_guide/02_prerequisites.html#windows

A standard C/C++ compiler is required to build applications for the host PC. Windows users may use Build Tools for Visual Studio command-line interface.

It is recommended to use Ninja as the build system for native Windows firmware builds. To install Ninja follow install instructions at https://ninja-build.org/ or on Windows install with winget by running the following commands in PowerShell:

# Install
winget install Ninja-build.ninja
# Reload user Path
$env:Path=[System.Environment]::GetEnvironmentVariable("Path","User")

XCORE-VOICE host builds should also work using other Windows GNU development environments like GNU Make, MinGW or Cygwin.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Prerequisites$$$Windows$$$libusb£££doc/programming_guide/02_prerequisites.html#libusb

The DFU feature of XCORE-VOICE requires dfu-util.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Prerequisites$$$macOS£££doc/programming_guide/02_prerequisites.html#macos

A standard C/C++ compiler is required to build applications for the host PC. Mac users may use the Xcode command-line tools.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs£££doc/programming_guide/03_example_designs.html#example-designs

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting£££doc/programming_guide/asr/asr.html#automated-speech-recognition-porting

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Overview£££doc/programming_guide/asr/overview.html#overview

This is the XCORE-VOICE automated speech recognition (ASR) porting example design. This example can be used by 3rd-party ASR developers and ISVs to port their ASR library to xcore.ai.

The example reads a 1 channel, 16-bit, 16kHz wav file, slices it up into bricks, and calls the ASR library with each brick. The default brick length is 240 samples but this is configurable. ASR ports that implement the public API defined in modules/asr/asr.h can easily be added to current and future XCORE-VOICE example designs that support speech recognition.

An oversimplified ASR port example is provided. This ASR port recognizes the “Hello XMOS” keyword if any acoustic activity is observed in 75 consecutive bricks.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Supported Hardware£££doc/programming_guide/asr/hardware.html#supported-hardware

This example application is supported on the XK-VOICE-L71 board.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Supported Hardware$$$Setting up the Hardware£££doc/programming_guide/asr/hardware.html#setting-up-the-hardware

This example design requires an XTAG4 and XK-VOICE-L71 board.

all components
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Supported Hardware$$$xTAG£££doc/programming_guide/asr/hardware.html#xtag

The xTAG is used to program and debug the device

Connect the xTAG to the debug header, as shown below.

xtag

Connect the micro USB XTAG4 and micro USB XK-VOICE-L71 to the programming host.

programming host setup
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Linux or macOS£££doc/programming_guide/asr/deploying/linux_macos.html#deploying-the-firmware-with-linux-or-macos

This document explains how to deploy the software using CMake and Make.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Linux or macOS$$$Building the Host Server£££doc/programming_guide/asr/deploying/linux_macos.html#building-the-host-server

This application requires a host application to serve files to the device. The served file must be named test.wav. This filename is defined in src/app_conf.h.

Run the following commands in the root folder to build the host application using your native Toolchain:

Note

Permissions may be required to install the host applications.

cmake -B build_host
cd build_host
make xscope_host_endpoint
make install

The host application, xscope_host_endpoint, will be installed at /opt/xmos/bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

Before running the host application, you may need to add the location of xscope_endpoint.so to your LD_LIBRARY_PATH environment variable. This environment variable will be set if you run the host application in the XTC Tools command-line environment. For more information see Configuring the command-line environment.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Linux or macOS$$$Building the Firmware£££doc/programming_guide/asr/deploying/linux_macos.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the firmware:

pip install -r requirements.txt
cmake -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
make example_asr
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Linux or macOS$$$Flashing the Model£££doc/programming_guide/asr/deploying/linux_macos.html#flashing-the-model

The model file is part of the data partition file. The data partition file includes a file used to calibrate the flash followed by the model.

Run the following commands in the build folder to create the data partition:

make make_data_partition_example_asr

Then run the following commands in the build folder to flash the data partition:

xflash --force --quad-spi-clock 50MHz --target-file ../examples/speech_recognition/XK_VOICE_L71.xn --write-all example_asr_data_partition.bin
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Linux or macOS$$$Running the Firmware£££doc/programming_guide/asr/deploying/linux_macos.html#running-the-firmware

From the build folder run:

xrun --xscope --xscope-port localhost:12345 example_asr.xe

In a second console, run the following command in the examples/speech_recognition folder to run the host server:

xscope_host_endpoint 12345
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Native Windows£££doc/programming_guide/asr/deploying/native_windows.html#deploying-the-firmware-with-native-windows

This document explains how to deploy the software using CMake and Ninja. If you are not using native Windows MSVC build tools and instead using a Linux emulation tool, refer to Deploying the Firmware with Linux or macOS.

To install Ninja follow install instructions at https://ninja-build.org/ or on Windows install with winget by running the following commands in PowerShell:

# Install
winget install Ninja-build.ninja
# Reload user Path
$env:Path=[System.Environment]::GetEnvironmentVariable("Path","User")
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Native Windows$$$Building the Host Server£££doc/programming_guide/asr/deploying/native_windows.html#building-the-host-server

This application requires a host application to serve files to the device. The served file must be named test.wav. This filename is defined in src/app_conf.h.

Run the following commands in the root folder to build the host application using your native Toolchain:

Note

Permissions may be required to install the host applications.

Note

A C/C++ compiler, such as Visual Studio or MinGW, must be included in the path.

Before building the host application, you will need to add the path to the XTC Tools to your environment.

set "XMOS_TOOL_PATH=<path-to-xtc-tools>"

Then build the host application:

cmake -G Ninja -B build_host
cd build_host
ninja xscope_host_endpoint
ninja install

The host application, xscope_host_endpoint.exe, will install at <USERPROFILE>\.xmos\bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

Before running the host application, you may need to add the location of xscope_endpoint.dll to your PATH. This environment variable will be set if you run the host application in the XTC Tools command-line environment. For more information see Configuring the command-line environment.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Native Windows$$$Building the Firmware£££doc/programming_guide/asr/deploying/native_windows.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the firmware:

pip install -r requirements.txt
cmake -G Ninja -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
ninja example_asr
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Native Windows$$$Flashing the Model£££doc/programming_guide/asr/deploying/native_windows.html#flashing-the-model

The model file is part of the data partition file. The data partition file includes a file used to calibrate the flash followed by the model.

Run the following commands in the build folder to create the data partition:

ninja make_data_partition_example_asr

Then run the following commands in the build folder to flash the data partition:

xflash --force --quad-spi-clock 50MHz --target-file ../examples/speech_recognition/XK_VOICE_L71.xn --write-all example_asr_data_partition.bin
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Deploying the Firmware with Native Windows$$$Running the Firmware£££doc/programming_guide/asr/deploying/native_windows.html#running-the-firmware

From the build folder run:

xrun --xscope --xscope-port localhost:12345 example_asr.xe

In a second console, run the following command in the examples/speech_recognition folder to run the host server:

xscope_host_endpoint.exe 12345
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Modifying the Software£££doc/programming_guide/asr/modifying.html#modifying-the-software
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Modifying the Software$$$Implementing the ASR API£££doc/programming_guide/asr/modifying.html#implementing-the-asr-api

Begin your ASR port by creating a new folder under modules/asr/. The asr.h and device_memory.h files include comments detailing the public API methods and parameters. ASR ports that implement the public API defined can easily be added to current and future XCORE-VOICE example designs that support speech recognition.

Pay close attention to the functions: - asr_printf - devmem_malloc - devmem_free - devmem_read_ext - devmem_read_ext_async - devmem_read_ext_wait

ASR libraries should call asr_printf instead of printf or xcore’s debug_printf.

ASR libraries must not call malloc directly to allocate dynamic memory. Instead call the devmem_malloc and devmem_free functions. This allows the application to provide alternative implementations of these functions - like pvPortMalloc and vPortFree in a FreeRTOS application.

The devmem_read_ext function is provided to load data directly from external memory (QSPI flash or LPDDR) into SRAM. This is the recommended way to load coefficients or blocks of data from a model. It is far more efficient to load the data into SRAM and perform any math on the data while it is in SRAM. The devmem_read_ext function a signature similar to memcpy. The caller is responsible for allocating the destination buffer.

Like devmem_read_ext, the devmem_read_ext_async function is provided to load data directly from external memory (QSPI flash or LPDDR) into SRAM. devmem_read_ext_async differs in that it does not block the caller’s thread. Instead it loads the data in another thread. One must have a free core when calling devmem_read_ext_async or an exception will be raised. devmem_read_ext_async returns a handle that can later be used to wait for the load to complete. Call devmem_read_ext_wait to block the callers thread until the load is complete. Currently, each call to devmem_read_ext_async must be followed by a call to devmem_read_ext_wait. You can not have more than one read in flight at a time.

Note

XMOS provides an arithmetic and DSP library which leverages the XS3 Vector Processing Unit (VPU) to accelerate costly operations on vectors of 16- or 32-bit data. Included are functions for block floating-point arithmetic, fast Fourier transforms, discrete cosine transforms, linear filtering and more. See the XMath Programming Guide for more information.

Note

To minimize SRAM scratch space usage, some ASR ports load coefficients into SRAM in chunks. This is useful when performing a routine such as a vector matrix multiply as this operation can be performed on a portion of the matrix at a time.

When the port of the new ASR is complete, you can use the example in examples/speech_recognition to test it.

Note

You may also need to modify BRICK_SIZE_SAMPLES in app_conf.h to match the number of audio samples expected per process for your ASR port. In other example designs, this is defined by appconfINTENT_SAMPLE_BLOCK_LENGTH. This is set to 240 in the existing example designs.

In the current source code, the model data (and optional grammar data) are set in examples/speech_recognition/src/process_file.c. Modify these variables to reflect your data. The remainder of the API should be familiar to ASR developers. The API can be extended if necessary.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Modifying the Software$$$Flashing Models£££doc/programming_guide/asr/modifying.html#flashing-models

To flash your model, modify the --data argument passed to xflash command in the Flashing the Model section.

See examples/speech_recognition/asr_example/asr_example_model.h to see how the model’s flash address is defined.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Modifying the Software$$$Placing Models in SRAM£££doc/programming_guide/asr/modifying.html#placing-models-in-sram

Small models (near or under 100kB in size) may be placed in SRAM. See examples/speech_recognition/asr_example/asr_example_model.c for more information on placing your model in SRAM.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$ASR API£££doc/programming_guide/asr/modifying.html#asr-api
enum asr_error_enum

Enumerator type representing error return values.

Values:

enumerator ASR_OK

Ok.

enumerator ASR_ERROR

General error.

enumerator ASR_INSUFFICIENT_MEMORY

Insufficient memory for given model.

enumerator ASR_NOT_SUPPORTED

Function not supported for given model.

enumerator ASR_INVALID_PARAMETER

Invalid Parameter.

enumerator ASR_MODEL_INCOMPATIBLE

Model type or version is not compatible with the ASR library.

enumerator ASR_MODEL_CORRUPT

Model malformed.

enumerator ASR_NOT_INITIALIZED

Not Initialized.

enumerator ASR_EVALUATION_EXPIRED

Evaluation period has expired.

typedef void *asr_port_t

Typedef to the ASR port context struct.

An ASR port can store any data needed in the context. The context pointer is passed to all API methods and can be cast to any struct defined by the ASR port.

typedef int16_t asr_sample_t

Typedef representing the base type of an audio sample.

typedef struct asr_attributes_struct asr_attributes_t

Typedef to the ASR port and model attributes

typedef struct asr_result_struct asr_result_t

Typedef to the ASR result

typedef enum asr_error_enum asr_error_t

Enumerator type representing error return values.

void asr_printf(const char *format, ...)

String output function that allows the application to provide an alternative implementation.

ASR ports should call asr_printf instead of printf

asr_port_t asr_init(int32_t *model, int32_t *grammar, devmem_manager_t *devmem_ctx)

Initialize an ASR port.

Parameters:
  • model – A pointer to the model data.

  • grammar – A pointer to the grammar data (Optional).

  • devmem_ctx – A pointer to the device manager (Optional). Save this pointer if calling any device manager API functions.

Returns:

the ASR port context.

asr_error_t asr_get_attributes(asr_port_t *ctx, asr_attributes_t *attributes)

Get engine and model attributes.

Parameters:
  • ctx – A pointer to the ASR port context.

  • attributes – The attributes result.

Returns:

Success or error code.

asr_error_t asr_process(asr_port_t *ctx, int16_t *audio_buf, size_t buf_len)

Process an audio buffer.

Parameters:
  • ctx – A pointer to the ASR port context.

  • audio_buf – A pointer to the 16-bit PCM samples.

  • buf_len – The number of PCM samples.

Returns:

Success or error code.

asr_error_t asr_get_result(asr_port_t *ctx, asr_result_t *result)

Get the most recent results.

Parameters:
  • ctx – A pointer to the ASR port context.

  • result – The processed result.

Returns:

Success or error code.

asr_error_t asr_reset(asr_port_t *ctx)

Reset ASR port (if necessary).

Called before the next call to asr_process.

Parameters:
  • ctx – A pointer to the ASR port context.

Returns:

Success or error code.

asr_error_t asr_release(asr_port_t *ctx)

Release ASR port (if necessary).

The ASR port must deallocate any memory.

Parameters:
  • ctx – A pointer to the ASR port context.

Returns:

Success or error code.

struct asr_attributes_struct
#include <asr.h>

Typedef to the ASR port and model attributes

struct asr_result_struct
#include <asr.h>

Typedef to the ASR result

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Device Memory API£££doc/programming_guide/asr/modifying.html#device-memory-api
void *devmem_malloc(devmem_manager_t *ctx, size_t size)

Memory allocation function that allows the application to provide an alternative implementation.

Call devmem_malloc instead of malloc

Parameters:
  • ctx – A pointer to the device memory context.

  • size – Number of bytes to allocate.

Returns:

A pointer to the beginning of newly allocated memory, or NULL on failure.

void devmem_free(devmem_manager_t *ctx, void *ptr)

Memory deallocation function that allows the application to provide an alternative implementation.

Call devmem_free instead of free

Parameters:
  • ctx – A pointer to the device memory context.

  • ptr – A pointer to the memory to deallocate.

void devmem_read_ext(devmem_manager_t *ctx, void *dest, const void *src, size_t n)

Synchronous extended memory read function that allows the application

to provide an alternative implementation. Blocks the callers thread until the read is completed.

Call devmem_read_ext instead of any other functions to read memory from flash, LPDDR or SDRAM. Modules are free to use memcpy if the dest and src are both SRAM addresses.

Parameters:
  • ctx – A pointer to the device memory context.

  • dest – A pointer to the destination array where the content is to be read.

  • src – A pointer to the word-aligned address of data to be read.

  • n – Number of bytes to read.

int devmem_read_ext_async(devmem_manager_t *ctx, void *dest, const void *src, size_t n)

Asynchronous extended memory read function that allows the application

to provide an alternative implementation.

Call asr_read_ext_async instead of any other functions to read memory from flash, LPDDR or SDRAM.

Parameters:
  • ctx – A pointer to the device memory context.

  • dest – A pointer to the destination array where the content is to be read.

  • src – A pointer to the word-aligned address of data to be read.

  • n – Number of bytes to read.

Returns:

A handle that can be used in a call to devmem_read_ext_wait.

void devmem_read_ext_wait(devmem_manager_t *ctx, int handle)

Wait in the caller’s thread for an asynchronous extended memory read to finish.

Parameters:
  • ctx – A pointer to the device memory context.

  • handle – The devmem_read_ext_asyc handle to wait on.

IS_SRAM(a)
IS_SWMEM(a)
IS_FLASH(a)

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Automated Speech Recognition Porting$$$Speech Recognition Ports£££doc/programming_guide/asr/asr_ports.html#speech-recognition-ports

Ports of the Sensory and Cyberon speech recognition libraries are provided.

Speech Recognition Ports

Filename/Directory

Description

modules/asr directory

include folder for ASR modules and ports

module/asr/sensory directory

contains the Sensory library and associated port code

module/asr/Cyberon directory

contains the Cyberon library and associated port code

modules/asr/CmakeLists.txt

CMakeLists file for adding ASR port targets

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command£££doc/programming_guide/ffd/ffd.html#far-field-voice-local-command

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Overview£££doc/programming_guide/ffd/overview.html#overview

This is the far-field voice local command (FFD) example design. Three examples are provided: all examples include speech recognition and a local dictionary. One example uses the Sensory TrulyHandsfree™ (THF) libraries, and the other ones use the Cyberon DSPotter™ libraries. The two examples with the Cyberon DSPotter™ libraries differ in the audio source fed into the intent engine. One example uses the audio source from the microphone array, and the other uses the audio source from the I2S interface.

The examples using the microphone array as the audio source include an audio pipeline with the following stages:

  1. Interference Canceler (IC) + Voice To Noise Ratio Estimator (VNR)

  2. Noise Suppressor (NS)

  3. Adaptive Gain Control (AGC)

The FFD examples provide several options to inform the host of a possible intent detected by the intent engine. The device can notify the host by:

  • sending the intent ID over a UART interface upon detecting the intent

  • sending the intent ID over an I2C master interface upon detecting the intent

  • allowing the host to poll the last detected intent ID over the I2C slave interface

  • listening to an audio message over an I2S interface

When a wakeword phrase is detected followed by a command phrase, the application will output an audio response and a discrete message over I2C and UART.

Sensory’s THF and Cyberon’s DSpotter™ libraries ship with an expiring development license. The Sensory one will suspend recognition after 11.4 hours or 107 recognition events, and the Cyberon one will suspend recognition after 100 recognition events. After the maximum number of recognitions is reached, a device reset is required to resume normal operation. To perform a reset, either power cycle the device or press the SW2 button.

More information on the Sensory speech recognition library can be found here: Speech Recognition - Sensory.

More information on the Cyberon speech recognition library can be found here: Speech Recognition - Cyberon

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Supported Hardware£££doc/programming_guide/ffd/hardware.html#supported-hardware

This example application is supported on the XK-VOICE-L71 board.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Supported Hardware$$$Setting up the Hardware£££doc/programming_guide/ffd/hardware.html#setting-up-the-hardware

This example design requires an XTAG4 and XK-VOICE-L71 board.

all components
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Supported Hardware$$$xTAG£££doc/programming_guide/ffd/hardware.html#xtag

The xTAG is used to program and debug the device

Connect the xTAG to the debug header, as shown below.

xtag

Connect the micro USB XTAG4 and micro USB XK-VOICE-L71 to the programming host.

programming host setup
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Supported Hardware$$$Speakers (OPTIONAL)£££doc/programming_guide/ffd/hardware.html#speakers-optional

This example application features audio playback responses. Speakers can be connected to the LINE OUT on the XK-VOICE-L71.

speakers
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Configuring the Firmware£££doc/programming_guide/ffd/deploying/configuration.html#configuring-the-firmware

The default application performs as described in the Overview. There are numerous compile time options that can be added to change the example design without requiring code changes. To change the options explained in the table below, add the desired configuration variables to the APP_COMPILE_DEFINITIONS cmake variable in the .cmake file located in the examples/ffd/ folder.

If options are changed, the application firmware must be rebuilt.

FFD Compile Options

Compile Option

Description

Default Value

appconfINTENT_ENABLED

Enables/disables the intent engine, primarily for debug.

1

appconfINTENT_RESET_DELAY_MS

Sets the period after the wake up phrase has been heard for a valid command phrase

5000

appconfINTENT_RAW_OUTPUT

Set to 1 to output all keywords found, skipping the internal wake up and command state machine

0

appconfAUDIO_PLAYBACK_ENABLED

Enables/disables the audio playback command response

1

appconfINTENT_UART_OUTPUT_ENABLED

Enables/disables the UART intent message

1

appconfINTENT_UART_DEBUG_INFO_ENABLED

Enables/disables the UART intent debug information

0

appconfI2C_MASTER_DAC_ENABLED

Enables/disables configuring the DAC over I2C master

1

appconfINTENT_I2C_MASTER_OUTPUT_ENABLED

Enables/disables sending the intent message over I2C master

1

appconfINTENT_I2C_MASTER_DEVICE_ADDR

Sets the address of the I2C device receiving the intent via the I2C master interface

0x01

appconfINTENT_I2C_SLAVE_POLLED_ENABLED

Enables/disables allowing another device to poll the intent message via I2C slave

0

appconfI2C_SLAVE_DEVICE_ADDR

Sets the address of the I2C device receiving the intent via the I2C slave interface

0x42

appconfINTENT_I2C_REG_ADDRESS

Sets the address of the I2C register to store the intent message, this value can be read via the I2C slave interface

0x01

appconfUART_BAUD_RATE

Sets the baud rate for the UART tx intent interface

9600

appconfUSE_I2S_INPUT

Replace I2S audio source instead of the microphone array audio source.

0

appconfI2S_MODE

Select I2S mode, supported values are appconfI2S_MODE_MASTER and appconfI2S_MODE_SLAVE

master

appconfI2S_AUDIO_SAMPLE_RATE

Select the sample rate of the I2S interface, supported values are 16000 and 48000

16000

appconfRECOVER_MCLK_I2S_APP_PLL

Enables/disables the recovery of the MCLK from the Software PLL application; this removes the need to use an external MCLK.

0

appconfINTENT_TRANSPORT_DELAY_MS

Sets the delay between host wake up requested and I2C and UART keyword code transmission

50

appconfINTENT_QUEUE_LEN

Sets the maximum number of detected intents to hold while waiting for the host to wake up

10

appconfINTENT_WAKEUP_EDGE_TYPE

Sets the host wake up pin GPIO edge type. 0 for rising edge, 1 for falling edge

0

appconfAUDIO_PIPELINE_SKIP_IC_AND_VNR

Enables/disables the IC and VNR

0

appconfAUDIO_PIPELINE_SKIP_NS

Enables/disables the NS

0

appconfAUDIO_PIPELINE_SKIP_AGC

Enables/disables the AGC

0

Note

The example_ffd_i2s_input_cyberon has different default values from the ones in the table above. The list of updated values can be found in the APP_COMPILE_DEFINITIONS list in examples\ffd\ffd_i2s_input_cyberon.cmake.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Configuring the Firmware$$$Configuring the I2C interfaces£££doc/programming_guide/ffd/deploying/configuration.html#configuring-the-i2c-interfaces

The I2C interfaces are used to configure the DAC and to communicate with the host. The I2C interface can be configured as a master or a slave. The DAC must be configured at bootup via the I2C master interface. The I2C master is used when the FFD example asynchronously sends intent messages to the host. The I2C slave is used when the host wants to read intent messages from the FFD example through polling.

Note

The I2C interface cannot operate as both master and slave simultaneously. The FFD example design uses the I2C master interface to configure the DAC at device initialisation. However, if the host reads intent messages from the FFD example using the I2C slave interface, the I2C master interface will be disabled after the DAC configuration is complete.

To send the intent ID via the I2C master interface when a command is detected, set the following variables:

  • appconfINTENT_I2C_MASTER_OUTPUT_ENABLED to 1.

  • appconfINTENT_I2C_MASTER_DEVICE_ADDR to the desired address used by the I2C slave device.

  • appconfINTENT_I2C_SLAVE_POLLED_ENABLED to 0, this will disable the I2C slave interface.

To configure the FFD example so that the host can poll for the intent via the I2C slave interface, set the following variables:

  • appconfINTENT_I2C_SLAVE_POLLED_ENABLED to 1.

  • appconfI2C_SLAVE_DEVICE_ADDR to the desired address used by the I2C master device.

  • appconfINTENT_I2C_REG_ADDRESS to the desired register read by the I2C master device.

  • appconfINTENT_I2C_MASTER_OUTPUT_ENABLED to 0, this will disable the I2C master interface after initialization.

The handling of the I2C slave registers is done in the examples\ffd\src\i2c_reg_handling.c file. The variable appconfINTENT_I2C_REG_ADDRESS is used in the callback function read_device_reg().

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Configuring the Firmware$$$Configuring the I2S interface£££doc/programming_guide/ffd/deploying/configuration.html#configuring-the-i2s-interface

The I2S interface is used to play the audio command response to the DAC, and/or to receive the audio samples from the host. The I2S interface can be configured as either a master or a slave. To configure the I2S interface, set the following variables:

  • appconfI2S_ENABLED to 1.

  • appconfI2S_MODE to the desired mode, either appconfI2S_MODE_MASTER or appconfI2S_MODE_SLAVE.

  • appconfI2S_AUDIO_SAMPLE_RATE to the desired sample rate, either 16000 or 48000.

  • appconfRECOVER_MCLK_I2S_APP_PLL to 1 if an external MCLK is not available, otherwise set it to 0.

  • appconfAUDIO_PLAYBACK_ENABLED to 1, if the intent audio is to be played back.

  • appconfUSE_I2S_INPUT to 1, if the I2S audio source is to be used instead of the microphone array audio source.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS£££doc/programming_guide/ffd/deploying/linux_macos.html#deploying-the-firmware-with-linux-or-macos

This document explains how to deploy the software using CMake and Make.

Note

In the commands below <speech_engine> can be either sensory or cyberon, depending on the choice of the speech recognition engine and model.

Note

The Cyberon speech recognition engine is integrated in two examples. The example_ffd_cyberon use the microphone array as the audio source, and the example_ffd_i2s_input_cyberon uses the I2S interface as the audio source. In the rest of this section, we use only the example_ffd_<speech_engine> as an example.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Building the Host Applications£££doc/programming_guide/ffd/deploying/linux_macos.html#building-the-host-applications

This application requires a host application to create the flash data partition. Run the following commands in the root folder to build the host application using your native Toolchain:

Note

Permissions may be required to install the host applications.

cmake -B build_host
cd build_host
make install

The host applications will be installed at /opt/xmos/bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Building the Firmware£££doc/programming_guide/ffd/deploying/linux_macos.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the firmware:

pip install -r requirements.txt
cmake -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
make example_ffd_<speech_engine>
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Running the Firmware£££doc/programming_guide/ffd/deploying/linux_macos.html#running-the-firmware

Before running the firmware, the filesystem and model must be flashed to the data partition.

Within the root of the build folder, run:

make flash_app_example_ffd_<speech_engine>

After this command completes, the application will be running.

After flashing the data partition, the application can be run without reflashing. If changes are made to the data partition components, the application must be reflashed.

From the build folder run:

xrun --xscope example_ffd_<speech_engine>.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Debugging the Firmware£££doc/programming_guide/ffd/deploying/linux_macos.html#debugging-the-firmware

To debug with xgdb, from the build folder run:

xgdb -ex "connect --xscope" -ex "run" example_ffd_<speech_engine>.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Native Windows£££doc/programming_guide/ffd/deploying/native_windows.html#deploying-the-firmware-with-native-windows

This document explains how to deploy the software using CMake and Ninja. If you are not using native Windows MSVC build tools and instead using a Linux emulation tool such as WSL, refer to Deploying the Firmware with Linux or macOS.

To install Ninja follow install instructions at https://ninja-build.org/ or on Windows install with winget by running the following commands in PowerShell:

# Install
winget install Ninja-build.ninja
# Reload user Path
$env:Path=[System.Environment]::GetEnvironmentVariable("Path","User")

Note

In the commands below <speech_engine> can be either sensory or cyberon, depending on the choice of the speech recognition engine and model.

Note

The Cyberon speech recognition engine is integrated in two examples. The example_ffd_cyberon use the microphone array as the audio source, and the example_ffd_i2s_input_cyberon uses the I2S interface as the audio source. In the rest of this section, we use only the example_ffd_<speech_engine> as an example.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Building the Host Applications£££doc/programming_guide/ffd/deploying/native_windows.html#building-the-host-applications

This application requires a host application to create the flash data partition. Run the following commands in the root folder to build the host application using your native Toolchain:

Note

Permissions may be required to install the host applications.

Note

A C/C++ compiler, such as Visual Studio or MinGW, must be included in the path.

Before building the host application, you will need to add the path to the XTC Tools to your environment.

set "XMOS_TOOL_PATH=<path-to-xtc-tools>"

Then build the host application:

cmake -G Ninja -B build_host
cd build_host
ninja install

The host applications will be installed at %USERPROFILE%\.xmos\bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Building the Firmware£££doc/programming_guide/ffd/deploying/native_windows.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the firmware:

pip install -r requirements.txt
cmake -G Ninja -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
ninja example_ffd_<speech_engine>
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Running the Firmware£££doc/programming_guide/ffd/deploying/native_windows.html#running-the-firmware

Before running the firmware, the filesystem and model must be flashed to the data partition.

Within the root of the build folder, run:

ninja flash_app_example_ffd_<speech_engine>

After this command completes, the application will be running.

After flashing the data partition, the application can be run without reflashing. If changes are made to the data partition components, the application must be reflashed.

From the build folder run:

xrun --xscope example_ffd_<speech_engine>.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Debugging the Firmware£££doc/programming_guide/ffd/deploying/native_windows.html#debugging-the-firmware

To debug with xgdb, from the build folder run:

xgdb -ex "connect --xscope" -ex "run" example_ffd_<speech_engine>.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software£££doc/programming_guide/ffd/modifying.html#modifying-the-software

The FFD example design is highly customizable. This section describes how to modify the application.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Host Integration£££doc/programming_guide/ffd/host_integration.html#host-integration
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Overview£££doc/programming_guide/ffd/host_integration.html#overview

This section describes the connections that would need to be made to an external host for plug and play integration with existing devices.

When an intent is found, the XCORE device will check if the host is awake, by checking the Host Status GPIO pin. If the host is awake the intent code will be transmitted over I2C and/or UART.

If the host is not awake, the XCORE device will trigger a transition of the Wakeup GPIO pin. This can be configured to be a rising or falling edge. The XCORE device will then wait for a fixed period of time, set at compile time, before transmitting the intent over the I2C and/or UART interface. This behavior can be changed as desired by modifying the intent handling code.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$UART£££doc/programming_guide/ffd/host_integration.html#uart
UART Connections

FFD Connection

Host Connection

J4:24

UART RX

J4:20

GND

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$I2C£££doc/programming_guide/ffd/host_integration.html#i2c
I2C Connections

FFD Connection

Host Connection

J4:3

SDA

J4:5

SCL

J4:9

GND

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$GPIO£££doc/programming_guide/ffd/host_integration.html#gpio
GPIO Connections

FFD Connection

Host Connection

J4:19

Wake up input

J4:21

Host Status output

ffd host integration diagram
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Audio Pipeline£££doc/programming_guide/ffd/audio_pipeline.html#audio-pipeline

The audio pipeline in FFD processes two channel PDM microphone input into a single output channel, intended for use by an ASR engine.

The audio pipeline consists of 3 stages.

FFD Audio Pipeline

Stage

Description

Input Channel Count

Output Channel Count

1

Interference Canceller and Voice Noise Ratio

2

1

2

Noise Suppression

1

1

3

Automatic Gain Control

1

1

See the Voice Framework User Guide for more information.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Software Description£££doc/programming_guide/ffd/software_description.html#software-description
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Overview£££doc/programming_guide/ffd/software_desc/overview.html#overview

The estimated power usage of the example application varies from 100-141 mW. This will vary based on component tolerances and any user added code and/or user added compile options.

FFD Resources

Resource

Tile 0

Tile 1

Total Memory Free

145k

208k

Runtime Heap Memory Free

38k

42k

FFD CPU Usage

Core ID

Typical Mean CPU Usage (%)

Standard Deviation CPU Usage (%)

Typical Min CPU usage (%, 10ms rolling)

Typical Max CPU usage (%, 10ms rolling)

tile[0], core[0]

0.006

0.345

0.000

21.030

tile[0], core[1]

0.072

2.031

0.000

80.690

tile[0], core[2]

0.082

2.287

0.000

100.000

tile[0], core[3]

1.666

2.906

0.000

54.560

tile[0], core[4]

65.925

27.828

0.000

91.220

tile[1], core[0]

0.014

0.540

0.000

27.440

tile[1], core[1]

99.990

0.505

74.000

100.000

tile[1], core[2]

99.990

0.507

73.870

100.000

tile[1], core[3]

18.272

13.259

0.000

98.220

tile[1], core[4]

17.231

11.048

0.000

37.260

Note that these are typical usage statistics for a representative run of the application on hardware. Core allocations may shift run-to-run in a scheduled RTOS. These statistics are generated by slicing the representative run into 10 ms chunks and calculating % time per chunk not spent in the FreeRTOS IDLE tasks. Therefore, the underlying distribution of these 10 ms bins should not be assumed to be Normal; this has implications on e.g. the interpretation of the Standard Deviation given here.

FFD Power Usage

Power State

Power (mW)

Always

114

The description of the software is split up by folder:

FFD Software Description

Folder

Description

examples/ffd/bsp_config

Board support configuration setting up software based IO peripherals

examples/ffd/filesystem_support

Filesystem contents for application

examples/ffd/src

Main application

modules/asr/intent_engine

Intent engine integration

modules/asr/intent_handler

Intent engine output integration

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$examples/ffd/bsp_config£££doc/programming_guide/ffd/software_desc/bsp_config.html#examples-ffd-bsp-config

This folder contains bsp_configs for the FFD application. More information on bsp_configs can be found in the RTOS Framework documentation.

FFD bsp_config

Filename/Directory

Description

dac directory

DAC ports for supported bsp_configs

XCORE-AI-EXPLORER directory

experimental bsp_config, not recommended for general use

XCORE-AI-EXPLORER_EXT directory

experimental bsp_config, not recommended for general use

XK_VOICE_L71 directory

default FFD application bsp_config

XK_VOICE_L71_EXT directory

USB debug extension FFD application bsp_config

bsp_config.cmake

cmake for adding FFD bsp_configs

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$examples/ffd/filesystem_support£££doc/programming_guide/ffd/software_desc/filesystem_support.html#examples-ffd-filesystem-support

This folder contains filesystem contents for the FFD application.

FFD filesystem_support

Filename/Directory

Description

50.wav

Playback for intent ID 50

1.wav

Playback for intent ID 1

3.wav

Playback for intent ID 3

4.wav

Playback for intent ID 4

5.wav

Playback for intent ID 5

6.wav

Playback for intent ID 6

7.wav

Playback for intent ID 7

8.wav

Playback for intent ID 8

9.wav

Playback for intent ID 9

10.wav

Playback for intent ID 10

11.wav

Playback for intent ID 11

12.wav

Playback for intent ID 12

13.wav

Playback for intent ID 13

14.wav

Playback for intent ID 14

15.wav

Playback for intent ID 15

16.wav

Playback for intent ID 16

17.wav

Playback for intent ID 17

18.wav

Playback for intent ID 18

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$examples/ffd/src£££doc/programming_guide/ffd/software_desc/src.html#examples-ffd-src

This folder contains the core application source.

FFD src

Filename/Directory

Description

gpio_ctrl directory

contains general purpose input handling and LED handling tasks

intent_engine directory

contains intent engine code

intent_handler directory

contains intent handling code

rtos_conf directory

contains default FreeRTOS configuration headers

app_conf_check.h

header to validate app_conf.h

app_conf.h

header to describe app configuration

config.xscope

xscope configuration file

ff_appconf.h

default fatfs configuration header

main.c

main application source file

xcore_device_memory.c

model loading from filesystem source file

xcore_device_memory.h

model loading from filesystem header file

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Audio Pipeline£££doc/programming_guide/ffd/software_desc/src.html#audio-pipeline

The audio pipeline module provides the application with three API functions:

Audio Pipeline API (audio_pipeline.h)
void audio_pipeline_init(
        void *input_app_data,
        void *output_app_data);

void audio_pipeline_input(
        void *input_app_data,
        int32_t **input_audio_frames,
        size_t ch_count,
        size_t frame_count);

int audio_pipeline_output(
        void *output_app_data,
        int32_t **output_audio_frames,
        size_t ch_count,
        size_t frame_count);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$audio_pipeline_init£££doc/programming_guide/ffd/software_desc/src.html#audio-pipeline-init

This function has the role of creating the audio pipeline, with two optional application pointers which are provided to the application in the audio_pipeline_input() and audio_pipeline_output() callbacks.

In FFD, the audio pipeline is initialized with no additional arguments, and instantiates a 3 stage pipeline on tile 1, as described in: Audio Pipeline

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$audio_pipeline_input£££doc/programming_guide/ffd/software_desc/src.html#audio-pipeline-input

This function has the role of providing the audio pipeline with the input frames.

In FFD, the input is received from the rtos_mic_array driver.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$audio_pipeline_output£££doc/programming_guide/ffd/software_desc/src.html#audio-pipeline-output

This function has the role of receiving the processed audio pipeline output.

In FFD, the output is sent to the intent engine.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Main£££doc/programming_guide/ffd/software_desc/src.html#main

The major components of main are:

Main components (main.c)
void startup_task(void *arg)
void vApplicationMinimalIdleHook(void)
void tile_common_init(chanend_t c)
void main_tile0(chanend_t c0, chanend_t c1, chanend_t c2, chanend_t c3)
void main_tile1(chanend_t c0, chanend_t c1, chanend_t c2, chanend_t c3)
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$startup_task£££doc/programming_guide/ffd/software_desc/src.html#startup-task

This function has the role of launching tasks on each tile. For those familiar with XCORE, it is comparable to the main par loop in an XC main.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$vApplicationMinimalIdleHook£££doc/programming_guide/ffd/software_desc/src.html#vapplicationminimalidlehook

This is a FreeRTOS callback. By calling “waiteu” without events configured, this has the effect of both MIPs and power savings on XCORE.

vApplicationMinimalIdleHook (main.c)
asm volatile("waiteu");
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$tile_common_init£££doc/programming_guide/ffd/software_desc/src.html#tile-common-init

This function is the common tile initialization, which initializes the bsp_config, creates the startup task, and starts the FreeRTOS kernel.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$main_tile0£££doc/programming_guide/ffd/software_desc/src.html#main-tile0

This function is the application C entry point on tile 0, provided by the SDK.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$main_tile1£££doc/programming_guide/ffd/software_desc/src.html#main-tile1

This function is the application C entry point on tile 1, provided by the SDK.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$modules/asr/intent_engine£££doc/programming_guide/ffd/software_desc/intent_engine.html#modules-asr-intent-engine

This folder contains the intent engine module for the FFD and FFVA applications.

ASR Intent Engine

Filename/Directory

Description

intent_engine_io.c

contains additional io intent engine code

intent_engine_support.c

contains general intent engine support code

intent_engine.c

contains the implementation of default intent engine code

intent_engine.h

header for intent engine code

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Major Components£££doc/programming_guide/ffd/software_desc/intent_engine.html#major-components

The intent engine module provides the application with two API functions:

Intent Engine API (intent_engine.h)
int32_t intent_engine_create(uint32_t priority, void *args);
void intent_engine_ready_sync(void);
int32_t intent_engine_sample_push(asr_sample_t *buf, size_t frames);

If replacing the existing model, these are the only two functions that are required to be populated.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_create£££doc/programming_guide/ffd/software_desc/intent_engine.html#intent-engine-create

This function has the role of creating the model running task and providing a pointer, which can be used by the application to handle the output intent result. In the case of the default configuration, the application provides a FreeRTOS Queue object.

The ASR engine is on tile 0 in both FFD and FFVA, but the audio pipeline output is on tile 1 for FFD and on tile 0 for FFVA.

intent_engine_create snippet (intent_engine_io.c)
#if ASR_TILE_NO == AUDIO_PIPELINE_OUTPUT_TILE_NO
    intent_engine_task_create(priority);
#else
    intent_engine_intertile_task_create(priority);
#endif

The call to intent_engine_intertile_task_create() will create two threads on tile 0. One thread is the ASR engine thread. The other thread is an intertile rx thread, which will interface with the audio pipeline output.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_ready_sync£££doc/programming_guide/ffd/software_desc/intent_engine.html#intent-engine-ready-sync

This function is called by both tiles and serves to ensure that tile 0 is ready to receive audio samples before starting the audio pipeline. This is a preventative measure to avoid dropping samples at startup.

intent_engine_create snippet (intent_engine_io.c)
    int sync = 0;
#if ON_TILE(AUDIO_PIPELINE_OUTPUT_TILE_NO)
    size_t len = rtos_intertile_rx_len(intertile_ctx, appconfINTENT_ENGINE_READY_SYNC_PORT, RTOS_OSAL_WAIT_FOREVER);
    xassert(len == sizeof(sync));
    rtos_intertile_rx_data(intertile_ctx, &sync, sizeof(sync));
#else
    rtos_intertile_tx(intertile_ctx, appconfINTENT_ENGINE_READY_SYNC_PORT, &sync, sizeof(sync));
#endif
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_sample_push£££doc/programming_guide/ffd/software_desc/intent_engine.html#intent-engine-sample-push

This function has the role of sending the ASR output channel from the audio pipeline to the intent engine.

The ASR engine is on tile 0 in both FFD and FFVA, but the audio pipeline output is on tile 1 for FFD and on tile 0 for FFVA.

intent_engine_create snippet (intent_engine_io.c)
#if appconfINTENT_ENABLED && ON_TILE(AUDIO_PIPELINE_OUTPUT_TILE_NO)
#if ASR_TILE_NO == AUDIO_PIPELINE_OUTPUT_TILE_NO
    intent_engine_samples_send_local(
            frames,
            buf);
#else
    intent_engine_samples_send_remote(
            intertile_ap_ctx,
            frames,
            buf);
#endif
#endif

The call to intent_engine_samples_send_remote() will send the audio samples to the previously configured intertile rx thread.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_process_asr_result£££doc/programming_guide/ffd/software_desc/intent_engine.html#intent-engine-process-asr-result

This function can be replaced by the application to handle the intent in a completely different manner.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Miscellaneous Functions£££doc/programming_guide/ffd/software_desc/intent_engine.html#miscellaneous-functions

The following helper functions are provided for supporting the command processing features that are unique to the default FFD application:

  • intent_engine_keyword_queue_count

  • intent_engine_keyword_queue_complete

  • intent_engine_stream_buf_reset

  • intent_engine_play_response

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$modules/asr/intent_handler£££doc/programming_guide/ffd/software_desc/intent_handler.html#modules-asr-intent-handler

This folder contains ASR output handling modules for the FFD and FFVA applications.

ASR Intent handler

Filename/Directory

Description

audio_response directory

include folder for handling audio responses to keywords

intent_handler.c

contains the implementation of default intent handling code

intent_handler.h

header for intent handler code

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Major Components£££doc/programming_guide/ffd/software_desc/intent_handler.html#major-components

The intent handling module provides the application with one API function:

Intent Handler API (intent_handler.h)
int32_t intent_handler_create(uint32_t priority, void *args);

If replacing the existing handler code, this is the only function that is required to be populated.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$intent_handler_create£££doc/programming_guide/ffd/software_desc/intent_handler.html#intent-handler-create

This function has the role of creating the keyword handling task for the ASR engine. In the case of the Sensory and Cyberon models, the application provides a FreeRTOS Queue object. This handler is on the same tile as the speech recognition engine, tile 0.

The call to intent_handler_create() will create one thread on tile 0. This thread will receive ID packets from the ASR engine over a FreeRTOS Queue object and output over various IO interfaces based on configuration.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Software Modifications£££doc/programming_guide/ffd/software_modifications.html#software-modifications

The FFD example design consists of three major software blocks, the audio pipeline, keyword spotter, and keyword handler. This section will go into detail on how to replace each/all of these subsystems.

ffd diagram

It is highly recommended to be familiar with the application as a whole before attempting replacing these functional units. This information can be found here: Software Description

See Software Description for more details on the memory footprint and CPU usage of the major software components.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Replacing XCORE-VOICE DSP Block£££doc/programming_guide/ffd/software_modifications.html#replacing-xcore-voice-dsp-block

The audio pipeline can be replaced by making changes to the audio_pipeline.c file.

It is up to the user to ensure that the input and output frames of the audio pipeline remain the same, or the remainder of the application will not function properly.

This section will walk through an example of replacing the XMOS NS stage, with a custom stage foo.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Declaration and Definition of DSP Context£££doc/programming_guide/ffd/software_modifications.html#declaration-and-definition-of-dsp-context

Replace:

XMOS NS (audio_pipeline.c)
typedef struct ns_stage_ctx {
    ns_state_t state;
} ns_stage_ctx_t;

static ns_stage_ctx_t ns_stage_state = {};

With:

Foo (audio_pipeline.c)
typedef struct foo_stage_ctx {
    /* Your required state context here */
} foo_stage_ctx_t;

static foo_stage_ctx_t foo_stage_state = {};
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$DSP Function£££doc/programming_guide/ffd/software_modifications.html#dsp-function

Replace:

XMOS NS (audio_pipeline.c)
static void stage_ns(frame_data_t *frame_data)
{
#if appconfAUDIO_PIPELINE_SKIP_NS
    (void) frame_data;
#else
    int32_t ns_output[appconfAUDIO_PIPELINE_FRAME_ADVANCE];
    configASSERT(NS_FRAME_ADVANCE == appconfAUDIO_PIPELINE_FRAME_ADVANCE);
    ns_process_frame(
                &ns_stage_state.state,
                ns_output,
                frame_data->samples[0]);
    memcpy(frame_data->samples, ns_output, appconfAUDIO_PIPELINE_FRAME_ADVANCE * sizeof(int32_t));
#endif
}

With:

Foo (audio_pipeline.c)
static void stage_foo(frame_data_t *frame_data)
{
    int32_t foo_output[appconfAUDIO_PIPELINE_FRAME_ADVANCE];
    foo_process_frame(
                &foo_stage_state.state,
                foo_output,
                frame_data->samples[0]);
    memcpy(frame_data->samples, foo_output, appconfAUDIO_PIPELINE_FRAME_ADVANCE * sizeof(int32_t));
}
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Runtime Initialization£££doc/programming_guide/ffd/software_modifications.html#runtime-initialization

Replace:

XMOS NS (audio_pipeline.c)
ns_init(&ns_stage_state.state);

With:

Foo (audio_pipeline.c)
foo_init(&foo_stage_state.state);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Audio Pipeline Setup£££doc/programming_guide/ffd/software_modifications.html#audio-pipeline-setup

Replace:

XMOS NS (audio_pipeline.c)
const pipeline_stage_t stages[] = {
    (pipeline_stage_t)stage_vnr_and_ic,
    (pipeline_stage_t)stage_ns,
    (pipeline_stage_t)stage_agc,
};

const configSTACK_DEPTH_TYPE stage_stack_sizes[] = {
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_vnr_and_ic) + RTOS_THREAD_STACK_SIZE(audio_pipeline_input_i),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_ns),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_agc) + RTOS_THREAD_STACK_SIZE(audio_pipeline_output_i),
};

With:

Foo (audio_pipeline.c)
const pipeline_stage_t stages[] = {
    (pipeline_stage_t)stage_vnr_and_ic,
    (pipeline_stage_t)stage_foo,
    (pipeline_stage_t)stage_agc,
};

const configSTACK_DEPTH_TYPE stage_stack_sizes[] = {
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_vnr_and_ic) + RTOS_THREAD_STACK_SIZE(audio_pipeline_input_i),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_foo),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_agc) + RTOS_THREAD_STACK_SIZE(audio_pipeline_output_i),
};

It is also possible to add or remove stages. Refer to the RTOS Framework documentation on the generic pipeline sw_service.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Replacing Example Design Interfaces£££doc/programming_guide/ffd/software_modifications.html#replacing-example-design-interfaces

It may be desired to have a different output interface to talk to a host, or not have a host at all and handle the intent local to the XCORE device.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Different Peripheral IO£££doc/programming_guide/ffd/software_modifications.html#different-peripheral-io

To add or remove a peripheral IO, modify the bsp_config accordingly. Refer to documentation inside the RTOS Framework on how to instantiate different RTOS peripheral drivers.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Direct Control£££doc/programming_guide/ffd/software_modifications.html#direct-control

In a single controller system, the XCORE can be used to control peripherals directly.

The proc_keyword_res task can be modified as follows:

Intent Handler (intent_handler.c)
static void proc_keyword_res(void *args) {
    QueueHandle_t q_intent = (QueueHandle_t) args;
    int32_t id = 0;

    while(1) {
        xQueueReceive(q_intent, &id, portMAX_DELAY);

        /* User logic here */
    }
}

This code example will receive the ID of each intent, and can be populated by any user application logic. User logic can use other RTOS drivers to control various peripherals, such as screens, motors, lights, etc, based on the intent engine outputs.

ffd host direct control diagram

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Speech Recognition - Sensory£££doc/programming_guide/ffd/speech_recognition_sensory.html#speech-recognition-sensory
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$License£££doc/programming_guide/ffd/speech_recognition_sensory.html#license

The Sensory TrulyHandsFree™ (THF) speech recognition library is Copyright (C) 1995-2022 Sensory Inc., All Rights Reserved.

Sensory THF software requires a commercial license granted by Sensory Inc. This software ships with an expiring development license. It will suspend recognition after 11.4 hours or 107 recognition events.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Overview£££doc/programming_guide/ffd/speech_recognition_sensory.html#overview

The Sensory THF speech recognition engine runs proprietary models to identify keywords in an audio stream. Models can be generated using VoiceHub.

Two models are provided - one in US English and one in Mainland Mandarin. The US English model is used by default. To modify the software to use the Mandarin model, see the comment at the top of the ffd_sensory.cmake file. Make sure run the following commands to rebuild and re-flash the data partition:

make clean
make flash_app_example_ffd_sensory -j

To replace the Sensory engine with a different engine, refer to the ASR documentation on Automated Speech Recognition Porting

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Dictionary command table£££doc/programming_guide/ffd/speech_recognition_sensory.html#dictionary-command-table
English Language Demo

Utterances

Type

Return code (decimal)

Hello XMOS

keyword

1

Switch on the TV

command

3

Switch off the TV

command

4

Channel up

command

5

Channel down

command

6

Volume up

command

7

Volume down

command

8

Switch on the lights

command

9

Switch off the lights

command

10

Brightness up

command

11

Brightness down

command

12

Switch on the fan

command

13

Switch off the fan

command

14

Speed up the fan

command

15

Slow down the fan

command

16

Set higher temperature

command

17

Set lower temperature

command

18

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Application Integration£££doc/programming_guide/ffd/speech_recognition_sensory.html#application-integration

In depth information on out of the box integration can be found here: Host Integration

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Speech Recognition - Cyberon£££doc/programming_guide/ffd/speech_recognition_cyberon.html#speech-recognition-cyberon
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$License£££doc/programming_guide/ffd/speech_recognition_cyberon.html#license

Cyberon DSpotter™ software requires a commercial license granted by Cyberon Corporation. This software ships with an expiring development license. It will suspend recognition after 100 recognition events.

Production versions of the DSpotter™ library are unrestricted when running on a specially licensed XMOS device. Please contact Cyberon or XMOS sales for further information.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Overview£££doc/programming_guide/ffd/speech_recognition_cyberon.html#overview

The Cyberon DSpotter™ speech recognition engine runs proprietary models to identify keywords in an audio stream.

One model for US English is provided. For any technical questions or additional models please contact Cyberon.

To replace the Cyberon engine with a different engine, refer to the ASR documentation on Automated Speech Recognition Porting

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Dictionary command table£££doc/programming_guide/ffd/speech_recognition_cyberon.html#dictionary-command-table
English Language Demo

Utterances

Type

Return code (decimal)

Hello XMOS

keyword

1

Hello Cyberon

keyword

1

Switch on the TV

command

2

Switch off the TV

command

3

Channel up

command

4

Channel down

command

5

Volume up

command

6

Volume down

command

7

Switch on the lights

command

8

Switch off the lights

command

9

Brightness up

command

10

Brightness down

command

11

Switch on the fan

command

12

Switch off the fan

command

13

Speed up the fan

command

14

Slow down the fan

command

15

Set higher temperature

command

16

Set lower temperature

command

17

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Local Command$$$Modifying the Software$$$Application Integration£££doc/programming_guide/ffd/speech_recognition_cyberon.html#application-integration

In depth information on out of the box integration can be found here: Host Integration

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command£££doc/programming_guide/low_power_ffd/low_power_ffd.html#low-power-far-field-voice-local-command

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Overview£££doc/programming_guide/low_power_ffd/overview.html#overview

The low power far-field voice local command (Low Power FFD) example design targets low power speech recognition using Sensory’s TrulyHandsfree™ (THF) speech recognition and local dictionary.

When the small wake word model running on tile 1 recognizes a wake word utterance, the device transitions to full power mode where tile 0’s command model begins receiving audio samples, continuing the command recognition process. On command recognition, the application outputs a discrete message over I2C and UART.

Tile 0’s command model, in combination with a timer, determines when to request a transition to low power. Tile 1 may accept or reject this request based on its own timer that is reset on wake word recognitions and potentially other application-specific events. The figure below illustrates the general behavior.

low power ffd timing diagram

When in low power mode, tile 0 is effectively disabled along with any peripheral/IO associated with that tile.

Sensory’s THF software ships with an expiring development license. It will suspend recognition after 11.4 hours or 107 recognition events; after which, a device reset is required to resume normal operation. To perform a reset, either power cycle the device or press the SW2 button. Note that SW2 is only functional while in full power mode (this application is configured to hold the device in full-power mode on such license expiration events).

More information on the Sensory speech recognition library can be found here: Speech Recognition

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Supported Hardware£££doc/programming_guide/low_power_ffd/hardware.html#supported-hardware

This example application is supported on the XK-VOICE-L71 board.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Supported Hardware$$$Setting up the Hardware£££doc/programming_guide/low_power_ffd/hardware.html#setting-up-the-hardware

This example design requires an XTAG4 and XK-VOICE-L71 board.

all components
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Supported Hardware$$$xTAG£££doc/programming_guide/low_power_ffd/hardware.html#xtag

The xTAG is used to program and debug the device

Connect the xTAG to the debug header, as shown below.

xtag

Connect the micro USB XTAG4 and micro USB XK-VOICE-L71 to the programming host.

programming host setup

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Configuring the Firmware£££doc/programming_guide/low_power_ffd/deploying/configuration.html#configuring-the-firmware

The default application performs as described in the Overview. There are numerous compile time options that can be added to change the example design without requiring code changes. To change the options explained in the table below, add the desired configuration variables to the APP_COMPILE_DEFINITIONS CMake variable located in the example’s CMake file here.

If options are changed, the application firmware must be rebuilt.

Low Power FFD Compile Options

Compile Option

Description

Default Value

appconfINTENT_RESET_DELAY_MS

Sets the period after the wake word phrase or subsequent command/wake word phrase has been heard for a valid command phrase

4000

appconfINTENT_UART_OUTPUT_ENABLED

Enables/disables the UART intent message

1

appconfINTENT_I2C_MASTER_OUTPUT_ENABLED

Enables/disables sending the intent message over I2C master

1

appconfUART_BAUD_RATE

Sets the baud rate for the UART tx intent interface

9600

appconfINTENT_I2C_MASTER_DEVICE_ADDR

Sets the I2C slave address to transmit the intent to

0x01

appconfINTENT_TRANSPORT_DELAY_MS

Sets the delay between host wake up requested and I2C and UART keyword code transmission

50

appconfINTENT_QUEUE_LEN

Sets the maximum number of detected intents to hold while waiting for the host to wake up

10

appconfINTENT_WAKEUP_EDGE_TYPE

Sets the host wake up pin GPIO edge type. 0 for rising edge, 1 for falling edge

0

appconfAUDIO_PIPELINE_SKIP_IC_AND_VNR

Enables/disables the IC and VNR

0

appconfAUDIO_PIPELINE_SKIP_NS

Enables/disables the NS

0

appconfAUDIO_PIPELINE_SKIP_AGC

Enables/disables the AGC

0

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS£££doc/programming_guide/low_power_ffd/deploying/linux_macos.html#deploying-the-firmware-with-linux-or-macos

This document explains how to deploy the software using CMake and Make.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Building the Host Applications£££doc/programming_guide/low_power_ffd/deploying/linux_macos.html#building-the-host-applications

This application requires a host application to create the flash data partition. Run the following commands in the root folder to build the host application using your native toolchain:

Note

Permissions may be required to install the host applications.

cmake -B build_host
cd build_host
make install

The host applications will be installed at /opt/xmos/bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Building the Firmware£££doc/programming_guide/low_power_ffd/deploying/linux_macos.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the firmware:

pip install -r requirements.txt
cmake -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
make example_low_power_ffd_sensory
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Running the Firmware£££doc/programming_guide/low_power_ffd/deploying/linux_macos.html#running-the-firmware

Before running the firmware, the filesystem and command model must be flashed to the data partition.

Within the root of the build folder, run:

make flash_app_example_low_power_ffd_sensory

After this command completes, the application will be running.

After flashing the data partition, the application can be run without reflashing. If changes are made to the data partition components, the application must be reflashed.

From the build folder run:

xrun --xscope example_low_power_ffd_sensory.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Linux or macOS$$$Debugging the Firmware£££doc/programming_guide/low_power_ffd/deploying/linux_macos.html#debugging-the-firmware

To debug with xgdb, from the build folder run:

xgdb -ex "connect --xscope" -ex "run" example_low_power_ffd_sensory.xe

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Native Windows£££doc/programming_guide/low_power_ffd/deploying/native_windows.html#deploying-the-firmware-with-native-windows

This document explains how to deploy the software using CMake and Ninja. If you are not using native Windows MSVC build tools and instead using a Linux emulation tool such as WSL, refer to Deploying the Firmware with Linux or macOS.

To install Ninja follow install instructions at https://ninja-build.org/ or on Windows install with winget by running the following commands in PowerShell:

# Install
winget install Ninja-build.ninja
# Reload user Path
$env:Path=[System.Environment]::GetEnvironmentVariable("Path","User")
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Building the Host Applications£££doc/programming_guide/low_power_ffd/deploying/native_windows.html#building-the-host-applications

This application requires a host application to create the flash data partition. Run the following commands in the root folder to build the host application using your native toolchain:

Note

Permissions may be required to install the host applications.

Note

A C/C++ compiler, such as Visual Studio or MinGW, must be included in the path.

Before building the host application, you will need to add the path to the XTC Tools to your environment.

set "XMOS_TOOL_PATH=<path-to-xtc-tools>"

Then build the host application:

cmake -G Ninja -B build_host
cd build_host
ninja install

The host applications will be installed at %USERPROFILE%\.xmos\bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Building the Firmware£££doc/programming_guide/low_power_ffd/deploying/native_windows.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the firmware:

pip install -r requirements.txt
cmake -G Ninja -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
ninja example_low_power_ffd_sensory
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Running the Firmware£££doc/programming_guide/low_power_ffd/deploying/native_windows.html#running-the-firmware

Before running the firmware, the filesystem and command model must be flashed to the data partition.

Within the root of the build folder, run:

ninja flash_app_example_low_power_ffd_sensory

After this command completes, the application will be running.

After flashing the data partition, the application can be run without reflashing. If changes are made to the data partition components, the application must be reflashed.

From the build folder run:

xrun --xscope example_low_power_ffd_sensory.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Deploying the Firmware with Native Windows$$$Debugging the Firmware£££doc/programming_guide/low_power_ffd/deploying/native_windows.html#debugging-the-firmware

To debug with xgdb, from the build folder run:

xgdb -ex "connect --xscope" -ex "run" example_low_power_ffd_sensory.xe

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software£££doc/programming_guide/low_power_ffd/modifying.html#modifying-the-software

The low-power FFD example design is highly customizable. This section describes how to modify the application.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Host Integration£££doc/programming_guide/low_power_ffd/host_integration.html#host-integration
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Overview£££doc/programming_guide/low_power_ffd/host_integration.html#overview

This section describes the connections that would need to be made to an external host for plug and play integration with existing devices.

When an intent is found, the XCORE device will check if the host is awake, by checking the Host Status GPIO pin. If the host is awake the intent code will be transmitted over I2C and/or UART.

If the host is not awake, the XCORE device will trigger a transition of the Wakeup GPIO pin. This can be configured to be a rising or falling edge. The XCORE device will then wait for a fixed period of time, set at compile time, before transmitting the intent over the I2C and/or UART interface. This behavior can be changed as desired by modifying the intent handling code.

low power FFD host integration diagram

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$UART£££doc/programming_guide/low_power_ffd/host_integration.html#uart
UART Connections

Low Power FFD Connection

Host Connection

J4:24

UART RX

J4:20

GND

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$I2C£££doc/programming_guide/low_power_ffd/host_integration.html#i2c
I2C Connections

Low Power FFD Connection

Host Connection

J4:3

SDA

J4:5

SCL

J4:9

GND

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$GPIO£££doc/programming_guide/low_power_ffd/host_integration.html#gpio
GPIO Connections

Low Power FFD Connection

Host Connection

J4:19

Wake up input

J4:21

Host Status output

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Audio Pipeline£££doc/programming_guide/low_power_ffd/audio_pipeline.html#audio-pipeline

The audio pipeline in Low Power FFD processes two channel PDM microphone input into a single output channel, intended for use by an ASR engine.

The audio pipeline consists of 3 stages.

FFD Audio Pipeline

Stage

Description

Input Channel Count

Output Channel Count

1

Interference Canceller and Voice Noise Ratio

2

1

2

Noise Suppression

1

1

3

Automatic Gain Control

1

1

See the Voice Framework User Guide for more information.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Software Description£££doc/programming_guide/low_power_ffd/software_description.html#software-description
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Overview£££doc/programming_guide/low_power_ffd/software_desc/overview.html#overview

The approximate resource utilizations for Low Power FFD are shown in the table below.

Low Power FFD Resources

Resource

Tile 0

Tile 1

Unused CPU Time (600MHz | 200MHz)

50%

10%

Total Memory Free

19.1k

5.3k

Runtime Heap Memory Free

219k

12.4k

The estimated (core) power usage for Low Power FFD are shown in the table below. Additional power savings may be possible using Sensory’s Low Power Sound Detect (LPSD) option which approaches sub-50mW operation in Low Power mode. These measurements will vary based on component tolerances and any user added code and/or user added compile options.

Low Power FFD Power Usage

Power State

Core Power (mW)

Low Power

54

Full Power

110

The description of the software is split up by folder:

Low Power FFD Software Description

Folder

Description

bsp_config

Board support configuration setting up software based IO peripherals

filesystem_support

Filesystem contents for application

model

Wake word and command model files

src

Main application

src/gpio_ctrl

GPIO and LED related functions

src/intent_engine

Intent engine integration

src/intent_handler

Intent engine output integration

src/power

Low power control logic

src/wakeword

Wake word engine integration

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$bsp_config£££doc/programming_guide/low_power_ffd/software_desc/bsp_config.html#bsp-config

This folder contains bsp_configs for the Low Power FFD application. More information on bsp_configs can be found in the RTOS Framework documentation.

Low Power FFD bsp_config

Filename/Directory

Description

dac directory

DAC ports for supported bsp_configs (not used in example, disabled)

XK_VOICE_L71 directory

default Low Power FFD application bsp_config

bsp_config.cmake

cmake for adding Low Power FFD bsp_configs

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$filesystem_support£££doc/programming_guide/low_power_ffd/software_desc/filesystem_support.html#filesystem-support

This folder contains filesystem contents for the Low Power FFD application.

Low Power FFD filesystem_support

Filename/Directory

Description

demo.txt

A file for demonstrative purposes containing the text “Hello World!”. This file is not used or interacted with in this application.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$model£££doc/programming_guide/low_power_ffd/software_desc/model.html#model

This folder contains the Sensory wake word and command model files the Low Power FFD application.

Note

Only a subset of the files below are used. See low_power_ffd.cmake for the files used by the application. Also note the nibble-swapped net-file is manually generated, via the nibble_swap tool found in lib_qspi_fast_read.

Low Power FFD model

Filename/Directory

Description

command-pc62w-6.1.0-op10-prod-net.bin

The command model’s net-file, in binary-form

command-pc62w-6.1.0-op10-prod-net.bin.nibble_swapped

The command model’s net-file, in binary-form (nibble swapped, for supporting fast flash reads)

command-pc62w-6.1.0-op10-prod-net.c

The command model’s net-file, in source form

command-pc62w-6.1.0-op10-prod-search.bin

The command model’s search-file, in binary form

command-pc62w-6.1.0-op10-prod-search.c

The command model’s search-file, in source form

command-pc62w-6.1.0-op10-prod-search.h

The command model’s search header-file

command.snsr

The command model’s Sensory THF/TNL SDK “snsr” file

wakeword-pc60w-6.1.0-op10-prod-net.bin

The wake word model’s net-file, in binary-form

wakeword-pc60w-6.1.0-op10-prod-net.c

The wake word model’s net-file, in source form

wakeword-pc60w-6.1.0-op10-prod-search.bin

The wake word model’s search-file, in binary form

wakeword-pc60w-6.1.0-op10-prod-search.c

The wake word model’s search-file, in source form

wakeword-pc60w-6.1.0-op10-prod-search.h

The wake word model’s search header-file

wakeword.snsr

The wake word model’s Sensory THF/TNL SDK “snsr” file

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$src£££doc/programming_guide/low_power_ffd/software_desc/src.html#src

This folder contains the core application source.

FFD src

Filename/Directory

Description

gpio_ctrl directory

contains general purpose input handling and LED handling tasks

intent_engine directory

contains intent engine code

intent_handler directory

contains intent handling code

power directory

contains low power control logic and related audio buffer

rtos_conf directory

contains default FreeRTOS configuration headers

wakeword directory

contains wake word detection code

app_conf_check.h

header to validate app_conf.h

app_conf.h

header to describe app configuration

config.xscope

xscope configuration file

ff_appconf.h

default fatfs configuration header

main.c

main application source file

device_memory_impl.c

contains XCORE device memory functions for supporting ASR functionality

device_memory_impl.h

header for the device memory implementation

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Audio Pipeline£££doc/programming_guide/low_power_ffd/software_desc/src.html#audio-pipeline

The audio pipeline module provides the application with three API functions:

Audio Pipeline API (audio_pipeline.h)
void audio_pipeline_init(
        void *input_app_data,
        void *output_app_data);

void audio_pipeline_input(
        void *input_app_data,
        int32_t **input_audio_frames,
        size_t ch_count,
        size_t frame_count);

int audio_pipeline_output(
        void *output_app_data,
        int32_t **output_audio_frames,
        size_t ch_count,
        size_t frame_count);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$audio_pipeline_init£££doc/programming_guide/low_power_ffd/software_desc/src.html#audio-pipeline-init

This function has the role of creating the audio pipeline, with two optional application pointers which are provided to the application in the audio_pipeline_input() and audio_pipeline_output() callbacks.

In Low Power FFD, the audio pipeline is initialized with no additional arguments, and instantiates a 3 stage pipeline on tile 1, as described in: Audio Pipeline

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$audio_pipeline_input£££doc/programming_guide/low_power_ffd/software_desc/src.html#audio-pipeline-input

This function has the role of providing the audio pipeline with the input frames.

In Low Power FFD, the input is received from the rtos_mic_array driver.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$audio_pipeline_output£££doc/programming_guide/low_power_ffd/software_desc/src.html#audio-pipeline-output

This function has the role of receiving the processed audio pipeline output.

In Low Power FFD, the output is sent to both the wake word handler and the intent engine. Because the intent engine will be suspended in low power mode and that there is a finite time that it takes to resume full power operation, there is a ring buffer placed between the audio output received from this routine and the intent engine’s stream buffer.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Main£££doc/programming_guide/low_power_ffd/software_desc/src.html#main

The major components of main are:

Main components (main.c)
void startup_task(void *arg)
void vApplicationMinimalIdleHook(void)
void tile_common_init(chanend_t c)
void main_tile0(chanend_t c0, chanend_t c1, chanend_t c2, chanend_t c3)
void main_tile1(chanend_t c0, chanend_t c1, chanend_t c2, chanend_t c3)
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$startup_task£££doc/programming_guide/low_power_ffd/software_desc/src.html#startup-task

This function has the role of launching tasks on each tile. For those familiar with XCORE, it is comparable to the main par loop in an XC main.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$vApplicationMinimalIdleHook£££doc/programming_guide/low_power_ffd/software_desc/src.html#vapplicationminimalidlehook

This is a FreeRTOS callback. By calling “waiteu” without events configured, this has the effect of both MIPs and power savings on XCORE.

vApplicationMinimalIdleHook (main.c)
asm volatile("waiteu");
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$tile_common_init£££doc/programming_guide/low_power_ffd/software_desc/src.html#tile-common-init

This function is the common tile initialization, which initializes the bsp_config, creates the startup task, and starts the FreeRTOS kernel.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$main_tile0£££doc/programming_guide/low_power_ffd/software_desc/src.html#main-tile0

This function is the application C entry point on tile 0, provided by the SDK.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$main_tile1£££doc/programming_guide/low_power_ffd/software_desc/src.html#main-tile1

This function is the application C entry point on tile 1, provided by the SDK.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$src/gpio_ctrl£££doc/programming_guide/low_power_ffd/software_desc/gpio_ctrl.html#src-gpio-ctrl

This folder contains the GPIO and LED related functionality for the Low Power FFD application.

Low Power FFD gpio_ctrl

Filename/Directory

Description

gpi_ctrl.c

The general purpose input control source file. Implements SW2 reset logic.

gpi_ctrl.h

The general purpose input control header file.

leds.c

The LED task source file. Handles the applications LED indications.

leds.h

The LED task header file.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$src/intent_engine£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#src-intent-engine

This folder contains the intent engine module for the low power FFD application.

Low Power FFD Intent Engine

Filename/Directory

Description

intent_engine_io.c

contains additional io intent engine code

intent_engine_support.c

contains general intent engine support code

intent_engine.c

contains the implementation of default intent engine code

intent_engine.h

header for intent engine code

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Major Components£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#major-components

The intent engine module provides the application with the following primary API functions:

Intent Engine API (intent_engine.h)
int32_t intent_engine_create(uint32_t priority, void *args);
void intent_engine_ready_sync(void);
int32_t intent_engine_sample_push(asr_sample_t *buf, size_t frames);

These APIs provide the functionality needed to feed audio pipeline samples into the ASR engine.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_create£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#intent-engine-create

This function has the role of creating the model running task and providing a pointer, which can be used by the application to handle the output intent result. In the case of the default configuration, the application provides a FreeRTOS Queue object.

In Low Power FFD, the audio pipeline output is on tile 1 and the ASR engine on tile 0.

intent_engine_create snippet (intent_engine_io.c)
intent_engine_intertile_task_create(priority);

The call to intent_engine_intertile_task_create() will create two threads on tile 0. One thread is the ASR engine thread. The other thread is an intertile RX thread, which will interface with the audio pipeline output.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_ready_sync£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#intent-engine-ready-sync

This function is called by both tiles and serves to ensure that tile 0 is ready to receive audio samples before starting the audio pipeline. This is a preventative measure to avoid dropping samples at startup.

intent_engine_create snippet (intent_engine_io.c)
    int sync = 0;
#if ON_TILE(AUDIO_PIPELINE_OUTPUT_TILE_NO)
    size_t len = rtos_intertile_rx_len(intertile_ctx, appconfINTENT_ENGINE_READY_SYNC_PORT, RTOS_OSAL_WAIT_FOREVER);
    xassert(len == sizeof(sync));
    rtos_intertile_rx_data(intertile_ctx, &sync, sizeof(sync));
#else
    rtos_intertile_tx(intertile_ctx, appconfINTENT_ENGINE_READY_SYNC_PORT, &sync, sizeof(sync));
#endif
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_sample_push£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#intent-engine-sample-push

This function has the role of sending the ASR output channel from the audio pipeline to the intent engine.

In Low Power FFD, the audio pipeline output is on tile 1 and the ASR engine on tile 0.

intent_engine_create snippet (intent_engine_io.c)
    intent_engine_samples_send_remote(
            intertile_ap_ctx,
            frames,
            buf);

The call to intent_engine_samples_send_remote() will send the audio samples to the previously configured intertile RX thread.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$intent_engine_process_asr_result£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#intent-engine-process-asr-result

This function can be replaced by the application to handle the intent in a completely different manner.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Low Power Components£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#low-power-components

The following APIs are the intent engine mechanisms needed by the power control task.

Low Power APIs (intent_engine.h)
void intent_engine_full_power_request(void);
void intent_engine_low_power_accept(void);

In this implementation, it is the responsibility of tile 0 (intent engine tile) to determine when to request a transition into low power mode; however, tile 1 may reject the request. When tile 1 accepts the request (via LOW_POWER_ACK), the power control task calls intent_engine_low_power_accept. When tile 1 rejects the request (via LOW_POWER_NAK), the power control task calls intent_engine_full_power_request.

Note

There is an additional LOW_POWER_HALT response where the power control task calls intent_engine_halt. This is primarily for end-of-evaluation handling logic for the underlying ASR engine and is not needed for a normal application.

After tile 1 accepts the low power request, tile 0 begins preparations for entering low power by locking various resources and waiting for any enqueued commands to finish up. The helper functions below are provided for this purpose.

Low Power Helper Functions (intent_engine.h)
int32_t intent_engine_keyword_queue_count(void);
void intent_engine_keyword_queue_complete(void);
uint8_t intent_engine_low_power_ready(void);

Before tile 1 sends LOW_POWER_ACK it also stops pushing audio samples via intent_engine_sample_push. After receiving the low power response, the application may clear the stream buffer and keyword queue to avoid processing stale samples/commands when returning to full power mode. The functions below provide this functionality.

Low Power Helper Functions (intent_engine.h)
void intent_engine_keyword_queue_reset(void);
void intent_engine_stream_buf_reset(void);

Note

Since it is possible that a command is spoken/recognized between the time when tile 0 requests low power and when tile 1 responds to the request, the application should not reset these buffer entities until it has received LOW_POWER_ACK; otherwise, recognized commands may be lost.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Evaluation Specific Components£££doc/programming_guide/low_power_ffd/software_desc/intent_engine.html#evaluation-specific-components

The following functions are provided for the primary purpose of facilitating the evaluation of the ASR model. The provided ASR models have evaluation periods which will end due to various factors. When the evaluation period ends, the application logic halts the intent engine via intent_engine_halt. This is primarily to ensure the device remains in full-power mode to allow functionality that may be exclusive to tile 0 to function.

Evaluation-specific Helper Functions (intent_engine.h)
void intent_engine_halt(void);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$src/intent_handler£££doc/programming_guide/low_power_ffd/software_desc/intent_handler.html#src-intent-handler

This folder contains ASR output handling modules for the Low Power FFD application.

FFD Intent handler

Filename/Directory

Description

intent_handler.c

contains the implementation of default intent handling code

intent_handler.h

header for intent handler code

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Major Components£££doc/programming_guide/low_power_ffd/software_desc/intent_handler.html#major-components

The intent handling module provides the application with one API function:

Intent Handler API (intent_handler.h)
int32_t intent_handler_create(uint32_t priority, void *args);

If replacing the existing handler code, this is the only function that is required to be populated.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$intent_handler_create£££doc/programming_guide/low_power_ffd/software_desc/intent_handler.html#intent-handler-create

This function has the role of creating the keyword handling task for the ASR engine. In the case of the Sensory model, the application provides a FreeRTOS Queue object. This handler is on the same tile as the Sensory engine, tile 0.

The call to intent_handler_create() will create one thread on tile 0. This thread will receive ID packets from the ASR engine over a FreeRTOS Queue object and output over various IO interfaces based on configuration.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$src/power£££doc/programming_guide/low_power_ffd/software_desc/power.html#src-power

This folder contains the low power control logic and supporting logic.

Low Power FFD power

Filename/Directory

Description

low_power_audio_buffer.c

Implementation of an audio sample ring buffer. Aids in responsiveness to commands during a transition to full power mode.

low_power_audio_buffer.c

Header for the low power audio buffer.

power_control.c

Implementation of the power control logic.

power_control.h

Header for power control logic.

power_state.c

Implementation of Tile 1 power state logic.

power_state.h

Header for power state logic.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Major Components£££doc/programming_guide/low_power_ffd/software_desc/power.html#major-components

The power control module provides the application with the following primary API functions:

Power Control API (power_control.h)
void power_control_task_create(unsigned priority, void *args);
void power_control_exit_low_power(void);
power_state_t power_control_state_get(void);
void power_control_halt(void);
void power_control_req_low_power(void);
void power_control_ind_complete(void);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_control_task_create£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-control-task-create

Creates and starts the power control task. To be called by each tile.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_control_exit_low_power£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-control-exit-low-power

Applicable only for Tile 1. Begins a transition to full power mode and is intended to be called by the power_state_set() routine.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_control_state_get£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-control-state-get

Applicable only for Tile 1. Gets the current power state.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_control_halt£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-control-halt

Applicable only for Tile 1. Halts the power control task. This is provided primarily for end-of-evaluation logic, but severs to terminate the low power logic. When halted, the system remains in full power mode.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_control_req_low_power£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-control-req-low-power

Applicable only for Tile 0. Requests a transition to low power mode.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_control_ind_complete£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-control-ind-complete

Applicable only for Tile 0. Indication that the last step for preparing for a low power transition has completed and allows the power control task to continue with final steps. This is primarily to ensure the LED indications are up-to-date before driver locks are taken (which include GPIO/LED control).

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Power State Components£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-state-components

The power state module provides the application with the following primary API functions:

Power State API (power_state.h)
void power_state_init();
void power_state_set(power_state_t state);
uint8_t power_state_timer_expired_get(void);

This module is also responsible for providing the base power state datatype (power_state_t) used by other low power logic.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_state_init£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-state-init

Initializes the power state module. Responsible to initializing the underlying timer that effectively determines whether a low power request by Tile 0 is accepted or rejected.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_state_set£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-state-set

Used by Tile 1’s application to signal full power events (such as wake word detection or other application-specific events). Used by Tile 1’s power control logic to signal low power only after Tile 0 has requested low power mode and the local timer has expired.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$power_state_timer_expired_get£££doc/programming_guide/low_power_ffd/software_desc/power.html#power-state-timer-expired-get

Used by the Tile 1’s power control logic to determine whether to accept or reject a low power request by Tile 0.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$src/wakeword£££doc/programming_guide/low_power_ffd/software_desc/wakeword.html#src-wakeword

This folder contains the wake word recognition functionality for the Low Power FFD application.

Low Power FFD wakeword

Filename/Directory

Description

wakeword.c

The wake word engine source file. Responsible for the transfer of audio samples into the ASR and handling of wake word detection events.

wakeword.h

The wake word engine header file.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Major Components£££doc/programming_guide/low_power_ffd/software_desc/wakeword.html#major-components

The wakeword module provides the application with two API functions:

Wake Word API (wakeword.h)
void wakeword_init(void);
wakeword_result_t wakeword_handler(asr_sample_t *buf, size_t num_frames);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$wakeword_init£££doc/programming_guide/low_power_ffd/software_desc/wakeword.html#wakeword-init

This function performs the required initialization for the wakeword_handler() function to operate. This involves initializing an instance of devmem_manager_t for use by the ASR abstraction layer and initialization of the ASR unit itself. It is to be called once during startup before any call to wakeword_handler() occurs.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$wakeword_handler£££doc/programming_guide/low_power_ffd/software_desc/wakeword.html#wakeword-handler

This function performs wake word detection logic and reports back to the caller a result, indicating whether a wake word was recognized. Note: this routine is called by audio_pipeline_output(), meaning this routine’s logic should be kept to a minimum to ensure timing requirements are met.

In this implementation a single wake word ID of 1 is defined. Minimal adaptation is needed to support other models supporting other IDs or more than one valid wake word.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Software Modifications£££doc/programming_guide/low_power_ffd/software_modifications.html#software-modifications

The Low Power FFD example design consists of four major software blocks: the audio pipeline, ASR engine (wake word and intent engines), intent handler, and power control. This section will go into detail on how to replace each subsystem.

low power ffd diagram

It is highly recommended to be familiar with the application as a whole before attempting replacing these functional units. This information can be found here: Software Description

See Software Description for more details on the memory footprint and CPU usage of the major software components.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Replacing XCORE-VOICE DSP Block£££doc/programming_guide/low_power_ffd/software_modifications.html#replacing-xcore-voice-dsp-block

The audio pipeline can be replaced by making changes to the audio_pipeline.c file.

It is up to the user to ensure that the input and output frames of the audio pipeline remain the same, or the remainder of the application will not function properly.

This section will walk through an example of replacing the XMOS NS stage, with a custom stage foo.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Declaration and Definition of DSP Context£££doc/programming_guide/low_power_ffd/software_modifications.html#declaration-and-definition-of-dsp-context

Replace:

XMOS NS (audio_pipeline.c)
typedef struct ns_stage_ctx {
    ns_state_t state;
} ns_stage_ctx_t;

static ns_stage_ctx_t ns_stage_state = {};

With:

Foo (audio_pipeline.c)
typedef struct foo_stage_ctx {
    /* Your required state context here */
} foo_stage_ctx_t;

static foo_stage_ctx_t foo_stage_state = {};
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$DSP Function£££doc/programming_guide/low_power_ffd/software_modifications.html#dsp-function

Replace:

XMOS NS (audio_pipeline.c)
static void stage_ns(frame_data_t *frame_data)
{
#if appconfAUDIO_PIPELINE_SKIP_NS
    (void) frame_data;
#else
    int32_t ns_output[appconfAUDIO_PIPELINE_FRAME_ADVANCE];
    configASSERT(NS_FRAME_ADVANCE == appconfAUDIO_PIPELINE_FRAME_ADVANCE);
    ns_process_frame(
                &ns_stage_state.state,
                ns_output,
                frame_data->samples[0]);
    memcpy(frame_data->samples, ns_output, appconfAUDIO_PIPELINE_FRAME_ADVANCE * sizeof(int32_t));
#endif
}

With:

Foo (audio_pipeline.c)
static void stage_foo(frame_data_t *frame_data)
{
    int32_t foo_output[appconfAUDIO_PIPELINE_FRAME_ADVANCE];
    foo_process_frame(
                &foo_stage_state.state,
                foo_output,
                frame_data->samples[0]);
    memcpy(frame_data->samples, foo_output, appconfAUDIO_PIPELINE_FRAME_ADVANCE * sizeof(int32_t));
}
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Runtime Initialization£££doc/programming_guide/low_power_ffd/software_modifications.html#runtime-initialization

Replace:

XMOS NS (audio_pipeline.c)
ns_init(&ns_stage_state.state);

With:

Foo (audio_pipeline.c)
foo_init(&foo_stage_state.state);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Audio Pipeline Setup£££doc/programming_guide/low_power_ffd/software_modifications.html#audio-pipeline-setup

Replace:

XMOS NS (audio_pipeline.c)
const pipeline_stage_t stages[] = {
    (pipeline_stage_t)stage_vnr_and_ic,
    (pipeline_stage_t)stage_ns,
    (pipeline_stage_t)stage_agc,
};

const configSTACK_DEPTH_TYPE stage_stack_sizes[] = {
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_vnr_and_ic) + RTOS_THREAD_STACK_SIZE(audio_pipeline_input_i),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_ns),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_agc) + RTOS_THREAD_STACK_SIZE(audio_pipeline_output_i),
};

With:

Foo (audio_pipeline.c)
const pipeline_stage_t stages[] = {
    (pipeline_stage_t)stage_vnr_and_ic,
    (pipeline_stage_t)stage_foo,
    (pipeline_stage_t)stage_agc,
};

const configSTACK_DEPTH_TYPE stage_stack_sizes[] = {
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_vnr_and_ic) + RTOS_THREAD_STACK_SIZE(audio_pipeline_input_i),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_foo),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_agc) + RTOS_THREAD_STACK_SIZE(audio_pipeline_output_i),
};

It is also possible to add or remove stages. Refer to the RTOS Framework documentation on the generic pipeline sw_service.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Replacing ASR Engine Block£££doc/programming_guide/low_power_ffd/software_modifications.html#replacing-asr-engine-block

Replacing the keyword spotter engine has the potential to require significant changes due to various feature extraction input requirements and varied output logic.

The generic intent engine API only requires two functions be declared:

Intent API (intent_engine.h)
/* Generic interface for intent engines */
int32_t intent_engine_create(uint32_t priority, void *args);
int32_t intent_engine_sample_push(asr_sample_t *buf, size_t frames);

Refer to the existing Sensory model implementation for details on how the output handler is set up, how the audio is conditioned to the expected model format, and how it receives frames from the audio pipeline.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Replacing Example Design Interfaces£££doc/programming_guide/low_power_ffd/software_modifications.html#replacing-example-design-interfaces

It may be desired to have a different output interface to talk to a host, or not have a host at all and handle the intent local to the XCORE device.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Different Peripheral IO£££doc/programming_guide/low_power_ffd/software_modifications.html#different-peripheral-io

To add or remove a peripheral IO, modify the bsp_config accordingly. Refer to documentation inside the RTOS Framework on how to instantiate different RTOS peripheral drivers.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Direct Control£££doc/programming_guide/low_power_ffd/software_modifications.html#direct-control

In a single controller system, the XCORE can be used to control peripherals directly.

The proc_keyword_res task can be modified as follows:

Intent Handler (intent_handler.c)
static void proc_keyword_res(void *args) {
    QueueHandle_t q_intent = (QueueHandle_t) args;
    int32_t id = 0;

    while(1) {
        xQueueReceive(q_intent, &id, portMAX_DELAY);

        /* User logic here */
    }
}

This code example will receive the ID of each intent, and can be populated by any user application logic. User logic can use other RTOS drivers to control various peripherals, such as screens, motors, lights, etc, based on the intent engine outputs.

low power ffd host direct control diagram

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Replacing Example Power Control Logic£££doc/programming_guide/low_power_ffd/software_modifications.html#replacing-example-power-control-logic

Depending on the peripherals used in the end application, the requirements and handling of the power control/state logic may need adaptation. The power control logic operates in a task where a state machine that is common to both tiles is used. During steady state, each tile is expected to remain is the same state. During transitions each tile executes its own state transition logic. Below outlines the various functions that may need adaptation for a given application.

Locking drivers (power_control.c)
static void driver_control_lock(void)
{
#if ON_TILE(POWER_CONTROL_TILE_NO)
    rtos_osal_mutex_get(&gpio_ctx_t0->lock, RTOS_OSAL_WAIT_FOREVER);
#else
    rtos_osal_mutex_get(&qspi_flash_ctx->mutex, RTOS_OSAL_WAIT_FOREVER);
    /* User logic here */
#endif
}
Unlocking drivers (power_control.c)
static void driver_control_unlock(void)
{
#if ON_TILE(POWER_CONTROL_TILE_NO)
    rtos_osal_mutex_put(&gpio_ctx_t0->lock);
#else
    /* User logic here */
    rtos_osal_mutex_put(&qspi_flash_ctx->mutex);
#endif
}

This implementation also includes function calls that are for evaluation/diagnosis purposes and may be removed for end applications. This includes calls to:

  • led_indicate_awake

  • led_indicate_asleep

When removing these calls, the associated call to power_control_ind_complete must either be moved to another location in the application (this is currently handled in led.c’s led_task) or logic associated with TASK_NOTIF_MASK_LP_IND_COMPLETE should be removed/disabled. The power_control_ind_complete routine provides a basic means for the power control task to wait for another asynchronous process to complete before proceeding with the state transition logic.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Speech Recognition£££doc/programming_guide/low_power_ffd/speech_recognition.html#speech-recognition
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$License£££doc/programming_guide/low_power_ffd/speech_recognition.html#license

The Sensory TrulyHandsFree™ (THF) speech recognition library is Copyright (C) 1995-2022 Sensory Inc., All Rights Reserved.

Sensory THF software requires a commercial license granted by Sensory Inc. This software ships with an expiring development license. It will suspend recognition after 11.4 hours or 107 recognition events.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Overview£££doc/programming_guide/low_power_ffd/speech_recognition.html#overview

The Sensory THF speech recognition engine runs proprietary models to identify keywords in an audio stream. Models can be generated using VoiceHub.

Two models are provided for the purpose of Low Power FFD. The small wake word model running on tile 1 is approximately 67KB. The command model running on tile 0 is approximately 289KB. On tile 1, the Sensory runtime and application supporting code consumes approximately 239KB of SRAM. On tile 0, the Sensory runtime and application supporting code consumes approximately 210KB of SRAM.

With the command model in flash, the Sensory engine requires a core frequency of at least 450 MHz to keep up with real time. Additionally, the intent engine that is responsible for processing the commands must be on the same tile as the flash.

To run with a different model, see the Set Sensory model variables section of the low_power_ffd.cmake file. There several variables are set pointing to files that are part of the VoiceHub generated model download. Change these variables to point to the files you downloaded. This can be done for both the wakeword and command models. The command model “net.bin” file, because it is placed in flash memory, must first be nibble swapped. A utility is provided that is part of the host applications built during install. Run that application with the following command:

nibble_swap <your-model-prod-net.bin> <your-model-prod-net.bin.nibble_swapped>

Make sure run the following commands to rebuild and re-flash the data partition:

make clean
make flash_app_example_low_power_ffd -j

You may also wish to modify the command ID-to-string lookup table which is located in the src/intent_engine/intent_engine_io.c source file.

To replace the Sensory engine with a different engine, refer to the ASR documentation on Automated Speech Recognition Porting

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Wake Word Dictionary£££doc/programming_guide/low_power_ffd/speech_recognition.html#wake-word-dictionary
English Language Wake Words

Return code (decimal)

Utterance

1

Hello XMOS

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Command Dictionary£££doc/programming_guide/low_power_ffd/speech_recognition.html#command-dictionary
English Language Commands

Return code (decimal)

Utterance

1

Switch on the TV

2

Channel up

3

Channel down

4

Volume up

5

Volume down

6

Switch off the TV

7

Switch on the lights

8

Brightness up

9

Brightness down

10

Switch off the lights

11

Switch on the fan

12

Speed up the fan

13

Slow down the fan

14

Set higher temperature

15

Set lower temperature

16

Switch off the fan

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Low Power Far-field Voice Local Command$$$Modifying the Software$$$Application Integration£££doc/programming_guide/low_power_ffd/speech_recognition.html#application-integration

In depth information on out of the box integration can be found here: Host Integration

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant£££doc/programming_guide/ffva/ffva.html#far-field-voice-assistant

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Overview£££doc/programming_guide/ffva/overview.html#overview

This is the XCORE-VOICE far-field voice assistant example design.

This application can be used out of the box as a voice processor solution, or expanded to run local wakeword engines.

This application features a full duplex acoustic echo cancellation stage, which can be provided reference audio via I2S or USB audio. An audio output ASR stream is also available via I2S or USB audio.

By default, there are two audio integration options. The INT (Integrated) configuration uses I2S for reference and output audio streams. The UA (USB Accessory) configuration uses USB UAC 2.0 for reference and output audio streams.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Supported Hardware£££doc/programming_guide/ffva/hardware.html#supported-hardware

This example application is supported on the XK-VOICE-L71 board.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Supported Hardware$$$Setting up the Hardware£££doc/programming_guide/ffva/hardware.html#setting-up-the-hardware

This example design requires an XTAG4 and XK-VOICE-L71 board.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Supported Hardware$$$xTAG£££doc/programming_guide/ffva/hardware.html#xtag

The xTAG is used to program and debug the device

Connect the xTAG to the debug header, as shown below.

xtag

Connect the micro USB XTAG4 and micro USB XK-VOICE-L71 to the programming host.

programming host setup

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS£££doc/programming_guide/ffva/deploying/linux_macos.html#deploying-the-firmware-with-linux-or-macos

This document explains how to deploy the software using CMake and Make.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS$$$Building the Host Applications£££doc/programming_guide/ffva/deploying/linux_macos.html#building-the-host-applications

This application requires a host application to create the flash data partition. Run the following commands in the root folder to build the host application using your native Toolchain:

Note

Permissions may be required to install the host applications.

cmake -B build_host
cd build_host
make install

The host applications will be installed at /opt/xmos/bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS$$$Building the Firmware£££doc/programming_guide/ffva/deploying/linux_macos.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the I2S firmware:

pip install -r requirements.txt
cmake -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
make example_ffva_int_fixed_delay

After having your python environment activated, run the following commands in the root folder to build the I2S firmware with the Cyberon ASR engine:

pip install -r requirements.txt
cmake -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
make example_ffva_int_cyberon_fixed_delay

After having your python environment activated, run the following commands in the root folder to build the USB firmware:

pip install -r requirements.txt
cmake -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
make example_ffva_ua_adec_altarch
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS$$$Running the Firmware£££doc/programming_guide/ffva/deploying/linux_macos.html#running-the-firmware

Before the firmware is run, the filesystem must be loaded.

Inside of the build folder root, after building the firmware, run one of:

make flash_app_example_ffva_int_fixed_delay
make flash_app_example_ffva_int_cyberon_fixed_delay
make flash_app_example_ffva_ua_adec_altarch

Once flashed, the application will run.

After the filesystem has been flashed once, the application can be run without flashing. If changes are made to the filesystem image, the application must be reflashed.

From the build folder run:

xrun --xscope example_ffva_int_fixed_delay.xe
xrun --xscope example_ffva_int_cyberon_fixed_delay.xe
xrun --xscope example_ffva_ua_adec_altarch.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS$$$Upgrading the Firmware£££doc/programming_guide/ffva/deploying/linux_macos.html#upgrading-the-firmware
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS$$$UA variant£££doc/programming_guide/ffva/deploying/linux_macos.html#ua-variant

The UA variants of this application contain DFU over the USB DFU Class V1.1 transport method.

To create an upgrade image from the build folder run:

make create_upgrade_img_example_ffva_ua_adec_altarch

Once the application is running, a USB DFU v1.1 tool can be used to perform various actions. This example will demonstrate with dfu-util commands. Installation instructions for the respective operating systems can be found here.

To verify the device is running run:

dfu-util -l

This should result in an output containing:

Found DFU: [20b1:4001] ver=0001, devnum=100, cfg=1, intf=3, path="3-4.3", alt=2, name="DFU DATAPARTITION", serial="123456"
Found DFU: [20b1:4001] ver=0001, devnum=100, cfg=1, intf=3, path="3-4.3", alt=1, name="DFU UPGRADE", serial="123456"
Found DFU: [20b1:4001] ver=0001, devnum=100, cfg=1, intf=3, path="3-4.3", alt=0, name="DFU FACTORY", serial="123456"

The DFU interprets the flash as 3 separate partitions, the read only factory image, the read/write upgrade image, and the read/write data partition containing the filesystem.

The factory image can be read back by running:

dfu-util -e -d ,20b1:4001 -a 0 -U readback_factory_img.bin

The factory image can not be written to.

From the build folder, the upgrade image can be written by running:

dfu-util -e -d ,20b1:4001 -a 1 -D example_ffva_ua_adec_altarch_upgrade.bin

The upgrade image can be read back by running:

dfu-util -e -d ,20b1:4001 -a 1 -U readback_upgrade_img.bin

On system reboot, the upgrade image will always be loaded if valid. If the upgrade image is invalid, the factory image will be loaded. To revert back to the factory image, you can upload a file containing the word 0xFFFFFFFF.

The data partition image can be read back by running:

dfu-util -e -d ,20b1:4001 -a 2 -U readback_data_partition_img.bin

The data partition image can be written by running:

dfu-util -e -d ,20b1:4001 -a 2 -D readback_data_partition_img.bin

Note that the data partition will always be at the address specified in the initial flashing call.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS$$$INT variant£££doc/programming_guide/ffva/deploying/linux_macos.html#int-variant

The INT variants of this application contain DFU over I2C.

To create an upgrade image from the build folder run:

make create_upgrade_img_example_ffva_int_fixed_delay

Once the application is running, the xvf_dfu tool can be used to perform various actions. Installation instructions for Raspbian OS can be found here.

Before running the xvf_dfu host application, the I2C_ADDRESS value in the file transport_config.yaml located in the same folder as the binary file xvf_dfu must be updated. This value must match the one set for appconf_CONTROL_I2C_DEVICE_ADDR in the platform_conf.h file.

The DFU interprets the flash as 3 separate partitions, the read only factory image, the read/write upgrade image, and the read/write data partition containing the filesystem.

The factory image can be read back by running:

xvf_dfu --upload-factory readback_factory_img.bin

The factory image can not be written to.

From the build folder, the upgrade image can be written by running:

xvf_dfu -d example_ffva_int_fixed_delay_upgrade.bin

The upgrade image can be read back by running:

xvf_dfu --upload-upgrade readback_upgrade_img.bin

The device can be rebooted remotely by running

xvf_dfu --reboot

On system reboot, the upgrade image will always be loaded if valid. If the upgrade image is invalid, the factory image will be loaded. To revert back to the factory image, you can upload a file containing the word 0xFFFFFFFF.

The FFVA-INT variants include some version numbers:

  • APP_VERSION_MAJOR

  • APP_VERSION_MINOR

  • APP_VERSION_PATCH

These values are defined in the app_conf.h file, and they can read by running:

xvf_dfu --version

The data partition image cannot be read or write using the xvf_dfu host application.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Linux or macOS$$$Debugging the Firmware£££doc/programming_guide/ffva/deploying/linux_macos.html#debugging-the-firmware

To debug with xgdb, from the build folder run:

xgdb -ex "connect --xscope" -ex "run" example_ffva_int_fixed_delay.xe
xgdb -ex "connect --xscope" -ex "run" example_ffva_ua_adec_altarch.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Native Windows£££doc/programming_guide/ffva/deploying/native_windows.html#deploying-the-firmware-with-native-windows

This document explains how to deploy the software using CMake and Ninja. If you are not using native Windows MSVC build tools and instead using a Linux emulation tool, refer to Deploying the Firmware with Linux or macOS.

To install Ninja follow install instructions at https://ninja-build.org/ or on Windows install with winget by running the following commands in PowerShell:

# Install
winget install Ninja-build.ninja
# Reload user Path
$env:Path=[System.Environment]::GetEnvironmentVariable("Path","User")
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Native Windows$$$Building the Host Applications£££doc/programming_guide/ffva/deploying/native_windows.html#building-the-host-applications

This application requires a host application to create the flash data partition. Run the following commands in the root folder to build the host application using your native Toolchain:

Note

Permissions may be required to install the host applications.

Note

A C/C++ compiler, such as Visual Studio or MinGW, must be included in the path.

Before building the host application, you will need to add the path to the XTC Tools to your environment.

set "XMOS_TOOL_PATH=<path-to-xtc-tools>"

Then build the host application:

cmake -G Ninja -B build_host
cd build_host
ninja install

The host applications will be installed at %USERPROFILE%\.xmos\bin, and may be moved if desired. You may wish to add this directory to your PATH variable.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Native Windows$$$Building the Firmware£££doc/programming_guide/ffva/deploying/native_windows.html#building-the-firmware

After having your python environment activated, run the following commands in the root folder to build the I2S firmware:

pip install -r requirements.txt
cmake -G Ninja -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
ninja example_ffva_int_fixed_delay

After having your python environment activated, run the following commands in the root folder to build the I2S firmware with the Cyberon ASR engine:

pip install -r requirements.txt
cmake -G Ninja -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
ninja example_ffva_int_cyberon_fixed_delay

After having your python environment activated, run the following commands in the root folder to build the USB firmware:

pip install -r requirements.txt
cmake -G Ninja -B build --toolchain=xmos_cmake_toolchain/xs3a.cmake
cd build
ninja example_ffva_ua_adec_altarch
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Native Windows$$$Running the Firmware£££doc/programming_guide/ffva/deploying/native_windows.html#running-the-firmware

Before the firmware is run, the filesystem must be loaded.

Inside of the build folder root, after building the firmware, run one of:

ninja flash_app_example_ffva_int_fixed_delay
ninja flash_app_example_ffva_int_cyberon_fixed_delay
ninja flash_app_example_ffva_ua_adec_altarch

Once flashed, the application will run.

After the filesystem has been flashed once, the application can be run without flashing. If changes are made to the filesystem image, the application must be reflashed.

From the build folder run:

xrun --xscope example_ffva_int_fixed_delay.xe
xrun --xscope example_ffva_int_cyberon_fixed_delay.xe
xrun --xscope example_ffva_ua_adec_altarch.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Native Windows$$$Upgrading the Firmware£££doc/programming_guide/ffva/deploying/native_windows.html#upgrading-the-firmware

The UA variants of this application contain DFU over the USB DFU Class V1.1 transport method. In this section DFU over I2C for the INT variants is not covered. The INT variants require an I2C connection to the host, and Windows doesn’t support this feature.

To create an upgrade image from the build folder run:

ninja create_upgrade_img_example_ffva_ua_adec_altarch

Once the application is running, a USB DFU v1.1 tool can be used to perform various actions. This example will demonstrate with dfu-util commands. Installation instructions for respective operating system can be found here

To verify the device is running run:

dfu-util -l

This should result in an output containing:

Found DFU: [20b1:4001] ver=0001, devnum=100, cfg=1, intf=3, path="3-4.3", alt=2, name="DFU DATAPARTITION", serial="123456"
Found DFU: [20b1:4001] ver=0001, devnum=100, cfg=1, intf=3, path="3-4.3", alt=1, name="DFU UPGRADE", serial="123456"
Found DFU: [20b1:4001] ver=0001, devnum=100, cfg=1, intf=3, path="3-4.3", alt=0, name="DFU FACTORY", serial="123456"

The DFU interprets the flash as 3 separate partitions, the read only factory image, the read/write upgrade image, and the read/write data partition containing the filesystem.

The factory image can be read back by running:

dfu-util -e -d ,20b1:4001 -a 0 -U readback_factory_img.bin

The factory image can not be written to.

From the build folder, the upgrade image can be written by running:

dfu-util -e -d ,20b1:4001 -a 1 -D example_ffva_ua_adec_altarch_upgrade.bin

The upgrade image can be read back by running:

dfu-util -e -d ,20b1:4001 -a 1 -U readback_upgrade_img.bin

On system reboot, the upgrade image will always be loaded if valid. If the upgrade image is invalid, the factory image will be loaded. To revert back to the factory image, you can upload an file containing the word 0xFFFFFFFF.

The data partition image can be read back by running:

dfu-util -e -d ,20b1:4001 -a 2 -U readback_data_partition_img.bin

The data partition image can be written by running:

dfu-util -e -d ,20b1:4001 -a 2 -D readback_data_partition_img.bin

Note that the data partition will always be at the address specified in the initial flashing call.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Deploying the Firmware with Native Windows$$$Debugging the Firmware£££doc/programming_guide/ffva/deploying/native_windows.html#debugging-the-firmware

To debug with xgdb, from the build folder run:

xgdb -ex "connect --xscope" -ex "run" example_ffva_int_fixed_delay.xe
xgdb -ex "connect --xscope" -ex "run" example_ffva_ua_adec_altarch.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software£££doc/programming_guide/ffva/modifying.html#modifying-the-software

The FFVA example design is highly customizable. This section describes how to modify the application.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Host Integration£££doc/programming_guide/ffva/design.html#host-integration

This example design can be integrated with existing solutions or modified to be a single controller solution.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Out of the Box Integration£££doc/programming_guide/ffva/design.html#out-of-the-box-integration

Out of the box integration varies based on configuration.

INT requires I2S connections to the host. Refer to the schematic, connecting the host reference audio playback to the ADC I2S and the host input audio to the DAC I2S. Out of the box, the INT configuration requires an externally generated MCLK of 12.288 MHz. 24.576 MHz is also supported and can be changed via the compile option MIC_ARRAY_CONFIG_MCLK_FREQ, found in ffva_int.cmake.

UA requires a USB connection to the host.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Support for ASR engine£££doc/programming_guide/ffva/design.html#support-for-asr-engine

The example_ffva_int_cyberon_fixed_delay provides an example about how to include an ASR engine, the Cyberon DSPotter™.

Most of the considerations made in the section about the FFD devices are still valid for the FFVA example. The only notable difference is that the pipeline output in the FFVA example is on the same tile as the ASR engine, i.e. tile 0.

Note

Both the audio pipeline and the ASR engine process use the same sample block length. appconfINTENT_SAMPLE_BLOCK_LENGTH and appconfAUDIO_PIPELINE_FRAME_ADVANCE are both 240.

More information about the Cyberon engine can be found in Speech Recognition - Cyberon section.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Design Architecture£££doc/programming_guide/ffva/design.html#design-architecture

The application consists of a PDM microphone input which is fed through the XMOS-VOICE DSP blocks. The output ASR channel is then output over I2S or USB.

ffva diagram
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Device Firmware update (DFU) Design£££doc/programming_guide/ffva/design.html#device-firmware-update-dfu-design

The Device Firmware Update (DFU) allows updating the firmware of the device from a host computer, and it can be performed over I2C or USB. This interface closely follows the principles set out in version 1.1 of the Universal Serial Bus Device Class Specification for Device Firmware Upgrade, including implementing the state machine and command structure described there.

The DFU process is internally managed by the DFU controller module within the firmware. This module is tasked with overseeing the DFU state machine and executing DFU operations. The list of states and transactions are represented in the diagram in Fig. 1.

../_images/dfu_state.drawio.png

State diagram of the DFU operations

The main differences with the state diagram in version 1.1 of Universal Serial Bus Device Class Specification for Device Firmware Upgrade are:

  • the appIDLE and appDETACH states are not implemented, and the device is started in the dfuIDLE state

  • the device goes into the dfuIDLE state when a SET_ALTERNATE message is received

  • the device is rebooted when a DFU_DETACH command is received.

The DFU allows the following operations:

  • download of an upgrade image to the device

  • upload of factory and upgrade images from the device

  • reboot of the device.

The rest of this section describes the message sequence charts of the supported operations.

A message sequence chart of the download operation is below:

../_images/dfu_download.plantuml.png

Message sequence chart of the download operation

Note

The end of the image transfer is indicated by a DFU_DNLOAD message of size 0.

Note

The DFU_DETACH message is used to trigger the reboot.

Note

For the I2C implementation, specification of the block number in download is not supported; all downloads must start with block number 0 and must be run to completion. The device will track this progress internally.

A message sequence chart of the reboot operation is below:

../_images/dfu_reboot.plantuml.png

Message sequence chart of the reboot operation

Note

The DFU_DETACH message is used to trigger the reboot.

A message sequence chart of the upload operation is below:

../_images/dfu_upload.plantuml.png

Message sequence chart of the upload operation

Note

The end of the image transfer is indicated by a DFU_UPLOAD message of size less than the transport medium maximum; this is 4096 bytes in UA and 128 bytes in INT.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$DFU over USB implementation£££doc/programming_guide/ffva/design.html#dfu-over-usb-implementation

The UA variant of the device makes use of a USB connection for handling DFU operations. This interface is a relatively standard, specification-compliant implementation. The implementation is encapsulated within the tinyUSB library, which provides a USB stack for the sln_voice.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$DFU over I2C implementation£££doc/programming_guide/ffva/design.html#dfu-over-i2c-implementation

The INT variant of the device presents a DFU interface that may be controlled over I2C.

Fig. 5 shows the modules involved in processing the DFU commands. The I2C task has a dedicated logical core so that it is always ready to receive and send control messages. The DFU state machine is driven by the control commands. The DFU state machine interacts with a separate RTOS task in order to asynchronously perform flash read/write operations.

../_images/control_plane_components.drawio.png

sln_voice Control Plane Components Diagram

Fig. 6 shows the interaction between the Device Control module and the DFU Servicer. In this diagram, boxes with the same colour reside in the same RTOS task.

../_images/control_plane_device_control_servicer_flow_chart.drawio.png

sln_voice Device Control – Servicer Flow Chart

This diagram shows a critical aspect of the DFU control operation. The Device Control module, having placed a command on a Servicer’s command queue, waits on the Gateway queue for a response. As a result, it ensures processing of a single control command at a time. Limiting DFU control operation to a single command in-flight reduces the complexity of the control protocol and eliminates several potential error cases.

The FFVA-INT uses a packet protocol to receive control commands and send each corresponding response. Because packet transmission occurs over a very short-haul transport, as in I2C, the protocol does not include fields for error detection or correction such as start-of-frame and end-of-frame symbols, a cyclical redundancy check or an error correcting code. Fig. 7 depicts the structure of each packet.

../_images/control_plane_packet_diagram.drawio.png

sln_voice Control Plane Packet Diagram

Packets containing a response from the FFVA-INT to the host application place a status value in the first byte of the payload.

Mirroring the USB DFU specification, the INT DFU implementation supports a set of 9 control commands intended to drive the state machine, along with an additional 2 utility commands:

DFU commands

Name

ID

Length

Payload Structure

Purpose

DFU_DETACH

0

1

Payload unused

Write-only command. Restarts the device. Payload is required for protocol, but is discarded within the device. This command has a defined purpose in the USB DFU specification, but in a deviation to that specification it is used with I2C simply to reboot the device. Future versions of the XMOS DFU-by-device-control protocol (but not future versions of this product) may choose to alter the function of this command to more closely align with the USB DFU specification.

DFU_DNLOAD

1

130

2 bytes length marker, followed by 128 bytes of data buffer

Write-only command. The first two bytes indicate how many bytes of data are being transmitted in this packet. These bytes are little-endian, so byte 0 represents the low byte and byte 1 represents the high byte of an unsigned 16b integer. The remaining 128 bytes are a data buffer for transfer to the device. All control command packets are a fixed length, and therefore all 128 bytes must be included in the command, even if unused. For example, a payload with length of 100 should have the first 100 bytes of data set, but must send an additional 28 bytes of arbitrary data.

DFU_UPLOAD

2

130

2 bytes length marker, followed by 128 bytes of data buffer

Read-only command. The first two bytes indicate how many bytes of data are being transmitted in this packet. These bytes are little-endian, so byte 0 represents the low byte and byte 1 represents the high byte of an unsigned 16b integer. The remaining 128 bytes are a data buffer of data received from the device. All control command packets are a fixed length, and therefore this buffer will be padded to length 128 by the device before transmission. The device will, as per the USB DFU specification, mark the end of the upload process by sending a “short frame” - a packet with a length marker less than 128 bytes.

DFU_GETSTATUS

3

5

1 byte representing device status, 3 bytes representing the requested timeout, 1 byte representing the next device state.

Read-only command. The first byte returns the device status code, as described in the USB DFU specification in the table in section 6.1.2. The next 3 bytes represent the amount of time the host should wait, in ms, before issuing any other commands. This timeout is used in the DNLOAD process to allow the device time to write to flash. This value is little-endian, so bytes 1, 2, and 3 represent the low, middle, and high bytes respectively of an unsigned 24b integer. The final byte returns the number of the state that the device will move into immediately following the return of this request, as described in the USB DFU specification in the table in section 6.1.2.

DFU_CLRSTATUS

4

1

Payload unused

Write-only command. Moves the device out of state 10, dfuERROR. Payload is required for protocol, but is discarded within the device.

DFU_GETSTATE

5

1

1 byte representing current device state.

Read-only command. The first (and only) byte represents the number of the state that the device is currently in, as described in the USB DFU specification in the table in section 6.1.2.

DFU_ABORT

6

1

Payload unused

Write-only command. Aborts an ongoing upload or download process. Payload is required for protocol, but is discarded within the device.

DFU_SETALTERNATE

64

1

1 byte representing either factory (0) or upgrade (1) DFU target images

Write-only command. Sets which of the factory or upgrade images should be targeted by any subsequent upload or download commands. Use of this command entirely resets the DFU state machine to initial conditions: the device will move to dfuIDLE, clear all error conditions, wipe all internal DFU data buffers, and reset all other DFU state apart from the DFU_TRANSFERBLOCK value. This command is included to emulate the SET_ALTERNATE request available in USB.

DFU_TRANSFERBLOCK

65

2

2 bytes, representing the target transfer block for an upload process.

Read/write command. Sets/gets a 2 byte value specifying the transfer block number to use for a subsequent upload operation. A complete image may be conceptually divided into 128-byte blocks. These blocks may then be numbered from 0 upwards. Setting this value sets which block will be returned by a subsequent DFU_UPLOAD request. This value is initialised to 0, and autoincrements after each successful DFU_UPLOAD request has been serviced. Therefore, to read a whole image from the start, there is no need to issue this command - this command need only be used to select a specific section to read. Because this value is automatically incremented after a DFU_UPLOAD command is successfully serviced, reading it will give the value of the next block to be read (and this will be one greater than the previous block read, if it has not been altered in the interim). This value is reset to 0 at the successful completion of a DFU_UPLOAD process. It is not reset after a DFU_ABORT, nor after a DFU_SETALTERNATE call. This command is included to emulate the ability in a USB request to send values in the header of the request - the device control protocol used here does not allow sending any data with a read request such as DFU_UPLOAD.

DFU_GETVERSION

88

3

3 bytes, representing major.minor.patch version of device

Read-only command. Bytes 0, 1, and 2 represent the major, minor, and patch versions respectively of the device. This is a utility command intended to provide an easy mechanism by which to verify that a firmware download has been successful.

DFU_REBOOT

89

1

Payload unused

Write-only command. Restarts the device. Payload is required for protocol, but is discarded within the device. This is a utility command intended to provide a clear and unambiguous interface for restarting the device. Use of this command should be preferred over DFU_DETACH for this purpose.

These commands are then used to drive the state machine described in the Device Firmware update (DFU) Design.

When writing a custom compliant host application, the use of XMOS’ fwk_rtos library is advised; the device_control library provided there gives a host API that can communicate effectively with the FFVA-INT. A description of the I2C bus activity during the execution of the above DFU commands is provided below, in the instance that usage of the device_control library is inconvenient or impossible.

The FFVA-INT I2C address is set by default as 0x42. This may be confirmed by examination of the appconf_CONTROL_I2C_DEVICE_ADDR define in the platform_conf.h file. The I2C address may also be altered by editing this file. The DFU resource has an internal “resource ID” of 0xF0. This maps to the register that read/write operations on the DFU resource should target - therefore, the register to write to will always be 0xF0.

To issue a write command (e.g. DFU_SETALTERNATE):

  • First, set up a write to the device address. For a default device configuration, a write operation will always start by a write token to 0x42 (START, 7 bits of address [0x42], R/W bit [0 to specify write]), wait for ACK, followed by specifying the register to write [Resource ID 0xF0] (and again wait for ACK).

  • Then, write the command ID (in this example, 64 [0x40]) from the above table.

  • Then, write the total transfer size, including the register byte. In this example, that will be 4 bytes (register byte, command ID, length byte, and 1 byte of payload), so write 0x04.

  • Finally, send the payload - e.g. 1 to set the alternate setting to “upgrade”.

  • The full sequence for this write command will therefore be START, 7 bits of address [0x42], 0 (to specify write), hold for ACK, 0xF0, hold for ACK, 0x40, hold for ACK, 0x04, hold for ACK, 0x01, hold for ACK, STOP.

  • To complete the transaction, the device must then be queried; set up a read to 0x42 (START, 7 bits of address [0x42], R/W bit [1 to specify read], wait for ACK). The device will clock-stretch until it is ready, at which point it will release the clock and transmit one byte of status information. This will be a value from the enum control_ret_t from device_control_shared.h, found in modules\rtos\modules\sw_services\device_control\api.

To issue a read command (e.g. DFU_GETSTATUS):

  • Set up a write to the device; as above, this will mean sending START, 7 bits of device address [0x42], 0 (to specify write), hold for ACK. Send the DFU resource ID [0xF0], hold for ACK.

  • Then, write the command ID (in this example, 3), bitwise ANDed with 0x80 (to specify this as a read command) - in this example therefore 0x83 should be sent, and hold for ACK.

  • Then, write the total length of the expected reply. In this example, the command has a payload of 5 bytes. The device will also prepend the payload with a status byte. Therefore, the expected reply length will be 6 bytes [0x06]. Hold for ACK.

  • Then, issue a repeated START. Follow this with a read from the device: the repeated START, 7 bits of device address [0x42], 1 (to specify read), hold for ACK. The device will clock-stretch until it is ready. It will then send a status byte (from the enum control_ret_t as described above), followed by a payload of requested data - in this example, the device will send 5 bytes. ACK each received byte. After the last expected byte, issue a STOP.

It is heavily advised that those wishing to write a custom host application to drive the DFU process for the FFVA-INT over I2C familiarise themselves with version 1.1 of the Universal Serial Bus Device Class Specification for Device Firmware Upgrade.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Audio Pipeline£££doc/programming_guide/ffva/audio_pipeline.html#audio-pipeline

The audio pipeline in FFVA processes two channel PDM microphone input into a single output channel, intended for use by an ASR engine.

The audio pipeline consists of 4 stages.

FFVA Audio Pipeline

Stage

Description

Input Channel Count

Output Channel Count

1

Acoustic Echo Cancellation

2

2

2

Interference Canceller and Voice Noise Ratio

2

1

3

Noise Suppression

1

1

4

Automatic Gain Control

1

1

See the Voice Framework User Guide for more information.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Software Description£££doc/programming_guide/ffva/software_description.html#software-description
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Overview£££doc/programming_guide/ffva/software_desc/overview.html#overview

There are three main build configurations for this application.

FFVA INT Fixed Delay Resources

Resource

Tile 0

Tile 1

Total Memory Free

141k

80k

Runtime Heap Memory Free

75k

76k

FFVA INT Cyberon Fixed Delay Resources

Resource

Tile 0

Tile 1

Total Memory Free

21k

79k

Runtime Heap Memory Free

19k

81k

FFVA UA ADEC Resources

Resource

Tile 0

Tile 1

Total Memory Free

94k

59k

Runtime Heap Memory Free

54k

83k

The description of the software is split up by folder:

FFVA Software Description

Folder

Description

Audio Pipelines

Preconfigured audio pipelines

examples/ffva/bsp_config

Board support configuration setting up software based IO peripherals

examples/ffva/filesystem_support

Filesystem contents for application

examples/ffva/src

Main application

modules/asr/intent_engine

Intent engine integration (FFVA INT Cyberon only)

modules/asr/intent_handler

Intent engine output integration (FFVA INT Cyberon only)

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$examples/ffva/bsp_config£££doc/programming_guide/ffva/software_desc/bsp_config.html#examples-ffva-bsp-config

This folder contains bsp_configs for the FFVA application. More information on bsp_configs can be found in the RTOS Framework documentation.

FFVA bsp_config

Filename/Directory

Description

dac directory

DAC ports for supported bsp_configs

XCORE-AI-EXPLORER directory

experimental bsp_config, not recommended for general use

XK_VOICE_L71 directory

default FFVA application bsp_config

bsp_config.cmake

cmake for adding FFVA bsp_configs

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$examples/ffva/filesystem_support£££doc/programming_guide/ffva/software_desc/filesystem_support.html#examples-ffva-filesystem-support

This folder contains filesystem contents for the FFVA application.

FFVA filesystem_support

Filename/Directory

Description

demo.txt

Example file

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Audio Pipelines£££doc/programming_guide/ffva/software_desc/audio_pipeline.html#audio-pipelines

This folder contains preconfigured audio pipelines for the FFVA application.

FFVA Audio Pipelines

Filename/Directory

Description

api directory

include folder for audio pipeline modules

src directory

contains preconfigured XMOS DSP audio pipelines

audio_pipeline.cmake

cmake for adding audio pipeline targets

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Major Components£££doc/programming_guide/ffva/software_desc/audio_pipeline.html#major-components

The audio pipeline module provides the application with three API functions:

Audio Pipeline API (audio_pipeline.h)
void audio_pipeline_init(
        void *input_app_data,
        void *output_app_data);

void audio_pipeline_input(
        void *input_app_data,
        int32_t **input_audio_frames,
        size_t ch_count,
        size_t frame_count);

int audio_pipeline_output(
        void *output_app_data,
        int32_t **output_audio_frames,
        size_t ch_count,
        size_t frame_count);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$audio_pipeline_init£££doc/programming_guide/ffva/software_desc/audio_pipeline.html#audio-pipeline-init

This function has the role of creating the audio pipeline task(s) and initializing DSP stages.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$audio_pipeline_input£££doc/programming_guide/ffva/software_desc/audio_pipeline.html#audio-pipeline-input

This function is application defined and populates input audio frames used by the audio pipeline. In FFVA, this function is defined in main.c.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$audio_pipeline_output£££doc/programming_guide/ffva/software_desc/audio_pipeline.html#audio-pipeline-output

This function is application defined and populates input audio frames used by the audio pipeline. In FFVA, this function is defined in main.c.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$examples/ffva/src£££doc/programming_guide/ffva/software_desc/src.html#examples-ffva-src

This folder contains the core application source.

FFVA src

Filename/Directory

Description

gpio_test directory

contains general purpose input handling task

usb directory

contains intent handling code

ww_model_runner directory

contains placeholder wakeword model runner task

app_conf_check.h

header to validate app_conf.h

app_conf.h

header to describe app configuration

config.xscope

xscope configuration file

ff_appconf.h

default fatfs configuration header

FreeRTOSConfig.h

header to describe FreeRTOS configuration

main.c

main application source file

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Main£££doc/programming_guide/ffva/software_desc/src.html#main

The major components of main are:

Main components (main.c)
void startup_task(void *arg)
void tile_common_init(chanend_t c)
void main_tile0(chanend_t c0, chanend_t c1, chanend_t c2, chanend_t c3)
void main_tile1(chanend_t c0, chanend_t c1, chanend_t c2, chanend_t c3)
void i2s_rate_conversion_enable(void)
size_t i2s_send_upsample_cb(rtos_i2s_t *ctx, void *app_data, int32_t *i2s_frame, size_t i2s_frame_size, int32_t *send_buf, size_t samples_available)

size_t i2s_send_downsample_cb(rtos_i2s_t *ctx, void *app_data, int32_t *i2s_frame, size_t i2s_frame_size, int32_t *receive_buf, size_t sample_spaces_free)
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$startup_task£££doc/programming_guide/ffva/software_desc/src.html#startup-task

This function has the role of launching tasks on each tile. For those familiar with XCORE, it is comparable to the main par loop in an XC main.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$tile_common_init£££doc/programming_guide/ffva/software_desc/src.html#tile-common-init

This function is the common tile initialization, which initializes the bsp_config, creates the startup task, and starts the FreeRTOS kernel.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$main_tile0£££doc/programming_guide/ffva/software_desc/src.html#main-tile0

This function is the application C entry point on tile 0, provided by the SDK.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$main_tile1£££doc/programming_guide/ffva/software_desc/src.html#main-tile1

This function is the application C entry point on tile 1, provided by the SDK.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$i2s_rate_conversion_enable£££doc/programming_guide/ffva/software_desc/src.html#i2s-rate-conversion-enable

This application features 16kHz and 48kHz audio input and output. The XMOS DPS blocks operate on 16kHz audio. Input streams are downsampled when needed. Output streams are upsampled when needed. When in I2S modes This function is called by the bsp_config to enable the I2S sample rate conversion.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$i2s_send_upsample_cb£££doc/programming_guide/ffva/software_desc/src.html#i2s-send-upsample-cb

This function is the I2S upsampling callback.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$i2s_send_downsample_cb£££doc/programming_guide/ffva/software_desc/src.html#i2s-send-downsample-cb

This function is the I2S downsampling callback.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Software Modifications£££doc/programming_guide/ffva/software_modifications.html#software-modifications

The FFVA example design consists of three major software blocks, the audio interface, audio pipeline, and placeholder for a keyword handler. This section will go into detail on how to modify each/all of these subsystems.

ffva diagram

It is highly recommended to be familiar with the application as a whole before attempting replacing these functional units.

See Memory and CPU Requirements for more details on the memory footprint and CPU usage of the major software components.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Replacing XCORE-VOICE DSP Block£££doc/programming_guide/ffva/software_modifications.html#replacing-xcore-voice-dsp-block

The audio pipeline can be replaced by making changes to the audio_pipeline.c file.

It is up to the user to ensure that the input and output frames of the audio pipeline remain the same, or the remainder of the application will not function properly.

This section will walk through an example of replacing the XMOS NS stage, with a custom stage foo.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Declaration and Definition of DSP Context£££doc/programming_guide/ffva/software_modifications.html#declaration-and-definition-of-dsp-context

Replace:

XMOS NS (audio_pipeline_t0.c)
static ns_stage_ctx_t DWORD_ALIGNED ns_stage_state = {};

With:

Foo (audio_pipeline_t0.c)
typedef struct foo_stage_ctx {
    /* Your required state context here */
} foo_stage_ctx_t;

static foo_stage_ctx_t foo_stage_state = {};
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$DSP Function£££doc/programming_guide/ffva/software_modifications.html#dsp-function

Replace:

XMOS NS (audio_pipeline_t0.c)
static void stage_ns(frame_data_t *frame_data)
{
#if appconfAUDIO_PIPELINE_SKIP_NS
#else
    int32_t DWORD_ALIGNED ns_output[appconfAUDIO_PIPELINE_FRAME_ADVANCE];
    configASSERT(NS_FRAME_ADVANCE == appconfAUDIO_PIPELINE_FRAME_ADVANCE);
    ns_process_frame(
                &ns_stage_state.state,
                ns_output,
                frame_data->samples[0]);
    memcpy(frame_data->samples, ns_output, appconfAUDIO_PIPELINE_FRAME_ADVANCE * sizeof(int32_t));
#endif
}

With:

Foo (audio_pipeline_t0.c)
static void stage_foo(frame_data_t *frame_data)
{
    int32_t foo_output[appconfAUDIO_PIPELINE_FRAME_ADVANCE];
    foo_process_frame(
                &foo_stage_state.state,
                foo_output,
                frame_data->samples[0]);
    memcpy(frame_data->samples, foo_output, appconfAUDIO_PIPELINE_FRAME_ADVANCE * sizeof(int32_t));
}
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Runtime Initialization£££doc/programming_guide/ffva/software_modifications.html#runtime-initialization

Replace:

XMOS NS (audio_pipeline_t0.c)
ns_init(&ns_stage_state.state);

With:

Foo (audio_pipeline_t0.c)
foo_init(&foo_stage_state.state);
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Audio Pipeline Setup£££doc/programming_guide/ffva/software_modifications.html#audio-pipeline-setup

Replace:

XMOS NS (audio_pipeline_t0.c)
const pipeline_stage_t stages[] = {
    (pipeline_stage_t)stage_vnr_and_ic,
    (pipeline_stage_t)stage_ns,
    (pipeline_stage_t)stage_agc,
};

const configSTACK_DEPTH_TYPE stage_stack_sizes[] = {
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_vnr_and_ic) + RTOS_THREAD_STACK_SIZE(audio_pipeline_input_i),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_ns),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_agc) + RTOS_THREAD_STACK_SIZE(audio_pipeline_output_i),
};

With:

Foo (audio_pipeline_t0.c)
const pipeline_stage_t stages[] = {
    (pipeline_stage_t)stage_vnr_and_ic,
    (pipeline_stage_t)stage_foo,
    (pipeline_stage_t)stage_agc,
};

const configSTACK_DEPTH_TYPE stage_stack_sizes[] = {
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_vnr_and_ic) + RTOS_THREAD_STACK_SIZE(audio_pipeline_input_i),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_foo),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage_agc) + RTOS_THREAD_STACK_SIZE(audio_pipeline_output_i),
};

It is also possible to add or remove stages. Refer to the RTOS Framework documentation on the generic pipeline sw_service.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Changing the ASR engine£££doc/programming_guide/ffva/software_modifications.html#changing-the-asr-engine

THE FFVA provides an example with a specific ASR engine. A different ASR engine can be used by updating and adding the necessary files in modules\asr.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Replacing Example Design Interfaces£££doc/programming_guide/ffva/software_modifications.html#replacing-example-design-interfaces

It may be desired to have a different input or output interfaces to talk to a host.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Hybrid Audio Peripheral IO£££doc/programming_guide/ffva/software_modifications.html#hybrid-audio-peripheral-io

One example use case may be to create a hybrid audio solution where reference frames or output audio streams are used over an interface other than I2S or USB.

Audio Pipeline Input (main.c)
void audio_pipeline_input(void *input_app_data,
                        int32_t **input_audio_frames,
                        size_t ch_count,
                        size_t frame_count)
{
    (void) input_app_data;
    int32_t **mic_ptr = (int32_t **)(input_audio_frames + (2 * frame_count));

    static int flushed;
    while (!flushed) {
        size_t received;
        received = rtos_mic_array_rx(mic_array_ctx,
                                    mic_ptr,
                                    frame_count,
                                    0);
        if (received == 0) {
            rtos_mic_array_rx(mic_array_ctx,
                            mic_ptr,
                            frame_count,
                            portMAX_DELAY);
            flushed = 1;
        }
    }

    rtos_mic_array_rx(mic_array_ctx,
                    mic_ptr,
                    frame_count,
                    portMAX_DELAY);

    /* Your ref input source here */
}

Refer to documentation inside the RTOS Framework on how to instantiate different RTOS peripheral drivers. Populate the above code snippet with your input frame source. Refer to the default application for an example of populating reference via I2S or USB.

Audio Pipeline Output (main.c)
int audio_pipeline_output(void *output_app_data,
                        int32_t **output_audio_frames,
                        size_t ch_count,
                        size_t frame_count)
{
    (void) output_app_data;

    /* Your output sink here */

#if appconfWW_ENABLED
    ww_audio_send(intertile_ctx,
                frame_count,
                (int32_t(*)[2])output_audio_frames);
#endif

    return AUDIO_PIPELINE_FREE_FRAME;
}

Refer to documentation inside the RTOS Framework on how to instantiate different RTOS peripheral drivers. Populate the above code snippet with your output frame sink. Refer to the default application for an example of outputting the ASR channel via I2S or USB.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Different Peripheral IO£££doc/programming_guide/ffva/software_modifications.html#different-peripheral-io

To add or remove a peripheral IO, modify the bsp_config accordingly. Refer to documentation inside the RTOS Framework on how to instantiate different RTOS peripheral drivers.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$Far-field Voice Assistant$$$Modifying the Software$$$Application Filesystem Usage£££doc/programming_guide/ffva/software_modifications.html#application-filesystem-usage

This application is equipped with a FAT filesystem in flash for general use. To add files to the filesystem, simply place them in the filesystem_support directory before running the filesystem setup commands in Deploying the Firmware with Linux or macOS or Deploying the Firmware with Native Windows.

The application can access the filesystem via the FatFS API.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example£££doc/programming_guide/mic_aggregator/mic_aggregator.html#pdm-microphone-aggregator-example

Warning

This example is deprecated and will be moved into a separate Application Note and may be removed in the next major release.

This example provides a bridge between 16 PDM microphones to either TDM16 slave or USB Audio and targets the xcore-ai explorer board.

This application is to support cases where many microphone inputs need to be sent to a host where signal processing will be performed. Please see the other examples in sln_voice where signal processing is performed within the xcore in firmware.

This example uses a modified mic_array with multiple decimator threads to support 16 DDR microphones on a single 8 bit input port. The example is written as ‘bare-metal’ and runs directly on the XCORE device without an RTOS.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Obtaining the app files£££doc/programming_guide/mic_aggregator/mic_aggregator.html#obtaining-the-app-files

Download the main repo and submodules using:

$ git clone --recurse git@github.com:xmos/sln_voice.git
$ cd sln_voice/
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Building the app£££doc/programming_guide/mic_aggregator/mic_aggregator.html#building-the-app

First make sure that your XTC tools environment is activated.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Building the app$$$Linux or Mac£££doc/programming_guide/mic_aggregator/mic_aggregator.html#linux-or-mac

After having your python environment activated, run the following commands in the root folder to build the firmware:

$ pip install -r requirements.txt
$ mkdir build
$ cd build
$ cmake --toolchain ../xmos_cmake_toolchain/xs3a.cmake  ..
$ make example_mic_aggregator_tdm -j
$ make example_mic_aggregator_usb -j

Following initial cmake build, as long as you don’t add new source files, you may just type:

$ make example_mic_aggregator_tdm -j
$ make example_mic_aggregator_usb -j

If you add new source files you will need to run the cmake step again.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Building the app$$$Windows£££doc/programming_guide/mic_aggregator/mic_aggregator.html#windows

It is recommended to use Ninja or xmake as the make system under Windows. Ninja has been observed to be faster than xmake, however xmake comes natively with XTC tools. This firmware has been tested with Ninja version v1.11.1.

To install Ninja, activate your python environment, and run the following command:

$ pip install ninja

After having your python environment activated, run the following commands in the root folder to build the firmware:

$ pip install -r requirements.txt
$ md build
$ cd build
$ cmake -G "Ninja" --toolchain  ..\xmos_cmake_toolchain\xs3a.cmake ..
$ ninja example_mic_aggregator_tdm.xe -j
$ ninja example_mic_aggregator_usb.xe -j

Following initial cmake build, as long as you don’t add new source files, you may just type:

$ ninja example_mic_aggregator_tdm.xe -j
$ ninja example_mic_aggregator_usb.xe -j

If you add new source files you will need to run the cmake step again.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Running the app£££doc/programming_guide/mic_aggregator/mic_aggregator.html#running-the-app

Connect the explorer board to the host and type:

$ xrun example_mic_aggregator_tdm.xe
$ xrun example_mic_aggregator_usb.xe

Optionally, you may use xrun --xscope to provide debug output.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Required Hardware£££doc/programming_guide/mic_aggregator/mic_aggregator.html#required-hardware

The application runs on the XCORE-AI Explorer board version 2 (with integrated XTAG debug adapter). You will require in addition:

  • The dual DDR microphone board that attaches via the flat flex connector.

  • Header pins soldered into:

    • J14, J10, SCL/SDA IOT, the I2S expansion header, MIC data and MIC clock.

  • Six jumper wires. Please see the microphone aggregator main documentation for details on how these are connected.

An oscilloscope will also be handy in case of hardware debug being needed.

Note

You will only be able to inject PDM data to two channels at a time due to a single pair of microphones on the HW.

If you wish to see all 16 microphones running then an external microphone board with 16 microphones (DDR connected to 8 data lines) is required.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Operation£££doc/programming_guide/mic_aggregator/mic_aggregator.html#operation

The design consists of a number of tasks connected via the xcore-ai silicon communication channels. The decimators in the microphone array are configured to produce a 48 kHz PCM output. The 16 output channels are loaded into a 16 slot TDM slave peripheral running at 24.576 MHz bit clock or a USB Audio Class 2 asynchronous interface and are optionally amplified. The TDM build also provides a simple I2C slave interface to allow gains to be controlled at run-time. The USB build supports USB Audio Class 2 compliant volume controls.

For the TDM build, a simple TDM16 master peripheral is included as well as a local 24.576 MHz clock source so that mic_array and TDM16 slave operation may be tested standalone through the use of jumper cables. These may be removed when integrating into a system with TDM16 master supplied.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Software Architecture£££doc/programming_guide/mic_aggregator/mic_aggregator.html#software-architecture

The applications are written on bare metal and use logical cores (hardware threads) to implement the functional blocks. Each of the tasks are connected using channels provided in the xcore-ai architecture. The thread diagrams are shown in Fig. 8 and Fig. 9.

../_images/aggregator_tdm.png

Microphone Aggregator TDM Thread Diagram

../_images/aggregator_usb.png

Microphone Aggregator USB Thread Diagram

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Software Architecture$$$PDM Capture£££doc/programming_guide/mic_aggregator/mic_aggregator.html#pdm-capture

Both the TDM and USB aggregator examples share a common PDM front end. This consists of an 8 bit port with each data line connected to two PDM microphones each configured to provide data on a different clock edge. The 3.072 MHz clock for the PDM microphones is provided by the xcore-ai device on a 1 bit port and clocks all PDM microphones. The PDM clock is divided down from the 24.576 MHz local MCLK.

The data collected by the 8 bit port is sent to the lib_mic_array block which de-interleaves the PDM data streams and performs decimation of the PDM data down to 48 kHz 32 bit PCM samples. Due to the large number of microphones the PDM capture stage uses four hardware threads on tile[0]; one for the microphone capture and three for decimation. This is needed to divide the processing workload and meet timing comfortably.

Samples are forwarded to the next stage at a rate of 48 kHz resulting in a packet of 16 PCM samples per exchange.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Software Architecture$$$Audio Hub£££doc/programming_guide/mic_aggregator/mic_aggregator.html#audio-hub

The 16 channels of 48 kHz PCM streams are collected by Hub and are amplified using a saturated gain stage. The initial gain is set to 100, since a gain of 1 sounds very quiet due to the mic_array output being scaled to allow acoustic overload of the microphones without clipping within the decimators. This value can be overridden using the MIC_GAIN_INIT define in app_conf.h.

Additionally for the TDM configuration, the Hub task also checks for control packets from I2C which may be used to dynamically update the individual gains at runtime.

A single hardware thread contains the task and a triple buffer scheme is used to ensure there is always a free buffer available to write into regardless of the relative phase between the production and consumption of microphone samples.

The Hub task has plenty of timing slack and is a suitable place for adding signal processing if needed.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Software Architecture$$$TDM Host Connection£££doc/programming_guide/mic_aggregator/mic_aggregator.html#tdm-host-connection

The TDM build supports a 16-slot TDM slave Tx peripheral from the fwk_io sub-module. In this application it runs at 24.576 MHz bit clock which supports 16 channels of 32 bit, 48 kHz samples per frame.

The TDM component uses a single hardware thread.

For the purpose of debugging a simple TDM 16 Master Rx component is provided. This allows the transmitted TDM frames from the application to be received and checked without having to connect an external TDM Master. It may be deleted / disconnected without affecting the core application.

Note

The simple TDM 16 Master Rx component is not regression tested and is for evaluation of TDM 16 Slave Tx in this application only.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Software Architecture$$$USB Host Connection£££doc/programming_guide/mic_aggregator/mic_aggregator.html#usb-host-connection

As an alternative to TDM, a USB host connection is also supported. The USB connection uses the following specifications:

  • USB High Speed (480 Mbps)

  • USB Audio Class 2.0

  • Asynchronous mode (audio clock is provided by the firmware)

  • 24 bit Audio slots

  • 48 kHz Sample Rate

The USB host connection functionality is provided by lib_xua which is the core library of XMOS’s USB Audio solution.

The USB Audio subsection uses a total of four hardware threads in this application.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Resource Usage£££doc/programming_guide/mic_aggregator/mic_aggregator.html#resource-usage

The xcore-ai device has a total resource count of 2 x 524288 Bytes of memory and 2 x 8 hardware threads across two tiles. This application uses around half of the processing resources and a tiny fraction of the available memory meaning there is plenty of space inside the chip for additional functionality if needed.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Resource Usage$$$TDM Build£££doc/programming_guide/mic_aggregator/mic_aggregator.html#tdm-build

Tile

Memory

Threads

0

25996

5

1

22812

2*

Total

48808

7

  • An additional debug TDM Master thread is used on Tile[1] by default which is not needed in a practical deployment.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Resource Usage$$$USB Build£££doc/programming_guide/mic_aggregator/mic_aggregator.html#usb-build

Tile

Memory

Threads

0

24252

4

1

52116

5

Total

76368

9

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$Board Configuration£££doc/programming_guide/mic_aggregator/mic_aggregator.html#board-configuration

Make the following connections between headers using flying leads:

Host Connection

Board Connection

Note

MIC CLK

J14 ‘00’

This is the microphone clock which is to be sent to the PDM microphones from J14.

MIC DATA

J14 ‘14’

This is the data line for microphones 0 and 8. See below.

I2S LRCLK

J10 ‘36’

This is the FSYCNH input for TDM slave. J10 ‘36’ is the TDM master FSYNCH output for the application.

I2S MCLK

I2S BCLK

MCLK is the 24.576MHz clock which directly drives the BCLK input for the TDM slave.

I2S DAC

J10 ‘38’

I2S DAC is the TDM Slave Tx out which is read by the TDM Master Rx input on J10.

To access other microphone inputs use the following:

Mic pair

J14 pin

0, 8

14

1, 9

15

2, 10

16

3, 11

17

4, 12

18

5, 13

19

6, 14

20

7, 15

21

For I2C control, make the following connections:

Host Connection

Board Connection

SCL IOL

Your I2C host SCL.

SDA IOL

Your I2C host SDA.

GND

Your I2C host ground.

The I2C slave is tested at 100 kHz SCL.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$PDM Microphone Aggregator Example$$$I2C Controlled Volume£££doc/programming_guide/mic_aggregator/mic_aggregator.html#i2c-controlled-volume

For the TDM build, there are 32 registers which control the gain of each of the 16 output channels. The 8 bit registers contain the upper 8 bit and lower 8 bit of the microphone gain respectively. The initial gain is set to 100, since 1 is quiet due to the mic_array output being scaled to allow acoustic overload of the microphones without clipping. Typically a gain of a few hundred works for normal conditions. The gain is only applied after the lower byte is written.

The gain applied is saturating so no overflow will occur, only clipping.

Register

Value

0

Channel 0 upper gain byte

1

Channel 0 lower gain byte

2

Channel 1 upper gain byte

3

Channel 1 lower gain byte

4

Channel 2 upper gain byte

5

Channel 2 lower gain byte

6

Channel 3 upper gain byte

7

Channel 3 lower gain byte

8

Channel 4 upper gain byte

9

Channel 4 lower gain byte

10

Channel 5 upper gain byte

11

Channel 5 lower gain byte

12

Channel 6 upper gain byte

13

Channel 6 lower gain byte

14

Channel 7 upper gain byte

15

Channel 7 lower gain byte

16

Channel 8 upper gain byte

17

Channel 8 lower gain byte

18

Channel 9 upper gain byte

19

Channel 9 lower gain byte

20

Channel 10 upper gain byte

21

Channel 10 lower gain byte

22

Channel 11 upper gain byte

23

Channel 11 lower gain byte

24

Channel 12 upper gain byte

25

Channel 12 lower gain byte

26

Channel 13 upper gain byte

27

Channel 13 lower gain byte

28

Channel 14 upper gain byte

29

Channel 14 lower gain byte

30

Channel 15 upper gain byte

31

Channel 15 lower gain byte

If using a raspberry Pi as the I2C host you may use the following commands:

$ i2cset -y 1 0x3c 0 0 #Set the gain on mic channel 0 to 50
$ i2cset -y 1 0x3c 1 50 #Set the gain on mic channel 0 to 50

$ i2cget -y 1 0x3c 0 #Get the upper byte of gain on mic channel 0
$ i2cget -y 1 0x3c 1 #Get the lower byte of gain on mic channel 0

$ i2cset -y 1 0x3c 16 1 #Set the gain on mic channel 8 to 256
$ i2cset -y 1 0x3c 15 0 #Set the gain on mic channel 8 to 256

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application£££doc/programming_guide/asrc/asrc.html#asrc-application

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview£££doc/programming_guide/asrc/overview.html#overview

Warning

This example is based on the RTOS framework and drivers. This choice simplifies the example design, but it leads to high latency in the system. The main sources of latency are:

  • Large block size used for ASRC processing: this is necessary to minimise latency associated with the intertile context and thread switching overhead.

  • Large size of the buffer to which the ASRC output samples are written: a stable level (half full) must be reached before the start of streaming out over USB.

  • RTOS task scheduling overhead between the tasks.

  • bInterval of USB in the RTOS drivers is set to 4, i.e. one frame every 1 ms.

  • Block based implementation of the USB and I2S RTOS drivers.

The expected latencies for USB at 48 kHz are as follows:

  • USB -> ASRC -> I2S: from 8 ms at I2S at 192 kHz to 22 ms at 44.1 kHz

  • I2S -> ASRC -> USB: from 13 ms at I2S at 192 kHz to 19 ms at 44.1 kHz

For a proposed implementation with lower latency, please refer to the bare-metal examples below:

This is the XCORE-VOICE Asynchronous Sampling Rate Converter (ASRC) example design.

The example system implements a stereo I2S Slave and a stereo Adaptive UAC2.0 interface and exchanges data between the two interfaces. Since the two interfaces are operating in different clock domains, there is an ASRC block between them that converts from the input to the output sampling rate. There are two ASRC blocks, one each in the I2S -> ASRC -> USB and USB -> ASRC -> I2S path, as illustrated in the ASRC example top level system diagram. The diagram also shows the rate calculation path, which monitors and computes the instantaneous ratio between the ASRC input and output sampling rate. The rate ratio is used by the ASRC task to dynamically adapt filter coefficients using spline interpolation in its filtering stage.

ASRC example top level system diagram

ASRC example top level system diagram

The I2S Slave interface is a stereo 32 bit interface supporting sampling rates between 44.1 kHz - 192 kHz.

The USB interface is a stereo, 32 bit, 48 kHz, High-Speed, USB Audio Class 2, Adaptive interface.

The ASRC algorithm implemented in the lib_src library is used for the ASRC processing. The ASRC processing is block based and works on a block size of 244 samples per channel in the I2S -> ASRC -> USB path and 96 samples per channel in the USB -> ASRC -> I2S path.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview$$$Supported Hardware£££doc/programming_guide/asrc/overview.html#supported-hardware

This example application is supported on the XK-VOICE-L71 board. In addition to the XK-VOICE-L71 board, it requires an XTAG4 to program and debug the device.

To demonstrate the audio exchange between the I2S and USB interface, the XK-VOICE-L71 device needs to be connected to an I2S Master device. To do this, connect the BCLK, MCLK, DOUT, DIN pins of the RASPBERRY PI HOST INTERFACE header (J4) on the XK-VOICE-L71 to the I2S Master. The table XK-VOICE-L71 RPI host interface header (J4) connections lists the pins on the XK-VOICE-L71 RPI header and the signals on the I2S Master that they need to be connected to.

XK-VOICE-L71 RPI host interface header (J4) connections

XK-VOICE-L71 PI header pin

Signal to connect to on the I2S Master board

12

BLCK output

35

LRCK output

38

I2S Data input to the Master

40

I2S Data output from the Master

One of the GND pins (6, 14, 20, 30, 34, 9, 25 or 39)

GND on the I2S Master board

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview$$$Obtaining the app files£££doc/programming_guide/asrc/overview.html#obtaining-the-app-files

Download the main repo and submodules using:

$ git clone --recurse git@github.com:xmos/sln_voice.git
$ cd sln_voice/
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview$$$Building the app£££doc/programming_guide/asrc/overview.html#building-the-app

First install and source the XTC version: 15.3.1 tools. For example with version 15.2.1, the output should be something like this:

$ xcc --version
xcc: Build 19-198606c, Oct-25-2022
XTC version: 15.2.1
Copyright (C) XMOS Limited 2008-2021. All Rights Reserved.
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview$$$Linux or Mac£££doc/programming_guide/asrc/overview.html#linux-or-mac

To build for the first time, activate your python environment, run cmake to create the make files:

$ pip install -r requirements.txt
$ mkdir build
$ cd build
$ cmake --toolchain ../xmos_cmake_toolchain/xs3a.cmake  ..
$ make example_asrc_demo -j

Following initial cmake build, for subsequent builds, as long as new source files are not added, just type:

$ make example_asrc_demo -j

cmake needs to be rerun to discover any new source files added.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview$$$Windows£££doc/programming_guide/asrc/overview.html#windows

It is recommended to use Ninja or xmake as the make system under Windows. Ninja has been observed to be faster than xmake, however xmake comes natively with XTC tools. This firmware has been tested with Ninja version v1.11.1.

To install Ninja, activate your python environment, and run the following command:

$ pip install ninja

To build for the first time, activate your python environment, run cmake to create the make files:

$ pip install -r requirements.txt
$ md build
$ cd build
$ cmake -G "Ninja" --toolchain  ..\xmos_cmake_toolchain\xs3a.cmake ..
$ ninja example_asrc_demo.xe

Following initial cmake build, for subsequent builds, as long as new source files are not added, just type:

$ ninja example_asrc_demo.xe

cmake needs to be rerun to discover any new source files added.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview$$$Running the app£££doc/programming_guide/asrc/overview.html#running-the-app

To run the app, either xrun or xflash can be used. Connect the XK-VOICE-L71 board to the host and type the following to run with real-time debug output enabled:

$ xrun --xscope example_asrc_demo.xe

or to flash the application so that it always boots after a power cycle:

$ xflash example_asrc_demo.xe
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Overview$$$Operation£££doc/programming_guide/asrc/overview.html#operation

When the example runs, the audio received by the device on the I2S Slave interface at the I2S interface sampling rate is sample rate converted using the ASRC to the USB sampling rate and streamed out from the device over the USB interface. Similarly, the audio streamed out by the USB host into the USB interface of the device is sample rate converted to the I2S interface sampling rate and streamed out from the device over the I2S Slave interface.

This example supports dynamic changes of the I2S interface sampling frequency at runtime. It detects the I2S sampling rate change and reconfigures the system for the new rate.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture£££doc/programming_guide/asrc/software_architecture.html#software-architecture

The ASRC demo application is a two tile application developed to run on the XK-VOICE-L71 board running at a core frequency of 600 MHz.

It is a FreeRTOS based application where all the application blocks are implemented as FreeRTOS tasks.

Each tile has 5 bare metal cores dedicated to running RTOS tasks and since all processing is done within RTOS tasks, each core has 120 MHz of bandwidth available.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$Task diagram£££doc/programming_guide/asrc/software_architecture.html#task-diagram

The ASRC example task diagram shows the RTOS tasks and other components that make up the system.

ASRC example task diagram

ASRC example task diagram

The tasks can roughly be categorised as belonging to the USB driver, I2S driver or the application code categories. The actual ASRC processing happens in four tasks across the two tiles; the usb_audio_out_asrc task, i2s_audio_recv_asrc task, and two instances of asrc_one_channel task, one on each tile. This is described in more detail in the Application components section below.

Most of the tasks are involved in the ASRC processing data path, while a few are involved in monitoring the input and output data rates and computing the rate ratio, which is the ratio between the frequencies at the input and output of the ASRC tasks. The rate ratio is provided to the ASRC tasks every asrc_process_frame() call. Details about the rate ratio calculation are described in the rate_server section below.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$USB Driver components£££doc/programming_guide/asrc/software_architecture.html#usb-driver-components

This application presents a stereo, 48 kHz, 32 bit, high-speed, Adaptive UAC2.0 USB interface. It has two endpoints, Endpoint 0 for control and Endpoint 1 for bidirectional isochronous USB audio. The USB application level driver is TinyUSB based.

The usb_xud_thread, usb_isr, usb_task and usb_adaptive_clk_manager implement the USB driver. Together, these tasks handle the USB communication with the host and also monitor the average USB rate seen by the device. The average USB rate is used for calculating the rate ratios that are sent to the asrc_process_frame() function. This is described more in the rate_server section.

The usb_xud_thread runs XUD_Main which implements the USB HIL driver. It runs on a dedicated bare metal core so cannot be preempted by other RTOS tasks. It interfaces with the USB app level thread (usb_task) via shared memory and dedicated channels between the XUD_Main and each endpoint.

XUD_Main notifies the connected endpoint of a USB transfer completion through an interrupt on the respective channel. This interrupt is serviced by the usb_isr routine.

usb_task implements the app level USB driver functionality. The app level USB driver is based on TinyUSB which hooks into the application by means of callback functions. The usb_isr task is triggered by the interrupt and parses the data transferred from XUD and places it on a queue that the usb_task blocks on for further processing. For example, on completion of an EP1 OUT transfer, the transfer completion gets notified on the usb_xud_thread -> usb_isr -> usb_task path, and the usb_task calls the tud_audio_rx_done_post_read_cb() function to have the application process the data received from the host. On completion of an EP1 IN transfer, the transfer completion again follows the usb_xud_thread -> usb_isr -> usb_task path, and usb_task calls the tud_audio_tx_done_pre_load_cb() callback function to have the application load the EP1 IN data for the next transfer.

samples_to_host_stream_buf and samples_from_host_stream_buf are circular buffers shared between the application and the USB driver and allow for decoupling one from the other. The data frame received over USB from the host is written to the samples_from_host_stream_buf by the TinyUSB callback function tud_audio_rx_done_post_read_cb(), while the application reads USB_TO_I2S_ASRC_BLOCK_LENGTH samples of data out of it. Similarly, the application writes the ASRC output block of data to the samples_to_host_stream_buf while the TinyUSB callback function tud_audio_tx_done_pre_load_cb() reads from it to send one frame of data to the USB host.

usb_adaptive_clk_manager task is responsible for calculating the average USB rate as seen by the device. The average rate is calculated over a 16-second moving window. The averaging smooths out any jitter seen in the USB SOF timestamps that are used for calculating the rate.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$I2S Driver components£££doc/programming_guide/asrc/software_architecture.html#i2s-driver-components

This application presents a stereo 32 bit, I2S Slave interface that supports I2S sampling rates of 44.1, 48, 88.2, 96, 176.4 and 192 kHz. The I2S driver supports tracking dynamic sampling rate (SR) changes and recalculates the nominal sampling rate after detecting a SR change event. It also continuously monitors the timespan over which a fixed number of samples are received. This information is then used by the application for calculating the average I2S rate seen by the device.

i2s_slave_thread, I2S send_buffer and receive_buffer and rtos_i2s_isr make up the I2S driver components.

i2s_slave_thread implements the I2S HIL driver. The HIL level driver calls into the application callback functions for i2s_init(), i2s_restart_check(), i2s_receive() and i2s_send(). These functions, in addition to handling I2S send and receive data, also detect sampling rate changes and gather information for tracking the average sampling rate.

I2S send_buffer and receive_buffer are circular buffers shared between the driver and the application and contain data received over I2S (receive_buffer) and data the application wants to send over I2S (send_buffer). These buffers allow for decoupling the I2S HIL driver from the ASRC application. The driver reads from and writes to these buffers at the I2S sample rate while the application can read and write blocks of data to these buffers equal to the ASRC input or output block size.

The application calls rtos_i2s_rx() to read I2S_TO_USB_ASRC_BLOCK_LENGTH samples of data from the receive_buffer. The i2s_slave_thread independently calls i2s_receive() callback function to write a sample of data as it gets received over I2S.

Similarly, the application calls rtos_i2s_tx() to write ASRC output size block of data into the send_buffer. Meanwhile, the driver independently calls the callback function i2s_send() to read a sample of data to send over the I2S.

rtos_i2s_isr interrupt is used to ensure that the application calls to rtos_i2s_rx() and rtos_i2s_tx() block only on RTOS primitives when waiting for read data to be available or buffer space to be available when writing data.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$Application components£££doc/programming_guide/asrc/software_architecture.html#application-components

usb_audio_out_asrc, i2s_audio_recv_asrc, asrc_one_channel_task, usb_to_i2s_intertile, i2s_to_usb_intertile and the rate_server tasks make up the non-driver components of the application.

usb_audio_out_asrc performs ASRC on data received from the USB host to the device. It waits to get notified by the TinyUSB callback function tud_audio_rx_done_post_read_cb() when there are one or more ASRC input blocks (96 USB samples) of data in the samples_from_host_stream_buf. It does ASRC processing of the first channel while coordinating with the asrc_one_channel_task for processing the second channel in parallel and sends the processed output to the other tile on the inter-tile context.

i2s_audio_recv_asrc performs ASRC on data received over the I2S interface by the device. It blocks on the rtos_i2s_rx() function to receive one ASRC input block (244 I2S samples) of data from I2S and performs ASRC on one channel while coordinating with the asrc_one_channel_task for processing the second channel in parallel. It then sends the processed output to the other tile on the inter-tile context.

asrc_one_channel_task performs ASRC on a single channel of data. There is one of these on each tile. It waits on an RTOS message queue for an ASRC input block to be available, does ASRC processing on the block and posts the completion notification on another message queue.

usb_to_i2s_intertile task receives the ASRC output data generated by usb_audio_out_asrc over the inter-tile context onto the I2S tile and writes it to the I2S send_buffer. It has other rate-monitoring related responsibilities that are described in the rate_server section.

i2s_to_usb_intertile task receives the ASRC output data generated by i2s_audio_recv_asrc over the inter-tile context onto the USB tile and writes it to the USB samples_to_host_stream_buf. It has other rate-monitoring related responsibilities that are described in the rate_server section.

The I2S -> ASRC -> USB data path diagram shows the application tasks involved in the I2S -> ASRC -> USB path processing and their interaction with each other.

ASRC |I2S| -> ASRC -> USB data path

I2S -> ASRC -> USB data path

The USB -> ASRC -> I2S data path diagram shows the application tasks involved in the USB -> ASRC -> I2S path processing and their interaction with each other.

USB -> ASRC -> |I2S| data path

USB -> ASRC -> I2S data path

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$rate_server£££doc/programming_guide/asrc/software_architecture.html#rate-server

The ASRC process_frame API requires the caller to calculate and send the instantaneous ratio between the ASRC input and output rate. The rate_server is responsible for calculating these rate ratios for both USB -> ASRC -> I2S and I2S -> ASRC -> USB directions.

Additionally, the application also monitors the average buffer fill levels of the buffers holding ASRC output to prevent any overflows or underflows of the respective buffer. A gradual drift in the buffer fill level indicates that the rate ratio is being under or over calculated by the rate_server. This could happen either due to jitter in the actual rates or precision limitations when calculating the rates.

The average fill level of the buffer is monitored and a closed-loop error correction factor is calculated to keep the buffer level at an expected stable level. The error estimated based on the buffer fill level is used to compute the estimated rate ratio from the initial rate ratio. This estimated rate ratio is then sent to the ASRC process_frame() API.

estimated_rate_ratio = initial_rate_ratio + buffer_based_correction_factor

The rate_server runs on the I2S tile (tile 1) and is periodically triggered from the USB tile (tile 0) by the usb_to_i2s_intertile task. The rate_server is triggered once after every 16 frames are written to the samples_to_host_stream_buf.

The following information is needed for calculating the rate ratios:

  1. The average I2S rate

  2. The average USB rate

  3. An error factor computed based on the USB samples_to_host_stream_buf fill level

  4. An error factor computed based on the I2S send buffer fill level

  5. A USB mic_interface_open flag indicating if the USB host is streaming out from the device, since the rate ratio in the I2S -> ASRC -> USB direction is calculated only when the host is reading data from the device

  6. A USB spkr_interface_open flag indicating if the USB host is streaming into the device, since the rate ratio in the USB -> ASRC -> I2S direction is calculated only when the host is sending data to the device

Of the above, the USB related information (2, 3, 5 and 6 above) is available on the USB tile. When triggering the rate_server, the i2s_to_usb_intertile task gets this information, either calculating it or getting it through shared memory from other USB tasks on the same tile, and sends it to the rate_server over the inter-tile context using the structure below.

typedef struct
{
    int64_t buffer_based_correction;
    float_s32_t usb_data_rate;
    bool mic_itf_open;
    bool spkr_itf_open;
}usb_rate_info_t;

The I2S related information (1 and 4 above) is calculated in the rate_server itself with information available for calculating these available through shared memory from other tasks on this tile.

After calculating the rates, the rate_server sends the rate ratio for the USB -> ASRC -> I2S side to the usb_to_i2s_intertile task over the inter-tile context and it is made available to the usb_audio_out_asrc task through shared memory. The I2S -> ASRC -> USB side rate ratio is also made available to the i2s_audio_recv_asrc task through shared memory since it runs on the same tile as the rate server.

The Rate calculation code flow diagram shows the code flow during the rate ratio calculation process, focussing on the usb_to_intertile task that triggers the rate_server and the rate_server task where the rate ratios are calculated.

**rate_server** code flow

Rate calculation code flow

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$Handling I2S sampling rate change events£££doc/programming_guide/asrc/software_architecture.html#handling-i2s-sampling-rate-change-events

The I2S driver monitors the I2S nominal rate and provides this information to the application. When an I2S sampling rate change happens:

  • The ASRC instances on both tiles are re-initialised with the new sampling rate.

  • The buffers that are used for buffer-fill-level based correction are reset. Streaming out of them is paused while zeroes are sent out over both USB and I2S. Once the buffers fill to a stable level, streaming out from them resumes.

  • The average buffer level calculation state is reset and the average buffer level calculation starts afresh. New stable buffer levels are also calculated and the buffer levels are now corrected against these new stable averages.

Note that the device starts with the nominal I2S sampling rate set to zero. Device startup therefore follows the same path as an I2S sampling rate change where the sampling rate goes from zero to first detected nominal sampling rate. Everything described above therefore also applies to the device startup behaviour.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$Handling USB speaker interface close -> open events£££doc/programming_guide/asrc/software_architecture.html#handling-usb-speaker-interface-close-open-events

When the USB host stops streaming to the device and then starts again, this event is detected through calls to the tud_audio_set_itf_close_EP_cb and tud_audio_set_itf_cb functions. The ASRC output buffer in the USB -> ASRC -> I2S path (I2S send_buffer) is reset. Zeroes are then sent over I2S until the buffer fills to a stable level, when we resume streaming out of this buffer to send samples over I2S. The average buffer calculation state for the I2S send_buffer is also reset and a new stable average is calculated against which the average buffer levels are corrected.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Software Architecture$$$Handling USB mic interface close -> open events£££doc/programming_guide/asrc/software_architecture.html#handling-usb-mic-interface-close-open-events

If the USB host stops streaming from the device and then starts again, this event is detected through calls to the tud_audio_set_itf_close_EP_cb and tud_audio_set_itf_cb functions. The ASRC output buffer in the I2S -> ASRC -> USB is reset (USB samples_to_host_stream_buf). Zeroes are streamed to the host until the buffer fills to a stable level, when we resume streaming out of this buffer to send samples over USB. The average buffer calculation state for the USB samples_to_host_stream_buf is also reset and a new stable average is calculated against which the average buffer levels are corrected.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Resource Usage£££doc/programming_guide/asrc/resource_usage.html#resource-usage
XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Resource Usage$$$Memory£££doc/programming_guide/asrc/resource_usage.html#memory

Out of the 524288 bytes of memory available per tile, this application uses approximately 262000 bytes of memory on Tile 0 and 208000 bytes of memory on Tile 1.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Resource Usage$$$Chanends£££doc/programming_guide/asrc/resource_usage.html#chanends

This application uses 19 chanends on the USB tile (tile 0) and 11 chanends on the I2S tile (tile 1)

The chanend use for both tiles is described in the Tile 0 chanend usage and Tile 1 chanend usage tables.

Tile 0

Tile 0 chanend usage

Resource

Chanends used

RTOS scheduler

5 (one per bare-metal core dedicated to RTOS)

RTOS USB driver

10 (2 per endpoint, per direction. 2 for SOF input)

Intertile contexts

3

xscope

1

Tile 1

Tile 1 chanend usage

Resource

Chanends used

RTOS scheduler

5 (one per bare-metal core dedicated to RTOS)

RTOS I2S driver

2

Intertile contexts

3

xscope

1

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Resource Usage$$$Intertile contexts£££doc/programming_guide/asrc/resource_usage.html#intertile-contexts

The application uses 3 intertile contexts for cross tile communication.

  • A dedicated intertile context for sending ASRC output data from the I2S tile to the USB tile.

  • A dedicated intertile context for sending ASRC output data from the USB tile to the I2S tile.

  • The intertile context for all other cross tile communication.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Example Designs$$$ASRC Application$$$Resource Usage$$$CPU£££doc/programming_guide/asrc/resource_usage.html#cpu

Profiling the CPU usage for this application using an RTOS friendly profiling tool is still TBD. However, profiling some application tasks has taken place. These numbers along with some already existing profiling numbers for the drivers are listed in the Tile 0 tasks MIPS and Tile 1 tasks MIPS tables. Each tile has 5 bare-metal cores being used for running RTOS tasks so each core has a fixed bandwidth of 120 MHz available.

Tile 0

Tile 0 tasks MIPS

RTOS Task

MIPS

XUD

120 (from CPU Requirements (@ 600 MHz))

ASRC in the USB -> ASRC -> I2S path for the worst case of 48 kHz to 192 kHz upsampling

85

usb_task

24

i2s_to_usb_intertile

14

Tile 1

Tile 1 tasks MIPS

RTOS Task

MIPS

I2S Slave

96 (from CPU Requirements (@ 600 MHz))

ASRC in the I2S -> ASRC -> USB path for the worst case of 192 kHz to 48 kHz downsampling

75

usb_to_i2s_intertile

0.7

rate_server

19

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Memory and CPU Requirements£££doc/programming_guide/04_extending.html#memory-and-cpu-requirements

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Memory and CPU Requirements$$$Memory£££doc/programming_guide/04_extending.html#memory

The table below lists the approximate memory requirements for the larger software components. All memory use estimates in the table below are based on the default configuration for the feature. Alternate configurations will require more or less memory. The estimates are provided as guideline to assist application developers judge the memory cost of extending the application or benefit of removing an existing feature. It can be assumed that the memory requirement of components not listed in the table below are under 5 kB.

Memory Requirements

Component

Memory Use (kB)

Stereo Adaptive Echo Canceler (AEC)

275

Sensory Speech Recognition Engine

180

Cyberon Speech Recognition Engine

125

Interference Canceler (IC) + Voice To Noise Ratio Estimator (VNR)

130

USB

20

Noise Suppressor (NS)

15

Adaptive Gain Control (AGC)

11

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Memory and CPU Requirements$$$CPU£££doc/programming_guide/04_extending.html#cpu

The table below lists the approximate CPU requirements in MIPS for the larger software components. All CPU use estimates in the table below are based on the default configuration for the feature. Alternate configurations will require more or less MIPS. The estimates are provided as guideline to assist application developers judge the MIP cost of extending the application or benefits of removing an existing feature. It can be assumed that the memory requirement of components not listed in the table below are under 1%.

The following formula was used to convert CPU% to MIPS:

MIPS = (CPU% / 100%) * (600 MHz / 5 cores)

CPU Requirements (@ 600 MHz)

Component

CPU Use (%)

MIPS Use

USB XUD

100

120

I2S (slave mode)

80

96

Stereo Adaptive Echo Canceler (AEC)

80

96

Sensory Speech Recognition Engine

80

96

Cyberon Speech Recognition Engine

72

87

Interference Canceler (IC) + Voice To Noise Ratio Estimator (VNR)

25

30

Noise Suppressor (NS)

10

12

Adaptive Gain Control (AGC)

5

6

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$How-Tos£££doc/programming_guide/05_howto.html#how-tos

This section includes instructions on anticipated or common software modifications.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$How-Tos$$$Changing the input and output sample rate£££doc/programming_guide/05_howto.html#changing-the-input-and-output-sample-rate

In the example design app_conf.h file, change appconfAUDIO_PIPELINE_SAMPLE_RATE to either 16000 or 48000.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$How-Tos$$$I2S AEC reference input audio & USB processed audio output£££doc/programming_guide/05_howto.html#i2s-aec-reference-input-audio-usb-processed-audio-output

The FFVA example design includes 2 basic configurations; INT and UA. The INT configuration is setup with I2S for input and output audio. The UA configuration is setup with USB for input and output audio. This HOWTO explains how to modify the FFVA example design for I2S input audio and USB output audio.

In the ffva_ua.cmake file, changing the appconfAEC_REF_DEFAULT to appconfAEC_REF_I2S will result in the expected input frames.

set(FFVA_UA_COMPILE_DEFINITIONS
    ${APP_COMPILE_DEFINITIONS}
    appconfI2S_ENABLED=1
    appconfUSB_ENABLED=1
    appconfAEC_REF_DEFAULT=appconfAEC_REF_I2S

    appconfI2S_MODE=appconfI2S_MODE_MASTER
    MIC_ARRAY_CONFIG_MCLK_FREQ=24576000
)

For integrating with I2S there are a few other differences from the default UA configuration. When integrating with an external Raspberry Pi BCLK and LRCLK, you will want the following FFVA_UA_COMPILE_DEFINITIONS:

set(FFVA_UA_COMPILE_DEFINITIONS
    ${APP_COMPILE_DEFINITIONS}
    appconfI2S_ENABLED=1
    appconfUSB_ENABLED=1
    appconfAEC_REF_DEFAULT=appconfAEC_REF_I2S

    appconfI2S_MODE=appconfI2S_MODE_SLAVE
    appconfEXTERNAL_MCLK=0
    appconfI2S_AUDIO_SAMPLE_RATE=48000
    MIC_ARRAY_CONFIG_MCLK_FREQ=12288000
)

appconfI2S_AUDIO_SAMPLE_RATE can also be 16000. Only 48k and 16k conversions is supported in FFVA.

The default FFVA INT device doesn’t require an external MCLK, but this setting can be changed by setting appconfEXTERNAL_MCLK=1. In this case the FFVA example application will sit at initialization until it can lock on to that clock source, so it MUST be active during boot.

Since the FFVA example application is not receiving reference audio through USB in this configuration, USB adaptive mode will not adapt to the input. By default, FFVA will output the configured nominal rate.

If you enable appconfAEC_REF_DEFAULT=appconfAEC_REF_I2S and appconfI2S_MODE=appconfI2S_MODE_MASTER. You need to invert I2S_DATA_IN and I2S_MIC_DATA in the bsp_config/XK_VOICE_L71/XK_VOICE_L71.xn file to have the reference audio play properly.

Lastly, with I2S enabled the DAC is always initialized by the FFVA example application. If FFVA cannot be the I2C host then it is up to the host to initialize the DAC, like in the AVS demo.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Frequently Asked Questions£££doc/programming_guide/06_faq.html#frequently-asked-questions

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Frequently Asked Questions$$$CMake hides XTC Tools commands£££doc/programming_guide/06_faq.html#cmake-hides-xtc-tools-commands

If you want to customize the XTC Tools commands like xflash and xrun, you can see what commands CMake is running by adding VERBOSE=1 to your build command line. For example:

make run_my_target VERBOSE=1

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Frequently Asked Questions$$$fatfs_mkimage: not found£££doc/programming_guide/06_faq.html#fatfs-mkimage-not-found

This issue occurs when the fatfs_mkimage host utility cannot be found. The most common cause for these issues are an incomplete installation of XCORE-VOICE.

Ensure that the host applications build and install has been completed. Verify that the fatfs_mkimage binary is installed to a location on PATH, or that the default application installation folder is added to PATH.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Frequently Asked Questions$$$FFD pdm_rx_isr() Crash£££doc/programming_guide/06_faq.html#ffd-pdm-rx-isr-crash

One potential issue with the low power FFD application is a crash after adding new code:

xrun: Program received signal ET_ECALL, Application exception.
    [Switching to tile[1] core[1]]
    0x0008a182 in pdm_rx_isr ()

This generally occurs when there is not enough processing time available on tile 1, or when interrupts were disabled for too long, causing the mic array driver to fail to meet timing. To resolve reduce the processing time, minimize context switching and other actions that require kernel locks, and/or increase the tile 1 core clock frequency.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Frequently Asked Questions$$$Debugging low-power£££doc/programming_guide/06_faq.html#debugging-low-power

The clock dividers are set high to minimize core power consumption. This can make debugging a challenge or impossible. Even adding a simple printf can cause critical timing to be missed. In order to debug with the low-power features enabled, temporarily modify the clock dividers in app_conf.h.

#define appconfLOW_POWER_SWITCH_CLK_DIV         1   // Resulting clock freq 600MHz.
#define appconfLOW_POWER_OTHER_TILE_CLK_DIV     1   // Resulting clock freq 600MHz.
#define appconfLOW_POWER_CONTROL_TILE_CLK_DIV   1   // Resulting clock freq 600MHz.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Frequently Asked Questions$$$xcc2clang.exe: error: no such file or directory£££doc/programming_guide/06_faq.html#xcc2clang-exe-error-no-such-file-or-directory

Those strange characters at the beginning of the path are known as a byte-order mark (BOM). CMake adds them to the beginning of the response files it generates during the configure step. Why does it add them? Because the MSVC compiler toolchain requires them. However, some compiler toolchains, like gcc and xcc, do not ignore the BOM. Why did CMake think the compiler toolchain was MSVC and not the XTC toolchain? Because of a bug in which certain versions of CMake and certain versions of Visual Studio do not play nice together. The good news is that this appears to have been addressed in CMake version 3.22.3. Update to CMake version 3.22.2 or newer.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Licenses£££doc/programming_guide/07_legal.html#licenses

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Licenses$$$XMOS£££doc/programming_guide/07_legal.html#xmos

All original source code is licensed under the XMOS License.

XCORE ® -VOICE Solutions$$$XCORE-VOICE Programming Guide$$$Licenses$$$Third-Party£££doc/programming_guide/07_legal.html#third-party

Additional third party code is included under the following copyrights and licenses:

Third Party Module Copyrights & Licenses

Module

Copyright & License

dr_wav

Copyright (C) 2022 David Reid, licensed under a public domain license

FatFS

Copyright (C) 2017 ChaN, licensed under a BSD-style license

FreeRTOS

Copyright (c) 2017 Amazon.com, Inc., licensed under the MIT License

Sensory TrulyHandsfree™

The Sensory TrulyHandsfree™ speech recognition library is Copyright (C) 1995-2022 Sensory Inc. and is provided as an expiring development license. Commercial licensing is granted by Sensory Inc.

Cyberon DSpotter™

For any licensing questions about Cyberon DSpotter™ speech recognition library please contact Cyberon Corporation.

TinyUSB

Copyright (c) 2018 hathach (tinyusb.org), licensed under the MIT license

XCORE ® -VOICE Solutions$$$Audio Processing£££modules/voice/doc/user_guide/audio_processing/index.html#audio-processing

At the core of the Voice Framework are high-performance audio processing algorithms. The algorithms are connected in a pipeline that takes its input from a pair of the microphone and executes a series of signal processing algorithms to extract a voice signal from a complex soundscape. The audio pipeline can accept a reference signal from a host system which is used to perform Acoustic Echo Cancellation (AEC) to remove audio being played by the host. The audio pipeline provides two different output channels - one that is optimized for Automatic Speech Recognition systems and the other for voice communications.

A flexible audio signal routing infrastructure and a range of digital inputs and outputs enables the Voice Framework to be integrated into a wide range of system configurations, that can be configured at start up and during operation through a set of control registers. In addition, all source code is provided to allow for full customization or the addition of other audio processing algorithms.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features£££modules/voice/doc/user_guide/audio_processing/index.html#audio-features

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library£££modules/voice/modules/lib_aec/doc/index.html#acoustic-echo-canceller-library

lib_aec is a library which provides functions that can be put together to perform Acoustic Echo Cancellation (AEC) on input mic data using the input reference data to model the room echo characteristics. lib_aec library functions make use of functionality provided in lib_xcore_math to perform DSP operations. For more details refer to AEC Overview.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$Repository Structure£££modules/voice/modules/lib_aec/doc/src/getting_started.html#repository-structure
  • modules/lib_aec - The actual lib_aec library directory within https://github.com/xmos/fwk_voice/. Within lib_aec

    • api/ - Headers containing the public API for lib_aec.

    • doc/ - Library documentation source (for non-embedded documentation) and build directory.

    • src/ - Library source code.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$Requirements£££modules/voice/modules/lib_aec/doc/src/getting_started.html#requirements

lib_aec is included as part of the fwk_voice github repository and all requirements for cloning and building fwk_voice apply. lib_aec is compiled as a static library as part of overall fwk_voice build. It depends on lib_xcore_math.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Structure£££modules/voice/modules/lib_aec/doc/src/getting_started.html#api-structure

The API can be categorised into high level and low level functions.

High level API has fewer input arguments and is simpler. However, it provides limited options for calling functions in parallel across multiple threads. Keeping API simplicity in mind, most of the high level API functions accept a pointer to the AEC state structure as an input and modify the relevant part of the AEC state. API and example documentation provides more details about the fields within the state modified when calling a given function. High level API functions allow 2 levels of parallelism:

  • Single level of parallelism where for a given function, main and shadow filter processing can happen in parallel.

  • Two levels of parallelism where a for a given function, processing across multiple channels as well as main and shadow filter can be done in parallel.

Low level API has more input arguments but allows more freedom for running in parallel across multiple threads. Low level API function names begin with a aec_l2_ prefix. Depending on the low level API used, functions can be run in parallel to work over a range of bins or a range of phases. This API is still a work in progress and will be fully supported in the future.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$Getting and Building£££modules/voice/modules/lib_aec/doc/src/getting_started.html#getting-and-building

This repo is got as part of the parent fwk_voice repo clone. It is compiled as a static library as part of fwk_voice compilation process.

To include lib_aec in an application as a static library, the generated libfwk_voice_module_lib_aec.a can then be linked into the application. Be sure to also add lib_aec/api as an include directory for the application.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$AEC Overview£££modules/voice/modules/lib_aec/doc/src/overview.html#aec-overview

The lib_aec library provides functions that can be put together to perform Automatic Echo Cancellation on input microphone data by using input reference data to model the echo characteristics of the room.

The echo canceller takes in one or more channels of microphone (mic) input and one or more channels of reference input data. The mic input is the input captured by the device microphones. Reference input is the audio that is played out of the device speakers. The echo canceller uses the reference input to model the room echo characteristics for each mic-loudspeaker pair and outputs an echo cancelled version of the mic input. AEC uses adaptive filters, one per mic-speaker pair to constantly remove echo from the the mic input. The filters continually adapt to the acoustic environment to accommodate changes in the room created by events such as doors opening or closing and people moving about.

Echo cancellation is performed on a frame by frame basis. Each frame is made of 15msec chunks of data, which is 240 samples at 16kHz input sampling frequency, per input channel. For example, for a 2 mic channel and 2 reference channel input configuration, an input frame is made of 2x240 samples of mic data and 2x240 samples of reference data. Input data is expected to be in fixed point 32bit 1.31 format. Further, in this example, there will be a total of 4 adaptive filters; \(\hat{H}_{y0x0}\), \(\hat{H}_{y0x1}\), \(\hat{H}_{y1x0}\) and \(\hat{H}_{y1x1}\), monitoring the echo seen in mic channel 0 from reference channel 0 and 1 and echo seen in mic channel 1 from reference channel 0 and 1.

Microphone data is referred to as \(y\) when in time domain and \(Y\) when in frequency domain. In general throughout the code, names starting with lower case represent time domain and those beginning with upper case represent frequency domain. For example \(error\) is the filter error and \(Error\) is the spectrum of the filter error. Reference input is referred to as \(x\) in time domain and \(X\) when in frequency domain. Filter is referred to as \(\hat{h}\) in time domain and \(\hat{H}\) in frequency domain.

A filter has multiple phases. The term phases refers to the tail length of the filter. A filter with more phases or a longer tail length will be able to model a more reverberant room response leading to better echo cancellation.

There are 2 types of adaptive filters used in the AEC. These are referred to as main filter and shadow filter. The main filter as the name suggests is the main filter that is used to generate the echo cancelled output of the AEC. Shadow filter is a filter that used to quickly detect and respond to changes in the room transfer function. There is one main filter and one shadow filter per \(x\)-\(y\) pair. Typically the main filter has more phases than the shadow filter. Fewer phases in the shadow filter enable it to rapidly detect and respond to changes while more phases in main filter lead to deeper convergence and hence better echo cancellation at the AEC output.

Before starting AEC processing or every time there’s a configuration change, the user needs to call aec_init() to initialise the echo canceller for a desired configuration. Once the AEC is initialised, the library functions can be called in a logical order to perform echo cancellation on a frame by frame basis. Refer to the aec_1_thread and aec_2_threads examples to see how the functions are called to perform echo cancellation using one thread or 2 threads.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference£££modules/voice/modules/lib_aec/doc/src/reference/index.html#api-reference
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$AEC Data Structure and Enum Definitions£££modules/voice/modules/lib_aec/doc/src/reference/types.html#aec-data-structure-and-enum-definitions
group aec_types

Enums

enum aec_adaption_e

Values:

enumerator AEC_ADAPTION_AUTO

Compute filter adaption config every frame.

enumerator AEC_ADAPTION_FORCE_ON

Filter adaption always ON.

enumerator AEC_ADAPTION_FORCE_OFF

Filter adaption always OFF.

enum shadow_state_e

Values:

enumerator LOW_REF

Not much reference so no point in acting on AEC filter logic.

enumerator ERROR

something has gone wrong, zero shadow filter

enumerator ZERO

shadow filter has been reset multiple times, zero shadow filter

enumerator RESET

copy main filter to shadow filter

enumerator EQUAL

main filter and shadow filter are similar

enumerator SIGMA

shadow filter bit better than main, reset sigma_xx for faster convergence

enumerator COPY

shadow filter much better, copy to main

struct coherence_mu_config_params_t

Public Members

float_s32_t coh_alpha

Update rate of coh.

float_s32_t coh_slow_alpha

Update rate of coh_slow.

float_s32_t coh_thresh_slow

Adaption frozen if coh below (coh_thresh_slow*coh_slow)

float_s32_t coh_thresh_abs

Adaption frozen if coh below coh_thresh_abs.

float_s32_t mu_scalar

Scalefactor for scaling the calculated mu.

float_s32_t eps

Parameter to avoid divide by 0 in coh calculation.

float_s32_t thresh_minus20dB

-20dB threshold

float_s32_t x_energy_thresh

X_energy threshold used for determining if the signal has enough reference energy for sensible coherence mu calculation

unsigned mu_coh_time

Number of frames after low coherence, adaption frozen for.

unsigned mu_shad_time

Number of frames after shadow filter use, the adaption is fast for

aec_adaption_e adaption_config

Filter adaption mode. Auto, force ON or force OFF

int32_t force_adaption_mu_q30

Fixed mu value used when filter adaption is forced ON

struct shadow_filt_config_params_t

Public Members

float_s32_t shadow_sigma_thresh

threshold for resetting sigma_XX.

float_s32_t shadow_copy_thresh

threshold for copying shadow filter.

float_s32_t shadow_reset_thresh

threshold for resetting shadow filter.

float_s32_t shadow_delay_thresh

threshold for turning off shadow filter reset if reference delay is large

float_s32_t x_energy_thresh

X energy threshold used for deciding whether the system has enough reference energy for main and shadow filter comparison to make sense

float_s32_t shadow_mu

fixed mu value used during shadow filter adaption.

int32_t shadow_better_thresh

Number of times shadow filter needs to be better before it gets copied to main filter.

int32_t shadow_zero_thresh

Number of times shadow filter is reset by copying the main filter to it before it gets zeroed.

int32_t shadow_reset_timer

Number of frames between zeroing resets of shadow filter.

struct aec_core_config_params_t

Public Members

int bypass

bypass AEC flag.

int gamma_log2

parameter for deriving the gamma value that used in normalisation spectrum calculation. gamma is calculated as 2^gamma_log2

uint32_t sigma_xx_shift

parameter used for deriving the alpha value used while calculating EMA of X_energy to calculate sigma_XX.

float_s32_t delta_adaption_force_on

delta value used in normalisation spectrum computation when adaption is forced as always ON.

float_s32_t delta_min

Lower limit of delta computed using fractional regularisation.

uint32_t coeff_index

coefficient index used to track H_hat index when sending H_hat values over the host control interface.

uq2_30 ema_alpha_q30

alpha used while calculating y_ema_energy, x_ema_energy and error_ema_energy.

struct aec_config_params_t
#include <aec_state.h>

AEC control parameters.

This structure contains control parameters that the user can modify at run time.

Public Members

coherence_mu_config_params_t coh_mu_conf

Coherence mu related control params.

shadow_filt_config_params_t shadow_filt_conf

Shadow filter related control params.

aec_core_config_params_t aec_core_conf

All AEC control params except those for coherence mu and shadow filter.

struct coherence_mu_params_t

Public Members

float_s32_t coh

Moving average coherence.

float_s32_t coh_slow

Slow moving average coherence.

int32_t mu_coh_count

Counter for tracking number of frames coherence has been low for.

int32_t mu_shad_count

Counter for tracking number of frames shadow filter has been used in.

float_s32_t coh_mu[AEC_LIB_MAX_X_CHANNELS]

Coherence mu.

struct shadow_filter_params_t

Public Members

int32_t shadow_flag[AEC_LIB_MAX_Y_CHANNELS]

shadow_state_e enum indicating shadow filter status

int shadow_reset_count[AEC_LIB_MAX_Y_CHANNELS]

counter for tracking shadow filter resets

int shadow_better_count[AEC_LIB_MAX_Y_CHANNELS]

counter for tracking shadow filter copy to main filter

struct aec_shared_state_t
#include <aec_state.h>

AEC shared state structure.

Data structures holding AEC persistent state that is common between main filter and shadow filter. aec_state_t::shared_state for both main and shadow filter point to the common aec_shared_t structure. [aec_shared_state_t]

Public Members

bfp_complex_s32_t X_fifo[AEC_LIB_MAX_X_CHANNELS][AEC_LIB_MAX_PHASES]

BFP array pointing to the reference input spectrum phases. The term phase refers to the spectrum data for a frame. Multiple phases means multiple frames of data.

For example, 10 phases would mean the 10 most recent frames of data. Each phase spectrum, pointed to by X_fifo[i][j]->data is stored as a length AEC_FD_FRAME_LENGTH, complex 32bit array.

The phases are ordered from most recent to least recent in the X_fifo. For example, for an AEC configuration of 2 x-channels and 10 phases per x channel, 10 frames of X data spectrum is stored in the X_fifo. For a given x channel, say x channel 0, X_fifo[0][0] points to the most recent frame’s X spectrum and X_fifo[0][9] points to the last phase, i.e the least recent frame’s X spectrum.

bfp_complex_s32_t X[AEC_LIB_MAX_X_CHANNELS]

BFP array pointing to reference input signal spectrum. The X data values are stored as a length AEC_FD_FRAME_LENGTH complex 32bit array per x channel.

bfp_complex_s32_t Y[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to mic input signal spectrum. The Y data values are stored as a length AEC_FD_FRAME_LENGTH complex 32bit array per y channel.

bfp_s32_t y[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to time domain mic input processing block. The y data values are stored as length AEC_PROC_FRAME_LENGTH, 32bit integer array per y channel.

bfp_s32_t x[AEC_LIB_MAX_X_CHANNELS]

BFP array pointing to time domain reference input processing block. The x data values are stored as length AEC_PROC_FRAME_LENGTH, 32bit integer array per x channel.

bfp_s32_t prev_y[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to time domain mic input values from the previous frame. These are put together with the new samples received in the current frame to make a AEC_PROC_FRAME_LENGTH processing block. The prev_y data values are stored as length (AEC_PROC_FRAME_LENGTH - AEC_FRAME_ADVANCE), 32bit integer array per y channel.

bfp_s32_t prev_x[AEC_LIB_MAX_X_CHANNELS]

BFP array pointing to time domain reference input values from the previous frame. These are put together with the new samples received in the current frame to make a AEC_PROC_FRAME_LENGTH processing block. The prev_x data values are stored as length (AEC_PROC_FRAME_LENGTH - AEC_FRAME_ADVANCE), 32bit integer array per x channel.

bfp_s32_t sigma_XX[AEC_LIB_MAX_X_CHANNELS]

BFP array pointing to sigma_XX values which are the weighted average of the X_energy signal. The sigma_XX data is stored as 32bit integer array of length AEC_FD_FRAME_LENGTH

float_s32_t y_ema_energy[AEC_LIB_MAX_Y_CHANNELS]

Exponential moving average of the time domain mic signal energy. This is calculated by calculating energy per sample and summing across all samples. Stored in a y channels array with every value stored as a 32bit integer mantissa and exponent.

float_s32_t x_ema_energy[AEC_LIB_MAX_X_CHANNELS]

Exponential moving average of the time domain reference signal energy. This is calculated by calculating energy per sample and summing across all samples. Stored in a x channels array with every value stored as a 32bit integer mantissa and exponent.

float_s32_t overall_Y[AEC_LIB_MAX_Y_CHANNELS]

Energy of the mic input spectrum. This is calculated by calculating the energy per bin and summing across all bins. Stored in a y channels array with every value stored as a 32bit integer mantissa and exponent.

float_s32_t sum_X_energy[AEC_LIB_MAX_X_CHANNELS]

Sum of the X_energy across all bins for a given x channel. Stored in a x channels array with every value stored as a 32bit integer mantissa and exponent.

coherence_mu_params_t coh_mu_state[AEC_LIB_MAX_Y_CHANNELS]

Structure containing coherence mu calculation related parameters.

shadow_filter_params_t shadow_filter_params

Structure containing shadow filter related parameters.

aec_config_params_t config_params

Structure containing AEC control parameters. These are initialised to the default values and can be changed at runtime by the user.

unsigned num_y_channels

Number of mic input channels that the AEC is configured for. This is the input parameter num_y_channels that aec_init() gets called with.

unsigned num_x_channels

Number of reference input channels that the AEC is configured for. This is the input parameter num_x_channels that aec_init() gets called with.

struct aec_state_t
#include <aec_state.h>

[aec_shared_state_t]

AEC state structure.

Data structures holding AEC persistent state. There are 2 instances of aec_state_t maintained within AEC; one for main filter and one for shadow filter specific state. [aec_state_t]

Public Members

bfp_complex_s32_t Y_hat[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to estimated mic signal spectrum. The Y_data data values are stored as length AEC_FD_FRAME_LENGTH, complex 32bit array per y channel.

bfp_complex_s32_t Error[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to adaptive filter error signal spectrum. The Error data is stored as length AEC_FD_FRAME_LENGTH, complex 32bit array per y channel.

bfp_complex_s32_t H_hat[AEC_LIB_MAX_Y_CHANNELS][AEC_LIB_MAX_PHASES]

BFP array pointing to the adaptive filter spectrum. The filter spectrum is stored as a num_y_channels x total_phases_across_all_x_channels array where each H_hat[i][j] entry points to the spectrum of a single phase.

Number of phases in the filter refers to its tail length. A filter with more phases would be able to model a longer echo thereby causing better echo cancellation.

For example, for a 2 y-channels, 3 x-channels, 10 phases per x channel configuration, the filter spectrum phases are stored in a 2x30 array. For a given y channel, say y channel 0, H_hat[0][0] to H_hat[0][9] points to 10 phases of H_haty0x0, H_hat[0][10] to H_hat[0][19] points to 10 phases of H_haty0x1 and H_hat[0][20] to H_hat[0][29] points to 10 phases of H_haty0x2.

Each filter phase data which is pointed to by H_hat[i][j].data is stored as AEC_FD_FRAME_LENGTH complex 32bit array.

bfp_complex_s32_t X_fifo_1d[AEC_LIB_MAX_PHASES]

BFP array pointing to all phases of reference input spectrum across all x channels. Here, the reference input spectrum is saved in a 1 dimensional array of phases, with x channel 0 phases followed by x channel 1 phases and so on. For example, for a 2 x-channels, 10 phases per x channel configuration, X_fifo_1d[0] to X_fifo_1d[9] points to the 10 phases for channel 0 and X_fifo[10] to X_fifo[19] points to the 10 phases for channel 1.

Each X data spectrum phase pointed to by X_fifo_1d[i][j].data is stored as length AEC_FD_FRAME_LENGTH complex 32bit array.

bfp_complex_s32_t T[AEC_LIB_MAX_X_CHANNELS]

BFP array pointing to T values which are stored as a length AEC_FD_FRAME_LENGTH, complex array per x channel.

bfp_s32_t inv_X_energy[AEC_LIB_MAX_X_CHANNELS]

BFP array pointing to the normalisation spectrum which are stored as a length AEC_FD_FRAME_LENGTH, 32bit integer array per x channel.

bfp_s32_t X_energy[AEC_LIB_MAX_X_CHANNELS]

BFP array pointing to the X_energy data which is the energy per bin of the X spectrum summed over all phases of the X data. X_energy data is stored as a length AEC_FD_FRAME_LENGTH, integer 32bit array per x channel.

bfp_s32_t overlap[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to time domain overlap data values which are used in the overlap add operation done while calculating the echo canceller time domain output. Stored as a length 32, 32 bit integer array per y channel.

bfp_s32_t y_hat[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to the time domain estimated mic signal. Stored as length AEC_PROC_FRAME_LENGTH, 32 bit integer array per y channel.

bfp_s32_t error[AEC_LIB_MAX_Y_CHANNELS]

BFP array pointing to the time domain adaptive filter error signal. Stored as length AEC_PROC_FRAME_LENGTH, 32 bit integer array per y channel.

float_s32_t mu[AEC_LIB_MAX_Y_CHANNELS][AEC_LIB_MAX_X_CHANNELS]

mu values for every x-y pair stored as 32 bit integer mantissa and 32 bit integer exponent

float_s32_t error_ema_energy[AEC_LIB_MAX_Y_CHANNELS]

Exponential moving average of the time domain adaptive filter error signal energy. Stored in an x channels array with every value stored as a 32bit integer mantissa and exponent.

float_s32_t overall_Error[AEC_LIB_MAX_Y_CHANNELS]

Energy of the adaptive filter error spectrum. Stored in a y channels array with every value stored as a 32bit integer mantissa and exponent.

float_s32_t max_X_energy[AEC_LIB_MAX_X_CHANNELS]

Maximum X energy across all values of X_energy for a given x channel. Stored in an x channels array with every value stored as a 32bit integer mantissa and exponent.

float_s32_t delta_scale

fractional regularisation scalefactor.

float_s32_t delta

delta parameter used in the normalisation spectrum calculation.

aec_shared_state_t *shared_state

pointer to the state data shared between main and shadow filter.

unsigned num_phases

Number of filter phases per x-y pair that AEC filter is configured for. This is the input argument num_main_filter_phases or num_shadow_filter_phases, depending on which filter the aec_state_t is instantiated for, passed in aec_init() call.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$AEC #define constants£££modules/voice/modules/lib_aec/doc/src/reference/defines.html#aec-define-constants
group aec_defines

Defines

AEC_LIB_MAX_Y_CHANNELS

Maximum number of microphone input channels supported in the library. Microphone input to the AEC refers to the input from the device’s microphones from which AEC removes the echo created in the room by the device’s loudspeakers.

AEC functions follow the convention of using \(y\) and \(Y\) for referring to time domain and frequency domain representation of microphone input.

The num_y_channels passed into aec_init() call should be less than or equal to AEC_LIB_MAX_Y_CHANNELS. This define is only used for defining data structures in the aec_state. The library code implementation uses only the num_y_channels aec is initialised for in the aec_init() call.

AEC_LIB_MAX_X_CHANNELS

Maximum number of reference input channels supported in the library. Reference input to the AEC refers to a copy of the device’s speaker output audio that is also sent as an input to the AEC. It is used to model the echo characteristics between a mic-loudspeaker pair.

AEC functions follow the convention of using \(x\) and \(X\) for referring to time domain and frequency domain representation of reference input.

The num_x_channels passed into aec_init() call should be less than or equal to AEC_LIB_MAX_X_CHANNELS. This define is only used for defining data structures in the aec_state. The library code implementation uses only the num_x_channels aec is initialised for in the aec_init() call.

AEC_FRAME_ADVANCE

AEC frame size This is the number of samples of new data that the AEC works on every frame. 240 samples at 16kHz is 15msec. Every frame, the echo canceller takes in 15msec of mic and reference data and generates 15msec of echo cancelled output.

AEC_PROC_FRAME_LENGTH

Time domain samples block length used internally in AEC’s block LMS algorithm

AEC_FD_FRAME_LENGTH

Number of bins of spectrum data computed when doing a DFT of a AEC_PROC_FRAME_LENGTH length time domain vector. The AEC_FD_FRAME_LENGTH spectrum values represent the bins from DC to Nyquist.

AEC_LIB_MAX_PHASES

Maximum total number of phases supported in the AEC library This is the maximum number of total phases supported in the AEC library. Total phases are calculated by summing phases across adaptive filters for all x-y pairs.

For example. for a 2 y-channels, 2 x-channels, 10 phases per x channel configuration, there are 4 adaptive filters, H_haty0x0, H_haty0x1, H_haty1x0 and H_haty1x1, each filter having 10 phases, so the total number of phases is 40. When aec_init() is called to initialise the AEC, the num_y_channels, num_x_channels and num_main_filter_phases parameters passed in should be such that num_y_channels * num_x_channels * num_main_filter_phases is less than equal to AEC_LIB_MAX_PHASES.

This define is only used when defining data structures within the AEC state structure. The AEC algorithm implementation uses the num_main_filter_phases and num_shadow_filter_phases values that are passed into aec_init().

AEC_UNUSED_TAPS_PER_PHASE

Overlap data length

AEC_FFT_PADDING

Extra 2 samples you need to allocate in time domain so that the full spectrum (DC to nyquist) can be stored after the in-place FFT. NOT USER MODIFIABLE.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$AEC API£££modules/voice/modules/lib_aec/doc/src/reference/api/index.html#aec-api
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$AEC High Level API Functions£££modules/voice/modules/lib_aec/doc/src/reference/api/high_level_api.html#aec-high-level-api-functions
group aec_func

Functions

void aec_init(aec_state_t *main_state, aec_state_t *shadow_state, aec_shared_state_t *shared_state, uint8_t *main_mem_pool, uint8_t *shadow_mem_pool, unsigned num_y_channels, unsigned num_x_channels, unsigned num_main_filter_phases, unsigned num_shadow_filter_phases)

Initialise AEC data structures.

This function initializes AEC data structures for a given configuration. The configuration parameters num_y_channels, num_x_channels, num_main_filter_phases and num_shadow_filter_phases are passed in as input arguments.

This function needs to be called at startup to first initialise the AEC and subsequently whenever the AEC configuration changes.

main_state, shadow_state and shared_state structures must start at double word aligned addresses.

main_mem_pool and shadow_mem_pool must point to memory buffers big enough to support main and shadow filter processing. AEC state aec_state_t and shared state aec_shared_state_t structures contain only the BFP data structures used in the AEC. The memory these BFP structures will point to needs to be provided by the user in the memory pool main and shadow filters memory pool. An example memory pool structure is present in aec_memory_pool_t and aec_shadow_filt_memory_pool_t.

main_mem_pool and shadow_mem_pool must also start at double word aligned addresses.

Example

#include "aec_memory_pool.h"
aec_state_t DWORD_ALIGNED main_state;
aec_state_t DWORD_ALIGNED shadow_state;
aec_shared_state_t DWORD_ALIGNED aec_shared_state;
uint8_t DWORD_ALIGNED aec_mem[sizeof(aec_memory_pool_t)];
uint8_t DWORD_ALIGNED aec_shadow_mem[sizeof(aec_shadow_filt_memory_pool_t)];
unsigned y_chans = 2, x_chans = 2;
unsigned main_phases = 10, shadow_phases = 5;
// There is one main and one shadow filter per x-y channel pair, so for this example there will be 4 main and 4
// shadow filters. Each main filter will have 10 phases and each shadow filter will have 5 phases.
aec_init(&main_state, &shadow_state, &shared_state, aec_mem, aec_shadow_mem, y_chans, x_chans, main_phases, shadow_phases);

Parameters:
  • main_state[inout] AEC state structure for holding main filter specific state

  • shadow_state[inout] AEC state structure for holding shadow filter specific state

  • shared_state[inout] Shared state structure for holding state that is common to main and shadow filter

  • main_mem_pool[inout] Memory pool containing main filter memory buffers

  • shadow_mem_pool[inout] Memory pool containing shadow filter memory buffers

  • num_y_channels[in] Number of mic input channels

  • num_x_channels[in] Number of reference input channels

  • num_main_filter_phases[in] Number of phases in the main filter

  • num_shadow_filter_phases[in] Number of phases in the shadow filter

void aec_frame_init(aec_state_t *main_state, aec_state_t *shadow_state, const int32_t (*y_data)[AEC_FRAME_ADVANCE], const int32_t (*x_data)[AEC_FRAME_ADVANCE])

Initialise AEC data structures for processing a new frame.

This is the first function that is called when a new frame is available for processing. It takes the new samples as input and combines the new samples and previous frame’s history to create a processing block on which further processing happens. It also initialises some data structures that need to be initialised at the beginning of a frame.

Note

y_data and x_data buffers memory is free to be reused after this function call.

Parameters:
  • main_state[inout] main filter state

  • shadow_state[inout] shadow filter state

  • y_data[in] pointer to mic input buffer

  • x_data[in] pointer to reference input buffer

void aec_calc_freq_domain_energy(float_s32_t *fd_energy, const bfp_complex_s32_t *input)

Calculate energy in the spectrum.

This function calculates the energy of frequency domain data used in the AEC. Frequency domain data in AEC is in the form of complex 32bit vectors and energy is calculated as the squared magnitude of the input vector.

Parameters:
  • fd_energy[out] energy of the input spectrum

  • input[in] input spectrum BFP structure

void aec_calc_time_domain_ema_energy(float_s32_t *ema_energy, const bfp_s32_t *input, unsigned start_offset, unsigned length, const aec_config_params_t *conf)

Calculate exponential moving average (EMA) energy of a time domain (TD) vector.

This function calculates the EMA energy of AEC time domain data which is in the form of real 32bit vectors.

This function can be called to calculate the EMA energy of subsets of the input vector as well.

Parameters:
  • ema_energy[out] EMA energy of the input

  • input[in] time domain input BFP structure

  • start_offset[in] offset in the input vector from where to start calculating EMA energy

  • length[in] length over which to calculate EMA energy

  • conf[in] AEC configuration parameters.

void aec_forward_fft(bfp_complex_s32_t *output, bfp_s32_t *input)

Calculate Discrete Fourier Transform (DFT) spectrum of an input time domain vector.

This function calculates the spectrum of a real 32bit time domain vector. It calculates an N point real DFT where N is the length of the input vector to output a complex N/2+1 length complex 32bit vector. The N/2+1 complex output values represent spectrum samples from DC up to the Nyquist frequency.

The DFT calculation is done in place. After this function call the input and output BFP structures data fields point to the same memory. Since DFT is calculated in place, use of the input BFP struct is undefined after this function.

To allow for inplace transform from N real 32bit values to N/2+1 complex 32bit values, the input vector should have 2 extra real 32bit samples worth of memory. This means that input->data should point to a buffer of length input->length+2

After this function input->data and output->data point to the same memory address.

Parameters:
  • output[out] DFT output BFP structure

  • input[in] DFT input BFP structure

void aec_inverse_fft(bfp_s32_t *output, bfp_complex_s32_t *input)

Calculate inverse Discrete Fourier Transform (DFT) of an input spectrum.

This function calculates a N point inverse real DFT of a complex 32bit where N is 2*(length-1) where length is the length of the input vector. The output is a real 32bit vector of length N.

The inverse DFT calculation is done in place. After this operation the input and the output BFP structures data fields point to the same memory. Since the calculation is done in place, use of input BFP struct after this function is undefined.

After this function input->data and output->data point to the same memory address.

Parameters:
  • output[out] inverse DFT output BFP structure

  • input[in] inverse DFT input BFP structure

void aec_calc_X_fifo_energy(aec_state_t *state, unsigned ch, unsigned recalc_bin)

Calculate total energy of the X FIFO.

X FIFO is a FIFO of the most recent X frames, where X is spectrum of one frame of reference input. There’s a common X FIFO that is shared between main and shadow filters. It holds num_main_filter_phases most recent X frames and the shadow filter uses num_shadow_filter_phases most recent frames out of it.

This function calculates the energy per X sample index summed across the X FIFO phases. This function also calculates the maximum energy across all samples indices of the output energy vector

Note

This function implements some speed optimisations which introduce quantisation error. To stop quantisation error build up, in every call of this function, energy for one sample index, which is specified in the recalc_bin argument, is recalculated without the optimisations. There are a total of AEC_FD_FRAME_LENGTH samples in the energy vector, so recalc_bin keeps cycling through indexes 0 to AEC_PROC_FRAME_LENGTH/2.

Parameters:
  • state[inout] AEC state. state->X_energy[ch] and state->max_X_energy[ch] are updated

  • ch[in] channel index for which energy calculations are done

  • recalc_bin[in] The sample index for which energy is recalculated to eliminate quantisation errors

void aec_update_X_fifo_and_calc_sigmaXX(aec_state_t *state, unsigned ch)

Update X FIFO with the current X frame.

This function updates the X FIFO by removing the oldest X frame from it and adding the current X frame to it. This function also calculates sigmaXX which is the exponential moving average of the current X frame energy

Parameters:
  • state[inout] AEC state structure. state->shared_state->X_fifo[ch] and state->shared_state->sigma_XX[ch] are updated.

  • ch[in] X channel index for which to update X FIFO

void aec_calc_Error_and_Y_hat(aec_state_t *state, unsigned ch)

Calculate error spectrum and estimated mic signal spectrum.

This function calculates the error spectrum (Error) and estimated mic input spectrum (Y_hat) Y_hat is calculated as the sum of all phases of the adaptive filter multiplied by the respective phases of the reference input spectrum. Error is calculated by subtracting Y_hat from the mic input spectrum Y

Parameters:
  • state[inout] AEC state structure. state->Error[ch] and state->Y_hat[ch] are updated

  • ch[in] mic channel index for which to compute Error and Y_hat

void aec_calc_coherence(aec_state_t *state, unsigned ch)

Calculate coherence.

This function calculates the average coherence between mic input signal (y) and estimated mic signal (y_hat). A metric is calculated using y and y_hat and the moving average (coh) and a slow moving average (coh_slow) of that metric is calculated. The coherence values are used to distinguish between situations when filter adaption should continue or freeze and update mu accordingly.

Parameters:
  • state[inout] AEC state structure. state->shared_state->coh_mu_state[ch].coh and state->shared_state->coh_mu_state[ch].coh_slow are updated

  • ch[in] mic channel index for which to calculate average coherence

void aec_calc_output(aec_state_t *state, int32_t (*output)[AEC_FRAME_ADVANCE], unsigned ch)

Calculate AEC filter output signal.

This function is responsible for windowing the filter error signal and creating AEC filter output that can be propagated to downstream stages. output is calculated by overlapping and adding current frame’s windowed error signal with the previous frame windowed error. This is done to smooth discontinuities in the output as the filter adapts.

Parameters:
  • state[inout] AEC state structure. state->error[ch]

  • output[out] pointer to the output buffer

  • ch[in] mic channel index for which to calculate output

void aec_calc_normalisation_spectrum(aec_state_t *state, unsigned ch, unsigned is_shadow)

Calculate normalisation spectrum.

This function calculates the normalisation spectrum of the reference input signal. This normalised spectrum is later used during filter adaption to scale the adaption to the size of the input signal. The normalisation spectrum is calculated as a time and frequency smoothed energy of the reference input spectrum.

The normalisation spectrum is calculated differently for main and shadow filter, so a flag indicating whether this calculation is being done for the main or shadow filter is passed as an input to the function

Parameters:
  • state[inout] AEC state structure. state->inv_X_energy[ch] is updated

  • ch[in] reference channel index for which to calculate normalisation spectrum

  • is_shadow[in] flag indicating filter type. 0: Main filter, 1: Shadow filter

void aec_compare_filters_and_calc_mu(aec_state_t *main_state, aec_state_t *shadow_state)

Compare and update filters. Calculate the adaption step size mu.

This function has 2 responsibilities. First, it compares the energies in the error spectrums of the main and shadow filter with each other and with the mic input spectrum energy, and makes an estimate of how well the filters are performing. Based on this, it optionally modifies the filters by either resetting the filter coefficients or copying one filter into another. Second, it uses the coherence values calculated in aec_calc_coherence as well as information from filter comparison done in step 1 to calculate the adaption step size mu.

Parameters:
  • main_state[inout] AEC state structure for the main filter

  • shadow_state[inout] AEC state structure for the shadow filter

void aec_calc_T(aec_state_t *state, unsigned y_ch, unsigned x_ch)

Calculate the parameter T

This function calculates a parameter referred to as T that is later used to scale the reference input spectrum in the filter update step. T is a function of the adaption step size mu, normalisation spectrum inv_X_energy and the filter error spectrum Error.

Parameters:
  • state[inout] AEC state structure. state->T[x_ch] is updated

  • y_ch[in] mic channel index

  • x_ch[in] reference channel index

void aec_filter_adapt(aec_state_t *state, unsigned y_ch)

Update filter.

This function updates the adaptive filter spectrum (H_hat). It calculates the delta update that is applied to the filter by scaling the X FIFO with the T values computed in aec_compute_T() and applies the delta update to H_hat. A gradient constraint FFT is then applied to constrain the length of each phase of the filter to avoid wrapping when calculating y_hat

Parameters:
  • state[inout] AEC state structure. state->H_hat[y_ch] is updated

  • y_ch[in] mic channel index

void aec_update_X_fifo_1d(aec_state_t *state)

Update the X FIFO alternate BFP structure.

The X FIFO BFP structure is maintained in 2 forms - as a 2 dimensional [x_channels][num_phases] and as a [x_channels * num_phases] 1 dimensional array. This is done in order to optimally access the X FIFO as needed in different functions. After the X FIFO is updated with the current X frame, this function is called in order to copy the 2 dimensional BFP structure into it’s 1 dimensional counterpart.

Parameters:
  • state[inout] AEC state structure. state->X_fifo_1d is updated

float_s32_t aec_calc_corr_factor(aec_state_t *state, unsigned ch)

Calculate a correlation metric between the microphone input and estimated microphone signal.

This function calculates a metric of resemblance between the mic input and the estimated mic signal. The correlation metric, along with reference signal energy is used to infer presence of near and far end signals in the AEC mic input.

Parameters:
  • state[in] AEC state structure. state->y and state->y_hat are used to calculate the correlation metric

  • ch[in] mic channel index for which to calculate the metric

Returns:

correlation metric in float_s32_t format

float_s32_t aec_calc_max_input_energy(const int32_t (*input_data)[AEC_FRAME_ADVANCE], int num_channels)

Calculate the energy of the input signal.

This function calculates the sum of the energy across all samples of the time domain input channel and returns the maximum energy across all channels.

Parameters:
  • input_data[in] Pointer to the input data buffer. The input is assumed to be in Q1.31 fixed point format.

  • num_channels[in] Number of input channels.

Returns:

Maximum energy in float_s32_t format.

void aec_reset_state(aec_state_t *main_state, aec_state_t *shadow_state)

Reset parts of aec state structure.

This function resets parts of AEC state so that the echo canceller starts adapting from a zero filter.

Parameters:
  • pointer[in] to AEC main filter state structure.

  • pointer[in] to AEC shadow filter state structure

uint32_t aec_detect_input_activity(const int32_t (*input_data)[AEC_FRAME_ADVANCE], float_s32_t active_threshold, int32_t num_channels)

Detect activity on input channels.

This function implements a quick check for detecting activity on the input channels. It detects signal presence by checking if the maximum sample in the time domain input frame is above a given threshold.

Parameters:
  • input_data[in] Pointer to input data frame. Input is assumed to be in Q1.31 fixed point format.

  • active_threshold[in] Threshold for detecting signal activity

  • num_channels[in] Number of input data channels

Returns:

0 if no signal activity on the input channels, 1 if activity detected on the input channels

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$AEC Low Level API Functions (STILL WIP)£££modules/voice/modules/lib_aec/doc/src/reference/api/low_level_api.html#aec-low-level-api-functions-still-wip
group aec_low_level_func

Functions

void aec_l2_calc_Error_and_Y_hat(bfp_complex_s32_t *Error, bfp_complex_s32_t *Y_hat, const bfp_complex_s32_t *Y, const bfp_complex_s32_t *X_fifo, const bfp_complex_s32_t *H_hat, unsigned num_x_channels, unsigned num_phases, unsigned start_offset, unsigned length, int32_t bypass_enabled)

Calculate Error and Y_hat for a channel over a range of bins.

void aec_l2_adapt_plus_fft_gc(bfp_complex_s32_t *H_hat_ph, const bfp_complex_s32_t *X_fifo_ph, const bfp_complex_s32_t *T_ph)

Adapt one phase of the adaptive filter.

void aec_l2_bfp_complex_s32_unify_exponent(bfp_complex_s32_t *chunks, int32_t *final_exp, uint32_t *final_hr, const uint32_t *mapping, uint32_t array_len, uint32_t desired_index, uint32_t min_headroom)

Unify bfp_complex_s32_t chunks into a single exponent and headroom.

void aec_l2_bfp_s32_unify_exponent(bfp_s32_t *chunks, int32_t *final_exp, uint32_t *final_hr, const uint32_t *mapping, uint32_t array_len, uint32_t desired_index, uint32_t min_headroom)

Unify bfp_s32_t chunks into a single exponent and headroom.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$lib_aec Header Files£££modules/voice/modules/lib_aec/doc/src/reference/header_files.html#lib-aec-header-files
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$aec_defines.h£££modules/voice/modules/lib_aec/doc/src/reference/header_files.html#aec-defines-h
page page_aec_defines_h

This header contains lib_aec public defines

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$aec_state.h£££modules/voice/modules/lib_aec/doc/src/reference/header_files.html#aec-state-h
page page_aec_state_h

This header contains definitions for data structures and enums used in lib_aec.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API Reference$$$aec_api.h£££modules/voice/modules/lib_aec/doc/src/reference/header_files.html#aec-api-h
page page_aec_api_h

lib_aec public functions API.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$On GitHub£££modules/voice/modules/lib_aec/doc/src/reference/header_files.html#on-github

lib_aec is present as part of fwk_voice. Get the latest version of fwk_voice from https://github.com/xmos/fwk_voice. lib_aec is present within the modules/lib_aec directory in fwk_voice

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Acoustic Echo Canceller Library$$$API£££modules/voice/modules/lib_aec/doc/src/reference/header_files.html#api

To use the functions in this library in an application, include aec_api.h in the application source file

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library£££modules/voice/modules/lib_ns/doc/index.html#noise-suppression-library

lib_ns is a library which performs Noise Suppression (NS), by estimating the noise and subtracting it from frame. lib_ns library functions make use of functionality provided in lib_xcore_math to perform DSP operations. For more details, refer to NS Overview.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$Repository Structure£££modules/voice/modules/lib_ns/doc/src/getting_started.html#repository-structure
  • modules/lib_ns - The actual lib_ns library directory within https://github.com/xmos/fwk_voice/. Within lib_ns

    • api/ - Headers containing the public API for lib_ns.

    • doc/ - Library documentation source (for non-embedded documentation) and build directory.

    • src/ - Library source code.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$Requirements£££modules/voice/modules/lib_ns/doc/src/getting_started.html#requirements

lib_ns is included as part of the fwk_voice github repository and all requirements for cloning and building fwk_voice apply. lib_ns is compiled as a static library as part of the overall fwk_voice build. It depends on lib_xcore_math.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$Getting and Building£££modules/voice/modules/lib_ns/doc/src/getting_started.html#getting-and-building

This module is part of the parent fwk_voice repo clone. It is compiled as a static library as part of fwk_voice compilation process.

To include lib_ns in an application as a static library, the generated libfwk_voice_module_lib_ns.a can then be linked into the application. Add lib_ns/api to the include directories when building the application.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$NS Overview£££modules/voice/modules/lib_ns/doc/src/overview.html#ns-overview

The lib_ns library provides an API to implement Noise Suppression within an application.

The noise suppressor estimates the probability of speech presence and dynamically adapts its coefficients to estimate the noise levels to subtract from the input. The filter will automatically reset its noise estimations every 10 frames.

The NS takes as input a frame of data from an audio channel. This could be the microphone input or the output of another module in the application.

Noise Suppression is performed on a frame-by-frame basis. Each frame consists of 15ms of data, which is 240 samples at 16kHz input sampling frequency. Input data is expected to be in a fixed-point 32-bit 1.31 format.

Before processing any frames, the application must configure and initialise the NS instance by calling ns_init(). Then for each frame, ns_process_frame() will update the NS instance’s internal state and produce the output frame by applying the NS algorithm to the input frame.

If multiple channels need to be processed by the application, or multiple outputs are required, an independent instance of the NS must be run for each channel.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$API Reference£££modules/voice/modules/lib_ns/doc/src/reference/index.html#api-reference
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$API Reference$$$NS API Functions£££modules/voice/modules/lib_ns/doc/src/reference/api.html#ns-api-functions
group ns_func

Functions

void ns_init(ns_state_t *ns)

Initialise the NS.

This function initialises the NS state with the provided configuration. It must be called at startup to initialise the NS before processing any frames, and can be called at any time after that to reset the NS instance, returning the internal NS state to its defaults.

Example

ns_state_t ns;
ns_init(&ns);

Parameters:
  • ns[out] NS state structure

void ns_process_frame(ns_state_t *ns, int32_t output[NS_FRAME_ADVANCE], const int32_t input[NS_FRAME_ADVANCE])

Perform NS processing on a frame of input data.

This function updates the NS’s internal state based on the input 1.31 frame, and returns an output 1.31 frame containing the result of the NS algorithm applied to the input.

The input and output pointers can be equal to perform the processing in-place.

Example

int32_t input[NS_FRAME_ADVANCE];
int32_t output[NS_FRAME_ADVANCE];
ns_state_t ns;
ns_init(&ns);
ns_process_frame(&ns, output, input);

Parameters:
  • ns[inout] NS state structure

  • output[out] Array to return the resulting frame of data

  • input[in] Array of frame data on which to perform the NS

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$API Reference$$$NS API Structure Definitions£££modules/voice/modules/lib_ns/doc/src/reference/defines.html#ns-api-structure-definitions
group ns_defs

Defines

NS_FRAME_ADVANCE

Length of the frame of data on which the NS will operate.

NS_PROC_FRAME_LENGTH

Time domain samples block length used internally.

NS_PROC_FRAME_BINS

Number of bins of spectrum data computed when doing a DFT of a NS_PROC_FRAME_LENGTH length time domain vector. The NS_PROC_FRAME_BINS spectrum values represent the bins from DC to Nyquist.

NS_INT_EXP

The exponent used internally to keep q1.31 format.

NS_WINDOW_LENGTH

The length of the window applied in time domain

struct ns_state_t
#include <ns_state.h>

NS state structure.

This structure holds the current state of the NS instance and members are updated each time that ns_process_frame() runs. Many of these members are exponentially-weighted moving averages (EWMA) which influence the behaviour of the NS filter. The user should not directly modify any of these members.

Public Members

bfp_s32_t S

BFP structure to hold the local energy.

bfp_s32_t S_min

BFP structure to hold the minimum local energy within 10 frames.

bfp_s32_t S_tmp

BFP structure to hold the temporary local energy.

bfp_s32_t p

BFP structure to hold the conditional signal presence probability

bfp_s32_t alpha_d_tilde

BFP structure to hold the time-varying smoothing parameter.

bfp_s32_t lambda_hat

BFP structure to hold the noise estimation.

int32_t data_S[NS_PROC_FRAME_BINS]

int32_t array to hold the data for S.

int32_t data_S_min[NS_PROC_FRAME_BINS]

int32_t array to hold the data for S_min.

int32_t data_S_tmp[NS_PROC_FRAME_BINS]

int32_t array to hold the data for S_tmp.

int32_t data_p[NS_PROC_FRAME_BINS]

int32_t array to hold the data for p.

int32_t data_adt[NS_PROC_FRAME_BINS]

int32_t array to hold the data for alpha_d_tilde.

int32_t data_lambda_hat[NS_PROC_FRAME_BINS]

int32_t array to hold the data for lambda_hat.

bfp_s32_t prev_frame

BFP structure to hold the previous frame.

bfp_s32_t overlap

BFP structure to hold the overlap.

bfp_s32_t wind

BFP structure to hold the first part of the window.

bfp_s32_t rev_wind

BFP structure to hold the second part of the window.

int32_t data_prev_frame[NS_PROC_FRAME_LENGTH - NS_FRAME_ADVANCE]

int32_t array to hold the data for prev_frame.

int32_t data_ovelap[NS_FRAME_ADVANCE]

int32_t array to hold the data for overlap.

int32_t data_rev_wind[NS_WINDOW_LENGTH / 2]

int32_t array to hold the data for rev_wind.

float_s32_t delta

EWMA of the energy ratio to calculate p.

float_s32_t alpha_d

EWMA of the smoothing parameter for alpha_d_tilde.

float_s32_t alpha_s

EWMA of the smoothing parameter for S.

float_s32_t alpha_p

EWMA of the smoothing parameter for p.

float_s32_t one_minus_aplha_d

EWMA of the 1 - alpha_d parameter.

float_s32_t one_minus_alpha_s

EWMA of the 1 - alpha_s parameter.

float_s32_t one_minus_alpha_p

EWMA of the 1 - alpha_p parameter.

unsigned reset_period

Filter reset period value for auto-reset.

unsigned reset_counter

Filter reset counter.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$API Reference$$$NS Header Files£££modules/voice/modules/lib_ns/doc/src/reference/header_files.html#ns-header-files
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$API Reference$$$ns_api.h£££modules/voice/modules/lib_ns/doc/src/reference/header_files.html#ns-api-h
page page_ns_api_h

This header should be included in application source code to gain access to the lib_ns public functions API.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$API Reference$$$ns_state.h£££modules/voice/modules/lib_ns/doc/src/reference/header_files.html#ns-state-h
page page_ns_state_h

This header contains definitions for data structure and defines.

This header is automatically included by ns_api.h

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$On GitHub£££modules/voice/modules/lib_ns/doc/src/reference/header_files.html#on-github

lib_ns is present as part of fwk_voice. Get the latest version of fwk_voice from https://github.com/xmos/fwk_voice. lib_ns is present within the modules/lib_ns directory in fwk_voice.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Noise Suppression Library$$$API£££modules/voice/modules/lib_ns/doc/src/reference/header_files.html#api

To use the functions in this library in an application, include ns_api.h in the application source file.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library£££modules/voice/modules/lib_agc/doc/index.html#automatic-gain-control-library

lib_agc is a library which performs Automatic Gain Control (AGC), with support for Loss Control. For more details, refer to AGC Overview.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$Repository Structure£££modules/voice/modules/lib_agc/doc/src/getting_started.html#repository-structure
  • modules/lib_agc - The actual lib_agc library directory within https://github.com/xmos/fwk_voice/. Within lib_agc

    • api/ - Headers containing the public API for lib_agc.

    • doc/ - Library documentation source (for non-embedded documentation) and build directory.

    • src/ - Library source code.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$Requirements£££modules/voice/modules/lib_agc/doc/src/getting_started.html#requirements

lib_agc is included as part of the fwk_voice github repository and all requirements for cloning and building fwk_voice apply. lib_agc is compiled as a static library as part of the overall fwk_voice build. It depends on lib_xcore_math.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$Getting and Building£££modules/voice/modules/lib_agc/doc/src/getting_started.html#getting-and-building

This module is part of the parent fwk_voice repo clone. It is compiled as a static library as part of fwk_voice compilation process.

To include lib_agc in an application as a static library, the generated libfwk_voice_module_lib_agc.a can then be linked into the application. Add lib_agc/api to the include directories when building the application.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$AGC Overview£££modules/voice/modules/lib_agc/doc/src/overview.html#agc-overview

The lib_agc library provides an API to implement Automatic Gain Control within an application. The goal of the AGC algorithm is to provide consistent output levels for voice audio.

The gain control can adapt to maintain the amplitude of the peak of the frame within an upper and lower bound configured for the AGC instance. When used in an application with a Voice to Noise Ratio estimator (VNR), the AGC will adapt only when voice activity is detected, so that speech in the input signal is amplified above other sounds.

The AGC also has a Loss Control feature which can be used when the application has an Acoustic Echo Canceller (AEC). This feature uses data from the AEC to adjust the gain applied to reduce residual echoes by attenuating the audio when near-end speech is not present.

The AGC takes as input a frame of data from an audio channel. This could be the microphone input or the output of another module in the application.

Gain control is performed on a frame-by-frame basis. Each frame consists of 15ms of data, which is 240 samples at 16kHz input sampling frequency. Input data is expected to be in a fixed-point 32-bit 1.31 format.

Before processing any frames, the application must configure and initialise the AGC instance by calling agc_init(). Then for each frame, agc_process_frame() will update the AGC instance’s internal state and produce the output frame by applying the AGC algorithm to the input frame.

The gain values in this module for AGC gain and Loss Control gain are multiplicative factors that are applied to scale the input frame. Therefore, a fixed gain value of 1.0 (without loss control) will create no change to the input.

If multiple channels need to be processed by the application, or multiple outputs are required, an independent instance of the AGC must be run for each channel.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API Reference£££modules/voice/modules/lib_agc/doc/src/reference/index.html#api-reference
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API Reference$$$AGC API Functions£££modules/voice/modules/lib_agc/doc/src/reference/api.html#agc-api-functions
group agc_func

Functions

void agc_init(agc_state_t *agc, agc_config_t *config)

Initialise the AGC.

This function initialises the AGC state with the provided configuration. It must be called at startup to initialise the AGC before processing any frames, and can be called at any time after that to reset the AGC instance, returning the internal AGC state to its defaults.

Example with an unmodified profile

agc_state_t agc;
agc_init(&agc, &AGC_PROFILE_ASR);

Example with modification to the profile

agc_config_t conf = AGC_PROFILE_FIXED_GAIN;
conf.gain = f32_to_float_s32(100);
agc_state_t agc;
agc_init(&agc, &conf);

Parameters:
  • agc[out] AGC state structure

  • config[in] Initial configuration values

void agc_process_frame(agc_state_t *agc, int32_t output[AGC_FRAME_ADVANCE], const int32_t input[AGC_FRAME_ADVANCE], agc_meta_data_t *meta_data)

Perform AGC processing on a frame of input data.

This function updates the AGC’s internal state based on the input frame and meta-data, and returns an output containing the result of the AGC algorithm applied to the input.

The input and output pointers can be equal to perform the processing in-place.

Example

int32_t input[AGC_FRAME_ADVANCE];
int32_t output[AGC_FRAME_ADVANCE];
agc_meta_data md;
md.vnr_flag = AGC_META_DATA_NO_VNR;
md.aec_ref_power = AGC_META_DATA_NO_AEC;
md.aec_corr_factor = AGC_META_DATA_NO_AEC;
agc_process_frame(&agc, output, input, &md);

Parameters:
  • agc[inout] AGC state structure

  • output[out] Array to return the resulting frame of data

  • input[in] Array of frame data on which to perform the AGC

  • meta_data[in] Meta-data structure with VNR/AEC data

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API Reference$$$AGC Pre-Defined Profiles£££modules/voice/modules/lib_agc/doc/src/reference/profiles.html#agc-pre-defined-profiles
group agc_profiles

Defines

AGC_PROFILE_ASR

AGC profile tuned for Automatic Speech Recognition (ASR).

AGC_PROFILE_FIXED_GAIN

AGC profile tuned to apply a fixed gain.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API Reference$$$AGC API Structure Definitions£££modules/voice/modules/lib_agc/doc/src/reference/defines.html#agc-api-structure-definitions
group agc_defs

Defines

AGC_FRAME_ADVANCE

Length of the frame of data on which the AGC will operate.

AGC_META_DATA_NO_VNR

If the application has no VNR, adapt_on_vnr must be disabled in the configuration. This pre-processor definition can be assigned to the vnr_flag in agc_meta_data_t in that situation to make it clear in the code that there is no VNR.

AGC_META_DATA_NO_AEC

If the application has no AEC, lc_enabled must be disabled in the configuration. This pre-processor definition can be assigned to the aec_ref_power and aec_corr_factor in agc_meta_data_t in that situation to make it clear in the code that there is no AEC.

struct agc_config_t
#include <agc_api.h>

AGC configuration structure.

This structure contains configuration settings that can be changed to alter the behaviour of the AGC instance.

Members with the “lc_” prefix are parameters for the Loss Control feature.

Public Members

int adapt

Boolean to enable AGC adaption; if enabled, the gain to apply will adapt based on the peak of the input frame and the upper/lower threshold parameters.

int adapt_on_vnr

Boolean to enable adaption based on the VNR meta-data; if enabled, adaption will always be performed when voice activity is detected. This must be disabled if the application doesn’t have a VNR.

int soft_clipping

Boolean to enable soft-clipping of the output frame.

float_s32_t gain

The current gain to be applied, not including loss control.

float_s32_t max_gain

The maximum gain allowed when adaption is enabled.

float_s32_t min_gain

The minimum gain allowed when adaption is enabled.

float_s32_t upper_threshold

The upper limit for the gained peak of the frame when adaption is enabled.

float_s32_t lower_threshold

The lower limit for the gained peak of the frame when adaption is enabled.

float_s32_t gain_inc

Factor by which to increase the gain during adaption.

float_s32_t gain_dec

Factor by which to decrease the gain during adaption.

int lc_enabled

Boolean to enable loss control. This must be disabled if the application doesn’t have an AEC.

int lc_n_frame_far

Number of frames required to consider far-end audio active.

int lc_n_frame_near

Number of frames required to consider near-end audio active.

float_s32_t lc_corr_threshold

Threshold for far-end correlation above which to indicate far-end activity only.

float_s32_t lc_bg_power_gamma

Gamma coefficient for estimating the power of the far-end background noise.

float_s32_t lc_gamma_inc

Factor by which to increase the loss control gain when less than target value.

float_s32_t lc_gamma_dec

Factor by which to decrease the loss control gain when greater than target value.

float_s32_t lc_far_delta

Delta multiplier used when only far-end activity is detected.

float_s32_t lc_near_delta

Delta multiplier used when only near-end activity is detected.

float_s32_t lc_near_delta_far_active

Delta multiplier used when both near-end and far-end activity is detected.

float_s32_t lc_gain_max

Loss control gain to apply when near-end activity only is detected.

float_s32_t lc_gain_double_talk

Loss control gain to apply when double-talk is detected.

float_s32_t lc_gain_silence

Loss control gain to apply when silence is detected.

float_s32_t lc_gain_min

Loss control gain to apply when far-end activity only is detected.

struct agc_state_t
#include <agc_api.h>

AGC state structure.

This structure holds the current state of the AGC instance and members are updated each time that agc_process_frame() runs. Many of these members are exponentially-weighted moving averages (EWMA) which influence the adaption of the AGC gain or the loss control feature. The user should not directly modify any of these members, except the config.

Public Members

agc_config_t config

The current configuration of the AGC. Any member of this configuration structure can be modified and that change will take effect on the next run of agc_process_frame().

float_s32_t x_slow

EWMA of the frame peak, which is used to identify the overall trend of a rise or fall in the input signal.

float_s32_t x_fast

EWMA of the frame peak, which is used to identify a rise or fall in the peak of frame.

float_s32_t x_peak

EWMA of x_fast, which is used when adapting to the agc_config_t::upper_threshold.

int lc_t_far

Timer counting down until enough frames with far-end activity have been processed.

int lc_t_near

Timer counting down until enough frames with near-end activity have been processed.

float_s32_t lc_near_power_est

EWMA of estimates of the near-end power.

float_s32_t lc_far_power_est

EWMA of estimates of the far-end power.

float_s32_t lc_near_bg_power_est

EWMA of estimates of the power of near-end background noise.

float_s32_t lc_gain

Loss control gain applied on top of the AGC gain in agc_config_t.

float_s32_t lc_far_bg_power_est

EWMA of estimates of the power of far-end background noise.

float_s32_t lc_corr_val

EWMA of the far-end correlation for detecting double-talk.

struct agc_meta_data_t
#include <agc_api.h>

AGC meta data structure.

This structure holds meta-data about the current frame to be processed, and must be updated to reflect the current frame before calling agc_process_frame().

Public Members

int vnr_flag

Boolean to indicate the detection of voice activity in the current frame.

float_s32_t aec_ref_power

The power of the most powerful reference channel.

float_s32_t aec_corr_factor

Correlation factor between the microphone input and the AEC’s estimated microphone signal.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API Reference$$$AGC Header Files£££modules/voice/modules/lib_agc/doc/src/reference/header_files.html#agc-header-files
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API Reference$$$agc_api.h£££modules/voice/modules/lib_agc/doc/src/reference/header_files.html#agc-api-h
page page_agc_api_h

This header should be included in application source code to gain access to the lib_agc public functions API.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API Reference$$$agc_profiles.h£££modules/voice/modules/lib_agc/doc/src/reference/header_files.html#agc-profiles-h
page page_agc_profiles_h

This header contains pre-defined profiles for AGC configurations. These profiles can be used to initialise the agc_config_t data for use with agc_init().

This header is automatically included by agc_api.h.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$On GitHub£££modules/voice/modules/lib_agc/doc/src/reference/header_files.html#on-github

lib_agc is present as part of fwk_voice. Get the latest version of fwk_voice from https://github.com/xmos/fwk_voice. lib_agc is present within the modules/lib_agc directory in fwk_voice.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Gain Control Library$$$API£££modules/voice/modules/lib_agc/doc/src/reference/header_files.html#api

To use the functions in this library in an application, include agc_api.h in the application source file.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library£££modules/voice/modules/lib_adec/doc/index.html#automatic-delay-estimation-and-correction-library

lib_adec is a library which provides functions for measuring and correcting delay offsets between the reference and loudspeaker signals. lib_adec depends on lib_aec and lib_xcore_math libraries. For more details about the ADEC, refer to ADEC Overview

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$Repository Structure£££modules/voice/modules/lib_adec/doc/src/getting_started.html#repository-structure
  • modules/lib_adec - The actual lib_adec library directory within https://github.com/xmos/fwk_voice/. Within lib_adec

    • api/ - Headers containing the public API for lib_adec.

    • doc/ - Library documentation source (for non-embedded documentation) and build directory.

    • src/ - Library source code.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$Getting and Building£££modules/voice/modules/lib_adec/doc/src/getting_started.html#getting-and-building

lib_adec is included as part of the fwk_voice github repository and all requirements for cloning and building fwk_voice apply. lib_adec is compiled as a static library as part of overall fwk_voice build. To include lib_adec in an application as a static library, the generated libfwk_voice_module_lib_adec.a can then be linked into the application. Be sure to also add lib_adec/api as an include directory for the application.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$ADEC Overview£££modules/voice/modules/lib_adec/doc/src/overview.html#adec-overview

The ADEC module provides functions to estimate and automatically correct for delay offsets between the reference and the loudspeakers.

Acoustic echo cancellation is an adaptive filtering process which compares the reference audio to that received from the microphones. It models the reverberation time of a room, i.e. the time it takes for acoustic reflections to decay to insignificance. The time window modelled by the AEC is finite, and to maximise its performance it is important to ensure that the reference audio is presented to the AEC time aligned to the audio being reproduced by the loudspeakers. The reference audio path delay and the audio reproduction path delay may be significantly different, requiring additional delay to be inserted into one of the two paths, to correct this delay difference.

The ADEC module provides functionality for

  • Measuring the current delay

  • Using the measured delay along with AEC performance related metadata collected from the echo canceller to monitor AEC and make decisions about reconfiguring the AEC and correcting bulk delay offsets.

The metadata collected from AEC contains statistics such as the ERLE, the peak power seen in the adaptive filter and the peak power to average power ratio of the adaptive filter.

The ADEC algorithm works in 2 modes - normal mode and delay estimation mode. In its normal mode ADEC monitors the AEC performance and requests small delay corrections. Using the statistics from the AEC, the ADEC estimates a metric called the AEC goodness which is an estimate of how well the echo canceller is performing. Based on the estimated AEC goodness and the current measured delay, the ADEC can request for a delay correction to be applied at the input of the echo canceller.

If the AEC is seen as consistently bad, the ADEC transitions to a delay estimation mode and requests for

  • A special delay to be applied at AEC input that will enable measuring the actual delay in both delay scenarios; microphone input arriving at the AEC earlier in time than the reference input as well as microphone input arriving late in time wrt reference input.

  • A restart of AEC in a new configuration that has more adaptive filter phases, in order of have a longer filter tail length that is suitable for delay estimation.

Once the ADEC has a measure of the new delay, it requests a delay correction and a reconfiguration of the AEC back to its normal mode and goes back to its normal mode of monitoring AEC performance and correcting for small delay offsets.

Before processing any frames, the application must configure and initialise the ADEC instance by calling adec_init(). Then for each frame, adec_estimate_delay() will estimate the current delay and adec_process_frame() will use the current frame’s AEC statistics and the estimated delay to monitor the AEC and request possible AEC and delay configuration changes.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference£££modules/voice/modules/lib_adec/doc/src/reference/index.html#api-reference
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference$$$ADEC API Functions£££modules/voice/modules/lib_adec/doc/src/reference/api.html#adec-api-functions
group adec_func

Functions

void adec_init(adec_state_t *state, adec_config_t *config)

Initialise ADEC data structures.

This function initialises ADEC state for a given configuration. It must be called at startup to initialise the ADEC data structures before processing any frames, and can be called at any time after that to reset the ADEC instance, returning the internal ADEC state to its defaults.

Example with ADEC configured for delay estimation only at startup

adec_state_t adec_state;
adec_config_t adec_conf;
adec_conf.bypass = 1; // Bypass automatic DE correction
adec_conf.force_de_cycle_trigger = 1; // Force a delay correction cycle, so that delay correction happens once after initialisation
adec_init(&adec_state, &adec_conf);
// Application needs to ensure that adec_state->adec_config.force_de_cycle_trigger is set to 0 after ADEC has requested a transition to delay estimation mode once in order to ensure that delay is corrected only at startup.  

Example with ADEC configured for automatic delay estimation and correction

adec_state_t adec_state;
adec_conf.bypass = 0;
adec_conf.force_de_cycle_trigger = 0;
adec_init(&adec_state, &adec_conf);

Parameters:
  • state[out] Pointer to ADEC state structure

  • config[in] Pointer to ADEC configuration structure.

void adec_process_frame(adec_state_t *state, adec_output_t *adec_output, const adec_input_t *adec_in)

Perform ADEC processing on an input frame of data.

This function takes information about the latest AEC processed frame and the latest measured delay estimate as input, and decides if a delay correction between input microphone and reference signals is required. If a correction is needed, it outputs a new requested input delay, optionally accompanied with a request for AEC restart in a different configuration. It updates the internal ADEC state structure to reflect the current state of the ADEC process.

Parameters:
  • state[inout] ADEC internal state structure

  • adec_output[out] ADEC output structure

  • adec_in[in] ADEC input structure

void adec_estimate_delay(de_output_t *de_output, const bfp_complex_s32_t *H_hat, unsigned num_phases)

Estimate microphone delay.

This function measures the microphone signal delay wrt the reference signal. It does so by looking for the phase with the peak energy among all AEC filter phases and uses the peak energy phase index as the estimate of the microphone delay. Along with the measured delay, it also outputs information about the peak phase energy that can then be used to gauge the AEC filter convergence and the reliability of the measured delay.

Parameters:
  • de_state[out] Delay estimator output structure

  • H_hat[in] bfp_complex_s32_t array storing the AEC filter spectrum

  • Number[in] of phases in the AEC filter

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference$$$ADEC #define constants£££modules/voice/modules/lib_adec/doc/src/reference/defines.html#adec-define-constants
group adec_defines

Defines

ADEC_PEAK_TO_AVERAGE_HISTORY_DEPTH

Number of frames far we look back to smooth the peak to average filter power ratio history.

ADEC_PEAK_LINREG_HISTORY_SIZE

Number of frames of peak power history we look at while computing AEC goodness metric. Not NOT USER MODIFIABLE.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference$$$ADEC Data Structure and Enum definitions£££modules/voice/modules/lib_adec/doc/src/reference/types.html#adec-data-structure-and-enum-definitions
group adec_types

Enums

enum adec_mode_t

Values:

enumerator ADEC_NORMAL_AEC_MODE

ADEC processing mode where it monitors AEC performance and requests small delay correction.

enumerator ADEC_DELAY_ESTIMATOR_MODE

ADEC processing mode for bulk delay correction in which it measures for a new delay offset.

struct adec_config_t
#include <adec_state.h>

ADEC configuration structure.

This is used to provide configuration when initialising ADEC at startup. A copy of this structure is present in the ADEC state structure and available to be modified by the application for run time control of ADEC configuration.

Public Members

int32_t bypass

Bypass ADEC decision making process. When set to 1, ADEC evaluates the current input frame metrics but doesn’t make any delay correction or aec reset and reconfiguration requests

int32_t force_de_cycle_trigger

Force trigger a delay estimation cycle. When set to 1, ADEC bypasses the ADEC monitoring process and transitions to delay estimation mode for measuring delay offset.

struct de_output_t
#include <adec_state.h>

Delay estimator output structure.

Public Members

int32_t measured_delay_samples

Estimated microphone delay in time domain samples.

int32_t peak_power_phase_index

Phase index of peak energy AEC filter phase.

float_s32_t peak_phase_power

Maximum per phase energy across all AEC filter phases.

float_s32_t sum_phase_powers

Sum of filter energy across all filter phases.

float_s32_t peak_to_average_ratio

Ratio of peak filter phase energy to average filter phase energy. Used to evaluate how well the filter has converged.

float_s32_t phase_power[AEC_LIB_MAX_PHASES]

Phase energy of all AEC filter phases.

struct adec_output_t
#include <adec_state.h>

ADEC output structure.

Public Members

int32_t delay_change_request_flag

Flag indicating if ADEC is requesting an input delay correction

int32_t requested_mic_delay_samples

Mic delay in samples requested by ADEC. Relevant when delay_change_request_flag is 1. Note that this value is a signed integer. A positive requested_mic_delay_samples requires the microphone to be delayed so the application needs to delay the input mic signal by requested_mic_delay_samples samples. A negative requested_mic_delay_samples means ADEC is requesting the input mic signal to be moved earlier in time. This, the application should do my delaying the input reference signal by abs(requested_mic_delay_samples) samples.

int32_t reset_aec_flag

flag indicating ADEC’s request for a reset of part of the AEC state to get AEC filter to start adapting from a 0 filter. ADEC requests this when a small delay correction needs to be applied that doesn’t require a full reset of the AEC.

int32_t delay_estimator_enabled_flag

Flag indicating if AEC needs to be run configured in delay estimation mode.

int32_t requested_delay_samples_debug

Requested delay samples without clamping to +- MAX_DELAY_SAMPLES. Used only for debugging.

struct aec_to_adec_t
#include <adec_state.h>

Input structure containing current frame’s information from AEC.

Public Members

float_s32_t y_ema_energy_ch0

EWMA energy of AEC input mic signal channel 0

float_s32_t error_ema_energy_ch0

EWMA energy of AEC filter error output signal channel 0

int32_t shadow_flag_ch0

shadow_flag value for the current frame computed within the AEC

struct adec_input_t
#include <adec_state.h>

ADEC input structure.

Public Members

de_output_t from_de

ADEC input from the delay estimator

aec_to_adec_t from_aec

ADEC input from AEC

int32_t far_end_active_flag

Flag indicating if there is activity on reference input channels.

struct adec_state_t
#include <adec_state.h>

ADEC state structure.

This structure holds the current state of the ADEC instance and members are updated each time that adec_process_frame() runs. Many of these members are statistics from tracking the AEC performance. The user should not directly modify any of these members, except the config.

Public Members

float_s32_t max_peak_to_average_ratio_since_reset

Maximum peak to average AEC filter phase energy ratio seen since a delay correction was last requested.

float_s32_t peak_to_average_ratio_history[ADEC_PEAK_TO_AVERAGE_HISTORY_DEPTH + 1]

Last ADEC_PEAK_TO_AVERAGE_HISTORY_DEPTH frames peak_to_average_ratio of phase energies.

float_s32_t peak_power_history[ADEC_PEAK_LINREG_HISTORY_SIZE]

Last ADEC_PEAK_LINREG_HISTORY_SIZE frames peak phase power.

float_s32_t aec_peak_to_average_good_aec_threshold

Threshold was considering peak to average ratio as good.

q8_24 agm_q24

AEC goodness metric indicating a measure of how well AEC filter is performing.

q8_24 erle_bad_bits_q24

log2 of threshold below which AEC output’s measured ERLE is considered bad

q8_24 erle_good_bits_q24

log2 of threshold above which AEC output’s measured ERLE is considered good

q8_24 peak_phase_energy_trend_gain_q24

Multiplier used for scaling agm’s sensitivity to peak phase energy trend.

q8_24 erle_bad_gain_q24

Multiplier determining how steeply we reduce aec’s goodness when measured erle falls below the bad erle threshold.

adec_mode_t mode

ADEC’s mode of operation. Can be operating in normal AEC or delay estimation mode.

int32_t gated_milliseconds_since_mode_change

milliseconds elapsed since a delay change was last requested. Used to ensure that delay corrections are not requested too early without allowing enough time for aec filter to converge.

int32_t last_measured_delay

Last measured delay.

int32_t peak_power_history_idx

index storing the head of the peak_power_history circular buffer

int32_t peak_power_history_valid

Flag indicating whether the peak_power_history buffer has been filled at least once.

int32_t sf_copy_flag

Flag indicating if shadow to main filter copy has happened at least once in the AEC.

int32_t convergence_counter

Counter indicating number of frames the AEC shadow filter has been attempting to converge.

int32_t shadow_flag_counter

Counter indicating number of frame the AEC shadow filter has been better than the main filter.

adec_config_t adec_config

ADEC configuration parameters structure. Can be modified by application at run-time to reconfigure ADEC.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference$$$ADEC Header Files£££modules/voice/modules/lib_adec/doc/src/reference/header_files.html#adec-header-files
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference$$$adec_defines.h£££modules/voice/modules/lib_adec/doc/src/reference/header_files.html#adec-defines-h
page page_adec_defines_h

This header contains lib_adec public defines

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference$$$adec_state.h£££modules/voice/modules/lib_adec/doc/src/reference/header_files.html#adec-state-h
page page_adec_state_h

This header contains definitions for data structures and enums used in lib_adec.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API Reference$$$adec_api.h£££modules/voice/modules/lib_adec/doc/src/reference/header_files.html#adec-api-h
page page_adec_api_h

lib_adec public functions API.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$On GitHub£££modules/voice/modules/lib_adec/doc/src/reference/header_files.html#on-github

lib_adec is present as part of fwk_voice. Get the latest version of fwk_voice from https://github.com/xmos/fwk_voice. lib_adec is present within the modules/lib_adec directory in fwk_voice

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Automatic Delay Estimation and Correction Library$$$API£££modules/voice/modules/lib_adec/doc/src/reference/header_files.html#api

To use the functions in this library in an application, include adec_api.h in the application source file

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library£££modules/voice/modules/lib_ic/doc/index.html#interference-canceller-library

lib_ic is a library which provides functions that together perform Interference Cancellation (IC) on two channel input mic data by adapting to and modelling the room transfer characteristics. lib_ic library functions make use of functionality provided in lib_aec for the core normalised LMS blocks which in turn uses lib_xcore_math to perform DSP low-level optimised operations. For more details refer to IC Overview.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$Repository Structure£££modules/voice/modules/lib_ic/doc/src/getting_started.html#repository-structure
  • modules/lib_ic - The actual lib_ic library directory within https://github.com/xmos/fwk_voice/. Within lib_ic:

    • api/ - Headers containing the public API for lib_ic.

    • doc/ - Library documentation source (for non-embedded documentation) and build directory.

    • src/ - Library source code.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$Requirements£££modules/voice/modules/lib_ic/doc/src/getting_started.html#requirements

lib_ic is included as part of the fwk_voice github repository and all requirements for cloning and building fwk_voice apply. lib_ic is compiled as a static library as part of overall fwk_voice build. It depends on lib_aec and lib_xcore_math.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Structure£££modules/voice/modules/lib_ic/doc/src/getting_started.html#api-structure

The API is presented as three simple functions. These are initialisation, filtering and adaption. Initialisation is called once at startup and filtering and adaption is called once per frame of samples. The performance requirement is relative low (around 12MIPS) and as such is supplied as a single threaded implementation only.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$Getting and Building£££modules/voice/modules/lib_ic/doc/src/getting_started.html#getting-and-building

This repo is obtained as part of the parent fwk_voice repo clone. It is compiled as a static library as part of fwk_voice compilation process.

To include lib_ic in an application as a static library, the generated libfwk_voice_module_lib_ic.a can then be linked into the application. Be sure to also add lib_ic/api as an include directory for the application.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$IC Overview£££modules/voice/modules/lib_ic/doc/src/overview.html#ic-overview

The Interference Canceller (IC) suppresses static noise from point sources such as cooker hoods, washing machines, or radios for which there is no reference audio signal available. When the Voice to Noise Ratio estimator (VNR) input indicates the absence of voice, the IC adapts to remove noise from point sources in the environment. When the VNR signal indicates the presence of voice, the IC suspends adaptation which allows the voice source to be passed but maintains suppression of the interfering noise sources which have been previously adapted to.

It can offer much greater, and automatic, cancellation of broad-band noise sources when compared to beam forming techniques.

It is designed to work at a sample rate of 16kHz and has a fixed configuration of two input microphones and a single output channel.

The interference canceller is based on an AEC architecture and attempts to cancel one microphone signal from the other in the absence of voice. In this way, it builds an estimate of the difference in transfer functions between the two microphones for any present noise sources. Since the transfer function includes spatial information about the noise sources, applying this filter to the mic input allows any signals originating from the noise source to be cancelled.

The IC uses an adaptive filter which continually adapts to the acoustic environment to accommodate changes in the room created by events such as doors opening or closing and people moving about. However, it will hold the current transfer function in the presence of voice meaning it does not adapt to desired audio sources, which can be a person speaking.

The cancellation is performed on a frame by frame basis. Each frame is made of 15msec chunks of data, which is 240 new samples at 16kHz input sampling frequency, per input channel. This is combined with previous audio data to form a 512 sample frame which allows for sufficient overlap for effective operation of the filter.

The first channel of input microphone data is referred to as y when in time domain and Y when in frequency domain. The second channel of input microphone data is referred to as x when in time domain and X when in frequency domain. The y signal is effectively used as the signal containing noise that needs to be cancelled and the x signal is the reference from which the transfer function is estimated and consequently the noise signal estimated before it is subtracted from y.

In general throughout the code, names starting with lower case represent time domain and those beginning with upper case represent frequency domain. For example error is the filter error and Error is the spectrum of the filter error. The filter coefficient array referred to as h_hat in time domain and H_hat in frequency domain.

The filter has multiple phases each of 15ms. The term phases refers to the tail length of the filter. A filter with more phases or a longer tail length will be able to model a more reverberant room response leading to better interference cancellation but, as with all normalised LMS based architectures, will be slower to converge in the case of a transfer function change.

Before starting the IC processing the user must call ic_init() to initialise the IC. If the configuration parameters are to be set to non-defaults please modify these after ic_init() or in the lib_ic API Definitions file. Once the IC is initialised, the library functions can be called in a order to perform interference cancellation on a frame by frame basis.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference£££modules/voice/modules/lib_ic/doc/src/reference/index.html#api-reference
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference$$$lib_ic API Functions£££modules/voice/modules/lib_ic/doc/src/reference/api.html#lib-ic-api-functions
group ic_func

Functions

int32_t ic_init(ic_state_t *state)

Initialise IC and VNR data structures and set parameters according to ic_defines.h.

This is the first function that must called after creating an ic_state_t instance.

Parameters:
  • state[inout] pointer to IC state structure

Returns:

Error status of the VNR inference engine initialisation that is done as part of ic_init. 0 if no error, one of TfLiteStatus error enum values in case of error.

void ic_filter(ic_state_t *state, int32_t y_data[IC_FRAME_ADVANCE], int32_t x_data[IC_FRAME_ADVANCE], int32_t output[IC_FRAME_ADVANCE])

Filter one frame of audio data inside the IC.

This should be called once per new frame of IC_FRAME_ADVANCE samples. The y_data array contains the microphone data that is to have the noise subtracted from it and x_data is the noise reference source which is internally delayed before being fed into the adaptive filter. Note that the y_data input array is internally delayed by the call to ic_filter() and so contains the delayed y_data afterwards. Typically it does not matter which mic channel is connected to x or y_data as long as the separation is appropriate. The performance of this filter has been optimised for a 71mm mic separation distance.

Parameters:
  • state[inout] pointer to IC state structure

  • y_data[inout] array reference of mic 0 input buffer. Modified during call

  • x_data[in] array reference of mic 1 input buffer

  • output[out] array reference containing IC processed output buffer

void ic_calc_vnr_pred(ic_state_t *state, float_s32_t *input_vnr_pred, float_s32_t *output_vnr_pred)

Calculate voice to noise ratio estimation for the input and output of the IC.

This function can be called after each call to ic_filter. It will calculate voice to noise ratio which can be used to give information to ic_adapt and to the AGC.

Parameters:
  • state[inout] pointer to IC state structure

  • input_vnr_pred[inout] voice to noise estimate of the IC input

  • output_vnr_pred[inout] voice to noise estimate of the IC output

void ic_adapt(ic_state_t *state, float_s32_t vnr)

Adapts the IC filter according to previous frame’s statistics and VNR input.

This function should be called after each call to ic_filter. Filter and adapt functions are separated so that the external VNR can operate on each frame.

Parameters:
  • state[inout] pointer to IC state structure

  • vnr[in] VNR Voice-to-Noise ratio estimation

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference$$$lib_ic API State Structure£££modules/voice/modules/lib_ic/doc/src/reference/state.html#lib-ic-api-state-structure
group ic_state

Enums

enum adaption_config_e

Values:

enumerator IC_ADAPTION_AUTO
enumerator IC_ADAPTION_FORCE_ON
enumerator IC_ADAPTION_FORCE_OFF
enum control_flag_e

Values:

enumerator HOLD
enumerator ADAPT
enumerator ADAPT_SLOW
enumerator UNSTABLE
enumerator FORCE_ADAPT
enumerator FORCE_HOLD
struct ic_config_params_t
#include <ic_state.h>

IC configuration structure.

This structure contains configuration settings that can be changed to alter the behaviour of the IC instance. An instance of this structure is is automatically included as part of the IC state.

It controls the behaviour of the main filter and normalisation thereof. The initial values for these configuration parameters are defined in ic_defines.h and are initialised by ic_init().

Public Members

uint8_t bypass

Boolean to control bypassing of filter stage and adaption stage. When set the delayed y audio samples are passed unprocessed to the output. It is recommended to perform an initialisation of the instance after bypass is set as the room transfer function may have changed during that time.

int32_t gamma_log2

Up scaling factor for X energy calculation used for normalisation.

uint32_t sigma_xx_shift

Down scaling factor for X energy for used for normalisation.

q2_30 ema_alpha_q30

Alpha used for calculating error_ema_energy in adapt.

float_s32_t delta

Delta value used in denominator to avoid large values when calculating inverse X energy.

struct ic_adaption_controller_config_t
#include <ic_state.h>

IC adaption controller configuration structure.

This structure contains configuration settings that can be changed to alter the behaviour of the adaption controller. This includes processing of the raw VNR probability input and optional stability controller logic. It is automatically included as part of the IC state and initialised by ic_init().

The initial values for these configuration parameters are defined in ic_defines.h.

Public Members

q2_30 energy_alpha_q30

Alpha for EMA input/output energy calculation.

float_s32_t fast_ratio_threshold

Fast ratio threshold to detect instability.

float_s32_t high_input_vnr_hold_leakage_alpha

Setting of H_hat leakage which gets set if vnr detects high voice probability.

float_s32_t instability_recovery_leakage_alpha

Setting of H_hat leakage which gets set if fast ratio exceeds a threshold.

float_s32_t input_vnr_threshold

VNR input threshold which decides whether to hold or adapt the filter.

float_s32_t input_vnr_threshold_high

VNR high threshold to leak the filter is the speech level is high.

float_s32_t input_vnr_threshold_low

VNR low threshold to adapt faster when the speech level is low.

uint32_t adapt_counter_limit

Limits number of frames for which mu and leakage_alpha could be adapted.

uint8_t enable_adaption

Boolean which controls whether the IC adapts when ic_adapt() is called.

adaption_config_e adaption_config

Enum which controls the way mu and leakage_alpha are being adjusted.

struct ic_adaption_controller_state_t
#include <ic_state.h>

IC adaption controller state structure.

This structure contains state used for the instance of the adaption controller logic. It is automatically included as part of the IC state and initialised by ic_init().

Public Members

float_s32_t input_energy

EMWA of input frame energy.

float_s32_t output_energy

EMWA of output frame energy.

float_s32_t fast_ratio

Ratio between output and input EMWA energies.

uint32_t adapt_counter

Adaption counter which counts number of frames has been adapted.

control_flag_e control_flag

Flag that represents the state of the filter.

ic_adaption_controller_config_t adaption_controller_config

Configuration parameters for the adaption controller.

struct ic_state_t
#include <ic_state.h>

IC state structure.

This is the main state structure for an instance of the Interference Canceller. Before use it must be initialised using the ic_init() function. It contains everything needed for the IC instance including configuration and internal state of both the filter, adaption logic and adaption controller.

Public Members

bfp_s32_t y_bfp[IC_Y_CHANNELS]

BFP array pointing to the time domain y input signal.

bfp_complex_s32_t Y_bfp[IC_Y_CHANNELS]

BFP array pointing to the frequency domain Y input signal.

int32_t y[IC_Y_CHANNELS][IC_FRAME_LENGTH + FFT_PADDING]

Storage for y and Y mantissas. Note FFT is done in-place so the y storage is reused for Y.

bfp_s32_t x_bfp[IC_X_CHANNELS]

BFP array pointing to the time domain x input signal.

bfp_complex_s32_t X_bfp[IC_X_CHANNELS]

BFP array pointing to the frequency domain X input signal.

int32_t x[IC_X_CHANNELS][IC_FRAME_LENGTH + FFT_PADDING]

Storage for x and X mantissas. Note FFT is done in-place so the x storage is reused for X.

bfp_s32_t prev_y_bfp[IC_Y_CHANNELS]

BFP array pointing to previous y samples which are used for framing.

int32_t y_prev_samples[IC_Y_CHANNELS][IC_FRAME_LENGTH - IC_FRAME_ADVANCE]

Storage for previous y mantissas.

bfp_s32_t prev_x_bfp[IC_X_CHANNELS]

BFP array pointing to previous x samples which are used for framing.

int32_t x_prev_samples[IC_X_CHANNELS][IC_FRAME_LENGTH - IC_FRAME_ADVANCE]

Storage for previous x mantissas.

bfp_complex_s32_t Y_hat_bfp[IC_Y_CHANNELS]

BFP array pointing to the estimated frequency domain Y signal.

complex_s32_t Y_hat[IC_Y_CHANNELS][IC_FD_FRAME_LENGTH]

Storage for Y_hat mantissas.

bfp_complex_s32_t Error_bfp[IC_Y_CHANNELS]

BFP array pointing to the frequency domain Error output.

bfp_s32_t error_bfp[IC_Y_CHANNELS]

BFP array pointing to the time domain Error output.

complex_s32_t Error[IC_Y_CHANNELS][IC_FD_FRAME_LENGTH]

Storage for Error and error mantissas. Note IFFT is done in-place so the Error storage is reused for error.

bfp_complex_s32_t H_hat_bfp[IC_Y_CHANNELS][IC_X_CHANNELS * IC_FILTER_PHASES]

BFP array pointing to the frequency domain estimate of transfer function.

complex_s32_t H_hat[IC_Y_CHANNELS][IC_FILTER_PHASES * IC_X_CHANNELS][IC_FD_FRAME_LENGTH]

Storage for H_hat mantissas.

bfp_complex_s32_t X_fifo_bfp[IC_X_CHANNELS][IC_FILTER_PHASES]

BFP array pointing to the frequency domain X input history used for calculating normalisation.

bfp_complex_s32_t X_fifo_1d_bfp[IC_X_CHANNELS * IC_FILTER_PHASES]

1D alias of the frequency domain X input history used for calculating normalisation.

complex_s32_t X_fifo[IC_X_CHANNELS][IC_FILTER_PHASES][IC_FD_FRAME_LENGTH]

Storage for X_fifo mantissas.

bfp_complex_s32_t T_bfp[IC_X_CHANNELS]

BFP array pointing to the frequency domain T used for adapting the filter coefficients (H). Note there is no associated storage because we re-use the x input array as a memory optimisation.

bfp_s32_t inv_X_energy_bfp[IC_X_CHANNELS]

BFP array pointing to the inverse X energies used for normalisation.

int32_t inv_X_energy[IC_X_CHANNELS][IC_FD_FRAME_LENGTH]

Storage for inv_X_energy mantissas.

bfp_s32_t X_energy_bfp[IC_X_CHANNELS]

BFP array pointing to the X energies.

int32_t X_energy[IC_X_CHANNELS][IC_FD_FRAME_LENGTH]

Storage for X_energy mantissas.

unsigned X_energy_recalc_bin

Index state used for calculating energy across all X bins.

bfp_s32_t overlap_bfp[IC_Y_CHANNELS]

BFP array pointing to the overlap array used for windowing and overlap operations.

int32_t overlap[IC_Y_CHANNELS][IC_FRAME_OVERLAP]

Storage for overlap mantissas.

int32_t y_input_delay[IC_Y_CHANNELS][IC_Y_CHANNEL_DELAY_SAMPS]

FIFO for delaying y channel (w.r.t x) to enable adaptive filter to be effective.

uint32_t y_delay_idx[IC_Y_CHANNELS]

Index state used for keeping track of y delay FIFO.

float_s32_t mu[IC_Y_CHANNELS][IC_X_CHANNELS]

Mu value used for controlling adaption rate.

float_s32_t leakage_alpha

Alpha used for leaking away H_hat, allowing filter to slowly forget adaption.

float_s32_t max_X_energy[IC_X_CHANNELS]

Used to keep track of peak X energy.

bfp_s32_t sigma_XX_bfp[IC_X_CHANNELS]

BFP array pointing to the EMA filtered X input energy.

int32_t sigma_XX[IC_X_CHANNELS][IC_FD_FRAME_LENGTH]

Storage for sigma_XX mantissas.

float_s32_t sum_X_energy[IC_X_CHANNELS]

X energy sum used for maintaining the X FIFO.

ic_config_params_t config_params

Configuration parameters for the IC.

ic_adaption_controller_state_t ic_adaption_controller_state

State and configuration parameters for the IC adaption controller.

vnr_pred_state_t vnr_pred_state

Input and Output VNR Prediction related state

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference$$$lib_ic API Definitions£££modules/voice/modules/lib_ic/doc/src/reference/defines.html#lib-ic-api-definitions
group ic_defines

Defines

IC_INIT_MU

Initial MU value applied on startup. MU controls the adaption rate of the IC and is normally adjusted by the adaption rate controller during operation.

IC_INIT_EMA_ALPHA

Alpha used for calculating y_ema_energy, x_ema_energy and error_ema_energy.

IC_INIT_LEAKAGE_ALPHA

Alpha used for leaking away H_hat, allowing filter to slowly forget adaption. This value is adjusted by the adaption rate controller if instability is detected.

IC_FILTER_PHASES

The number of filter phases supported by the IC. Each filter phase represents 15ms of filter length. Hence a 10 phase filter will allow cancellation of noise sources with up to 150ms of echo tail length. There is a tradeoff between adaption speed and maximum cancellation of the filter; increasing the number of phases will increase the maximum cancellation at the cost of increased xCORE resource usage and slower adaption times.

IC_Y_CHANNEL_DELAY_SAMPS

This is the delay, in samples that one of the microphone signals is delayed in order for the filter to be effective. A larger number increases the delay through the filter but may improve cancellation. The group delay through the IC filter is 32 + this number of samples.

IC_INIT_SIGMA_XX_SHIFT

Down scaling factor for X energy calculation used for normalisation.

IC_INIT_GAMMA_LOG2

Up scaling factor for X energy calculation for used for LMS normalisation.

IC_INIT_DELTA

Delta value used in denominator to avoid large values when calculating inverse X energy.

IC_INIT_FAST_RATIO_THRESHOLD

Fast ratio threshold to detect instability.

IC_INIT_ENERGY_ALPHA

Alpha for EMA input/output energy calculation.

IC_INIT_HIGH_INPUT_VNR_HOLD_LEAKAGE_ALPHA

Leakage alpha used in case vnr detects high voice probability.

IC_INIT_INSTABILITY_RECOVERY_LEAKAGE_ALPHA

Leakage alpha used in the case where instability is detected. This allows the filter to stabilise without completely forgetting the adaption.

IC_INIT_ADAPT_COUNTER_LIMIT

Limits number of frames for which mu and leakage_alpha could be adapted.

IC_INIT_INPUT_VNR_THRESHOLD

VNR input threshold which decides whether to hold or adapt the filter.

IC_INIT_INPUT_VNR_THRESHOLD_HIGH

VNR high threshold to leak the filter is the speech level is high.

IC_INIT_INPUT_VNR_THRESHOLD_LOW

VNR low threshold to adapt faster when the speech level is low.

IC_INIT_VNR_PRED_ALPHA

Alpha for EMA VNR prediction calculation.

IC_INIT_INPUT_VNR_PRED

Initial value for the input VNR prediction.

IC_INIT_OUTPUT_VNR_PRED

Initial value for the output VNR prediction.

IC_Y_CHANNELS

Number of Y channels input. This is fixed at 1 for the IC. The Y channel is delayed and used to generate the estimated noise signal to subtract from X. In practical terms it does not matter which microphone is X and which is Y. NOT USER MODIFIABLE.

IC_X_CHANNELS

Number of X channels input. This is fixed at 1 for the IC. The X channel is the microphone from which the estimated noise signal is subtracted. In practical terms it does not matter which microphone is X and which is Y. NOT USER MODIFIABLE.

IC_FRAME_LENGTH

Time domain samples block length used internally in the IC’s block LMS algorithm. NOT USER MODIFIABLE.

IC_FRAME_ADVANCE

IC new samples frame size This is the number of samples of new data that the IC works on every frame. 240 samples at 16kHz is 15msec. Every frame, the IC takes in 15msec of mic data and generates 15msec of interference cancelled output. NOT USER MODIFIABLE.

IC_FD_FRAME_LENGTH

Number of bins of spectrum data computed when doing a DFT of a IC_FRAME_LENGTH length time domain vector. The IC_FD_FRAME_LENGTH spectrum values represent the bins from DC to Nyquist. NOT USER MODIFIABLE.

FFT_PADDING

Extra 2 samples you need to allocate in time domain so that the full spectrum (DC to nyquist) can be stored after the in-place FFT. NOT USER MODIFIABLE.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference$$$lib_ic Header Files£££modules/voice/modules/lib_ic/doc/src/reference/header_files.html#lib-ic-header-files
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference$$$ic_defines.h£££modules/voice/modules/lib_ic/doc/src/reference/header_files.html#ic-defines-h
page page_ic_defines_h

This header contains lib_ic public defines that are used to configure the interference canceller when ic_init() is called.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference$$$ic_state.h£££modules/voice/modules/lib_ic/doc/src/reference/header_files.html#ic-state-h
page page_ic_state_h

This header contains definitions for data structures used in lib_ic. It also contains the configuration sub-structures which control the operation of the interference canceller during run-time.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API Reference$$$ic_api.h£££modules/voice/modules/lib_ic/doc/src/reference/header_files.html#ic-api-h
page page_ic_api_h

lib_ic public functions API.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$On GitHub£££modules/voice/modules/lib_ic/doc/src/reference/header_files.html#on-github

lib_ic is present as part of fwk_voice. Get the latest version of fwk_voice from https://github.com/xmos/fwk_voice. The lib_ic module can be found in the modules/lib_ic directory in fwk_voice.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Interference Canceller Library$$$API£££modules/voice/modules/lib_ic/doc/src/reference/header_files.html#api

To use the functions in this library in an application, include ic_api.h in the application source file

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library£££modules/voice/modules/lib_vnr/doc/index.html#voice-to-noise-ratio-estimator-library

lib_vnr is a library which estimates the ratio of speech signal in noise for an input audio stream. lib_vnr library functions uses lib_xcore_math to perform DSP using low-level optimised operations, and lib_tflite_micro and lib_nn to perform inference using an optimised TensorFlow Lite model.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$Repository Structure£££modules/voice/modules/lib_vnr/doc/src/getting_started.html#repository-structure
  • modules/lib_vnr - The lib_vnr library directory within https://github.com/xmos/fwk_voice/. Within lib_vnr:

    • api/ - Header files containing the public API for lib_vnr.

    • doc/ - Library documentation source (for non-embedded documentation) and build directory.

    • src/ - Library source code.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$Requirements£££modules/voice/modules/lib_vnr/doc/src/getting_started.html#requirements

lib_vnr is included as part of the fwk_voice github repository and all requirements for cloning and building fwk_voice apply. It depends on lib_xcore_math and the xmos-ai-tools python package.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Structure£££modules/voice/modules/lib_vnr/doc/src/getting_started.html#api-structure

The API is split into 2 parts; feature extraction and inference. The feature extraction API processes an input audio frame to extract features that are input to the inference stage. The inference API has functions for running inference using the VNR TensorFlow Lite model to predict the speech to noise ratio. Both feature extraction and inference APIs have initialisation functions that are called only once at device initialisation and processing functions that are called every frame. The performance requirement is relative low, around 5 MIPS for initialisation and 3 MIPS for processing, and as such is supplied as a single threaded implementation only.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$Getting and Building£££modules/voice/modules/lib_vnr/doc/src/getting_started.html#getting-and-building

The VNR estimator module is obtained as part of the parent fwk_voice repo clone. It is present in fwk_voice/modules/lib_vnr

Both feature extraction and the inference parts of lib_vnr can be compiled as static libraries. The application can link against libfwk_voice_module_lib_vnr_features.a and/or libfwk_voice_module_lib_vnr_inference.a and add lib_vnr/api/features and/or lib_vnr/api/inference and lib_vnr/api/common as include directories.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$VNR Inference Model£££modules/voice/modules/lib_vnr/doc/src/getting_started.html#vnr-inference-model

The VNR estimator module uses a neural network model to predict the SNR of speech in noise for incoming data. The model used is a pre trained TensorFlow Lite model that has been optimised for the XCORE architecture using the xmos-ai-tools xformer. The optimised model is compiled as part of the VNR Inference Engine. Changing the model at runtime is not supported. If changing to a different model, the application needs to generate the model related files and recompile. This process is automated through the build system, as described below.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$VNR Inference Model$$$Integrating a TensorFlow Lite model into the VNR module£££modules/voice/modules/lib_vnr/doc/src/getting_started.html#integrating-a-tensorflow-lite-model-into-the-vnr-module

To integrate the new TensorFlow Lite model into the VNR module:

  1. Put an unoptimised model into fwk_voice/modules/lib_vnr/python/model/model_output/trained_model.tflite

  2. Rerun the build tool of our choice (make or ninja, for example)

This will use xmos-ai-tools to optimise .tflite model for xcore and generate .cpp and .h files into fwk_voice/modules/lib_vnr/src/inference/model/. Those generated files will be picked by the build system and compiled into the VNR module.

The process described above only generates an optimised model that would run on a single core.

Any new models replacing the existing one should have the same set of input features, input and output size, and data types as the existing model. If changes to the features are made, the feature extraction code must be updated. Note that the VNR is used to control the IC behavior, and so its performance may also change.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$VNR Overview£££modules/voice/modules/lib_vnr/doc/src/overview.html#vnr-overview

The VNR (Voice to Noise Ratio) estimator predicts the signal to noise ratio of a speech signal in noise, using a pre-trained neural network. The VNR neural network model outputs a value between 0 and 1, with 1 indicating the strongest speech, and 0, the weakest speech compared to noise in a frame of audio data.

The VNR module processes VNR_FRAME_ADVANCE new audio pcm samples every frame. The time domain input is transformed to frequency domain using a 512 point DFT. A MEL filterbank is then applied to compress the DFT output spectrum into fewer data points. The MEL filter outputs of VNR_PATCH_WIDTH most recent frames are normalised and fed as input features to the VNR prediction model which runs an inference over the features to output the VNR estimate value.

VNR estimations can be very helpful in voice processing pipelines. Applications for VNR include intelligent power management, control of adaptive filters for reducing noise sources and improved performance of AGC (Automatic Gain Control) blocks that provide a more natural listening experience.

The VNR API is split into 2 parts; feature extraction and inference. This is done to allow multiple sets of features to use the same inference engine. The VNR feature extraction is further split into 2 parts; a function to form the input frame that the feature extraction can run on, and a function to do the actual feature extraction. The function for forming the input frame starts from VNR_FRAME_ADVANCE new pcm samples and creates the DFT output that is used as input to the MEL filterbank. This has been separated from the rest of the feature extraction to support cases where the VNR might be using the DFT output computed in another module for extracting features.

The pre-trained, optimised for XCORE TensorFlow Lite model, that is used for VNR inference has been compiled as part of the VNR inference static library. There’s no support for providing a new model to the inference engine at run time.

Before starting the feature extraction, the user must call vnr_input_state_init() and vnr_feature_state_init() to initialise the form input frame and feature extraction state. Before starting inference, the user must call vnr_inference_init() to initialise the inference engine.

There are no user configurable parameters within the VNR and so no arguments are required and no configuration structures need be tuned.

Once the VNR is initialised, the vnr_form_input_frame(), vnr_extract_features() and vnr_inference() functions should be called on a frame by frame basis.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference£££modules/voice/modules/lib_vnr/doc/src/reference/index.html#api-reference
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$lib_vnr feature extraction API Functions£££modules/voice/modules/lib_vnr/doc/src/reference/api.html#lib-vnr-feature-extraction-api-functions
group vnr_features_api

Functions

void vnr_input_state_init(vnr_input_state_t *input_state)

Initialise previous frame samples buffer that is used when creating an input frame for processing through the VNR estimator.

This function should be called once at device startup.

Parameters:
  • input_state[inout] pointer to the VNR input state structure

void vnr_form_input_frame(vnr_input_state_t *input_state, bfp_complex_s32_t *X, complex_s32_t X_data[VNR_FD_FRAME_LENGTH], const int32_t new_x_frame[VNR_FRAME_ADVANCE])

Create the input frame for processing through the VNR estimator.

This function takes in VNR_FRAME_ADVANCE new samples, combines them with previous frame’s samples to form a VNR_PROC_FRAME_LENGTH samples input frame of time domain data, and outputs the DFT spectrum of the input frame. The DFT spectrum is output in the BFP structure and data memory provided by the user.

The frequency spectrum output from this function is processed through the VNR feature extraction stage.

If sharing the DFT spectrum calculated in some other module, vnr_form_input_frame() is not needed.

Example

#include "vnr_features_api.h"
complex_s32_t DWORD_ALIGNED input_frame[VNR_FD_FRAME_LENGTH];
bfp_complex_s32_t X;
vnr_form_input_frame(&vnr_input_state, &X, input_frame, new_data);

Parameters:
  • input_state[inout] pointer to the VNR input state structure

  • X[out] pointer to a variable of type bfp_complex_s32_t that the user allocates. The user doesn’t need to initialise this bfp variable. After this function, X is updated to point to the DFT output spectrum and can be passed as input to the feature extraction stage.

  • X_data[out] pointer to VNR_FD_FRAME_LENGTH values of type complex_s32_t that the user allocates. After this function, the DFT spectrum values are written to this array, and X->data points to X_data memory.

  • new_x_frame[in] Pointer to VNR_FRAME_ADVANCE new time domain samples

void vnr_feature_state_init(vnr_feature_state_t *feature_state)

Initialise the state structure for the VNR feature extraction stage.

This function is called once at device startup.

Parameters:
  • feature_state[inout] pointer to the VNR feature extraction state structure

void vnr_extract_features(vnr_feature_state_t *vnr_feature_state, bfp_s32_t *feature_patch, int32_t feature_patch_data[VNR_PATCH_WIDTH * VNR_MEL_FILTERS], const bfp_complex_s32_t *X)

Extract features.

This function takes in DFT spectrum of the VNR input frame and does the feature extraction. The features are written to the feature_patch BFP structure and feature_patch_data memory provided by the user. The feature output from this function are passed as input to the VNR inference engine.

Parameters:
  • vnr_feature_state[inout] Pointer to the VNR feature extraction state structure

  • feature_patch[out] Pointer to the bfp_s32_t structure allocated by the user. The user doesn’t need to initialise this BFP structure before passing it to this function. After this function call feature_patch will be updated and will point to the extracted features. It can then be passed to the inference stage.

  • feature_patch_data[out] Pointer to the VNR_PATCH_WIDTH * VNR_MEL_FILTERS int32_t values allocated by the user. The extracted features will be written to the feature_patch_data array and the BFP structure’s feature_patch->data will point to this array.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$lib_vnr inference engine API Functions£££modules/voice/modules/lib_vnr/doc/src/reference/api.html#lib-vnr-inference-engine-api-functions
group vnr_inference_api

Functions

int32_t vnr_inference_init()

Initialise the inference_engine object and load the VNR model into the inference engine.

This function calls lib_tflite_micro functions to initialise the inference engine and load the VNR model into it. It is called once at startup. The memory required for the inference engine object as well as the tensor arena size required for inference is statically allocated as global buffers in the VNR module. The VNR model is compiled as part of the VNR module.

void vnr_inference(float_s32_t *vnr_output, bfp_s32_t *features)

Run model prediction on a feature patch.

This function invokes the inference engine. It takes in a set of features corresponding to an input frame of data and outputs the VNR prediction value. The VNR output is a single value ranging between 0 and 1 returned in float_s32_t format, with 0 being the lowest SNR and 1 being the strongest possible SNR in speech compared to noise.

Parameters:
  • vnr_output[out] VNR prediction value.

  • features[in] Input feature vector. Note that this is not passed as a const pointer and the feature memory is overwritten as part of the inference computation.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$lib_vnr #defines common to feature extraction and inference£££modules/voice/modules/lib_vnr/doc/src/reference/common_defines.html#lib-vnr-defines-common-to-feature-extraction-and-inference
group vnr_defines

Defines

VNR_MEL_FILTERS

Number of filters in the MEL filterbank used in the VNR feature extraction.

VNR_PATCH_WIDTH

Number of frames that make up a full set of features for the inference to run on.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$lib_vnr feature extraction #defines and data structure definitions£££modules/voice/modules/lib_vnr/doc/src/reference/state.html#lib-vnr-feature-extraction-defines-and-data-structure-definitions
group vnr_features_state

Defines

VNR_PROC_FRAME_LENGTH

Time domain samples block length used internally in VNR DFT computation. NOT USER MODIFIABLE.

VNR_FRAME_ADVANCE

VNR new samples frame size This is the number of samples of new data that the VNR processes every frame. 240 samples at 16kHz is 15msec. NOT USER MODIFIABLE.

VNR_FD_FRAME_LENGTH

Number of bins of spectrum data computed when doing a DFT of a VNR_PROC_FRAME_LENGTH length time domain vector. The VNR_FD_FRAME_LENGTH spectrum values represent the bins from DC to Nyquist. NOT USER MODIFIABLE.

struct vnr_input_state_t
#include <vnr_features_state.h>

VNR form_input state structure.

Public Members

int32_t prev_input_samples[VNR_PROC_FRAME_LENGTH - VNR_FRAME_ADVANCE]

Previous frame time domain input samples which are combined with VNR_FRAME_ADVANCE new samples to form the VNR input frame.

struct vnr_feature_config_t
#include <vnr_features_state.h>

VNR feature extraction config structure.

Public Members

int32_t enable_highpass

Enable highpass filtering of VNR MEL filter output. Disabled by default

struct vnr_feature_state_t
#include <vnr_features_state.h>

State structure used in VNR feature extraction.

Public Members

int32_t feature_buffers[VNR_PATCH_WIDTH][VNR_MEL_FILTERS]

Feature buffer containing the most recent VNR_MEL_FILTERS frames’ MEL frequency spectrum.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$lib_vnr Header Files£££modules/voice/modules/lib_vnr/doc/src/reference/header_files.html#lib-vnr-header-files
XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$vnr_features_api.h£££modules/voice/modules/lib_vnr/doc/src/reference/header_files.html#vnr-features-api-h
page page_vnr_features_api_h

This header contains lib_vnr features extraction API functions.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$vnr_inference_api.h£££modules/voice/modules/lib_vnr/doc/src/reference/header_files.html#vnr-inference-api-h
page page_vnr_inference_api_h

This header contains lib_vnr inference engine API functions.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$vnr_defines.h£££modules/voice/modules/lib_vnr/doc/src/reference/header_files.html#vnr-defines-h
page page_vnr_defines_h

This header contains the lib_vnr public #defines that are common to both feature extraction and inference.

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$API Reference$$$vnr_features_state.h£££modules/voice/modules/lib_vnr/doc/src/reference/header_files.html#vnr-features-state-h
page page_vnr_features_state_h

This header contains lib_vnr feature extraction related public #defines and data structure definitions

XCORE ® -VOICE Solutions$$$Audio Processing$$$Audio Features$$$Voice To Noise Ratio Estimator Library$$$On GitHub£££modules/voice/modules/lib_vnr/doc/src/reference/header_files.html#on-github

lib_vnr is present as part of fwk_voice. Get the latest version of fwk_voice from https://github.com/xmos/fwk_voice. The lib_vnr module can be found in the modules/lib_vnr directory in fwk_voice.

XCORE ® -VOICE Solutions$$$Build System User Guide£££modules/rtos/doc/build_system_guide/index.html#build-system-user-guide

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Build System£££modules/rtos/doc/build_system_guide/introduction.html#build-system

This document describes the CMake-based build system used by applications based on the XMOS RTOS framework. The build system is designed so a user does not have to be an expert using CMake. However, some familiarity with CMake is helpful. You can familiarize yourself by reading the CMake Tutorial or CMake documentation. Reviewing these is optional and the reader should feel free to save that for later.

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Build System$$$Overview£££modules/rtos/doc/build_system_guide/introduction.html#overview

An xcore RTOS project can be seen as an integration of several modules. For example, for a FreeRTOS application that captures audio from PDM microphones and outputs it to a DAC, there could be the following modules:

  • Several core modules (for debug prints, etc…)

  • The FreeRTOS kernel and drivers

  • PDM microphone array driver for receiving audio samples

  • I2C driver for configuring the DAC

  • I2S driver for outputting to the DAC

  • Application code tying it all together

When a project is compiled, the build system will build all libraries and source files required for the application. For this to happen, your CMakeLists.txt file will need to specify:

  • Application sources and include paths

  • Compile flags

  • Compile definitions

  • Link libraries

  • Link options

This is best illustrated with a commented Example CMakeLists.txt.

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Build System$$$Aliases£££modules/rtos/doc/build_system_guide/introduction.html#aliases

Your CMakeLists.txt file will need to specify the target link libraries as shown in the following snippet:

target_link_libraries(my_target PUBLIC
    core::general
    rtos::freertos
    rtos::drivers::mic_array
    rtos::drivers::i2c
    rtos::drivers::i2s
    lib_mic_array
    lib_i2c
    lib_i2s
)

It is very common for target link alias libraries, like rtos::freertos in the snippet above, to include common sets of target link libraries. The snippet above could be simplified because the rtos::freertos alias includes many commonly used drivers and peripheral IO libraries as a dependency.

target_link_libraries(my_target PUBLIC
    core::general
    rtos::freertos
)

Application target link libraries can be further simplified using existing bsp_configs. These provide their dependent link libraries enabling applications to simplify their target link libraries list. The snippet above could be simplified because the rtos::bsp_config::xcore_ai_explorer alias includes core::general, rtos::freertos, and all required drivers and peripheral IO libraries used by the bsp_config. More information on bsp_configs can be found in the RTOS Programming Guide.

target_link_libraries(my_target PUBLIC
    rtos::bsp_config::xcore_ai_explorer
)

XMOS libraries and frameworks provide several target aliases. Being aware of the Targets will simplify your application CMakeLists.txt.

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Example CMakeLists.txt£££modules/rtos/doc/build_system_guide/cmakelists.html#example-cmakelists-txt

CMake is powerful tool that provides the developer a great deal of flexibility in how their projects are built. As a result, CMakeLists.txt files can accomplish the same function in multiple ways.

Below is an example CMakeLists.txt that shows both required and conventional commands for a basic FreeRTOS project. This example can be used as a starting point for your application, but it is recommended to copy a CMakeLists.txt from an XMOS reference design or other example application that closely resembles your application.

## Specify your application sources by globbing the src folder
file(GLOB_RECURSE APP_SOURCES src/*.c)

## Specify your application include paths
set(APP_INCLUDES src)

## Specify your compiler flags
set(APP_COMPILER_FLAGS
   -Os
   -report
   -fxscope
   -mcmodel=large
   ${CMAKE_CURRENT_SOURCE_DIR}/src/config.xscope
   ${CMAKE_CURRENT_SOURCE_DIR}/XCORE-AI-EXPLORER.xn
)

## Specify any compile definitions
set(APP_COMPILE_DEFINITIONS
   configENABLE_DEBUG_PRINTF=1
   PLATFORM_USES_TILE_0=1
   PLATFORM_USES_TILE_1=1
)

## Set your link libraries
set(APP_LINK_LIBRARIES
   rtos::bsp_config::xcore_ai_explorer
)

## Set your link options
set(APP_LINK_OPTIONS
   -report
   ${CMAKE_CURRENT_SOURCE_DIR}/XCORE-AI-EXPLORER.xn
   ${CMAKE_CURRENT_SOURCE_DIR}/src/config.xscope
)

## Create your targets

## Create the target for the portion of application code that will execute on tile[0]
set(TARGET_NAME tile0_my_app)
add_executable(${TARGET_NAME} EXCLUDE_FROM_ALL)
target_sources(${TARGET_NAME} PUBLIC ${APP_SOURCES})
target_include_directories(${TARGET_NAME} PUBLIC ${APP_INCLUDES})
target_compile_definitions(${TARGET_NAME} PUBLIC ${APP_COMPILE_DEFINITIONS} THIS_XCORE_TILE=0)
target_compile_options(${TARGET_NAME} PRIVATE ${APP_COMPILER_FLAGS})
target_link_libraries(${TARGET_NAME} PUBLIC ${APP_LINK_LIBRARIES})
target_link_options(${TARGET_NAME} PRIVATE ${APP_LINK_OPTIONS})
unset(TARGET_NAME)

## Create the target for the portion of application code that will execute on tile[1]
set(TARGET_NAME tile1_my_app)
add_executable(${TARGET_NAME} EXCLUDE_FROM_ALL)
target_sources(${TARGET_NAME} PUBLIC ${APP_SOURCES})
target_include_directories(${TARGET_NAME} PUBLIC ${APP_INCLUDES})
target_compile_definitions(${TARGET_NAME} PUBLIC ${APP_COMPILE_DEFINITIONS} THIS_XCORE_TILE=1)
target_compile_options(${TARGET_NAME} PRIVATE ${APP_COMPILER_FLAGS})
target_link_libraries(${TARGET_NAME} PUBLIC ${APP_LINK_LIBRARIES})
target_link_libraries(${TARGET_NAME} PRIVATE ${APP_LINK_OPTIONS} )
unset(TARGET_NAME)

## Merge tile[0] and tile[1] binaries into a single binary using an XMOS CMake macro
merge_binaries(my_app tile0_my_app tile1_my_app 1)

## Optionally create run and debug targets using XMOS CMake macros
create_run_target(my_app)
create_debug_target(my_app)

For more information, see the documentation for each of the CMake commands used in the example above.

See Macros for more information on the XMOS CMake macros.

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Targets£££modules/rtos/doc/build_system_guide/targets.html#targets

The following library target aliases can be used in your application CMakeLists.txt. An example of how to add aliases to your target link libraries is shown below:

target_link_libraries(my_app PUBLIC core::general rtos::freertos)

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Targets$$$General£££modules/rtos/doc/build_system_guide/targets.html#general

Several aliases are provided that specify a collection of libraries with similar functions. These composite target libraries provide a concise alternative to specifying all the individual targets that are commonly required.

Composite Target Libraries

Target

Description

core::general

Commonly used core libraries

io::general

Commonly used peripheral libraries

io::audio

Commonly used peripheral libraries for audio applications

rtos::freertos

Commonly used RTOS libraries

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Targets$$$Core£££modules/rtos/doc/build_system_guide/targets.html#core

If you prefer, you can specify individual core library targets.

Core Libraries

Target

Description

framework_core_clock_control

Clock control API

framework_core_utils

General utilities used by most applications

framework_core_legacy_compat

For compatibility with XC

lib_xcore_math

VPU-optimized math library

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Targets$$$Peripherals£££modules/rtos/doc/build_system_guide/targets.html#peripherals

If you prefer, you can specify individual peripheral libraries.

Peripheral Libraries

Target

Description

lib_i2c

I2C library

lib_spi

SPI library

lib_uart

UART library

lib_qspi_io

QSPI library

lib_xud

XUD USB library

lib_i2s

I2S library

lib_mic_array

Microphone Array library

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Targets$$$RTOS£££modules/rtos/doc/build_system_guide/targets.html#rtos

Several aliases are provided that specify a collection of RTOS libraries with similar functions. These composite target libraries provide a concise alternative to specifying all the individual targets that are commonly required.

Composite RTOS Libraries

Target

Description

rtos::freertos

All libraries used my most FreeRTOS applications

rtos::drivers:all

All RTOS Driver libraries

rtos::freertos_usb

All libraries to support development with TinyUSB

rtos::sw_services::general

Most commonly used RTOS software service libraries

rtos::iot

All IoT libraries

rtos::wifi

All WiFi libraries

These board support libraries simplify development with a specific board.

Board Support Libraries

Target

Description

rtos::bsp_config::xcore_ai_explorer

xcore.ai Explorer RTOS board support library

If you prefer, you can specify individual RTOS driver libraries.

Individual RTOS Driver Libraries

Target

Description

rtos::drivers::uart

UART RTOS driver library

rtos::drivers::i2c

I2C RTOS driver library

rtos::drivers::i2s

I2S RTOS driver library

rtos::drivers::spi

SPI RTOS driver library

rtos::drivers::qspi_io

QSPI RTOS driver library

rtos::drivers::mic_array

Microphone Array RTOS driver library

rtos::drivers::usb

USB RTOS driver library

rtos::drivers::dfu_image

RTOS DFU driver library

rtos::drivers::gpio

GPIO RTOS driver library

rtos::drivers::l2_cache

L2 Cache RTOS driver library

rtos::drivers::clock_control

Clock control RTOS driver library

rtos::drivers::trace

Trace RTOS driver library

rtos::drivers::swmem

SwMem RTOS driver library

rtos::drivers::wifi

WiFi RTOS driver library

rtos::drivers::intertile

Intertile RTOS driver library

rtos::drivers::rpc

Remote procedure call RTOS driver library

If you prefer, you can specify individual software service libraries.

Individual Software Service Libraries

Target

Description

rtos::sw_services::fatfs

FatFS library

rtos::sw_services::usb

USB library

rtos::sw_services::device_control

Device control library

rtos::sw_services::usb_device_control

USB device control library

rtos::sw_services::wifi_manager

WiFi manager library

rtos::sw_services::tls_support

TLS library

rtos::sw_services::dhcp

DHCP library

rtos::sw_services::json

JSON library

rtos::sw_services::http

HTTP library

rtos::sw_services::sntpd

SNTP daemon library

rtos::sw_services::mqtt

MQTT library

The following libraries for building host applications are also provided by the SDK.

Host (x86) Libraries

Target

Description

rtos::sw_services::device_control_host_usb

Host USB device control library

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros£££modules/rtos/doc/build_system_guide/macros.html#macros

Several CMake macros and functions are provide to make building for XCORE easier. These macros are located in the file tools/cmake_utils/xmos_macros.cmake and are documented below.

To see what XTC Tools commands the macros and functions are running, add VERBOSE=1 to your build command line. For example:

make run_my_target VERBOSE=1

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Common Macros£££modules/rtos/doc/build_system_guide/macros.html#common-macros

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Common Macros$$$merge_binaries£££modules/rtos/doc/build_system_guide/macros.html#merge-binaries

merge_binaries combines multiple xcore applications into one by extracting a tile elf and recombining it into another binary. This is used in multitile RTOS applications to enable building unique instances of the FreeRTOS kernel and task sets on a per tile basis. This macro takes an output target name, a base target, a target containing a tile to merge, and the tile number to merge.

This macro can be called in two ways. The 4 argument version is for when the application has only 1 node and therefore only the core needs to be specified.

# create target OUT by replacing tile number 0 in BASE with tile 0 in OTHER
merge_binaries(${OUT} ${BASE} ${OTHER} 0)

The 5 argument version is for multi-node applications. IMPORTANT: node number is not the “Node Id” from the xn file, rather the index of the node in the JTAGChain which is defined in the xn file.

# create target OUT by replacing tile 1 on node 0 in BASE with tile 1 on
# node 0 in OTHER
merge_binaries(${OUT} ${BASE} ${OTHER} 0 1)
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Common Macros$$$create_run_target£££modules/rtos/doc/build_system_guide/macros.html#create-run-target

create_run_target creates a run target for <TARGET_NAME> with xscope output.

create_run_target(<TARGET_NAME>)

create_run_target allows you to run a binary with the following command instead of invoking xrun --xscope.

make run_my_target
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Common Macros$$$create_debug_target£££modules/rtos/doc/build_system_guide/macros.html#create-debug-target

create_debug_target creates a debug target for <TARGET_NAME>.

create_debug_target(<TARGET_NAME>)

create_debug_target allows you to debug a binary with the following command instead of invoking xgdb. This target implicitly sets up the xscope debug interface as well.

make debug_my_target
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Common Macros$$$create_filesystem_target£££modules/rtos/doc/build_system_guide/macros.html#create-filesystem-target

create_filesystem_target creates a filesystem file for <TARGET_NAME> using the files in the <FILESYSTEM_INPUT_DIR> directory. <IMAGE_SIZE> specifies the size (in bytes) of the filesystem. The filesystem output filename will end in _fat.fs. Optional argument <OPTIONAL_DEPENDS_TARGETS> can be used to specify other dependency targets, such as filesystem generators.

create_filesystem_target(<TARGET_NAME> <FILESYSTEM_INPUT_DIR> <IMAGE_SIZE> <OPTIONAL_DEPENDS_TARGETS>)
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Common Macros$$$create_data_partition_directory£££modules/rtos/doc/build_system_guide/macros.html#create-data-partition-directory

create_data_partition_directory creates a directory populated with all components related to the data partition. The data partition output folder will end in _data_partition Optional argument <OPTIONAL_DEPENDS_TARGETS> can be used to specify other dependency targets.

create_data_partition_directory(<TARGET_NAME> <FILES_TO_COPY> <OPTIONAL_DEPENDS_TARGETS>)
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Common Macros$$$create_flash_app_target£££modules/rtos/doc/build_system_guide/macros.html#create-flash-app-target

create_flash_app_target creates a debug target for <TARGET_NAME> with optional arguments <BOOT_PARTITION_SIZE>, <DATA_PARTITION_CONTENTS>, and <OPTIONAL_DEPENDS_TARGETS>. <BOOT_PARTITION_SIZE> specificies the size in bytes of the boot partition. <DATA_PARTITION_CONTENTS> specifies the optional binary contents of the data partition. <OPTIONAL_DEPENDS_TARGETS> specifies CMake targets that should be dependencies of the resulting create_flash_app_target target. This may be used to create recipes that generate the data partition contents.

create_flash_app_target(<TARGET_NAME> <BOOT_PARTITION_SIZE> <DATA_PARTITION_CONTENTS> <OPTIONAL_DEPENDS_TARGETS>)

create_flash_app_target allows you to flash a factory image binary and optional data partition with the following command instead of invoking xflash.

make flash_app_my_target

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Less Common Macros£££modules/rtos/doc/build_system_guide/macros.html#less-common-macros

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Less Common Macros$$$create_install_target£££modules/rtos/doc/build_system_guide/macros.html#create-install-target

create_install_target creates an install target for <TARGET_NAME>.

create_install_target(<TARGET_NAME>)

create_install_target will copy <TARGET_NAME>.xe to the ${PROJECT_SOURCE_DIR}/dist directory.

make install_my_target
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Less Common Macros$$$create_run_xscope_to_file_target£££modules/rtos/doc/build_system_guide/macros.html#create-run-xscope-to-file-target

create_run_xscope_to_file_target creates a run target for <TARGET_NAME>. <XSCOPE_FILE> specifies the file to save to (no extension).

create_run_xscope_to_file_target(<TARGET_NAME> <XSCOPE_FILE>)

create_run_xscope_to_file_target allows you to run a binary with the following command instead of invoking xrun --xscope-file.

make run_xscope_to_file_my_target
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Less Common Macros$$$create_upgrade_img_target£££modules/rtos/doc/build_system_guide/macros.html#create-upgrade-img-target

create_upgrade_img_target creates an xflash image upgrade target for a provided binary for use in DFU

create_data_partition_directory(<TARGET_NAME> <FACTORY_MAJOR_VER> <FACTORY_MINOR_VER>)
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Less Common Macros$$$create_erase_all_target£££modules/rtos/doc/build_system_guide/macros.html#create-erase-all-target

create_erase_all_target creates an xflash erase all target for <TARGET_FILEPATH> target XN file. The full filepath must be specified for XN file

create_filesystem_target(<TARGET_NAME> <TARGET_FILEPATH>)

create_erase_all_target allows you to erase flash with the following command instead of invoking xflash.

make erase_all_my_target
XCORE ® -VOICE Solutions$$$Build System User Guide$$$Macros$$$Less Common Macros$$$query_tools_version£££modules/rtos/doc/build_system_guide/macros.html#query-tools-version

query_tools_version populates the following CMake variables:

XTC_VERSION_MAJOR XTC_VERSION_MINOR XTC_VERSION_PATCH

query_tools_version()

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Licenses£££modules/rtos/doc/shared/legal.html#licenses

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Licenses$$$XMOS£££modules/rtos/doc/shared/legal.html#xmos

All original source code is licensed under the XMOS License.

XCORE ® -VOICE Solutions$$$Build System User Guide$$$Licenses$$$Third-Party£££modules/rtos/doc/shared/legal.html#third-party

Additional third party code is included under the following copyrights and licenses:

Third Party Module Copyrights & Licenses

Module

Copyright & License

Argtable3

Copyright (C) 1998-2001,2003-2011,2013 Stewart Heitmann, licensed under LICENSE

FatFS

Copyright (C) 2017 ChaN, licensed under a BSD-style license

FreeRTOS

Copyright (c) 2017 Amazon.com, Inc., licensed under the MIT License

HTTP Parser

Copyright (c) Joyent, Inc. and other Node contributors, licensed under the MIT license

JSMN JSON Parser

Copyright (c) 2010 Serge A. Zaitsev, licensed under the MIT license

Mbed TLS library

Copyright (c) 2006-2018 ARM Limited, licensed under the Apache License 2.0

Paho MQTT C/C++ client for Embedded platforms

Copyright (c) 2020 The TensorFlow Authors, licensed under the Apache License

TinyUSB

Copyright (c) 2018 hathach (tinyusb.org), licensed under the MIT license

XCORE ® -VOICE Solutions$$$RTOS Programming Guide£££modules/rtos/doc/programming_guide/index.html#rtos-programming-guide

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$XCORE Platform£££modules/rtos/doc/programming_guide/platform.html#xcore-platform

The xcore platform provides a range of powerful, flexible and economic crossover processors for the use in wide-ranging applications. The XCore platform provides:

  • Fast compute

  • Flexibility

  • Economy

  • Scalablity

  • Security

  • Fast time to market

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$XCORE Platform$$$Architecture & Hardware Guide£££modules/rtos/doc/programming_guide/platform.html#architecture-hardware-guide

At the heart of the platform, the Architecture & Hardware Guide describes the multicore processors. Multiple xcore processors can themselves be “networked” together with seamless communications.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$XCORE Platform$$$Programming Guide£££modules/rtos/doc/programming_guide/platform.html#programming-guide

The Programming Guide describes how logical cores of an xcore processor can act independently to behave like highly responsive hardware peripherals, or can work as a team to apply all available CPU cycles onto a single compute task.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$XCORE Platform$$$XTC Tools£££modules/rtos/doc/programming_guide/platform.html#id3

The xcore processors are accompanied by the XTC Tools. As well as providing a powerful toolchain for application development, the toolkit assists with application deployment and upgrade.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials£££modules/rtos/doc/programming_guide/tutorials/tutorials.html#tutorials

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$FreeRTOS Application Programming£££modules/rtos/doc/programming_guide/tutorials/application_programming.html#freertos-application-programming

This document is intended to help you become familiar with FreeRTOS application programming on xcore.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$FreeRTOS Application Programming$$$Rationale£££modules/rtos/doc/programming_guide/tutorials/application_programming.html#rationale

Traditionally, xcore multi-core processors have been programmed using the XC language. The XC language allows the programmer to statically place tasks on the available hardware cores and wire them together with channels to provide inter-process communication. The XC language also exposes “events,” which are unique to the xcore architecture and are a useful alternative to interrupts.

Using the XC language, it is possible to write dedicated application software with deterministic timing and very low latency between I/O and tasks.

While XC elegantly enables the intrinsic, unique capabilities of the xcore architecture, there often needs to be higher level application type software running alongside it. The programming model that makes the lower level deterministic software possible may not be best suited for many higher level parts of an application that do not require deterministic timing. Where strict real-time execution is not required, higher level abstractions can be used to manage finite hardware resources, and provide a more familiar programming environment.

A symmetric multiprocessing (SMP) real time operating system (RTOS) can be used to simplify xcore application designs, as well as to preserve the hard real-time benefits provided by the xcore architecture for the lower level software functions that require it.

This document assumes familiarity with real time operating systems in general. Familiarity with FreeRTOS specifically should not be required, but will be helpful. For current up to date documentation on FreeRTOS see the documentation section on the FreeRTOS website.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$FreeRTOS Application Programming$$$SMP FreeRTOS£££modules/rtos/doc/programming_guide/tutorials/application_programming.html#smp-freertos

To support this new programming model for xcore, XMOS has extended the popular and free FreeRTOS kernel to support SMP. This allows for the kernel’s scheduler to be started on any number of available xcore logical cores per tile, leaving the remaining free to support other program elements that combine to create complete systems. Once the scheduler is started, FreeRTOS threads are placed on cores dynamically at runtime, rather than statically at compile time. All the usual FreeRTOS rules for thread scheduling are followed, except that rather than only running the single highest priority thread that is ready at any given time, multiple threads may run simultaneously. The threads chosen to run are always the highest priority threads that are ready. When there are more threads of a single priority that are ready to run than the number of cores available, they are scheduled in a round robin fashion. Dynamic scheduling allows FreeRTOS to optimize physical core usage based on priority and availability at runtime, opening up the potential for using tile wide MIPs more efficiently than what could be manually specified in a static compile time setting.

One of xcore’s primary strengths is its guarantee of deterministic behavior and timing. RTOS threads can also benefit from this determinism provided by the xcore architecture. An RTOS thread with interrupts disabled and a high enough priority behaves just as a bare-metal thread. An SMP RTOS kernel does not need to preempt a high priority thread because it has many other cores to utilize to schedule lower priority threads. Using an SMP RTOS allows developers to concentrate on specific requirements of their application without worrying about what affect they might have on non-preemptable thread response times. Furthermore, modification of the program in the future is much easier because the developer does not have to worry about affecting existing responsiveness with changes in unrelated areas. The non-preemptable threads will not be effected by adding lower-priority functionality.

Another xcore strength is it’s performance. xcore.ai provides lightning fast general purpose compute, AI acceleration, powerful DSP and instantaneous I/O control. RTOS threads can also benefit from the performance provided by the xcore architecture, allowing an application developer to dynamically shift performance usage from one application feature to another.

The standard FreeRTOS kernel supports dynamic task priorities, while the FreeRTOS-SMP kernel adds the following additional APIs:

  • vTaskCoreAffinitySet

  • vTaskCoreAffinityGet

  • vTaskPreemptionDisable

  • vTaskPreemptionEnable

Together, these API enable a developer to take full advantage of xcore’s performance.

Some additional configuration options are also available to the FreeRTOS-SMP Kernel:

  • configNUM_CORES

  • configRUN_MULTIPLE_PRIORITIES

  • configUSE_CORE_AFFINITY

  • configUSE_TASK_PREEMPTION_DISABLE

See Symmetric Multiprocessing (SMP) with FreeRTOS for additional information on SMP support in the FreeRTOS kernel and SMP specific considerations.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$FreeRTOS Application Programming$$$AMP SMP FreeRTOS£££modules/rtos/doc/programming_guide/tutorials/application_programming.html#amp-smp-freertos

To further leverage the xcore hardware and the FreeRTOS programming model, XMOS provides support for asymmetric multiprocessing (AMP) per tile. Each XMOS chip contains at least two tiles, which consist of their own set of logical xcore cores, IO, memory space, and more. XMOS provides a build method and variety of software drivers to allow an application to be created that is an AMP system containing, multiple SMP FreeRTOS kernels.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$FreeRTOS Application Programming$$$RTOS Drivers£££modules/rtos/doc/programming_guide/tutorials/application_programming.html#rtos-drivers

To help ease development of xcore applications using an SMP RTOS, XMOS provides several SMP RTOS compatible drivers. These include, but are not necessarily limited to:

  • Common I/O interfaces

    • GPIO

    • UART

    • I2C

    • I2S

    • PDM microphones

    • QSPI flash

    • SPI

    • USB

    • Clock control

  • xcore features

    • Intertile channel communication

    • Software defined memory

    • Software defined L2 Cache

  • External parts

    • Silicon Labs WF200 series WiFi transceiver

These drivers are all found in the RTOS framework under the path modules/rtos/modules/drivers.

Documentation on each of these drivers can be found under the RTOS Drivers section in the RTOS framework documentation pages.

It is worth noting that most of these drivers utilize a lightweight RTOS abstraction layer, meaning that they are not dependent on FreeRTOS. Conceivably they should work on any SMP RTOS, provided an abstraction layer for it is provided. This abstraction layer is found under the path modules/rtos/modules/osal. At the moment the only available SMP RTOS for xcore is the XMOS SMP FreeRTOS, but more may become available in the future.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$FreeRTOS Application Programming$$$Software Services£££modules/rtos/doc/programming_guide/tutorials/application_programming.html#software-services

The RTOS framework also includes some higher level RTOS compatible software services, some of which call the aforementioned drivers. These include, but are not necessarily limited to:

  • DHCP server

  • FAT filesystem

  • HTTP parser

  • JSON parser

  • MQTT client

  • SNTP client

  • TLS

  • USB stack

  • WiFi connection manager

Documentation on several software services can be found under the RTOS Services section in the RTOS framework documentation pages.

These services are all found in the RTOS framework under the path modules/rtos/modules/sw_services.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application Design£££modules/rtos/doc/programming_guide/tutorials/application_design.html#rtos-application-design

This document is intended to help you start your first FreeRTOS application on xcore. We assume you have read FreeRTOS Application Programming and that you are familiar with FreeRTOS.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application Design$$$RTOS Application Example£££modules/rtos/doc/programming_guide/tutorials/application_design.html#rtos-application-example

A fully functional example application that can be found in the RTOS framework under the path examples/freertos/explorer_board. This application is a reference for how to use an RTOS drivers or software service, and serves as an example for how to structure an SMP RTOS application for xcore. Additional code to initialize the SoC platform for this example is provided by a board support configuration library modules/rtos/modules/board_support/XCORE-AI-EXPLORER_2V0/platform

This example application runs two instances of SMP FreeRTOS, one on each of the processor’s two tiles. Because each tile has its own memory which is not shared between them, this can be viewed as a single asymmetric multiprocessing (AMP) system that comprises two SMP systems. A FreeRTOS thread that is created on one tile will never be scheduled to run on the other tile. Similarly, an RTOS object that is created on a tile, such as a queue, can only be accessed by threads and ISRs that run on that tile and never by code running on the other tile.

That said, the example application is programmed and built as a single coherent application, which will be familiar to programmers who have previously programmed for the xcore in the XC programming language. Data that must be shared between threads running on different tiles is sent via a channel using the RTOS intertile driver, which under the hood uses a streaming channel between the tiles.

Most of the I/O interface drivers in fact provide a mechanism to share driver instances between tiles that utilizes this intertile driver. For those familiar with XC programming, this can be viewed as a C alternative to XC interfaces.

For example, a SPI interface might be available on tile 0. Normally, initialization code that runs on tile 0 sets this interface up and then starts the driver. Without any further initialization, code that runs on tile 1 will be unable to access this interface directly, due both to not having direct access to tile 0’s memory, as well as not having direct access to tile 0’s ports. The drivers, however, provide some additional initialization functions that can be used by the application to share the instance on tile 0 with tile 1. After this initialization is done, code running on tile 1 may use the instance with the same driver API as tile 0, almost as if it was actually running on tile 0.

The example application referenced above, as well as the RTOS driver documentation, should be consulted to see exactly how to initialize and share driver instances. Additionally, not all IO is capable of being shared between tiles directly through the driver API due to timing constraints.

The RTOS framework provides the ON_TILE(t) preprocessor macro. This macro may be used by applications to ensure certain code is included only on a specific tile at compile time. In the example application, there is a single task that is created on both tiles that starts the drivers and creates the remaining application tasks. While this function is written as a single function, various parts are inside #if ON_TILE() blocks. For example, consider the following code snippet found inside the i2c_init() function:

#if ON_TILE(I2C_TILE_NO)
    rtos_intertile_t *client_intertile_ctx[1] = {intertile_ctx};
    rtos_i2c_master_init(
            i2c_master_ctx,
            PORT_I2C_SCL, 0, 0,
            PORT_I2C_SDA, 0, 0,
            0,
            100);

    rtos_i2c_master_rpc_host_init(
            i2c_master_ctx,
            &i2c_rpc_config,
            client_intertile_ctx,
            1);
#else
    rtos_i2c_master_rpc_client_init(
            i2c_master_ctx,
            &i2c_rpc_config,
            intertile_ctx);
#endif

When this function is compiled for tile I2C_TILE_NO, only the first block is included. When it is compiled for the other tile, only the second block is included. When the application is run, tile I2C_TILE_NO performs the initialization of the the I2C master driver host, while the other tile initializes the I2C master driver client. Because the I2C driver instance is shared between the two tiles, it may in fact be set to either zero or one, providing a demonstration of the way that drivers instances may be shared between tiles.

The RTOS framework provides a single XC file that provides the main() function. This provided main() function calls main_tile0() through main_tile3(), depending on the number of tiles that the application requires and the number of tiles provided by the target xcore processor. The application must provide each of these tile entry point functions. Each one is provided with up to three channel ends that are connected to each of the other tiles.

The example application provides both main_tile0() and main_tile1(). Each one calls a common initialization function that initializes all the drivers for the interfaces specific to its tile. These functions also call the initialization functions to share these driver instances between the tiles. These initialization functions are found in the platform/platform_init.c source file.

Each tile then creates the startup_task() task and starts the FreeRTOS scheduler. The startup_task() completes the driver instance sharing and then starts all of the driver instances. The driver startup functions are found in the platform/platform_start.c source file.

Consult the RTOS driver documentation for the details on what exactly each of the RTOS API functions called by this application does.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application Design$$$Board Support Configurations£££modules/rtos/doc/programming_guide/tutorials/application_design.html#board-support-configurations

xcore leverages its architecture to provide a flexible chip where many typically silicon based peripherals are found in software. This allows a chip to be reconfigured in a way that provides the specific IO required for a given application, thus resulting in a low cost yet incredibly silicon efficient solution. Board support configurations (bsp_configs) are the description for the hardware IO that exists in a given board. The bsp_configs provide the application programmer with an API to initialize and start the hardware configuration, as well as the supported RTOS driver contexts. The programming model in this FreeRTOS architecture is:

  • .xn files provide the mapping of ports, pins, and links

  • bsp_configs specify, setup, and start hardware IO and provide the application with RTOS driver contexts

  • applications use the bsp_config init/start code as well as RTOS driver contexts, similar to conventional microcontroller programming models.

To support any generic bsp_config, applications should call platform_init() before starting the scheduler, and then platform_start() after the scheduler is running and before any RTOS drivers are used.

The bsp_configs provided with the RTOS framework in modules/rtos/modules/bsp_config are an excellent starting point. They provide the most common peripheral drivers that are supported by the boards that support RTOS framework based applications. For advanced users, it is recommended that you copy one of these bsp_config into your application project and customize as needed.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application Design$$$Board Support Configurations$$$Creating Custom bsp_configs£££modules/rtos/doc/programming_guide/tutorials/application_design.html#creating-custom-bsp-configs

To enable hardware portability, a minimal bsp_config should contain the following:

custom_config/
  platform/
    driver_instances.c
    driver_instances.h
    platform_conf.h
    platform_init.c
    platform_init.h
    platform_start.c
  custom_config.cmake
  custom_config_xn_file.xn

custom_config.cmake provides the CMake target of the configuration. This target should link the required RTOS framework libraries to support the configuration it defines.

custom_config_xn_file.xn provides various hardware parameters including but not limited to the chip package, IO mapping, and network information.

platform_conf.h provides default configuration of all header defined configuration macros. These may be overridden by compile definitions or application headers.

driver_instances.h provides the declaration of all RTOS drivers in the configuration. It may define XCORE hardware resources, such as ports and clockblocks. It may also define tile placements.

driver_instances.c provides the definition of all RTOS drivers in the configuration.

platform_init.h provides the declaration of platform_init(chanend_t other_tile_c) and platform_start(void)

platform_init.c provides the initialization of all drivers defined in the configuration through the definition of platform_init(chanend_t other_tile_c). This code is run before the scheduler is started and therefore will not be able to access all RTOS driver functionalities nor kernel objects.

platform_start.c provides the starting of all drivers defined in the configuration through the definition of platform_start(void). It may also perform any initialization setup, such as configuring the app_pll or setting up an on board DAC. This code is run once the kernel is running and is therefore subject to preemption and other dynamic scheduling SMP programming considerations.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application Design$$$Developing and Debugging Memory£££modules/rtos/doc/programming_guide/tutorials/application_design.html#developing-and-debugging-memory

The XTC Tools provide compile time information to aid developers in creating and testing of their application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application Design$$$Developing and Debugging Memory$$$Resource Usage£££modules/rtos/doc/programming_guide/tutorials/application_design.html#resource-usage

One of these features if the -report option, which will Display a summary of resource usage. One of the outputs of this report is memory usage, split into the stack, code, and data requirements of the program. Unlike most XC applications, FreeRTOS makes heavy use of dynamic memory allocation. The FreeRTOS heap will appear as Data in the XTC Tools report. The heap size is determined by the compile time definition configTOTAL_HEAP_SIZE, which can be found in an application’s FreeRTOSConfig.h.

For AMP SMP FreeRTOS builds, which are created using the cmake macro merge_binaries(), there are actually multiple application builds, one per tile, which are then combined. While building a given AMP application, the console output will contain both of the individual tile build reports.

As an example, consider building the example_freertos_explorer_board target.

Constraint check for tile[0]:
  Memory available:       524288,   used:      318252 .  OKAY
    (Stack: 5260, Code: 42314, Data: 270678)
Constraints checks PASSED WITH CAVEATS.
Constraint check for tile[1]:
  Memory available:       524288,   used:       4060 .  OKAY
    (Stack: 356, Code: 3146, Data: 558)
Constraints checks PASSED.

Constraint check for tile[0]:
  Memory available:       524288,   used:       4836 .  OKAY
    (Stack: 356, Code: 3802, Data: 678)
Constraints checks PASSED.
Constraint check for tile[1]:
  Memory available:       524288,   used:      319476 .  OKAY
    (Stack: 14740, Code: 30730, Data: 274006)
Constraints checks PASSED WITH CAVEATS.

In this example, the cmake contains the command:

merge_binaries(example_freertos_explorer_board tile0_example_freertos_explorer_board tile1_example_freertos_explorer_board 1)

Which means the final application usage would be interpreted as:

Constraint check for tile[0]:
  Memory available:       524288,   used:      318252 .  OKAY
    (Stack: 5260, Code: 42314, Data: 270678)
Constraints checks PASSED WITH CAVEATS.
Constraint check for tile[1]:
  Memory available:       524288,   used:      319476 .  OKAY
    (Stack: 14740, Code: 30730, Data: 274006)
Constraints checks PASSED WITH CAVEATS.

Because the tile 1 portion of the tile1 target build replaces the tile 1 portion in the tile0 target build.

The XTC Tools also provide a method to examine the resource usage of a binary post build. This method will only work if used on the intermediate binaries.

$ xobjdump --resources tile0_example_freertos_explorer_board.xe
$ xobjdump --resources tile1_example_freertos_explorer_board.xe

Note: Because the resulting example_freertos_explorer_board.xe binary was created by merging into tile0_example_freertos_explorer_board.xe, the results of xobjdump –resources example_freertos_explorer_board.xe will be the exact same as xobjdump –resources tile0_example_freertos_explorer_board.xe and not account for the actual tile 1 requirements.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application Design$$$Building RTOS Applications£££modules/rtos/doc/programming_guide/tutorials/application_design.html#building-rtos-applications

Applications using the RTOS Framework are built using CMake. The RTOS framework provides many libraries, drivers and software services, all of which can be included by the application’s CMakeLists.txt file. The application’s CMakeLists can specify precisely which drivers and software services within the SDK should be included through the use of various CMake target aliases.

See the Build System Guide for more information on the build system.

See the Build System Guide - Targets for more information on the build system target aliases.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$Board Support Configurations£££modules/rtos/doc/programming_guide/tutorials/bsp_config.html#board-support-configurations

xcore leverages its architecture to provide a flexible chip where many typically silicon based peripherals are found in software. This allows a chip to be reconfigured in a way that provides the specific IO required for a given application, thus resulting in a low cost yet incredibly silicon efficient solution. Board support configurations (bsp_configs) are the description for the hardware IO that exists in a given board. The bsp_configs provide the application programmer with an API to initialize and start the hardware configuration, as well as the supported RTOS driver contexts. The programming model in this FreeRTOS architecture is:

  • .xn files provide the mapping of ports, pins, and links

  • bsp_configs specify, setup, and start hardware IO and provide the application with RTOS driver contexts

  • applications use the bsp_config init/start code as well as RTOS driver contexts, similar to conventional microcontroller programming models.

To support any generic bsp_config, applications should call platform_init() before starting the scheduler, and then platform_start() after the scheduler is running and before any RTOS drivers are used.

The bsp_configs provided with the RTOS framework in modules/rtos/modules/bsp_config are an excellent starting point. They provide the most common peripheral drivers that are supported by the boards that support RTOS framework based applications. For advanced users, it is recommended that you copy one of these bsp_config into your application project and customize as needed.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$Board Support Configurations$$$Creating Custom bsp_configs£££modules/rtos/doc/programming_guide/tutorials/bsp_config.html#creating-custom-bsp-configs

To enable hardware portability, a minimal bsp_config should contain the following:

custom_config/
  platform/
    driver_instances.c
    driver_instances.h
    platform_conf.h
    platform_init.c
    platform_init.h
    platform_start.c
  custom_config.cmake
  custom_config_xn_file.xn

custom_config.cmake provides the CMake target of the configuration. This target should link the required RTOS framework libraries to support the configuration it defines.

custom_config_xn_file.xn provides various hardware parameters including but not limited to the chip package, IO mapping, and network information.

platform_conf.h provides default configuration of all header defined configuration macros. These may be overridden by compile definitions or application headers.

driver_instances.h provides the declaration of all RTOS drivers in the configuration. It may define XCORE hardware resources, such as ports and clockblocks. It may also define tile placements.

driver_instances.c provides the definition of all RTOS drivers in the configuration.

platform_init.h provides the declaration of platform_init(chanend_t other_tile_c) and platform_start(void)

platform_init.c provides the initialization of all drivers defined in the configuration through the definition of platform_init(chanend_t other_tile_c). This code is run before the scheduler is started and therefore will not be able to access all RTOS driver functionalities nor kernel objects.

platform_start.c provides the starting of all drivers defined in the configuration through the definition of platform_start(void). It may also perform any initialization setup, such as configuring the app_pll or setting up an on board DAC. This code is run once the kernel is running and is therefore subject to preemption and other dynamic scheduling SMP programming considerations.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application DFU£££modules/rtos/doc/programming_guide/tutorials/application_dfu_usage.html#rtos-application-dfu

This document is intended to help you use the RTOS DFU driver and RTOS QSPI flash driver in an application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application DFU$$$DFU Driver Overview£££modules/rtos/doc/programming_guide/tutorials/application_dfu_usage.html#dfu-driver-overview

This driver provides the application with the boot partition and data partition layout of the flash used by the second stage bootloader. The driver provides a subset of the functionality of libquadflash enabling the application to use any transport method and the RTOS qspi flash driver to read the factory image, read/write a single upgrade image, and read/write the data partition.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application DFU$$$Reading the Factory Image£££modules/rtos/doc/programming_guide/tutorials/application_dfu_usage.html#reading-the-factory-image

To read back the factory image:

unsigned addr = rtos_dfu_image_get_factory_addr(dfu_image_ctx);
unsigned size = rtos_dfu_image_get_factory_size(dfu_image_ctx);

unsigned char *buf = pvPortMalloc(sizeof(unsigned char) * size);

rtos_qspi_flash_read(
      qspi_flash_ctx,
      (uint8_t *)buf,
      addr,
      size);

// buf now contains the factory image contents

It is advised to perform this operation in blocks rather than full image size to reduce memory usage. Once the buffer is populated from flash, it can be sent over the desired transport method, such as USB, I2C, etc.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application DFU$$$Reading the Upgrade Image£££modules/rtos/doc/programming_guide/tutorials/application_dfu_usage.html#reading-the-upgrade-image

To read back the upgrade image:

unsigned addr = rtos_dfu_image_get_upgrade_addr(dfu_image_ctx);
unsigned size = rtos_dfu_image_get_upgrade_size(dfu_image_ctx);

unsigned char *buf = pvPortMalloc(sizeof(unsigned char) * size);

rtos_qspi_flash_read(
      qspi_flash_ctx,
      (uint8_t *)buf,
      addr,
      size);

// buf now contains the upgrade image contents

It is advised to perform this operation in blocks rather than full image size to reduce memory usage. Once the buffer is populated from flash, it can be sent over the desired transport method, such as USB, I2C, etc.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application DFU$$$Writing the Upgrade Image£££modules/rtos/doc/programming_guide/tutorials/application_dfu_usage.html#writing-the-upgrade-image

To overwrite the current upgrade image:

// Assuming buf contains the image data
// and size contains the size in bytes

unsigned addr = rtos_dfu_image_get_upgrade_addr(dfu_image_ctx);
unsigned data_partition_base_addr = rtos_dfu_image_get_data_partition_addr(dfu_image_ctx);
unsigned bytes_avail = data_partition_base_addr - addr;

size_t sector_size = rtos_qspi_flash_sector_size_get(qspi_flash_ctx);

if(size < bytes_avail) {
   unsigned char *tmp_buf = pvPortMalloc(sizeof(unsigned char) * sector_size);
   unsigned cur_offset = 0;
   do {
      unsigned length = (size - (cur_offset - addr)) >= sector_size ? sector_size : (size - (cur_offset - addr));
      rtos_qspi_flash_lock(qspi_flash_ctx);
      {
         rtos_qspi_flash_read(
                  qspi_flash_ctx,
                  tmp_buf,
                  addr + cur_offset,
                  sector_size);
         memcpy(tmp_buf, data + cur_offset, length);
         rtos_qspi_flash_erase(
                  qspi_flash_ctx,
                  addr + cur_offset,
                  sector_size);
         rtos_qspi_flash_write(
                  qspi_flash_ctx,
                  (uint8_t *) tmp_buf,
                  addr + cur_offset,
                  sector_size);
      }
      rtos_qspi_flash_unlock(qspi_flash_ctx);
      cur_offset += length;
   } while(cur_offset < (size - 1));

   vPortFree(tmp_buf);
} else {
   rtos_printf("Insufficient space for upgrade image\n");
}

It is advised to perform this operation in blocks rather than full image size to reduce memory usage. The buffer can be populated over the desired transport method, such as USB, I2C, etc.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application DFU$$$Reading the Data Partition Image£££modules/rtos/doc/programming_guide/tutorials/application_dfu_usage.html#reading-the-data-partition-image

To read back the data partition image:

unsigned addr = rtos_dfu_image_get_data_partition_addr(dfu_image_ctx);
unsigned size = rtos_qspi_flash_size_get(qspi_flash_ctx);

unsigned char *buf = pvPortMalloc(sizeof(unsigned char) * size);

rtos_qspi_flash_read(
      qspi_flash_ctx,
      (uint8_t *)buf,
      addr,
      size);

// buf now contains the data partition image contents

It is advised to perform this operation in blocks rather than full image size to reduce memory usage. The data partition will likely be too large to read into SRAM in a read single operation. Once the buffer is populated from flash, it can be sent over the desired transport method, such as USB, I2C, etc.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Tutorials$$$RTOS Application DFU$$$Writing the Data Partition Image£££modules/rtos/doc/programming_guide/tutorials/application_dfu_usage.html#writing-the-data-partition-image

To overwrite the current data partition image:

// Assuming buf contains the image data
// and size contains the size in bytes

unsigned addr = rtos_dfu_image_get_data_partition_addr(dfu_image_ctx);
unsigned end_addr = rtos_qspi_flash_size_get(qspi_flash_ctx);
unsigned bytes_avail = end_addr - addr;

size_t sector_size = rtos_qspi_flash_sector_size_get(qspi_flash_ctx);

if(size < bytes_avail) {
   unsigned char *tmp_buf = pvPortMalloc(sizeof(unsigned char) * sector_size);
   unsigned cur_offset = 0;
   do {
      unsigned length = (size - (cur_offset - addr)) >= sector_size ? sector_size : (size - (cur_offset - addr));
      rtos_qspi_flash_lock(qspi_flash_ctx);
      {
         rtos_qspi_flash_read(
                  qspi_flash_ctx,
                  tmp_buf,
                  addr + cur_offset,
                  sector_size);
         memcpy(tmp_buf, data + cur_offset, length);
         rtos_qspi_flash_erase(
                  qspi_flash_ctx,
                  addr + cur_offset,
                  sector_size);
         rtos_qspi_flash_write(
                  qspi_flash_ctx,
                  (uint8_t *) tmp_buf,
                  addr + cur_offset,
                  sector_size);
      }
      rtos_qspi_flash_unlock(qspi_flash_ctx);
      cur_offset += length;
   } while(cur_offset < (size - 1));

   vPortFree(tmp_buf);
} else {
   rtos_printf("Insufficient space for data partition image\n");
}

It is advised to perform this operation in blocks rather than full image size to reduce memory usage. The buffer can be populated over the desired transport method, such as USB, I2C, etc.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference£££modules/rtos/doc/programming_guide/reference/api.html#api-reference

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers£££modules/rtos/doc/programming_guide/reference/rtos_drivers/rtos_drivers.html#rtos-drivers

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O£££modules/rtos/doc/programming_guide/reference/rtos_drivers/rtos_drivers.html#i-o
XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$GPIO RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/gpio.html#gpio-rtos-driver

This driver can be used to operate GPIO ports on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/gpio.html#initialization-api

The following structures and functions are used to initialize and start a GPIO driver instance.

enum rtos_gpio_port_id_t

Enumerator type representing each available GPIO port.

To be used with the RTOS GPIO driver functions.

Values:

enumerator rtos_gpio_port_none
enumerator rtos_gpio_port_1A
enumerator rtos_gpio_port_1B
enumerator rtos_gpio_port_1C
enumerator rtos_gpio_port_1D
enumerator rtos_gpio_port_1E
enumerator rtos_gpio_port_1F
enumerator rtos_gpio_port_1G
enumerator rtos_gpio_port_1H
enumerator rtos_gpio_port_1I
enumerator rtos_gpio_port_1J
enumerator rtos_gpio_port_1K
enumerator rtos_gpio_port_1L
enumerator rtos_gpio_port_1M
enumerator rtos_gpio_port_1N
enumerator rtos_gpio_port_1O
enumerator rtos_gpio_port_1P
enumerator rtos_gpio_port_4A
enumerator rtos_gpio_port_4B
enumerator rtos_gpio_port_4C
enumerator rtos_gpio_port_4D
enumerator rtos_gpio_port_4E
enumerator rtos_gpio_port_4F
enumerator rtos_gpio_port_8A
enumerator rtos_gpio_port_8B
enumerator rtos_gpio_port_8C
enumerator rtos_gpio_port_8D
enumerator rtos_gpio_port_16A
enumerator rtos_gpio_port_16B
enumerator rtos_gpio_port_16C
enumerator rtos_gpio_port_16D
enumerator rtos_gpio_port_32A
enumerator rtos_gpio_port_32B
enumerator RTOS_GPIO_TOTAL_PORT_CNT

Total number of I/O ports

typedef struct rtos_gpio_struct rtos_gpio_t

Typedef to the RTOS GPIO driver instance struct.

typedef void (*rtos_gpio_isr_cb_t)(rtos_gpio_t *ctx, void *app_data, rtos_gpio_port_id_t port_id, uint32_t value)

Function pointer type for application provided RTOS GPIO interrupt callback functions.

These callback functions are called when there is a GPIO port interrupt.

Note

this is the latched value that triggered the interrupt, not the current value.

Param ctx:

A pointer to the associated GPIO driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param port_id:

The GPIO port that triggered the interrupt.

Param value:

The value on the GPIO port that caused the interrupt.

inline rtos_gpio_port_id_t rtos_gpio_port(port_t p)

Helper function to convert an xcore I/O port resource ID to an RTOS GPIO driver port ID.

Parameters:
  • p – An xcore I/O port resource ID.

Returns:

the equivalent RTOS GPIO driver port ID.

void rtos_gpio_start(rtos_gpio_t *ctx)

Starts an RTOS GPIO driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before any of the core GPIO driver functions are called with this instance.

rtos_gpio_init() must be called on this GPIO driver instance prior to calling this.

Parameters:
  • ctx – A pointer to the GPIO driver instance to start.

void rtos_gpio_init(rtos_gpio_t *ctx)

Initializes an RTOS GPIO driver instance. There should only be one per tile. This instance represents all the GPIO ports owned by the calling tile. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_gpio_start() or any of the core GPIO driver functions with this instance.

Parameters:
  • ctx – A pointer to the GPIO driver instance to initialize.

RTOS_GPIO_ISR_CALLBACK_ATTR

This attribute must be specified on all RTOS GPIO interrupt callback functions provided by the application.

struct rtos_gpio_isr_info_t
#include <rtos_gpio.h>

Struct to hold interrupt state data for GPIO ports.

The members in this struct should not be accessed directly.

struct rtos_gpio_struct
#include <rtos_gpio.h>

Struct representing an RTOS GPIO driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/gpio.html#core-api

The following functions are the core GPIO driver functions that are used after it has been initialized and started.

inline void rtos_gpio_port_enable(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Enables a GPIO port. This must be called on a port before using it with any other GPIO driver function.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to enable.

inline uint32_t rtos_gpio_port_in(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Inputs the value present on a GPIO port’s pins.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to read from.

Returns:

the value on the port’s pins.

inline void rtos_gpio_port_out(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id, uint32_t value)

Outputs a value to a GPIO port’s pins.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to write to.

  • value – The value to write to the GPIO port.

inline void rtos_gpio_isr_callback_set(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id, rtos_gpio_isr_cb_t cb, void *app_data)

Sets the application callback function to be called when there is an interrupt on a GPIO port.

This must be called prior to enabling interrupts on port_id. It is also safe to be called while interrupts are enabled on it.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – Interrupts triggered by this port will call the application callback function cb.

  • cb – The application callback function to call when there is an interrupt triggered by the port port_id.

  • app_data – A pointer to application specific data to pass to the application callback function cb.

inline void rtos_gpio_interrupt_enable(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Enables interrupts on a GPIO port. Interrupts are triggered whenever the value on the port changes.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to enable interrupts on.

inline void rtos_gpio_interrupt_disable(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Disables interrupts on a GPIO port.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to disable interrupts on.

inline void rtos_gpio_port_drive(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Configures a port in drive mode. Output values will be driven on the pins. This is the default drive state of a port. This has the side effect of disabling the port’s internal pull-up and pull down resistors.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to set to drive mode.

inline void rtos_gpio_port_drive_low(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Configures a port in drive low mode. When the output value is 0 the pin is driven low, otherwise no value is driven. This has the side effect of enabled the port’s internal pull-up resistor.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to set to drive mode low.

inline void rtos_gpio_port_drive_high(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Configures a port in drive high mode. When the output value is 1 the pin is driven high, otherwise no value is driven. This has the side effect of enabled the port’s internal pull-down resistor.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to set to drive mode high.

inline void rtos_gpio_port_pull_none(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Disables the port’s internal pull-up and pull down resistors.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to set to pull none mode.

inline void rtos_gpio_port_pull_up(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Enables the port’s internal pull-up resistor.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to set to pull up mode.

inline void rtos_gpio_port_pull_down(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id)

Enables the port’s internal pull-down resistor.

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to set to pull down mode.

inline void rtos_gpio_write_control_word(rtos_gpio_t *ctx, rtos_gpio_port_id_t port_id, uint32_t value)

Configures the port control word value

Parameters:
  • ctx – A pointer to the GPIO driver instance to use.

  • port_id – The GPIO port to modify

  • value – The value to set the control word to

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/gpio.html#rpc-initialization-api

The following functions may be used to share a GPIO driver instance with other xcore tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_gpio_rpc_client_init(rtos_gpio_t *gpio_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS GPIO driver instance on a client tile. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of rtos_gpio_init(). The host tile that owns the actual instance must simultaneously call rtos_gpio_rpc_host_init().

Parameters:
  • gpio_ctx – A pointer to the GPIO driver instance to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as gpio_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as gpio_ctx.

void rtos_gpio_rpc_host_init(rtos_gpio_t *gpio_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on a GPIO driver instance to allow client tiles to use the GPIO driver instance. Each client tile that will use this instance must simultaneously call rtos_gpio_rpc_client_init().

Parameters:
  • gpio_ctx – A pointer to the GPIO driver instance to share with clients.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as gpio_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as gpio_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_gpio_rpc_config(rtos_gpio_t *gpio_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for a GPIO driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_gpio_rpc_client_init(). After calling this, the client tile may immediately begin to call the core GPIO functions on this driver instance. It does not need to wait for the host to call rtos_gpio_start().

On the host tile this must be called both after calling rtos_gpio_rpc_host_init() and before calling rtos_gpio_start().

Parameters:
  • gpio_ctx – A pointer to the GPIO driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2C RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2c/i2c.html#i2c-rtos-driver

This driver can be used to instantiate and control an I2C master or slave mode I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2C Master RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2c/i2c_master.html#i2c-master-rtos-driver

This driver can be used to instantiate and control an I2C master I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2C Master Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2c/i2c_master.html#i2c-master-initialization-api

The following structures and functions are used to initialize and start an I2C driver instance.

typedef struct rtos_i2c_master_struct rtos_i2c_master_t

Typedef to the RTOS I2C master driver instance struct.

void rtos_i2c_master_start(rtos_i2c_master_t *i2c_master_ctx)

Starts an RTOS I2C master driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before any of the core I2C master driver functions are called with this instance.

rtos_i2c_master_init() must be called on this I2C master driver instance prior to calling this.

Parameters:
  • i2c_master_ctx – A pointer to the I2C master driver instance to start.

void rtos_i2c_master_init(rtos_i2c_master_t *i2c_master_ctx, const port_t p_scl, const uint32_t scl_bit_position, const uint32_t scl_other_bits_mask, const port_t p_sda, const uint32_t sda_bit_position, const uint32_t sda_other_bits_mask, hwtimer_t tmr, const unsigned kbits_per_second)

Initializes an RTOS I2C master driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_i2c_master_start() or any of the core I2C master driver functions with this instance.

Parameters:
  • i2c_master_ctx – A pointer to the I2C master driver instance to initialize.

  • p_scl – The port containing SCL. This may be either the same as or different than p_sda.

  • scl_bit_position – The bit number of the SCL line on the port p_scl.

  • scl_other_bits_mask – A value that is ORed into the port value driven to p_scl both when SCL is high and low. The bit representing SCL (as well as SDA if they share the same port) must be set to 0.

  • p_sda – The port containing SDA. This may be either the same as or different than p_scl.

  • sda_bit_position – The bit number of the SDA line on the port p_sda.

  • sda_other_bits_mask – A value that is ORed into the port value driven to p_sda both when SDA is high and low. The bit representing SDA (as well as SCL if they share the same port) must be set to 0.

  • tmr – This is unused and should be set to 0. This will be removed.

  • kbits_per_second – The speed of the I2C bus. The maximum value allowed is 400.

struct rtos_i2c_master_struct
#include <rtos_i2c_master.h>

Struct representing an RTOS I2C master driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2C Master Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2c/i2c_master.html#i2c-master-core-api

The following functions are the core I2C driver functions that are used after it has been initialized and started.

inline i2c_res_t rtos_i2c_master_write(rtos_i2c_master_t *ctx, uint8_t device_addr, uint8_t buf[], size_t n, size_t *num_bytes_sent, int send_stop_bit)

Writes data to an I2C bus as a master.

Parameters:
  • ctx – A pointer to the I2C master driver instance to use.

  • device_addr – The address of the device to write to.

  • buf – The buffer containing data to write.

  • n – The number of bytes to write.

  • num_bytes_sent – The function will set this value to the number of bytes actually sent. On success, this will be equal to n but it will be less if the slave sends an early NACK on the bus and the transaction fails.

  • send_stop_bit – If this is non-zero then a stop bit will be sent on the bus after the transaction. This is usually required for normal operation. If this parameter is zero then no stop bit will be omitted. In this case, no other task can use the component until a stop bit has been sent.

Return values:
  • ``I2C_ACK`` – if the write was acknowledged by the device.

  • ``I2C_NACK``otherwise.

inline i2c_res_t rtos_i2c_master_read(rtos_i2c_master_t *ctx, uint8_t device_addr, uint8_t buf[], size_t n, int send_stop_bit)

Reads data from an I2C bus as a master.

Parameters:
  • ctx – A pointer to the I2C master driver instance to use.

  • device_addr – The address of the device to read from.

  • buf – The buffer to fill with data.

  • n – The number of bytes to read.

  • send_stop_bit – If this is non-zero then a stop bit. will be sent on the bus after the transaction. This is usually required for normal operation. If this parameter is zero then no stop bit will be omitted. In this case, no other task can use the component until a stop bit has been sent.

Return values:
  • ``I2C_ACK`` – if the read was acknowledged by the device.

  • ``I2C_NACK``otherwise.

inline void rtos_i2c_master_stop_bit_send(rtos_i2c_master_t *ctx)

Send a stop bit to an I2C bus as a master.

This function will cause a stop bit to be sent on the bus. It should be used to complete/abort a transaction if the send_stop_bit argument was not set when calling the rtos_i2c_master_read() or rtos_i2c_master_write() functions.

Parameters:
  • ctx – A pointer to the I2C master driver instance to use.

inline i2c_regop_res_t rtos_i2c_master_reg_write(rtos_i2c_master_t *ctx, uint8_t device_addr, uint8_t reg_addr, uint8_t data)

Write to an 8-bit register on an I2C device.

This function writes to an 8-bit addressed, 8-bit register in an I2C device. The function writes the data by sending the register address followed by the register data to the device at the specified device address.

Parameters:
  • ctx – A pointer to the I2C master driver instance to use.

  • device_addr – The address of the device to write to.

  • reg_addr – The address of the register to write to.

  • data – The 8-bit value to write.

Return values:
  • ``I2C_REGOP_DEVICE_NACK`` – if the address is NACKed.

  • ``I2C_REGOP_INCOMPLETE`` – if not all data was ACKed.

  • ``I2C_REGOP_SUCCESS`` – on successful completion of the write.

inline i2c_regop_res_t rtos_i2c_master_reg_read(rtos_i2c_master_t *ctx, uint8_t device_addr, uint8_t reg_addr, uint8_t *data)

Reads from an 8-bit register on an I2C device.

This function reads from an 8-bit addressed, 8-bit register in an I2C device. The function reads the data by sending the register address followed reading the register data from the device at the specified device address.

Note that no stop bit is transmitted between the write and the read. The operation is performed as one transaction using a repeated start.

Parameters:
  • ctx – A pointer to the I2C master driver instance to use.

  • device_addr – The address of the device to read from.

  • reg_addr – The address of the register to read from.

  • data – A pointer to the byte to fill with data read from the register.

Return values:
  • ``I2C_REGOP_DEVICE_NACK`` – if the device NACKed.

  • ``I2C_REGOP_SUCCESS`` – on successful completion of the read.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2C Master RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2c/i2c_master.html#i2c-master-rpc-initialization-api

The following functions may be used to share a I2C driver instance with other xcore tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_i2c_master_rpc_client_init(rtos_i2c_master_t *i2c_master_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS I2C master driver instance on a client tile. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of rtos_i2c_master_init(). The host tile that owns the actual instance must simultaneously call rtos_i2c_master_rpc_host_init().

Parameters:
  • i2c_master_ctx – A pointer to the I2C master driver instance to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as i2c_master_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as i2c_master_ctx.

void rtos_i2c_master_rpc_host_init(rtos_i2c_master_t *i2c_master_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on an I2C master driver instance to allow client tiles to use the I2C master driver instance. Each client tile that will use this instance must simultaneously call rtos_i2c_master_rpc_client_init().

Parameters:
  • i2c_master_ctx – A pointer to the I2C master driver instance to share with clients.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as i2c_master_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as i2c_master_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_i2c_master_rpc_config(rtos_i2c_master_t *i2c_master_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for an I2C master driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_i2c_master_rpc_client_init(). After calling this, the client tile may immediately begin to call the core I2C master functions on this driver instance. It does not need to wait for the host to call rtos_i2c_master_start().

On the host tile this must be called both after calling rtos_i2c_master_rpc_host_init() and before calling rtos_i2c_master_start().

Parameters:
  • i2c_master_ctx – A pointer to the I2C master driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2C Slave RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2c/i2c_slave.html#i2c-slave-rtos-driver

This driver can be used to instantiate and control an I2C slave I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2C Slave API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2c/i2c_slave.html#i2c-slave-api

The following structures and functions are used to initialize and start an I2C driver instance.

typedef struct rtos_i2c_slave_struct rtos_i2c_slave_t

Typedef to the RTOS I2C slave driver instance struct.

typedef void (*rtos_i2c_slave_start_cb_t)(rtos_i2c_slave_t *ctx, void *app_data)

Function pointer type for application provided RTOS I2C slave start callback functions.

These callback functions are optionally called by an I2C slave driver’s thread when it is first started. This gives the application a chance to perform startup initialization from within the driver’s thread.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

typedef void (*rtos_i2c_slave_rx_cb_t)(rtos_i2c_slave_t *ctx, void *app_data, uint8_t *data, size_t len)

Function pointer type for application provided RTOS I2C slave receive callback functions.

These callback functions are called when an I2C slave driver instance has received data from a master device.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param data:

A pointer to the data received from the master.

Param len:

The number of valid bytes in data.

typedef size_t (*rtos_i2c_slave_tx_start_cb_t)(rtos_i2c_slave_t *ctx, void *app_data, uint8_t **data)

Function pointer type for application provided RTOS I2C slave transmit start callback functions.

These callback functions are called when an I2C slave driver instance needs to transmit data to a master device. This callback must provide the data to transmit and the length.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param data:

A pointer to the data buffer to transmit to the master. The driver sets this to its internal data buffer, which has a size of RTOS_I2C_SLAVE_BUF_LEN, prior to calling this callback. This may be set to a different buffer by the callback. The callback must fill this buffer with the data to send to the master.

Return:

The number of bytes to transmit to the master from data. If the master reads more bytes than this, the driver will wrap around to the start of the buffer and send it again.

typedef void (*rtos_i2c_slave_tx_done_cb_t)(rtos_i2c_slave_t *ctx, void *app_data, uint8_t *data, size_t len)

Function pointer type for application provided RTOS I2C slave transmit done callback functions.

These callback functions are optionally called when an I2C slave driver instance is done transmitting data to a master device. A buffer to the data sent and the actual number of bytes sent are provided to the callback.

The application may want to use this, for example, if the buffer that was sent was malloc’d. This callback can be used to free the buffer.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param data:

A pointer to the data transmitted to the master.

Param len:

The number of bytes transmitted to the master from data.

typedef void (*rtos_i2c_slave_rx_byte_check_cb_t)(rtos_i2c_slave_t *ctx, void *app_data, uint8_t data, i2c_slave_ack_t *cur_status)

Function pointer type for application provided function to check bytes received from master individually.

This callback function is called once per byte received from the master device.

The application may want to use this, for example, to check byte by byte and force a NACK for an unexpected payload.

The user provided functions must be marked with RTOS_I2C_SLAVE_MASTER_SENT_BYTE_CHECK_CALLBACK_ATTR.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param data:

A copy of the most recent byte of data transmitted from the master.

Param cur_status:

A pointer to the current ACK/NACK response for this byte. The application may change this to I2C_SLAVE_ACK or I2C_SLAVE_NACK. If cur_status is returned as an invalid value, the driver will implicitly NACK.

typedef void (*rtos_i2c_slave_write_addr_request_cb_t)(rtos_i2c_slave_t *ctx, void *app_data, i2c_slave_ack_t *cur_status)

Function pointer type for application provided function to alert application that there is a write transaction incoming from master

This allows an application to NACK if it is not ready for handling write requests.

The user provided functions must be marked with RTOS_I2C_SLAVE_WRITE_ADDR_REQUEST_CALLBACK_ATTR.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param cur_status:

A pointer to the current ACK/NACK response for this byte. The application may change this to I2C_SLAVE_ACK or I2C_SLAVE_NACK. If cur_status is returned as an invalid value, the driver will implicitly NACK. By default the driver will implicitly ACK.

void rtos_i2c_slave_start(rtos_i2c_slave_t *i2c_slave_ctx, void *app_data, rtos_i2c_slave_start_cb_t start, rtos_i2c_slave_rx_cb_t rx, rtos_i2c_slave_tx_start_cb_t tx_start, rtos_i2c_slave_tx_done_cb_t tx_done, rtos_i2c_slave_rx_byte_check_cb_t rx_byte_check, rtos_i2c_slave_write_addr_request_cb_t write_addr_req, unsigned interrupt_core_id, unsigned priority)

Starts an RTOS I2C slave driver instance. This must only be called by the tile that owns the driver instance. It must be called after starting the RTOS from an RTOS thread.

rtos_i2c_slave_init() must be called on this I2C slave driver instance prior to calling this.

Parameters:
  • i2c_slave_ctx – A pointer to the I2C slave driver instance to start.

  • app_data – A pointer to application specific data to pass to the callback functions.

  • start – The callback function that is called when the driver’s thread starts. This is optional and may be NULL.

  • rx – The callback function to receive data from the bus master.

  • tx_start – The callback function to transmit data to the bus master.

  • tx_done – The callback function that is notified when transmits are complete. This is optional and may be NULL.

  • rx_byte_check – The callback function to check received bytes individually.

  • write_addr_req – The callback function to alert an incoming write request

  • interrupt_core_id – The ID of the core on which to enable the I2C interrupt.

  • priority – The priority of the task that gets created by the driver to call the callback functions.

void rtos_i2c_slave_init(rtos_i2c_slave_t *i2c_slave_ctx, uint32_t io_core_mask, const port_t p_scl, const port_t p_sda, uint8_t device_addr)

Initializes an RTOS I2C slave driver instance. This must only be called by the tile that owns the driver instance. It should be called before starting the RTOS, and must be called before calling rtos_i2c_slave_start().

Parameters:
  • i2c_slave_ctx – A pointer to the I2C slave driver instance to initialize.

  • io_core_mask – A bitmask representing the cores on which the low level I2C I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • p_scl – The port containing SCL. This must be a 1-bit port and different than p_sda.

  • p_sda – The port containing SDA. This must be a 1-bit port and different than p_scl.

  • device_addr – The 7-bit address of the slave device.

RTOS_I2C_SLAVE_BUF_LEN

The maximum number of bytes that a the RTOS I2C slave driver can receive from a master in a single write transaction.

RTOS_I2C_SLAVE_CALLBACK_ATTR

This attribute must be specified on all RTOS I2C slave callback functions provided by the application.

RTOS_I2C_SLAVE_RX_BYTE_CHECK_CALLBACK_ATTR

This attribute must be specified on all RTOS I2C slave rtos_i2c_slave_rx_byte_check_cb_t provided by the application.

RTOS_I2C_SLAVE_WRITE_ADDR_REQUEST_CALLBACK_ATTR

This attribute must be specified on all RTOS I2C slave rtos_i2c_slave_write_addr_request_cb_t provided by the application.

struct rtos_i2c_slave_struct
#include <rtos_i2c_slave.h>

Struct representing an RTOS I2C slave driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2S RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2s/i2s.html#i2s-rtos-driver

This driver can be used to instantiate and control an I2S master or slave mode I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2s/i2s.html#initialization-api

The following structures and functions are used to initialize and start an I2S driver instance.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2S Master Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2s/i2s_master.html#i2s-master-initialization-api

The following structures and functions are used to initialize and start an I2S master driver instance.

void rtos_i2s_master_init(rtos_i2s_t *i2s_ctx, uint32_t io_core_mask, port_t p_dout[], size_t num_out, port_t p_din[], size_t num_in, port_t p_bclk, port_t p_lrclk, port_t p_mclk, xclock_t bclk)

Initializes an RTOS I2S driver instance in master mode. This must only be called by the tile that owns the driver instance. It should be called before starting the RTOS, and must be called before calling rtos_i2s_start() or any of the core I2S driver functions with this instance.

Parameters:
  • i2s_ctx – A pointer to the I2S driver instance to initialize.

  • io_core_mask – A bitmask representing the cores on which the low level I2S I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • p_dout – An array of data output ports.

  • num_out – The number of output data ports.

  • p_din – An array of data input ports.

  • num_in – The number of input data ports.

  • p_bclk – The bit clock output port.

  • p_lrclk – The word clock output port.

  • p_mclk – Input port which supplies the master clock.

  • bclk – A clock that will get configured for use with the bit clock.

void rtos_i2s_master_ext_clock_init(rtos_i2s_t *i2s_ctx, uint32_t io_core_mask, port_t p_dout[], size_t num_out, port_t p_din[], size_t num_in, port_t p_bclk, port_t p_lrclk, xclock_t bclk)

Initializes an RTOS I2S driver instance in master mode but that uses an externally generated bit clock. This must only be called by the tile that owns the driver instance. It should be called before starting the RTOS, and must be called before calling rtos_i2s_start() or any of the core I2S driver functions with this instance.

Parameters:
  • i2s_ctx – A pointer to the I2S driver instance to initialize.

  • io_core_mask – A bitmask representing the cores on which the low level I2S I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • p_dout – An array of data output ports.

  • num_out – The number of output data ports.

  • p_din – An array of data input ports.

  • num_in – The number of input data ports.

  • p_bclk – The bit clock output port.

  • p_lrclk – The word clock output port.

  • bclk – A clock that is configured externally to be used as the bit clock

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$I2S Slave Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2s/i2s_slave.html#i2s-slave-initialization-api

The following structures and functions are used to initialize and start an I2S slave driver instance.

void rtos_i2s_slave_init(rtos_i2s_t *i2s_ctx, uint32_t io_core_mask, port_t p_dout[], size_t num_out, port_t p_din[], size_t num_in, port_t p_bclk, port_t p_lrclk, xclock_t bclk)

Initializes an RTOS I2S driver instance in slave mode. This must only be called by the tile that owns the driver instance. It should be called before starting the RTOS, and must be called before calling rtos_i2s_start() or any of the core I2S driver functions with this instance.

Parameters:
  • i2s_ctx – A pointer to the I2S driver instance to initialize.

  • io_core_mask – A bitmask representing the cores on which the low level I2S I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • p_dout – An array of data output ports.

  • num_out – The number of output data ports.

  • p_din – An array of data input ports.

  • num_in – The number of input data ports.

  • p_bclk – The bit clock input port.

  • p_lrclk – The word clock input port.

  • bclk – A clock that will get configured for use with the bit clock.

typedef struct rtos_i2s_struct rtos_i2s_t

Typedef to the RTOS I2S driver instance struct.

typedef size_t (*rtos_i2s_send_filter_cb_t)(rtos_i2s_t *ctx, void *app_data, int32_t *i2s_frame, size_t i2s_frame_size, int32_t *send_buf, size_t samples_available)

Function pointer type for application provided RTOS I2S send filter callback functions.

These callback functions are called when an I2S driver instance needs output the next audio frame to its interface. By default, audio frames in the driver’s send buffer are output directly to its interface. However, this gives the application an opportunity to override this and provide filtering.

These functions must not block.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param i2s_frame:

A pointer to the buffer where the callback should write the next frame to send.

Param i2s_frame_size:

The number of samples that should be written to i2s_frame.

Param send_buf:

A pointer to the next frame in the driver’s send buffer. The callback should use this as the input to its filter.

Param samples_available:

The number of samples available in send_buf.

Return:

the number of samples read out of send_buf.

typedef size_t (*rtos_i2s_receive_filter_cb_t)(rtos_i2s_t *ctx, void *app_data, int32_t *i2s_frame, size_t i2s_frame_size, int32_t *receive_buf, size_t sample_spaces_free)

Function pointer type for application provided RTOS I2S receive filter callback functions.

These callback functions are called when an I2S driver instance has received the next audio frame from its interface. By default, audio frames received from the driver’s interface are put directly into its receive buffer. However, this gives the application an opportunity to override this and provide filtering.

These functions must not block.

Param ctx:

A pointer to the associated I2C slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param i2s_frame:

A pointer to the buffer where the callback should read the next received frame from The callback should use this as the input to its filter.

Param i2s_frame_size:

The number of samples that should be read from i2s_frame.

Param receive_buf:

A pointer to the next frame in the driver’s send buffer. The callback should use this as the input to its filter.

Param sample_spaces_free:

The number of sample spaces free in receive_buf.

Return:

the number of samples written to receive_buf.

inline int rtos_i2s_mclk_bclk_ratio(const unsigned audio_clock_frequency, const unsigned sample_rate)

Helper function to calculate the MCLK/BCLK ratio given the audio clock frequency at the master clock pin and the desired sample rate.

Parameters:
  • audio_clock_frequency – The frequency of the audio clock at the port p_mclk.

  • sample_rate – The desired sample rate.

Returns:

the MCLK/BCLK ratio that should be provided to rtos_i2s_start().

inline void rtos_i2s_send_filter_cb_set(rtos_i2s_t *ctx, rtos_i2s_send_filter_cb_t send_filter_cb, void *send_filter_app_data)
inline void rtos_i2s_receive_filter_cb_set(rtos_i2s_t *ctx, rtos_i2s_receive_filter_cb_t receive_filter_cb, void *receive_filter_app_data)
void rtos_i2s_start(rtos_i2s_t *i2s_ctx, unsigned mclk_bclk_ratio, i2s_mode_t mode, size_t recv_buffer_size, size_t send_buffer_size, unsigned interrupt_core_id)

Starts an RTOS I2S driver instance. This must only be called by the tile that owns the driver instance. It must be called after starting the RTOS from an RTOS thread, and must be called before any of the core I2S driver functions are called with this instance.

One of rtos_i2s_master_init(), rtos_i2s_master_ext_clock_init, or rtos_i2s_slave_init() must be called on this I2S driver instance prior to calling this.

Parameters:
  • i2s_ctx – A pointer to the I2S driver instance to start.

  • mclk_bclk_ratio – The master clock to bit clock ratio. This may be computed by the helper function rtos_i2s_mclk_bclk_ratio(). This is only used if the I2S instance was initialized with rtos_i2s_master_init(). Otherwise it is ignored.

  • mode – The mode of the LR clock. See i2s_mode_t.

  • recv_buffer_size – The size in frames of the input buffer. Each frame is two samples (left and right channels) per input port. For example, a size of two here when num_in is three would create a buffer that holds up to 12 samples.

  • send_buffer_size – The size in frames of the output buffer. Each frame is two samples (left and right channels) per output port. For example, a size of two here when num_out is three would create a buffer that holds up to 12 samples. Frames transmitted by rtos_i2s_tx() are stored in this buffers before they are sent out to the I2S interface.

  • interrupt_core_id – The ID of the core on which to enable the I2S interrupt.

RTOS_I2S_APP_SEND_FILTER_CALLBACK_ATTR

This attribute must be specified on all RTOS I2S send filter callback functions provided by the application.

RTOS_I2S_APP_RECEIVE_FILTER_CALLBACK_ATTR

This attribute must be specified on all RTOS I2S receive filter callback functions provided by the application.

struct rtos_i2s_struct
#include <rtos_i2s.h>

Struct representing an RTOS I2S driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2s/i2s_slave.html#core-api

The following functions are the core I2S driver functions that are used after it has been initialized and started.

inline size_t rtos_i2s_rx(rtos_i2s_t *ctx, int32_t *i2s_sample_buf, size_t frame_count, unsigned timeout)

Receives sample frames from the I2S interface.

This function will block until new frames are available.

Parameters:
  • ctx – A pointer to the I2S driver instance to use.

  • i2s_sample_buf – A buffer to copy the received sample frames into.

  • frame_count – The number of frames to receive from the buffer. This must be less than or equal to the size of the input buffer specified to rtos_i2s_start().

  • timeout – The amount of time to wait before the requested number of frames becomes available.

Returns:

The number of frames actually received into i2s_sample_buf.

inline size_t rtos_i2s_tx(rtos_i2s_t *ctx, int32_t *i2s_sample_buf, size_t frame_count, unsigned timeout)

Transmits sample frames out to the I2S interface.

The samples are stored into a buffer and are not necessarily sent out to the I2S interface before this function returns.

Parameters:
  • ctx – A pointer to the I2S driver instance to use.

  • i2s_sample_buf – A buffer containing the sample frames to transmit out to the I2S interface.

  • frame_count – The number of frames to transmit out from the buffer. This must be less than or equal to the size of the output buffer specified to rtos_i2s_start().

  • timeout – The amount of time to wait before there is enough space in the send buffer to accept the frames to be transmitted.

Returns:

The number of frames actually stored into the buffer.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/i2s/i2s_slave.html#rpc-initialization-api

The following functions may be used to share a I2S driver instance with other xcore tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_i2s_rpc_client_init(rtos_i2s_t *i2s_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS I2S driver instance on a client tile. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of on of the RTOS I2S init functions. The host tile that owns the actual instance must simultaneously call rtos_i2s_rpc_host_init().

Parameters:
  • i2s_ctx – A pointer to the I2S driver instance to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as i2s_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as i2s_ctx.

void rtos_i2s_rpc_host_init(rtos_i2s_t *i2s_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on a I2S driver instance to allow client tiles to use the I2S driver instance. Each client tile that will use this instance must simultaneously call rtos_i2s_rpc_client_init().

Parameters:
  • i2s_ctx – A pointer to the I2S driver instance to share with clients.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as i2s_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as i2s_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_i2s_rpc_config(rtos_i2s_t *i2s_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for a I2S driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_i2s_rpc_client_init(). After calling this, the client tile may immediately begin to call the core I2S functions on this driver instance. It does not need to wait for the host to call rtos_i2s_start().

On the host tile this must be called both after calling rtos_i2s_rpc_host_init() and before calling rtos_i2s_start().

Parameters:
  • i2s_ctx – A pointer to the I2S driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Microphone Array RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/mic_array.html#microphone-array-rtos-driver

This driver can be used to instantiate and control a dual DDR PDM microphone interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/mic_array.html#initialization-api

The following structures and functions are used to initialize and start a microphone array driver instance.

enum rtos_mic_array_format_t

Typedef for the RTOS mic array driver audio format

Values:

enumerator RTOS_MIC_ARRAY_CHANNEL_SAMPLE
enumerator RTOS_MIC_ARRAY_SAMPLE_CHANNEL
enumerator RTOS_MIC_ARRAY_FORMAT_COUNT
typedef struct rtos_mic_array_struct rtos_mic_array_t

Typedef to the RTOS mic array driver instance struct.

void rtos_mic_array_start(rtos_mic_array_t *mic_array_ctx, size_t buffer_size, unsigned interrupt_core_id)

Starts an RTOS mic array driver instance. This must only be called by the tile that owns the driver instance. It must be called after starting the RTOS from an RTOS thread, and must be called before any of the core mic array driver functions are called with this instance.

rtos_mic_array_init() must be called on this mic array driver instance prior to calling this.

Parameters:
  • mic_array_ctx – A pointer to the mic array driver instance to start.

  • buffer_size – The size in frames of the input buffer. Each frame is two samples (one for each microphone) plus one sample per reference channel. This must be at least MIC_ARRAY_CONFIG_SAMPLES_PER_FRAME. Samples are pulled out of this buffer by the application by calling rtos_mic_array_rx().

  • interrupt_core_id – The ID of the core on which to enable the mic array interrupt.

void rtos_mic_array_init(rtos_mic_array_t *mic_array_ctx, uint32_t io_core_mask, rtos_mic_array_format_t format)

Initializes an RTOS mic array driver instance. This must only be called by the tile that owns the driver instance. It should be called before starting the RTOS, and must be called before calling rtos_mic_array_start() or any of the core mic array driver functions with this instance.

Parameters:
  • mic_array_ctx – A pointer to the mic array driver instance to initialize.

  • io_core_mask – A bitmask representing the cores on which the low level mic array I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • format – Format of the output data

struct rtos_mic_array_struct
#include <rtos_mic_array.h>

Struct representing an RTOS mic array driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/mic_array.html#core-api

The following functions are the core microphone array driver functions that are used after it has been initialized and started.

inline size_t rtos_mic_array_rx(rtos_mic_array_t *ctx, int32_t **sample_buf, size_t frame_count, unsigned timeout)

Receives sample frames from the PDM mic array interface.

This function will block until new frames are available.

Parameters:
  • ctx – A pointer to the mic array driver instance to use.

  • sample_buf – A buffer to copy the received sample frames into.

  • frame_count – The number of frames to receive from the buffer. This must be less than or equal to the size of the buffer specified to rtos_mic_array_start() if in RTOS_MIC_ARRAY_SAMPLE_CHANNEL mode. This must be equal to MIC_ARRAY_CONFIG_SAMPLES_PER_FRAME if in RTOS_MIC_ARRAY_CHANNEL_SAMPLE mode.

  • timeout – The amount of time to wait before the requested number of frames becomes available.

Returns:

The number of frames actually received into sample_buf.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/mic_array.html#rpc-initialization-api

The following functions may be used to share a microphone array driver instance with other xcore tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_mic_array_rpc_client_init(rtos_mic_array_t *mic_array_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS mic array driver instance on a client tile. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of rtos_mic_array_init(). The host tile that owns the actual instance must simultaneously call rtos_mic_array_rpc_host_init().

Parameters:
  • mic_array_ctx – A pointer to the mic array driver instance to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as mic_array_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as mic_array_ctx.

void rtos_mic_array_rpc_host_init(rtos_mic_array_t *mic_array_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on a mic array driver instance to allow client tiles to use the mic array driver instance. Each client tile that will use this instance must simultaneously call rtos_mic_array_rpc_client_init().

Parameters:
  • mic_array_ctx – A pointer to the mic array driver instance to share with clients.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as mic_array_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as mic_array_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_mic_array_rpc_config(rtos_mic_array_t *mic_array_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for a mic array driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_mic_array_rpc_client_init(). After calling this, the client tile may immediately begin to call the core mic array functions on this driver instance. It does not need to wait for the host to call rtos_mic_array_start().

On the host tile this must be called both after calling rtos_mic_array_rpc_host_init() and before calling rtos_mic_array_start().

Parameters:
  • mic_array_ctx – A pointer to the mic array driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$QSPI Flash RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/qspi_flash.html#qspi-flash-rtos-driver

This driver can be used to instantiate and control a Quad SPI flash I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/qspi_flash.html#initialization-api

The following structures and functions are used to initialize and start a QSPI flash driver instance.

typedef struct rtos_qspi_flash_struct rtos_qspi_flash_t

Typedef to the RTOS QSPI flash driver instance struct.

void rtos_qspi_flash_start(rtos_qspi_flash_t *ctx, unsigned priority)

Starts an RTOS QSPI flash driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before any of the core QSPI flash driver functions are called with this instance.

rtos_qspi_flash_init() must be called on this QSPI flash driver instance prior to calling this.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to start.

  • priority – The priority of the task that gets created by the driver to handle the QSPI flash interface.

void rtos_qspi_flash_op_core_affinity_set(rtos_qspi_flash_t *ctx, uint32_t op_core_mask)

Sets the core affinity for a RTOS QSPI flash driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, and should be called before any of the core QSPI flash driver functions are called with this instance.

Since interrupts are disabled during the QSPI transaction on the op thread, a core mask is provided to allow users to avoid collisions with application ISRs.

rtos_qspi_flash_start() must be called on this QSPI flash driver instance prior to calling this.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to start.

  • op_core_mask – A bitmask representing the cores on which the QSPI I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

void rtos_qspi_flash_init(rtos_qspi_flash_t *ctx, xclock_t clock_block, port_t cs_port, port_t sclk_port, port_t sio_port, fl_QuadDeviceSpec *spec)

Initializes an RTOS QSPI flash driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_qspi_flash_start() or any of the core QSPI flash driver functions with this instance.

This function will initialize a flash driver using lib_quadflash for all operations.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to initialize.

  • clock_block – The clock block to use for the qspi_io interface.

  • cs_port – The chip select port. MUST be a 1-bit port.

  • sclk_port – The SCLK port. MUST be a 1-bit port.

  • sio_port – The SIO port. MUST be a 4-bit port.

  • spec – A pointer to the flash part specification. This may be set to NULL to use the XTC default

void rtos_qspi_flash_fast_read_init(rtos_qspi_flash_t *ctx, xclock_t clock_block, port_t cs_port, port_t sclk_port, port_t sio_port, fl_QuadDeviceSpec *spec, qspi_fast_flash_read_transfer_mode_t read_mode, uint8_t read_divide, uint32_t calibration_pattern_addr)

Initializes an RTOS QSPI flash driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_qspi_flash_start() or any of the core QSPI flash driver functions with this instance.

This function will initialize a flash driver using lib_quadflash for erase and writes, and lib_qspi_fast_read for reads. If calibration fails the driver will enable lib_quadflash for reads and allow the application to decide what to do about the failed calibration. The status of the calibration can be checked at runtime by calling rtos_qspi_flash_calibration_valid_get().

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to initialize.

  • clock_block – The clock block to use for the qspi_io interface.

  • cs_port – The chip select port. MUST be a 1-bit port.

  • sclk_port – The SCLK port. MUST be a 1-bit port.

  • sio_port – The SIO port. MUST be a 4-bit port.

  • spec – A pointer to the flash part specification. This may be set to NULL to use the XTC default

  • read_mode – The transfer mode to use for port reads. Invalid values will default to qspi_fast_flash_read_transfer_raw

  • read_divide – The divisor to use for QSPI SCLK.

  • calibration_pattern_addr – The address of the default calibration pattern. This driver requires the default calibration pattern supplied with lib_qspi_fast_read and does not support custom patterns.

RTOS_QSPI_FLASH_READ_CHUNK_SIZE
struct rtos_qspi_flash_struct
#include <rtos_qspi_flash.h>

Struct representing an RTOS QSPI flash driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/qspi_flash.html#core-api

The following functions are the core QSPI flash driver functions that are used after it has been initialized and started.

inline void rtos_qspi_flash_lock(rtos_qspi_flash_t *ctx)

Obtains a lock for exclusive access to the QSPI flash. This allows a thread to perform a sequence of operations (such as read, modify, erase, write) without the risk of another thread issuing a command in the middle of the sequence and corrupting the data in the flash.

If only a single atomic operation needs to be performed, such as a read, it is not necessary to call this to obtain the lock first. Each individual operation obtains and releases the lock automatically so that they cannot run while another thread has the lock.

The lock MUST be released when it is no longer needed by calling rtos_qspi_flash_unlock().

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to lock.

inline void rtos_qspi_flash_unlock(rtos_qspi_flash_t *ctx)

Releases a lock for exclusive access to the QSPI flash. The lock must have already been obtained by calling rtos_qspi_flash_lock().

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to unlock.

inline void rtos_qspi_flash_read(rtos_qspi_flash_t *ctx, uint8_t *data, unsigned address, size_t len)

This reads data from the flash in quad I/O mode. All four lines are used to send the address and to read the data.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

  • data – Pointer to the buffer to save the read data to.

  • address – The byte address in the flash to begin reading at. Only bits 23:0 contain the address. Bits 31:24 are ignored.

  • len – The number of bytes to read and save to data.

inline void rtos_qspi_flash_read_mode(rtos_qspi_flash_t *ctx, uint8_t *data, unsigned address, size_t len, qspi_fast_flash_read_transfer_mode_t mode)

This reads data from the flash in quad I/O mode. All four lines are used to send the address and to read the data.

Note: This only works with fast flash read and successful calibration. See rtos_qspi_flash_fast_read_init() versus rtos_qspi_flash_init()

If used with non fast flash read setups, this function will behave exactly the same as rtos_qspi_flash_read(), regardless of the value of mode.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

  • data – Pointer to the buffer to save the read data to.

  • address – The byte address in the flash to begin reading at. Only bits 23:0 contain the address. Bits 31:24 are ignored.

  • len – The number of bytes to read and save to data.

  • mode – The transfer mode for this read operation data.

int rtos_qspi_flash_read_ll(rtos_qspi_flash_t *ctx, uint8_t *data, unsigned address, size_t len)

This is a lower level version of rtos_qspi_flash_read() that is safe to call from within ISRs. If a task currently own the flash lock, or if another core is actively doing a read with this function, then the read will not be performed and an error returned. It is up to the application to determine what it should do in this situation and to avoid a potential deadlock.

This function may only be called on the same tile as the underlying peripheral.

This function uses the lib_quadflash API to perform the read. It is up to the application to ensure that XCORE resources are properly configured.

Note

It is not possible to call this from a task that currently owns the flash lock taken with rtos_qspi_flash_lock(). In general it is not advisable to call this from an RTOS task unless the small amount of overhead time that is introduced by rtos_qspi_flash_read() is unacceptable.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

  • data – Pointer to the buffer to save the read data to.

  • address – The byte address in the flash to begin reading at. Only bits 23:0 contain the address. Bits 31:24 are ignored.

  • len – The number of bytes to read and save to data.

Return values:
  • 0 – if the flash was available and the read operation was performed.

  • -1 – if the flash was unavailable and the read could not be performed.

int rtos_qspi_flash_fast_read_ll(rtos_qspi_flash_t *ctx, uint8_t *data, unsigned address, size_t len)

This is a lower level version of rtos_qspi_flash_read() that is safe to call from within ISRs. If a task currently own the flash lock, or if another core is actively doing a read with this function, then the read will not be performed and an error returned. It is up to the application to determine what it should do in this situation and to avoid a potential deadlock.

This function may only be called on the same tile as the underlying peripheral.

This function uses the lib_qspi_fast_read API to perform the read. It is up to the application to ensure that XCORE resources are properly configured.

Note

It is not possible to call this from a task that currently owns the flash lock taken with rtos_qspi_flash_lock(). In general it is not advisable to call this from an RTOS task unless the small amount of overhead time that is introduced by rtos_qspi_flash_read() is unacceptable.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

  • data – Pointer to the buffer to save the read data to.

  • address – The byte address in the flash to begin reading at. Only bits 23:0 contain the address. Bits 31:24 are ignored.

  • len – The number of bytes to read and save to data.

Return values:
  • 0 – if the flash was available and the read operation was performed.

  • -1 – if the flash was unavailable and the read could not be performed.

int rtos_qspi_flash_fast_read_mode_ll(rtos_qspi_flash_t *ctx, uint8_t *data, unsigned address, size_t len, qspi_fast_flash_read_transfer_mode_t mode)

This is a lower level version of rtos_qspi_flash_read_mode() that is safe to call from within ISRs. If a task currently own the flash lock, or if another core is actively doing a read with this function, then the read will not be performed and an error returned. It is up to the application to determine what it should do in this situation and to avoid a potential deadlock.

This function may only be called on the same tile as the underlying peripheral.

This function uses the lib_qspi_fast_read API to perform the read. It is up to the application to ensure that XCORE resources are properly configured.

Note

It is not possible to call this from a task that currently owns the flash lock taken with rtos_qspi_flash_lock(). In general it is not advisable to call this from an RTOS task unless the small amount of overhead time that is introduced by rtos_qspi_flash_read_mode() is unacceptable.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

  • data – Pointer to the buffer to save the read data to.

  • address – The byte address in the flash to begin reading at. Only bits 23:0 contain the address. Bits 31:24 are ignored.

  • len – The number of bytes to read and save to data.

  • mode – The transfer mode for this read operation data.

Return values:
  • 0 – if the flash was available and the read operation was performed.

  • -1 – if the flash was unavailable and the read could not be performed.

void rtos_qspi_flash_fast_read_setup_ll(rtos_qspi_flash_t *ctx)

This is a lower level function that enables the user to setup the ports for fast flash access.

This function may only be called on the same tile as the underlying peripheral.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

void rtos_qspi_flash_fast_read_shutdown_ll(rtos_qspi_flash_t *ctx)

This is a lower level function that enables the user to shutdown low level usage to resume normal QSPI thread operation.

This function may only be called on the same tile as the underlying peripheral.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

inline void rtos_qspi_flash_write(rtos_qspi_flash_t *ctx, const uint8_t *data, unsigned address, size_t len)

This writes data to the QSPI flash. The standard page program command is sent and only SIO0 (MOSI) is used to send the address and data.

The driver handles sending the write enable command, as well as waiting for the write to complete.

This function may return before the write operation is complete, as the actual write operation is queued and executed by a thread created by the driver.

Note

this function does NOT erase the flash first. Erase operations must be explicitly requested by the application.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

  • data – Pointer to the data to write to the flash.

  • address – The byte address in the flash to begin writing at. Only bits 23:0 contain the address. The byte in bits 31:24 is not sent.

  • len – The number of bytes to write to the flash.

inline void rtos_qspi_flash_erase(rtos_qspi_flash_t *ctx, unsigned address, size_t len)

This erases data from the QSPI flash. If the address range to erase spans multiple sectors, then all of these sectors will be erased by issuing multiple erase commands.

The driver handles sending the write enable command, as well as waiting for the write to complete.

This function may return before the write operation is complete, as the actual erase operation is queued and executed by a thread created by the driver.

Note

The smallest amount of data that can be erased is a 4k sector. This means that data outside the address range specified by address and len will be erased if the address range does not both begin and end at 4k sector boundaries.

Parameters:
  • ctx – A pointer to the QSPI flash driver instance to use.

  • address – The byte address to begin erasing. This does not need to begin at a sector boundary, but if it does not, note that the entire sector that contains this address will still be erased.

  • len – The minimum number of bytes to erase. If address + len - 1 does not correspond to the last address within a sector, note that the entire sector that contains this address will still be erased.

inline size_t rtos_qspi_flash_size_get(rtos_qspi_flash_t *qspi_flash_ctx)

This gets the size in bytes of the flash chip.

Parameters:
  • A – pointer to the QSPI flash driver instance to query.

Returns:

the size in bytes of the flash chip.

inline size_t rtos_qspi_flash_page_size_get(rtos_qspi_flash_t *qspi_flash_ctx)

This gets the size in bytes of each page in the flash chip.

Parameters:
  • A – pointer to the QSPI flash driver instance to query.

Returns:

the size in bytes of the flash page.

inline size_t rtos_qspi_flash_page_count_get(rtos_qspi_flash_t *qspi_flash_ctx)

This gets the number of pages in the flash chip.

Parameters:
  • A – pointer to the QSPI flash driver instance to query.

Returns:

the number of pages in the flash chip.

inline size_t rtos_qspi_flash_sector_size_get(rtos_qspi_flash_t *qspi_flash_ctx)

This gets the sector size of the flash chip

Parameters:
  • A – pointer to the QSPI flash driver instance to query.

Returns:

the size in bytes of the smallest sector

inline unsigned rtos_qspi_flash_calibration_valid_get(rtos_qspi_flash_t *qspi_flash_ctx)

Gets the value of the calibration valid.

Parameters:
  • A – pointer to the QSPI flash driver instance to query.

Returns:

1 if calibration was successful 0 otherwise

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/qspi_flash.html#rpc-initialization-api

The following functions may be used to share a QSPI flash driver instance with other xcore tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_qspi_flash_rpc_client_init(rtos_qspi_flash_t *qspi_flash_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS QSPI flash driver instance on a client tile. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of rtos_qspi_flash_init(). The host tile that owns the actual instance must simultaneously call rtos_qspi_flash_rpc_host_init().

Parameters:
  • qspi_flash_ctx – A pointer to the QSPI flash driver instance to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as qspi_flash_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as qspi_flash_ctx.

void rtos_qspi_flash_rpc_host_init(rtos_qspi_flash_t *qspi_flash_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on a QSPI flash driver instance to allow client tiles to use the QSPI flash driver instance. Each client tile that will use this instance must simultaneously call rtos_qspi_flash_rpc_client_init().

Parameters:
  • qspi_flash_ctx – A pointer to the QSPI flash driver instance to share with clients.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as qspi_flash_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as qspi_flash_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_qspi_flash_rpc_config(rtos_qspi_flash_t *qspi_flash_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for a QSPI flash driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_qspi_flash_rpc_client_init(). After calling this, the client tile may immediately begin to call the core QSPI flash functions on this driver instance. It does not need to wait for the host to call rtos_qspi_flash_start().

On the host tile this must be called both after calling rtos_qspi_flash_rpc_host_init() and before calling rtos_qspi_flash_start().

Parameters:
  • qspi_flash_ctx – A pointer to the QSPI flash driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$SPI RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/spi/spi.html#spi-rtos-driver

This driver can be used to instantiate and control a SPI master or slave mode I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$SPI Master RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/spi/spi_master.html#spi-master-rtos-driver

This driver can be used to instantiate and control a SPI master I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$SPI Master Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/spi/spi_master.html#spi-master-initialization-api

The following structures and functions are used to initialize and start a SPI master driver instance.

typedef struct rtos_spi_master_struct rtos_spi_master_t

Typedef to the RTOS SPI master driver instance struct.

typedef struct rtos_spi_master_device_struct rtos_spi_master_device_t

Typedef to the RTOS SPI device instance struct.

void rtos_spi_master_start(rtos_spi_master_t *spi_master_ctx, unsigned priority)

Starts an RTOS SPI master driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before any of the core SPI master driver functions are called with this instance.

rtos_spi_master_init() must be called on this SPI master driver instance prior to calling this.

Parameters:
  • spi_master_ctx – A pointer to the SPI master driver instance to start.

  • priority – The priority of the task that gets created by the driver to handle the SPI master interface.

void rtos_spi_master_init(rtos_spi_master_t *bus_ctx, xclock_t clock_block, port_t cs_port, port_t sclk_port, port_t mosi_port, port_t miso_port)

Initializes an RTOS SPI master driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_spi_master_start() or any of the core SPI master driver functions with this instance.

Parameters:
  • bus_ctx – A pointer to the SPI master driver instance to initialize.

  • clock_block – The clock block to use for the SPI master interface.

  • cs_port – The SPI interface’s chip select port. This may be a multi-bit port.

  • sclk_port – The SPI interface’s SCLK port. Must be a 1-bit port.

  • mosi_port – The SPI interface’s MOSI port. Must be a 1-bit port.

  • miso_port – The SPI interface’s MISO port. Must be a 1-bit port.

void rtos_spi_master_device_init(rtos_spi_master_device_t *dev_ctx, rtos_spi_master_t *bus_ctx, uint32_t cs_pin, int cpol, int cpha, spi_master_source_clock_t source_clock, uint32_t clock_divisor, spi_master_sample_delay_t miso_sample_delay, uint32_t miso_pad_delay, uint32_t cs_to_clk_delay_ticks, uint32_t clk_to_cs_delay_ticks, uint32_t cs_to_cs_delay_ticks)

Initialize a SPI device. Multiple SPI devices may be initialized per RTOS SPI master driver instance. Each must be on a unique pin of the interface’s chip select port. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_spi_master_start() or any of the core SPI master driver functions with this instance.

Parameters:
  • dev_ctx – A pointer to the SPI device instance to initialize.

  • bus_ctx – A pointer to the SPI master driver instance to attach the device to.

  • cs_pin – The bit number of the chip select port that is connected to the device’s chip select pin.

  • cpol – The clock polarity required by the device.

  • cpha – The clock phase required by the device.

  • source_clock – The source clock to derive SCLK from. See spi_master_source_clock_t.

  • clock_divisor – The value to divide the source clock by. The frequency of SCLK will be set to:

    • (F_src) / (4 * clock_divisor) when clock_divisor > 0

    • (F_src) / (2) when clock_divisor = 0 Where F_src is the frequency of the source clock.

  • miso_sample_delay – When to sample MISO. See spi_master_sample_delay_t.

  • miso_pad_delay – The number of core clock cycles to delay sampling the MISO pad during a transaction. This allows for more fine grained adjustment of sampling time. The value may be between 0 and 5.

  • cs_to_clk_delay_ticks – The minimum number of reference clock ticks between assertion of chip select and the first clock edge.

  • clk_to_cs_delay_ticks – The minimum number of reference clock ticks between the last clock edge and de-assertion of chip select.

  • cs_to_cs_delay_ticks – The minimum number of reference clock ticks between transactions, which is between de-assertion of chip select and the end of one transaction, and its re-assertion at the beginning of the next.

struct rtos_spi_master_struct
#include <rtos_spi_master.h>

Struct representing an RTOS SPI master driver instance.

The members in this struct should not be accessed directly.

struct rtos_spi_master_device_struct
#include <rtos_spi_master.h>

Struct representing an RTOS SPI device instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$SPI Master Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/spi/spi_master.html#spi-master-core-api

The following functions are the core SPI master driver functions that are used after it has been initialized and started.

inline void rtos_spi_master_transaction_start(rtos_spi_master_device_t *ctx)

Starts a transaction with the specified SPI device on a SPI bus. This leaves chip select asserted.

Note: When this is called, the servicer thread will be locked to the core that it executed on until rtos_spi_master_transaction_end() is called. This is because the underlying I/O software utilized fast mode and high priority.

Parameters:
  • ctx – A pointer to the SPI device instance.

inline void rtos_spi_master_transfer(rtos_spi_master_device_t *ctx, uint8_t *data_out, uint8_t *data_in, size_t len)

Transfers data to and from the specified SPI device on a SPI bus. The transaction must already have been started by calling rtos_spi_master_transaction_start() on the same device instance. This may be called multiple times during a single transaction.

This function may return before the transfer is complete when data_in is NULL, as the actual transfer operation is queued and executed by a thread created by the driver.

Parameters:
  • ctx – A pointer to the SPI device instance.

  • data_out – Pointer to the data to transfer to the device. This may be NULL if there is no data to send.

  • data_in – Pointer to the buffer to save the received data to. This may be NULL if the received data is not needed.

  • len – The number of bytes to transfer in each direction. This number of bytes must be available in both the data_out and data_in buffers if they are not NULL.

inline void rtos_spi_master_delay_before_next_transfer(rtos_spi_master_device_t *ctx, uint32_t delay_ticks)

If there is a minimum amount of idle time that is required by the device between transfers within a single transaction, then this may be called between each transfer where a delay is required.

This function will return immediately. If the call for the next transfer happens before the minimum time specified has elapsed, the delay will occur then before the transfer begins.

Note

This must be called during a transaction, otherwise the behavior is unspecified.

Note

Technically the next transfer will occur no earlier than delay_ticks after this function is called, so this should be called immediately following a transfer, rather than immediately before the next.

Parameters:
  • ctx – A pointer to the SPI device instance.

  • delay_ticks – The number of reference clock ticks to delay.

inline void rtos_spi_master_transaction_end(rtos_spi_master_device_t *ctx)

Ends a transaction with the specified SPI device on a SPI bus. This leaves chip select de-asserted.

Parameters:
  • ctx – A pointer to the SPI device instance.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$SPI Master RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/spi/spi_master.html#spi-master-rpc-initialization-api

The following functions may be used to share a SPI master driver instance with other xcore tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_spi_master_rpc_client_init(rtos_spi_master_t *spi_master_ctx, rtos_spi_master_device_t *spi_device_ctx[], size_t spi_device_count, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS SPI master driver instance on a client tile, as well as any number of SPI device instances. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of rtos_spi_master_init() and rtos_spi_master_device_init(). The host tile that owns the actual instances must simultaneously call rtos_spi_master_rpc_host_init().

Parameters:
  • spi_master_ctx – A pointer to the SPI master driver instance to initialize.

  • spi_device_ctx – An array of pointers to SPI device instances to initialize.

  • spi_device_count – The number of SPI device instances to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as spi_master_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as spi_master_ctx.

void rtos_spi_master_rpc_host_init(rtos_spi_master_t *spi_master_ctx, rtos_spi_master_device_t *spi_device_ctx[], size_t spi_device_count, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on a SPI master driver instance to allow client tiles to use the SPI master driver instance. Each client tile that will use this instance must simultaneously call rtos_spi_master_rpc_client_init().

Parameters:
  • spi_master_ctx – A pointer to the SPI master driver instance to share with clients.

  • spi_device_ctx – An array of pointers to SPI device instances to share with clients.

  • spi_device_count – The number of SPI device instances to share.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as spi_master_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as spi_master_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_spi_master_rpc_config(rtos_spi_master_t *spi_master_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for a SPI master driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_spi_master_rpc_client_init(). After calling this, the client tile may immediately begin to call the core SPI master functions on this driver instance. It does not need to wait for the host to call rtos_spi_master_start().

On the host tile this must be called both after calling rtos_spi_master_rpc_host_init() and before calling rtos_spi_master_start().

Parameters:
  • spi_master_ctx – A pointer to the SPI master driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$SPI Slave RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/spi/spi_slave.html#spi-slave-rtos-driver

This driver can be used to instantiate and control a SPI slave I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$SPI Slave API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/spi/spi_slave.html#spi-slave-api

The following structures and functions are used to initialize and start a SPI slave driver instance.

typedef struct rtos_spi_slave_struct rtos_spi_slave_t

Typedef to the RTOS SPI slave driver instance struct.

typedef void (*rtos_spi_slave_start_cb_t)(rtos_spi_slave_t *ctx, void *app_data)

Function pointer type for application provided RTOS SPI slave start callback functions.

These callback functions are optionally called by a SPI slave driver’s thread when it is first started. This gives the application a chance to perform startup initialization from within the driver’s thread. It is a good place for the first call to spi_slave_xfer_prepare().

Param ctx:

A pointer to the associated SPI slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

typedef void (*rtos_spi_slave_xfer_done_cb_t)(rtos_spi_slave_t *ctx, void *app_data)

Function pointer type for application provided RTOS SPI slave transfer done callback functions.

These callback functions are optionally called when a SPI slave driver instance is done transferring data with a master device.

An application can use this to be notified immediately when a transfer has completed. It can then call spi_slave_xfer_complete() with a timeout of 0 from within this callback to get the transfer results.

Param ctx:

A pointer to the associated SPI slave driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

typedef struct xfer_done_queue_item xfer_done_queue_item_t

Internally used struct representing an received data packet.

The members in this struct should not be accessed directly.

void spi_slave_xfer_prepare(rtos_spi_slave_t *ctx, void *rx_buf, size_t rx_buf_len, void *tx_buf, size_t tx_buf_len)

Prepares an RTOS SPI slave driver instance with buffers for subsequent transfers. Before this is called for the first time, any transfers initiated by a master device with result in all received data over MOSI being dropped, and all data sent over MISO being zeros.

This only needs to be called when the buffers need to be changed. If all transfers will use the same buffers, then this only needs to be called once during initialization.

If the application has not processed the previous transaction, the buffers will be held, and default buffers set by spi_slave_xfer_prepare_default_buffers() will be used if a new transaction starts.

Parameters:
  • ctx – A pointer to the SPI slave driver instance to use.

  • rx_buf – The buffer to receive data into for any subsequent transfers.

  • rx_buf_ – The length in bytes of rx_buf. If the master transfers more than this during a single transfer, then the bytes that do not fit within rx_buf will be lost.

  • tx_buf – The buffer to send data from for any subsequent transfers.

  • tx_buf_len – The length in bytes of tx_buf. If the master transfers more than this during a single transfer, zeros will be sent following the last byte tx_buf.

void spi_slave_xfer_prepare_default_buffers(rtos_spi_slave_t *ctx, void *rx_buf, size_t rx_buf_len, void *tx_buf, size_t tx_buf_len)

Prepares an RTOS SPI slave driver instance with default buffers for subsequent transfers. Before this is called for the first time, any transfers initiated by a master device with result in all received data over MOSI being dropped, and all data sent over MISO being zeros.

This only needs to be called when the buffers need to be changed.

The default buffer will be used in the event that the application has not yet processed the previous transfer. This enables the application to have a default buffer to implement a sort of NACK over SPI in the event that the device was busy and had not yet finished handling the previous transaction before a new one started.

Parameters:
  • ctx – A pointer to the SPI slave driver instance to use.

  • rx_buf – The buffer to receive data into for any subsequent transfers.

  • rx_buf_ – The length in bytes of rx_buf. If the master transfers more than this during a single transfer, then the bytes that do not fit within rx_buf will be lost.

  • tx_buf – The buffer to send data from for any subsequent transfers.

  • tx_buf_len – The length in bytes of tx_buf. If the master transfers more than this during a single transfer, zeros will be sent following the last byte tx_buf.

int spi_slave_xfer_complete(rtos_spi_slave_t *ctx, void **rx_buf, size_t *rx_len, void **tx_buf, size_t *tx_len, unsigned timeout)

Waits for a SPI transfer to complete. Returns either when the timeout is reached, or when a transfer completes, whichever comes first. If a transfer does complete, then the buffers and the number of bytes read from or written to them are returned via the parameters.

Note

The duration of this callback will effect the minimum duration between SPI transactions

Parameters:
  • ctx – A pointer to the SPI slave driver instance to use.

  • rx_buf – The receive buffer used for the completed transfer. This is set by the function upon completion of a transfer.

  • rx_len – The number of bytes written to rx_buf. This is set by the function upon completion of a transfer.

  • tx_buf – The transmit buffer used for the completed transfer. This is set by the function upon completion of a transfer.

  • tx_len – The number of bytes sent from tx_buf. This is set by the function upon completion of a transfer.

  • timeout – The number of RTOS ticks to wait before the next transfer is complete. When called from within the “xfer_done” callback, this should be 0.

Return values:
  • 0 – if a transfer completed. All buffers and lengths are set in this case.

  • -1 – if no transfer completed before the timeout expired. No buffers or lengths are returned in this case.

void spi_slave_default_buf_xfer_ended_enable(rtos_spi_slave_t *ctx)

Sets the driver to use callbacks for all default transactions. This will result in transfers done with the default buffer generating callbacks to the application to xfer_done. This will require default buffer transaction items to be processed with spi_slave_xfer_complete()

Note

This is the default setting

Parameters:
  • ctx – A pointer to the SPI slave driver instance to use.

void spi_slave_default_buf_xfer_ended_disable(rtos_spi_slave_t *ctx)

Sets the driver to drop all default transactions. This will result in transfers done with the default buffer not generating callbacks to the application to xfer_done. This will also stop default buffer transaction items from being required to be processed with spi_slave_xfer_complete()

Parameters:
  • ctx – A pointer to the SPI slave driver instance to use.

void rtos_spi_slave_start(rtos_spi_slave_t *spi_slave_ctx, void *app_data, rtos_spi_slave_start_cb_t start, rtos_spi_slave_xfer_done_cb_t xfer_done, unsigned interrupt_core_id, unsigned priority)

Starts an RTOS SPI slave driver instance. This must only be called by the tile that owns the driver instance. It must be called after starting the RTOS from an RTOS thread.

rtos_spi_slave_init() must be called on this SPI slave driver instance prior to calling this.

Parameters:
  • spi_slave_ctx – A pointer to the SPI slave driver instance to start.

  • app_data – A pointer to application specific data to pass to the callback functions.

  • start – The callback function that is called when the driver’s thread starts. This is optional and may be NULL.

  • xfer_done – The callback function that is notified when transfers are complete. This is optional and may be NULL.

  • interrupt_core_id – The ID of the core on which to enable the SPI interrupt. This core should not be shared with threads that disable interrupts for long periods of time, nor enable other interrupts.

  • priority – The priority of the task that gets created by the driver to call the callback functions. If both callback functions are NULL, then this is unused.

void rtos_spi_slave_init(rtos_spi_slave_t *spi_slave_ctx, uint32_t io_core_mask, xclock_t clock_block, int cpol, int cpha, port_t p_sclk, port_t p_mosi, port_t p_miso, port_t p_cs)

Initializes an RTOS SPI slave driver instance. This must only be called by the tile that owns the driver instance. It should be called before starting the RTOS, and must be called before calling rtos_spi_slave_start().

For timing parameters and maximum clock rate, refer to the underlying HIL IO API.

Parameters:
  • spi_slave_ctx – A pointer to the SPI slave driver instance to initialize.

  • io_core_mask – A bitmask representing the cores on which the low level SPI I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • clock_block – The clock block to use for the SPI slave.

  • cpol – The clock polarity to use.

  • cpha – The clock phase to use.

  • p_sclk – The SPI slave’s SCLK port. Must be a 1-bit port.

  • p_mosi – The SPI slave’s MOSI port. Must be a 1-bit port.

  • p_miso – The SPI slave’s MISO port. Must be a 1-bit port.

  • p_cs – The SPI slave’s CS port. Must be a 1-bit port.

RTOS_SPI_SLAVE_CALLBACK_ATTR

This attribute must be specified on all RTOS SPI slave callback functions provided by the application.

HIL_IO_SPI_SLAVE_HIGH_PRIO

Set SPI Slave thread to high priority

HIL_IO_SPI_SLAVE_FAST_MODE

Set SPI Slave thread to run in fast mode

struct xfer_done_queue_item
#include <rtos_spi_slave.h>

Internally used struct representing an received data packet.

The members in this struct should not be accessed directly.

struct rtos_spi_slave_struct
#include <rtos_spi_slave.h>

Struct representing an RTOS SPI slave driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$UART RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/uart/uart.html#uart-rtos-driver

This driver can be used to instantiate and control an UART Rx or UART Tx I/O interface on xCORE in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$UART Tx RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/uart/uart_tx.html#uart-tx-rtos-driver

This driver can be used to instantiate and control an UART Tx I/O interface on xCORE in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$UART Tx API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/uart/uart_tx.html#uart-tx-api

The following structures and functions are used to initialize and start a UART Tx driver instance.

typedef struct rtos_uart_tx_struct rtos_uart_tx_t

Typedef to the RTOS UART tx driver instance struct.

inline void rtos_uart_tx_write(rtos_uart_tx_t *ctx, const uint8_t buf[], size_t n)

Writes data to an initialized and started UART instance. Unlike the UART rx, an xcore logical core is not reserved. The UART transmission is a function call and the the function blocks until the stop bit of the last byte to be transmittted has completed. Interrupts are masked during this time to avoid stretching of the waveform. Consequently, the tx consumes cycles from the caller thread.

Parameters:
  • ctx – A pointer to the UART Tx driver instance to use.

  • buf – The buffer containing data to write.

  • n – The number of bytes to write.

void rtos_uart_tx_init(rtos_uart_tx_t *ctx, const port_t tx_port, const uint32_t baud_rate, const uint8_t num_data_bits, const uart_parity_t parity, const uint8_t stop_bits, hwtimer_t tmr)

Initialises an RTOS UART tx driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_uart_tx_start() or any of the core UART tx driver functions with this instance.

Parameters:
  • ctx – A pointer to the UART tx driver instance to initialise.

  • tx_port – The port containing the transmit pin

  • baud_rate – The baud rate of the UART in bits per second.

  • num_data_bits – The number of data bits per frame sent.

  • parity – The type of parity used. See uart_parity_t above.

  • stop_bits – The number of stop bits asserted at the of the frame.

  • tmr – The resource id of the timer to be used by the UART tx.

void rtos_uart_tx_start(rtos_uart_tx_t *ctx)

Starts an RTOS UART tx driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before any of the core UART tx driver functions are called with this instance.

rtos_uart_tx_init() must be called on this UART tx driver instance prior to calling this.

Parameters:
  • ctx – A pointer to the UART tx driver instance to start.

struct rtos_uart_tx_struct
#include <rtos_uart_tx.h>

Struct representing an RTOS UART tx driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$UART Tx RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/uart/uart_tx.html#uart-tx-rpc-initialization-api

The following functions may be used to share a UART Tx driver instance with other xCORE tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_uart_tx_rpc_client_init(rtos_uart_tx_t *uart_tx_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS UART tx driver instance on a client tile. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of rtos_uart_tx_init(). The host tile that owns the actual instance must simultaneously call rtos_uart_tx_rpc_host_init().

Parameters:
  • uart_tx_ctx – A pointer to the UART tx driver instance to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as uart_tx_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as uart_tx_ctx.

void rtos_uart_tx_rpc_host_init(rtos_uart_tx_t *uart_tx_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on an UART tx driver instance to allow client tiles to use the UART tx driver instance. Each client tile that will use this instance must simultaneously call rtos_uart_tx_rpc_client_init().

Parameters:
  • uart_tx_ctx – A pointer to the UART tx driver instance to share with clients.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as uart_tx_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as uart_tx_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_uart_tx_rpc_config(rtos_uart_tx_t *uart_tx_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for an UART tx driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_uart_tx_rpc_client_init(). After calling this, the client tile may immediately begin to call the core UART tx functions on this driver instance. It does not need to wait for the host to call rtos_uart_tx_start().

On the host tile this must be called both after calling rtos_uart_tx_rpc_host_init() and before calling rtos_uart_tx_start().

Parameters:
  • uart_tx_ctx – A pointer to the UART tx driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$UART Rx RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/uart/uart_rx.html#uart-rx-rtos-driver

This driver can be used to instantiate and control an UART Rx I/O interface on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$UART Rx API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/uart/uart_rx.html#uart-rx-api

The following structures and functions are used to initialize and start a UART Rx driver instance.

typedef struct rtos_uart_rx_struct rtos_uart_rx_t

Typedef to the RTOS UART rx driver instance struct.

typedef void (*rtos_uart_rx_started_cb_t)(rtos_uart_rx_t *ctx)

Function pointer type for application provided RTOS UART rx start callback functions.

This callback function is optionally (may be NULL) called by an UART rx driver’s thread when it is first started. This gives the application a chance to perform startup initialization from within the driver’s thread.

Param ctx:

A pointer to the associated UART rx driver instance.

typedef void (*rtos_uart_rx_complete_cb_t)(rtos_uart_rx_t *ctx)

Function pointer type for application provided RTOS UART rx receive callback function.

This callback functions are called when an UART rx driver instance has received data to a specified depth. Please use the xStreamBufferReceive(rtos_uart_rx_ctx->isr_byte_buffer, … to read the bytes.

Param ctx:

A pointer to the associated UART rx driver instance.

typedef void (*rtos_uart_rx_error_t)(rtos_uart_rx_t *ctx, uint8_t err_flags)

Function pointer type for application provided RTOS UART rx error callback functions.

This callback function is optionally (may be NULL_ called when an UART rx driver instance experiences an error in reception. These error types are defined in uart.h of the underlying HIL driver but can be of the following types for the RTOS rx: UART_START_BIT_ERROR, UART_PARITY_ERROR, UART_FRAMING_ERROR, UART_OVERRUN_ERROR.

Param ctx:

A pointer to the associated UART rx driver instance.

Param err_flags:

An 8b word containing error flags set during reception of last frame. See rtos_uart_rx.h for the bit field definitions.

size_t rtos_uart_rx_read(rtos_uart_rx_t *uart_rx_ctx, uint8_t *buf, size_t n, rtos_osal_tick_t timeout)

Reads data from a UART Rx instance. It will read up to n bytes or timeout, whichever comes first.

Parameters:
  • uart_rx_ctx – A pointer to the UART Rx driver instance to use.

  • buf – The buffer to be written with the read UART bytes.

  • n – The number of bytes to write.

  • timeout – How long in ticks before the read operation should timeout.

Returns:

The number of bytes read.

void rtos_uart_rx_reset_buffer(rtos_uart_rx_t *uart_rx_ctx)

Resets the receive buffer. Clears the contents and sets number of items rto zero.

Parameters:
  • uart_rx_ctx – A pointer to the UART Rx driver instance to use.

void rtos_uart_rx_init(rtos_uart_rx_t *uart_rx_ctx, uint32_t io_core_mask, port_t rx_port, uint32_t baud_rate, uint8_t data_bits, uart_parity_t parity, uint8_t stop_bits, hwtimer_t tmr)

Initializes an RTOS UART rx driver instance. This must only be called by the tile that owns the driver instance. It should be called before starting the RTOS, and must be called before calling rtos_uart_rx_start(). Note that UART rx requires a whole logical core for the underlying HIL UART Rx instance.

Parameters:
  • uart_rx_ctx – A pointer to the UART rx driver instance to initialize.

  • io_core_mask – A bitmask representing the cores on which the low UART Rx thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • rx_port – The port containing the receive pin

  • baud_rate – The baud rate of the UART in bits per second.

  • data_bits – The number of data bits per frame sent.

  • parity – The type of parity used. See uart_parity_t above.

  • stop_bits – The number of stop bits asserted at the of the frame.

  • tmr – The resource id of the timer to be used by the UART Rx.

void rtos_uart_rx_start(rtos_uart_rx_t *uart_rx_ctx, void *app_data, rtos_uart_rx_started_cb_t start, rtos_uart_rx_complete_cb_t rx_complete, rtos_uart_rx_error_t error, unsigned interrupt_core_id, unsigned priority, size_t app_rx_buff_size)

Starts an RTOS UART rx driver instance. This must only be called by the tile that owns the driver instance. It must be called after starting the RTOS and from an RTOS thread.

rtos_uart_rx_init() must be called on this UART rx driver instance prior to calling this.

Parameters:
  • uart_rx_ctx – A pointer to the UART rx driver instance to start.

  • app_data – A pointer to application specific data to pass to the callback functions available in rtos_uart_rx_struct.

  • start – The callback function that is called when the driver’s thread starts. This is optional and may be NULL.

  • rx_complete – The callback function to indicate data received by the UART.

  • error – The callback function called when a reception error has occured.

  • interrupt_core_id – The ID of the core on which to enable the UART rx interrupt.

  • priority – The priority of the task that gets created by the driver to call the callback functions.

  • app_rx_buff_size – The size in bytes of the RTOS xstreambuffer used to buffer received words for the application.

UR_COMPLETE_CB_CODE

The callback code bit positions available for RTOS UART Rx.

UR_STARTED_CB_CODE
UR_START_BIT_ERR_CB_CODE
UR_PARITY_ERR_CB_CODE
UR_FRAMING_ERR_CB_CODE
UR_OVERRUN_ERR_CB_CODE
UR_COMPLETE_CB_FLAG

The callback code flag masks available for RTOS UART Rx.

UR_STARTED_CB_FLAG
UR_START_BIT_ERR_CB_FLAG
UR_PARITY_ERR_CB_FLAG
UR_FRAMING_ERR_CB_FLAG
UR_OVERRUN_ERR_CB_FLAG
RX_ERROR_FLAGS
RX_ALL_FLAGS
RTOS_UART_RX_BUF_LEN

The size of the byte buffer between the ISR and the appthread. It needs to be able to hold sufficient bytes received until the app_thread is able to service it. This is not the same as app_byte_buffer_size which can be of any size, specified by the user at device start. At 1Mbps we get a byte every 10us so 64B allows 640us for the app thread to respond. Note buffer is size n+1 as required by lib_uart.

RTOS_UART_RX_CALLBACK_ATTR

This attribute must be specified on all RTOS UART rx callback functions provided by the application to allow compiler stack calculation.

RTOS_UART_RX_CALL_ATTR

This attribute must be specified on all RTOS UART functions provided by the application to allow compiler stack calculation.

struct rtos_uart_rx_struct
#include <rtos_uart_rx.h>

Struct representing an RTOS UART rx driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$USB RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/usb.html#usb-rtos-driver

This driver can be used to instantiate and control a USB device interface on xcore in an RTOS application.

Unlike most other xcore I/O interface RTOS drivers, only a single USB driver instance may be started. It also does not require an initialization step prior to starting the driver. This is due to an implementation detail in lib_xud, which is what the RTOS USB driver uses at its core.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Driver API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/usb.html#driver-api

The following structures and functions are used to start and use a USB driver instance.

RTOS_USB_OUT_EP

This is used to index into the second dimension of many of the RTOS USB driver’s endpoint arrays.

RTOS_USB_IN_EP
enum rtos_usb_packet_type_t

Values:

enumerator rtos_usb_data_packet
enumerator rtos_usb_setup_packet
enumerator rtos_usb_sof_packet
typedef struct rtos_usb_struct rtos_usb_t

Typedef to the RTOS USB driver instance struct.

typedef void (*rtos_usb_isr_cb_t)(rtos_usb_t *ctx, void *app_data, uint32_t ep_address, size_t xfer_len, rtos_usb_packet_type_t packet_type, XUD_Result_t res)

Function pointer type for application provided RTOS USB interrupt callback function.

This callback function is called when there is a USB transfer interrupt.

Param ctx:

A pointer to the associated USB driver instance.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

Param ep_address:

The address of the USB endpoint that the transfer has completed on.

Param xfer_len:

The length of the data transferred.

Param packet_type:

The type of packet transferred. See rtos_usb_packet_type_t.

Param res:

The result of the transfer. See XUD_Result_t.

int rtos_usb_endpoint_ready(rtos_usb_t *ctx, uint32_t endpoint_addr, unsigned timeout)

Checks to see if a particular endpoint is ready to use.

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • endpoint_addr – The address of the endpoint to check.

  • timeout – The maximum amount of time to wait for the endpoint to become ready before returning.

Return values:
  • XUD_RES_OKAY – if the endpoint is ready to use.

  • XUD_RES_ERR – if the endpoint is not ready to use.

XUD_Result_t rtos_usb_all_endpoints_ready(rtos_usb_t *ctx, unsigned timeout)

Checks to see if all endpoints are ready to use.

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • timeout – The maximum amount of time to wait for all endpoints to become ready before returning.

Return values:
  • XUD_RES_OKAY – if all the endpoints are ready to use.

  • XUD_RES_ERR – if not all the endpoints are ready to use.

XUD_Result_t rtos_usb_endpoint_transfer_start(rtos_usb_t *ctx, uint32_t endpoint_addr, uint8_t *buffer, size_t len, bool is_setup)

Requests a transfer on a USB endpoint. This function returns immediately. When the transfer is complete, the application’s ISR callback provided to rtos_usb_start() will be called.

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • endpoint_addr – The address of the endpoint to perform the transfer on.

  • buffer – A pointer to the buffer to transfer data into for OUT endpoints, or from for IN endpoints. For OUT endpoint, the buffer needs an additional +4 bytes of space, this additional data should not be reflected in the len parameter.

  • len – The maximum number of bytes to receive for OUT endpoints, or the actual number of bytes to send for IN endpoints.

  • is_setup – To be set when preparing for the transfer of a setup packet.

Return values:
  • XUD_RES_OKAY – if the transfer was requested successfully.

  • XUD_RES_RST – if the transfer was not requested and the USB bus needs to be reset. In this case, the application should reset the USB bus.

XUD_BusSpeed_t rtos_usb_endpoint_reset(rtos_usb_t *ctx, uint32_t endpoint_addr)

This function will complete a reset on an endpoint. The address of the endpoint to reset must be provided, and may be either direction (IN or OUT) endpoint. If there is an associated endpoint of the opposite direction, however, it will also be reset.

The return value should be inspected to find the new bus-speed.

Parameters:
  • endpoint_addr – IN or OUT endpoint address to reset.

Return values:
  • XUD_SPEED_HS – the host has accepted that this device can execute at high speed.

  • XUD_SPEED_FS – the device is running at full speed.

static inline XUD_Result_t rtos_usb_device_address_set(rtos_usb_t *ctx, uint32_t addr)

Sets the USB device’s bus address. This function must be called after a setDeviceAddress request is made by the host, and after the ZLP status is sent.

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • addr – The device address requested by the host.

static inline void rtos_usb_endpoint_state_reset(rtos_usb_t *ctx, uint32_t endpoint_addr)

Reset a USB endpoint’s state including data PID toggle.

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • endpoint_addr – The address of the endpoint to reset.

static inline void rtos_usb_endpoint_stall_set(rtos_usb_t *ctx, uint32_t endpoint_addr)

Stalls a USB endpoint. The stall is cleared automatically when a setup packet is received on the endpoint. Otherwise it can be cleared manually with rtos_usb_endpoint_stall_clear().

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • endpoint_addr – The address of the endpoint to stall.

static inline void rtos_usb_endpoint_stall_clear(rtos_usb_t *ctx, uint32_t endpoint_addr)

Clears the stall condition on USB endpoint.

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • endpoint_addr – The address of the endpoint to clear the stall on.

void rtos_usb_enter_test_mode(rtos_usb_t *ctx, unsigned test_mode)

Calls the XUD function to enter the specified test mode.

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • test_mode – Desired test mode.

void rtos_usb_start(rtos_usb_t *ctx, size_t endpoint_count, XUD_EpType endpoint_out_type[], XUD_EpType endpoint_in_type[], XUD_BusSpeed_t speed, XUD_PwrConfig power_source, unsigned interrupt_core_id, int sof_interrupt_core_id)

Starts the USB driver instance’s low level USB I/O thread and enables its interrupts on the requested core. This must only be called by the tile that owns the driver instance. It must be called after starting the RTOS from an RTOS thread.

rtos_usb_init() must be called on this USB driver instance prior to calling this.

Parameters:
  • ctx – A pointer to the USB driver instance to start.

  • endpoint_count

    The number of endpoints that will be used by the application. A single endpoint here includes both its IN and OUT endpoints. For example, if the application uses EP0_IN, EP0_OUT, EP1_IN, EP2_IN, EP2_OUT, EP3_OUT, then the endpoint count specified here should be 4 (endpoint 0 through endpoint 3) regardless of the lack of EP1_OUT and EP3_IN. If these two endpoints were used, the count would still be 4.

    If for whatever reason, the application needs to use a particular endpoint number, say only EP6 in addition to EP0, then the count here needs to be 7, even though endpoints 1 through 5 are unused. All unused endpoints must be marked as disabled in the two endpoint type lists

    endpoint_out_type and endpoint_in_type.

  • endpoint_out_type – A list of the endpoint types for each output endpoint. Index 0 represents the type for EP0_OUT, and so on. See XUD_EpType in lib_xud. If the endpoint is unused, it must be set to XUD_EPTYPE_DIS.

  • endpoint_in_type – A list of the endpoint types for each input endpoint. Index 0 represents the type for EP0_IN, and so on. See XUD_EpType in lib_xud. If the endpoint is unused, it must be set to XUD_EPTYPE_DIS.

  • speed – The speed at which the bus should operate. Either XUD_SPEED_FS or XUD_SPEED_HS. See XUD_BusSpeed_t in lib_xud.

  • power_source – The source of the device’s power. Either bus powered (XUD_PWR_BUS) or self powered (XUD_PWR_SELF). See XUD_PwrConfig in lib_xud.

  • interrupt_core_id – The ID of the core on which to enable the USB interrupts.

  • sof_interrupt_core_id – The ID of the core on which to enable the SOF interrupt. Set to < 0 to disable the SoF interrupt if it is not needed.

void rtos_usb_init(rtos_usb_t *ctx, uint32_t io_core_mask, rtos_usb_isr_cb_t isr_cb, void *isr_app_data)

Initializes an RTOS USB driver instance. This must only be called by the tile that owns the driver instance. It should be called prior to starting the RTOS, and must be called before any of the core USB driver functions are called with this instance.

This will create an RTOS thread that runs lib_xud’s main loop. This thread is created with the highest priority and with preemption disabled.

Note

Due to implementation details of lib_xud, it is only possible to have one USB instance per application. Functionally this is not an issue, as no xcore chips have more than one USB interface.

Note

If using the Tiny USB stack, then this function should not be called directly by the application. The xcore device port for Tiny USB takes care of calling this, as well as all other USB driver functions.

Parameters:
  • ctx – A pointer to the USB driver instance to start.

  • io_core_mask – A bitmask representing the cores on which the low level USB I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

  • isr_cb – The callback function for the driver to call when transfers are completed.

  • isr_app_data – A pointer to application specific data to pass to the application’s ISR callback function isr_cb.

XUD_Result_t rtos_usb_simple_transfer_complete(rtos_usb_t *ctx, uint32_t endpoint_addr, size_t *len, unsigned timeout)

This function may be called to wait for a transfer on a particular endpoint to complete. This requires that the USB instance was initialized with rtos_usb_simple_init().

Parameters:
  • ctx – A pointer to the USB driver instance to use.

  • endpoint_addr – The address of the endpoint to wait for.

  • len – The actual number of bytes transferred. For IN endpoints, this will be the same as the length requested by rtos_usb_endpoint_transfer_start(). For OUT endpoints, it may be less.

  • timeout – The maximum amount of time to wait for the transfer to complete before returning.

Return values:
  • XUD_RES_OKAY – if the transfer was completed successfully.

  • XUD_RES_RST – if the transfer was not able to complete and the USB bus needs to be reset. In this case, the application should reset the USB bus.

  • XUD_RES_ERR – if there was an unexpected error transferring the data.

void rtos_usb_simple_init(rtos_usb_t *ctx, uint32_t io_core_mask)

Initializes an RTOS USB driver instance. This must only be called by the tile that owns the driver instance. It should be called prior to starting the RTOS, and must be called before any of the core USB driver functions are called with this instance.

This initialization function may be used instead of rtos_usb_init() if the application is not using a USB stack. This allows application threads to wait for transfers to complete with the rtos_usb_simple_transfer_complete() function. The application cannot provide its own ISR callback when initialized with this function. This provides a similar programming interface as a traditional bare metal xcore application using lib_xud.

This will create an RTOS thread that runs lib_xud’s main loop. This thread is created with the highest priority and with preemption disabled.

Note

Due to implementation details of lib_xud, it is only possible to have one USB instance per application. Functionally this is not an issue, as no xcore chips have more than one USB interface.

Parameters:
  • ctx – A pointer to the USB driver instance to start.

  • io_core_mask – A bitmask representing the cores on which the low level USB I/O thread created by the driver is allowed to run. Bit 0 is core 0, bit 1 is core 1, etc.

RTOS_USB_ENDPOINT_COUNT_MAX

The maximum number of USB endpoint numbers supported by the RTOS USB driver.

RTOS_USB_ISR_CALLBACK_ATTR

This attribute must be specified on the RTOS USB interrupt callback function provided by the application.

struct rtos_usb_ep_xfer_info_t
#include <rtos_usb.h>

Struct to hold USB transfer state data per endpoint, used as the argument to the ISR.

The members in this struct should not be accessed directly.

struct rtos_usb_struct
#include <rtos_usb.h>

Struct representing an RTOS USB driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Trace Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#trace-driver

This driver can be used to instantiate an xscope-based trace module in an RTOS application. The trace module currently supports both a demonstrative ASCII-mode and Percepio’s Tracealzyer on FreeRTOS. Both modes are dependent on RTOS-specific hooks/macros to handle the majority of RTOS event recording and integration.

For general usage of the FreeRTOS trace functionality please refer to FreeRTOS’ documentation here: RTOS Trace Macros

For basic information on printf debugging using xscope please refer to the tools guide here: XSCOPE debugging

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Trace Configuration£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#trace-configuration

In order to use the trace driver module, the following common steps must be performed:

  1. Add rtos::drivers::trace as a linked library for the desired CMake target application.

  2. The target application’s compiler arguments must include the -fxscope option.

  3. The target application’s list of sources must include an .xscope file with the first probe specified as:

    <Probe name="freertos_trace" type="CONTINUOUS" datatype="NONE" units="NONE" enabled="true"/>
    
  4. Include xcore_trace.h at the end of the RTOS configuration file (i.e. FreeRTOSConfig.h).

  5. Enable both configUSE_TRACE_FACILITY and configGENERATE_RUN_TIME_STATS in FreeRTOSConfig.h.

  6. Continue reading the following sections based on which trace mode is to be used. Additional configuration steps are required.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Tracealyzer Mode£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#tracealyzer-mode

The trace driver supports Percepio’s Tracealyzer, a feature rich tool for working with trace files. This implementation supports Tracealyzer’s streaming mode; currently, snapshot mode is not supported. The current underlying trace recording implementation interfaces with the xscope_core_bytes API function (on Probe 0).

To select Tracealyzer as the trace module’s event recorder, the following must be set. This can be applied at the CMake project level:

#define USE_TRACE_MODE TRACE_MODE_TRACEALYZER_STREAMING

Note

xcore_trace.h contains the definition for these modes.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Tracealyzer Initialization£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#tracealyzer-initialization

In addition to the configuration steps outlined above, Percepio’s Tracealyzer streaming mode needs additional function calls to start recording trace data. In the most basic use-case, the following functions should be called on the XCORE tile that is to record trace data:

xTraceInitialize();
xTraceEnable(TRC_START);

Note

xTraceInitialize must be called before any RTOS interaction (before any traced objects are being interacted with). It is advisable to call it as soon as possible in the application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$Tracealyzer Usage£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#tracealyzer-usage

The Percepio’s Tracealzyer C-unit outputs to a stream-able file format called Percepio Streaming Format (PSF). The xscope2psf utility aids in the extraction of the PSF file from the underlying xscope communication (making it readily available on the host’s filesystem). This tool can be configured to read from a VCD (value change dump) file that is generated when specifying the xgdb option –xscope-port <ip:port>, or it can be configured as an xscope-endpoint when specifying the –xscope-port <ip:port> option. Both options can be processed by the Tracealyzer graphical tool either as a post processing step or live.

Note

xscope2psf currently resides in a Tracealyzer example application here: example. This is likely to change in the future. Refer to either the README or the application’s help documentation for usage details.

Note

Currently, the only supported PSF Streaming target connection type is File System. Ensure this connection type is specified under Tracealyzer’s Recording Settings.

For general usage of Tracealyzer please refer to the Percepio’s documentation here: Manual

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$ASCII Mode£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#ascii-mode

The trace driver supports a basic ASCII mode that is primarily meant as an example for expanding support to other tracing tools/frameworks. In this mode, only the following FreeRTOS trace hooks are supported:

  • traceTASK_SWITCHED_IN

  • traceTASK_SWITCHED_OUT

This implementation will produce xscope logs for the RTOS task switching. The underlying xscope API xscope_core_bytes is used for communicating this information.

To select ASCII mode as the trace module’s event recorder, the following must be set. This can be applied at the CMake project level:

#define USE_TRACE_MODE TRACE_MODE_XSCOPE_ASCII

Note

xcore_trace.h contains the definition for these modes.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$ASCII Mode Initialization£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#ascii-mode-initialization

No additional steps are required for ASCII mode to start recording trace events to xscope.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$I/O$$$ASCII Mode Usage£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#ascii-mode-usage

To begin capturing ASCII mode traces, run xgdb with the –xscope-file option. Task switching events will be recorded to the specified VCD (value change dump) file.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE£££modules/rtos/doc/programming_guide/reference/rtos_drivers/trace.html#xcore
XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Clock Control RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/clock_control.html#clock-control-rtos-driver

This driver can be used to operate GPIO ports on xcore in an RTOS application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/clock_control.html#initialization-api

The following structures and functions are used to initialize and start a GPIO driver instance.

typedef struct rtos_clock_control_struct rtos_clock_control_t

Typedef to the RTOS Clock Control driver instance struct.

void rtos_clock_control_start(rtos_clock_control_t *ctx)

Starts an RTOS clock control driver instance. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before any of the core clock control driver functions are called with this instance.

rtos_clock_control_init() must be called on this clock control driver instance prior to calling this.

Parameters:
  • ctx – A pointer to the clock control driver instance to start.

void rtos_clock_control_init(rtos_clock_control_t *ctx)

Initializes an RTOS clock control driver instance. There should only be one per tile. This must only be called by the tile that owns the driver instance. It may be called either before or after starting the RTOS, but must be called before calling rtos_clock_control_start() or any of the core clock control driver functions with this instance.

Parameters:
  • ctx – A pointer to the GPIO driver instance to initialize.

struct rtos_clock_control_struct
#include <rtos_clock_control.h>

Struct representing an RTOS clock control driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/clock_control.html#core-api

The following functions are the core GPIO driver functions that are used after it has been initialized and started.

inline void rtos_clock_control_set_ref_clk_div(rtos_clock_control_t *ctx, unsigned divider)

Sets the reference clock divider register value for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

  • divider – The value + 1 to write to XS1_SSWITCH_REF_CLK_DIVIDER_NUM

inline unsigned rtos_clock_control_get_ref_clk_div(rtos_clock_control_t *ctx)

Gets the reference clock divider register value for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

inline void rtos_clock_control_set_processor_clk_div(rtos_clock_control_t *ctx, unsigned divider)

Sets the tile clock divider register value for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

  • divider – The value + 1 to write to XS1_PSWITCH_PLL_CLK_DIVIDER_NUM

inline unsigned rtos_clock_control_get_processor_clk_div(rtos_clock_control_t *ctx)

Gets the tile clock divider register value for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

inline void rtos_clock_control_set_switch_clk_div(rtos_clock_control_t *ctx, unsigned divider)

Sets the switch clock divider register value for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

  • divider – The value + 1 to write to XS1_SSWITCH_CLK_DIVIDER_NUM

inline unsigned rtos_clock_control_get_switch_clk_div(rtos_clock_control_t *ctx)

Gets the switch clock divider register value for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

inline unsigned rtos_clock_control_get_ref_clock(rtos_clock_control_t *ctx)

Gets the calculated reference clock frequency for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

inline unsigned rtos_clock_control_get_processor_clock(rtos_clock_control_t *ctx)

Gets the calculated core clock frequency for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

inline unsigned rtos_clock_control_get_switch_clock(rtos_clock_control_t *ctx)

Gets the calculated switch clock frequency for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

Sets the intra token delay and inter token delay to the xlinks within an address range, inclusive, for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

  • start_addr – The starting link address

  • end_addr – The ending address

  • delay_intra – The intra token delay value

  • delay_inter – The inter token delay value

Resets the xlinks within an address range, inclusive for the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

  • start_addr – The starting link address

  • end_addr – The ending address

inline void rtos_clock_control_set_node_pll_ratio(rtos_clock_control_t *ctx, unsigned pre_div, unsigned mul, unsigned post_div)

Sets the tile clock PLL control register value on the tile that owns this driver instance. The value set is calculated from the divider stage 1, multiplier stage, and divider stage 2 values provided.

VCO freq = fosc * (F + 1) / (2 * (R + 1)) VCO must be between 260MHz and 1.3GHz for XS2 Core freq = VCO / (OD + 1)

Refer to the xcore Clock Frequency Control document for more details.

Note: This function will not reset the chip and wait for the PLL to settle before re-enabling the chip to allow for large frequency jumps. This will cause a delay during settings.

Note: It is up to the application to ensure that it is safe to change the clock.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

  • pre_div – The value of R

  • mul – The value of F

  • post_div – The value of OD

inline void rtos_clock_control_get_node_pll_ratio(rtos_clock_control_t *ctx, unsigned *pre_div, unsigned *mul, unsigned *post_div)

Gets the divider stage 1, multiplier stage, and divider stage 2 values from the tile clock PLL control register values on the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

  • pre_div – A pointer to be populated with the value of R

  • mul – A pointer to be populated with the value of F

  • post_div – A pointer to be populated with the value of OD

inline void rtos_clock_control_get_local_lock(rtos_clock_control_t *ctx)

Gets the local lock for clock control on the tile that owns this driver instance. This is intended for applications to use to prevent clock changes around critical sections.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

inline void rtos_clock_control_release_local_lock(rtos_clock_control_t *ctx)

Releases the local lock for clock control on the tile that owns this driver instance.

Parameters:
  • ctx – A pointer to the clock control driver instance to use.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$RPC Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/clock_control.html#rpc-initialization-api

The following functions may be used to share a GPIO driver instance with other xcore tiles. Tiles that the driver instance is shared with may call any of the core functions listed above.

void rtos_clock_control_rpc_client_init(rtos_clock_control_t *cc_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *host_intertile_ctx)

Initializes an RTOS clock control driver instance on a client tile. This allows a tile that does not own the actual driver instance to use a driver instance on another tile. This will be called instead of rtos_clock_control_init(). The host tile that owns the actual instance must simultaneously call rtos_clock_control_rpc_host_init().

Parameters:
  • cc_ctx – A pointer to the clock control driver instance to initialize.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as cc_ctx.

  • host_intertile_ctx – A pointer to the intertile driver instance to use for performing the communication between the client and host tiles. This must have the same scope as cc_ctx.

void rtos_clock_control_rpc_host_init(rtos_clock_control_t *cc_ctx, rtos_driver_rpc_t *rpc_config, rtos_intertile_t *client_intertile_ctx[], size_t remote_client_count)

Performs additional initialization on a clock control driver instance to allow client tiles to use the clock control driver instance. Each client tile that will use this instance must simultaneously call rtos_clock_control_rpc_client_init().

Parameters:
  • cc_ctx – A pointer to the clock control driver instance to share with clients.

  • rpc_config – A pointer to an RPC config struct. This must have the same scope as cc_ctx.

  • client_intertile_ctx – An array of pointers to the intertile driver instances to use for performing the communication between the host tile and each client tile. This must have the same scope as cc_ctx.

  • remote_client_count – The number of client tiles to share this driver instance with.

void rtos_clock_control_rpc_config(rtos_clock_control_t *cc_ctx, unsigned intertile_port, unsigned host_task_priority)

Configures the RPC for a clock control driver instance. This must be called by both the host tile and all client tiles.

On the client tiles this must be called after calling rtos_clock_control_rpc_client_init(). After calling this, the client tile may immediately begin to call the core clock control functions on this driver instance. It does not need to wait for the host to call rtos_clock_control_start().

On the host tile this must be called both after calling rtos_clock_control_rpc_host_init() and before calling rtos_clock_control_start().

Parameters:
  • cc_ctx – A pointer to the clock control driver instance to configure the RPC for.

  • intertile_port – The port number on the intertile channel to use for transferring the RPC requests and responses for this driver instance. This port must not be shared by any other functions. The port must be the same for the host and all its clients.

  • host_task_priority – The priority to use for the task on the host tile that handles RPC requests from the clients.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Device Firmware Update RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/dfu.html#device-firmware-update-rtos-driver

This driver can be used to instantiate and manipulate various flash partitions on xcore in an RTOS application.

For application usage refer to the tutorial RTOS Application DFU.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/dfu.html#initialization-api

The following structures and functions are used to initialize and start a DFU driver instance.

void rtos_dfu_image_init(rtos_dfu_image_t *dfu_image_ctx, fl_QSPIPorts *qspi_ports, fl_QuadDeviceSpec *qspi_specs, unsigned int len)

Initializes an RTOS DFU image driver instance. This must be called before initializing the RTOS QSPI driver instance.

This will search the flash for program images via libquadflash and store then for application DFU use.

Parameters:
  • dfu_image_ctx – A pointer to the DFU image driver instance to initialize.

  • qspi_ports – A pointer to the fl_QSPIPorts context to determine which resources to use.

  • qspi_specs – A pointer to an array of fl_QuadDeviceSpec to try to connect to.

  • len – The number of fl_QuadDeviceSpec contained in qspi_specs

struct rtos_dfu_image_t
#include <rtos_dfu_image.h>

Struct representing an RTOS DFU image driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/dfu.html#core-api

The following functions are the core DFU driver functions that are used after it has been initialized and started.

inline unsigned rtos_dfu_image_get_data_partition_addr(rtos_dfu_image_t *dfu_image_ctx)

Get the starting address of the data partition

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

Returns:

The byte address

inline unsigned rtos_dfu_image_get_factory_addr(rtos_dfu_image_t *dfu_image_ctx)

Get the starting address of the factory image

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

Returns:

The byte address

inline unsigned rtos_dfu_image_get_factory_size(rtos_dfu_image_t *dfu_image_ctx)

Get the size of the factory image

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

Returns:

The size in bytes

inline unsigned rtos_dfu_image_get_factory_version(rtos_dfu_image_t *dfu_image_ctx)

Get the version of the factory image

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

Returns:

The version

inline unsigned rtos_dfu_image_get_upgrade_addr(rtos_dfu_image_t *dfu_image_ctx)

Get the starting address of the upgrade image

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

Returns:

The byte address

inline unsigned rtos_dfu_image_get_upgrade_size(rtos_dfu_image_t *dfu_image_ctx)

Get the size of the upgrade image

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

Returns:

The size in bytes

inline unsigned rtos_dfu_image_get_upgrade_version(rtos_dfu_image_t *dfu_image_ctx)

Get the version of the upgrade image

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

Returns:

The version

void rtos_dfu_image_print_debug(rtos_dfu_image_t *dfu_image_ctx)

Print debug information

Parameters:
  • ctx – A pointer to the DFU image driver instance to use.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Intertile RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/intertile.html#intertile-rtos-driver

This driver allows for communication between AMP RTOS instances running on different xcore tiles.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/intertile.html#initialization-api

The following structures and functions are used to initialize and start an intertile driver instance.

void rtos_intertile_start(rtos_intertile_t *intertile_ctx)

Starts an RTOS intertile driver instance. It may be called either before or after starting the RTOS, but must be called before any of the core intertile driver functions are called with this instance.

rtos_intertile_init() must be called on this intertile driver instance prior to calling this.

Parameters:
  • intertile_ctx – A pointer to the intertile driver instance to start.

void rtos_intertile_init(rtos_intertile_t *intertile_ctx, chanend_t c)

Initializes an RTOS intertile driver instance. This must be called simultaneously on the two tiles establishing an intertile link. It may be called either before or after starting the RTOS, but must be called before calling rtos_intertile_start() or any of the core RTOS intertile functions with this instance.

This establishes a new streaming channel between the two tiles, using the provided non-streaming channel to bootstrap this.

Parameters:
  • intertile_ctx – A pointer to the intertile driver instance to initialize.

  • c – A channel end that is already allocated and connected to channel end on the tile with which to establish an intertile link. After this function returns, this channel end is no longer needed and may be deallocated or used for other purposes.

struct rtos_intertile_t
#include <rtos_intertile.h>

Struct representing an RTOS intertile driver instance.

The members in this struct should not be accessed directly.

struct rtos_intertile_address_t
#include <rtos_intertile.h>

Struct to hold an address to a remote function, consisting of both an intertile instance and a port number. Primarily used by the RPC mechanism in the RTOS drivers.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Core API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/intertile.html#core-api

The following functions are the core intertile driver functions that are used after it has been initialized and started.

void rtos_intertile_tx_len(rtos_intertile_t *ctx, uint8_t port, size_t len)
size_t rtos_intertile_tx_data(rtos_intertile_t *ctx, void *data, size_t len)
void rtos_intertile_tx(rtos_intertile_t *ctx, uint8_t port, void *msg, size_t len)

Transmits data to an intertile link.

Parameters:
  • ctx – A pointer to the intertile driver instance to use.

  • port – The number of the port to send the data to. Only the thread listening on this particular port on the remote tile will receive this data.

  • msg – A pointer to the data buffer to transmit.

  • len – The number of bytes from the buffer to transmit.

size_t rtos_intertile_rx_len(rtos_intertile_t *ctx, uint8_t port, unsigned timeout)
size_t rtos_intertile_rx_data(rtos_intertile_t *ctx, void *data, size_t len)
size_t rtos_intertile_rx(rtos_intertile_t *ctx, uint8_t port, void **msg, unsigned timeout)

Receives data from an intertile link.

Note

the buffer returned via msg must be freed by the application using rtos_osal_free().

Note

It is important that no other thread listen on this port simultaneously. If this happens, it is undefined which one will receive the data, and it is possible for a resource exception to occur.

Parameters:
  • ctx – A pointer to the intertile driver instance to use.

  • port – The number of the port to listen for data on. Only data sent to this port by the remote tile will be received.

  • msg – A pointer to the received data is written to this pointer variable. This buffer is obtained from the heap and must be freed by the application using rtos_osal_free().

  • timeout – The amount of time to wait before data become available.

Returns:

the number of bytes received.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$L2 Cache RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/l2_cache.html#l2-cache-rtos-driver

This driver can be used to instantiate a software defined L2 Cache for code and data.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Initialization API£££modules/rtos/doc/programming_guide/reference/rtos_drivers/l2_cache.html#initialization-api

The following structures and functions are used to initialize and start an L2 cache driver instance.

typedef struct rtos_l2_cache_struct rtos_l2_cache_t

Typedef to the RTOS l2 cache driver instance struct.

void rtos_l2_cache_start(rtos_l2_cache_t *ctx)

Starts the RTOS l2 cache memory driver.

void rtos_l2_cache_init(rtos_l2_cache_t *ctx, l2_cache_setup_fn setup_fn, l2_cache_thread_fn thread_fn, l2_cache_swmem_read_fn read_func, uint32_t io_core_mask, void *cache_buffer)

Initializes the l2 cache for use by the RTOS l2 cache memory driver.

Cache buffer must be dword aligned

RTOS_L2_CACHE_DIRECT_MAP

Convenience macro that may be used to specify the direct map cache to rtos_l2_cache_init() in place of setup_fn and thread_fn.

RTOS_L2_CACHE_TWO_WAY_ASSOCIATIVE

Convenience macro that may be used to specify the two way associative cache to rtos_l2_cache_init() in place of setup_fn and thread_fn.

RTOS_L2_CACHE_BUFFER_WORDS_DIRECT_MAP

Convenience macro that may be used to specify the size of the cache buffer for a direct map cache. A pointer to the buffer of size RTOS_L2_CACHE_BUFFER_WORDS_DIRECT_MAP should be passed to the cache_buffer argument of rtos_l2_cache_init().

RTOS_L2_CACHE_BUFFER_WORDS_TWO_WAY

Convenience macro that may be used to specify the size of the cache buffer for a two way associative cache. A pointer to the buffer of size RTOS_L2_CACHE_BUFFER_WORDS_TWO_WAY should be passed to the cache_buffer argument of rtos_l2_cache_init().

struct rtos_l2_cache_struct
#include <rtos_l2_cache.h>

Struct representing an RTOS l2 cache driver instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Drivers$$$XCORE$$$Software Memory RTOS Driver£££modules/rtos/doc/programming_guide/reference/rtos_drivers/swmem.html#software-memory-rtos-driver

This driver allows for implementing application defined software memory in an RTOS.

bool rtos_swmem_read_request_isr(unsigned offset, uint32_t *buf)

Services a software memory read request from within the software memory fill interrupt handler. This function may be provided by the application when the software memory driver is initialized with the RTOS_SWMEM_READ_FLAG flag. If the application code to satisfy a fill request requires being run from within an RTOS thread, then rtos_swmem_read_request() should be used instead. Both this handler and rtos_swmem_read_request() may be used together. If the ISR handler is able to satisfy the request it should return true. If it is not, but the request can be satisfied from within rtos_swmem_read_request(), then it should return false.

Parameters:
  • offset – The byte offset into the software memory of the cache line that has had a cache miss.

  • buf – This function must fill this with SWMEM_EVICT_SIZE_WORDS words of data. Where this data comes from is up to the application. One example is from a flash memory.

Return values:
  • true – if the fill request was satisfied.

  • false – if the fill request was not satisfied. This requires that rtos_swmem_read_request() also be provided.

bool rtos_swmem_write_request_isr(unsigned offset, uint32_t dirty_mask, const uint32_t *buf)

Services a software memory write request from within the software memory fill interrupt handler. This function may be provided by the application when the software memory driver is initialized with the RTOS_SWMEM_WRITE_FLAG flag. If the application code to satisfy an evict request requires being run from within an RTOS thread, then rtos_swmem_write_request() should be used instead. Both this handler and rtos_swmem_write_request() may be used together. If the ISR handler is able to satisfy the request it should return true. If it is not, but the request can be satisfied from within rtos_swmem_write_request(), then it should return false.

Parameters:
  • offset – The byte offset into the software memory of the cache line that is being evicted.

  • dirty_mask – A bytewise dirty mask for the data in buf. The least significant bit corresponds to the lowest byte address in buf and each subsequent byte address corresponds to the next least significant bit.

  • buf – A pointer to a buffer containing SWMEM_EVICT_SIZE_WORDS words of data from the cache line being evicted. It is up to the application what it does with this data. One example is to write it to flash memory.

Return values:
  • true – if the evict request was satisifed.

  • false – if the evict request was not satisfied. This requires that rtos_swmem_write_request() also be provided.

void rtos_swmem_read_request(unsigned offset, uint32_t *buf)

Services a software memory read request from within the software memory RTOS thread. This function may be provided by the application when the software memory driver is initialized with the RTOS_SWMEM_READ_FLAG flag. If rtos_swmem_read_request_isr() is also implemented, then it will be called first. If it is unable to satisfy the request, then this handler will be called. See the description for rtos_swmem_read_request_isr().

Parameters:
  • offset – The byte offset into the software memory of the cache line that has had a cache miss.

  • buf – This function must fill this with SWMEM_EVICT_SIZE_WORDS words of data. Where this data comes from is up to the application. One example is from a flash memory.

void rtos_swmem_write_request(unsigned offset, uint32_t dirty_mask, const uint32_t *buf)

Services a software memory write request from within the software memory RTOS thread. This function may be provided by the application when the software memory driver is initialized with the RTOS_SWMEM_WRITE_FLAG flag. If rtos_swmem_write_request_isr() is also implemented, then it will be called first. If it is unable to satisfy the request, then this handler will be called. See the description for rtos_swmem_write_request_isr().

Parameters:
  • offset – The byte offset into the software memory of the cache line that is being evicted.

  • dirty_mask – A bytewise dirty mask for the data in buf. The least significant bit corresponds to the lowest byte address in buf and each subsequent byte address corresponds to the next least significant bit.

  • buf – A pointer to a buffer containing SWMEM_EVICT_SIZE_WORDS words of data from the cache line being evicted. It is up to the application what it does with this data. One example is to write it to flash memory.

void rtos_swmem_start(unsigned priority)

Starts the RTOS software memory driver.

Parameters:
  • priority – The priority of the task that gets created by the driver to service the software memory.

void rtos_swmem_init(uint32_t init_flags)

Initializes the software memory for use by the RTOS software memory driver.

Parameters:
  • init_flags – A bitfield consisting of initialization flags.

    • RTOS_SWMEM_READ_FLAG enables swmem reads.

    • RTOS_SWMEM_WRITE_FLAG enables swmem writes.

unsigned int rtos_swmem_offset_get()

Return the offset from XS1_SWMEM_BASE to the start of the software memory.

RTOS_SWMEM_READ_FLAG

Flag indicating that software memory reads should be enabled. This should probably always be set when using software memory.

RTOS_SWMEM_WRITE_FLAG

Flag indicating that software memory writes should be enabled. This will not always need to be set, especially if flash is backing the software memory and intended to be read only.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services£££modules/rtos/doc/programming_guide/reference/rtos_services/rtos_services.html#rtos-services

Several RTOS software services are included to accelerate development of new applications.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/index.html#device-control

The Device Control Service provides the ability to configure and control an XMOS device from a host over a number of transport layers. Features of the service include:

  • Simple read/write API

  • Fully acknowledged protocol

  • Includes different transports including I2C and USB.

The table below shows combinations of host and transport mechanisms that are currently supported. Adding new transport layers and/or hosts is straightforward where the hardware supports it.

Supported Device Control Library Transports

Host

I2C

USB

PC / Windows

Yes

PC / OSX

Yes

Raspberry Pi / Linux

Yes

Yes

xCORE

Yes

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Device Control Shared API£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_shared.html#device-control-shared-api

The following structures and functions are common to the control instance on the xcore device and the host.

typedef uint8_t control_resid_t

These types are used in control functions to identify the resource id, command, version, and status.

typedef uint8_t control_cmd_t
typedef uint8_t control_version_t
typedef uint8_t control_status_t
enum control_ret_t

This type enumerates the possible outcomes from a control transaction.

Values:

enumerator CONTROL_SUCCESS
enumerator CONTROL_REGISTRATION_FAILED
enumerator CONTROL_BAD_COMMAND
enumerator CONTROL_DATA_LENGTH_ERROR
enumerator CONTROL_OTHER_TRANSPORT_ERROR
enumerator CONTROL_BAD_RESOURCE
enumerator CONTROL_MALFORMED_PACKET
enumerator CONTROL_COMMAND_IGNORED_IN_DEVICE
enumerator CONTROL_ERROR
enumerator SERVICER_COMMAND_RETRY
enumerator SERVICER_WRONG_COMMAND_ID
enumerator SERVICER_WRONG_COMMAND_LEN
enumerator SERVICER_WRONG_PAYLOAD
enumerator SERVICER_QUEUE_FULL
enumerator SERVICER_SPECIAL_COMMAND_ALREADY_ONGOING
enumerator SERVICER_SPECIAL_COMMAND_BUFFER_OVERFLOW
enumerator SERVICER_RESOURCE_ERROR
enumerator SERVICER_SPECIAL_COMMAND_WRONG_ORDER
enumerator SERVICER_SPECIAL_COMMAND_BUF_SIZE_ERROR
enum control_direction_t

This type is used to inform the control library the direction of a control transfer from the transport layer.

Values:

enumerator CONTROL_HOST_TO_DEVICE
enumerator CONTROL_DEVICE_TO_HOST
CONTROL_VERSION

This is the version of control protocol. Used to check compatibility

IS_CONTROL_CMD_READ(c)

Checks if the read bit is set in a command code.

Parameters:
  • c[in] The command code to check

Returns:

true if the read bit in the command is set

Returns:

false if the read bit is not set

CONTROL_CMD_SET_READ(c)

Sets the read bit on a command code

Parameters:
  • c[inout] The command code to set the read bit on.

CONTROL_CMD_SET_WRITE(c)

Clears the read bit on a command code

Parameters:
  • c[inout] The command code to clear the read bit on.

CONTROL_SPECIAL_RESID

This is the special resource ID owned by the control library. It can be used to check the version of the control protocol. Servicers may not register this resource ID.

CONTROL_MAX_RESOURCE_ID

The maximum resource ID. IDs greater than this cannot be registered.

CONTROL_GET_VERSION

The command to read the version of the control protocol. It must be sent to resource ID CONTROL_SPECIAL_RESID.

CONTROL_GET_LAST_COMMAND_STATUS

The command to read the return status of the last command. It must be sent to resource ID CONTROL_SPECIAL_RESID.

DEVICE_CONTROL_HOST_MODE

The mode value to use when initializing a device control instance that is on the same tile as its associated transport layer. These may be connected to device control instances on other tiles that have been initialized with DEVICE_CONTROL_CLIENT_MODE.

DEVICE_CONTROL_CLIENT_MODE

The mode value to use when initializing a device control instance that is not on the same tile as its associated transport layer. These must be connected to a device control instance on another tile that has been initialized with DEVICE_CONTROL_HOST_MODE.

DEVICE_CONTROL_CALLBACK_ATTR

This attribute must be specified on all device control command handler callback functions provided by the application.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Device Control XCORE API£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_xcore.html#device-control-xcore-api

The following structures and functions are used to initialize and start a control instance on the xcore device.

typedef control_ret_t (*device_control_read_cmd_cb_t)(control_resid_t resid, control_cmd_t cmd, uint8_t *payload, size_t payload_len, void *app_data)

Function pointer type for application provided device control read command handler callback functions.

Called by device_control_servicer_cmd_recv() when a read command is received from the transport layer. The command consists of a resource ID, command value, and a payload_len. This handler must respond with a payload of the requested length.

Param resid:

[in] Resource ID. Indicates which resource the command is intended for.

Param cmd:

[in] Command code. Note that this will be in the range 0x80 to 0xFF because bit 7 set indicates a read command.

Param payload:

[out] Payload bytes of length payload_len that will be sent back over the transport layer in response to this read command.

Param payload_len:

[in] Requested size of the payload in bytes.

Param app_data:

[inout] A pointer to application specific data provided to device_control_servicer_cmd_recv(). How and if this is used is entirely up to the application.

Return:

CONTROL_SUCCESS if the handling of the read data by the device was successful. An error code otherwise.

typedef control_ret_t (*device_control_write_cmd_cb_t)(control_resid_t resid, control_cmd_t cmd, const uint8_t *payload, size_t payload_len, void *app_data)

Function pointer type for application provided device control write command handler callback functions.

Called by device_control_servicer_cmd_recv() when a write command is received from the transport layer. The command consists of a resource ID, command value, payload, and the payload’s length.

Param resid:

[in] Resource ID. Indicates which resource the command is intended for.

Param cmd:

[in] Command code. Note that this will be in the range 0x80 to 0xFF because bit 7 set indicates a read command.

Param payload:

[in] Payload bytes of length payload_len.

Param payload_len:

[in] The number of bytes in payload.

Param app_data:

[inout] A pointer to application specific data provided to device_control_servicer_cmd_recv(). How and if this is used is entirely up to the application.

Return:

CONTROL_SUCCESS if the handling of the read data by the device was successful. An error code otherwise.

control_ret_t device_control_request(device_control_t *ctx, control_resid_t resid, control_cmd_t cmd, size_t payload_len)

Must be called by the transport layer when a new request is received.

Precisely how each of the three command parameters resid, cmd, and payload_len are received is specific to the transport layer and not defined by this library.

Parameters:
  • ctx – A pointer to the associated device control instance.

  • resid – The received resource ID.

  • cmd – The received command value.

  • payload_len – The length in bytes of the payload that will follow.

Return values:
  • CONTROL_SUCCESS – if resid has been registered by a servicer.

  • CONTROL_BAD_COMMAND – if resid has not been registered by a servicer.

control_ret_t device_control_payload_transfer(device_control_t *ctx, uint8_t *payload_buf, size_t *buf_size, control_direction_t direction)

Must be called by the transport layer either when it receives a payload, or when it requires a payload to transmit.

Parameters:
  • ctx – A pointer to the associated device control instance.

  • payload_buf – A pointer to the payload buffer.

  • buf_size – A pointer to a variable containing the size of payload_buf.

                   When \p direction is CONTROL_HOST_TO_DEVICE, no more than this
                   number of bytes will be read from it.
    
                   When \p direction is CONTROL_DEVICE_TO_HOST, this will be updated
                   to the number of bytes actually written to \p payload_buf.
    

  • direction – The direction of the payload transfer.

                   This must be CONTROL_HOST_TO_DEVICE when a payload has already
                   been received and is inside \p payload_buf.
    
                   This must be CONTROL_DEVICE_TO_HOST when a payload needs to be
                   written into \p payload_buf by device_control_payload_transfer()
                   before sending it.
    

Returns:

CONTROL_SUCCESS if everything works and the command is successfully handled by a registered servicer. An error code otherwise.

void device_control_payload_transfer_bidir(device_control_t *ctx, uint8_t *rx_buf, const size_t rx_size, uint8_t *tx_buf, size_t *tx_size)

Must be called by the transport layer when it receives a payload and requires a payload to transmit, for example, in a SPI transfer. The error status returned by the servicer handling the command is updated in the first byte of the tx_buf.

Parameters:
  • ctx – A pointer to the associated device control instance.

  • rx_buf – A pointer to the receive payload buffer.

  • rx_size – A variable containing the size of rx_buf.

                   No more than this
                   number of bytes will be read from it.
    

  • tx_buf – A pointer to the transmitr payload buffer.

  • tx_size – A pointer variable containing the size of tx_buf.

                   This will be updated
                   to the number of bytes actually written to \p tx_buf.
    

control_ret_t device_control_servicer_cmd_recv(device_control_servicer_t *ctx, device_control_read_cmd_cb_t read_cmd_cb, device_control_write_cmd_cb_t write_cmd_cb, void *app_data, unsigned timeout)

This is called by servicers to wait for and receive any commands received by the transport layer contain one of the resource IDs registered by the servicer. This is also responsible for responding to read commands.

Parameters:
  • ctx – A pointer to the device control servicer context to receive commands for.

  • read_cmd_cb – The callback function to handle read commands for all resource IDs associated with the given servicer.

  • write_cmd_cb – The callback function to handle write commands for all resource IDs associated with the given servicer.

  • app_data – A pointer to application specific data to pass along to the provided callback functions. How and if this is used is entirely up to the application.

  • timeout – The number of RTOS ticks to wait before returning if no command is received.

Return values:
  • CONTROL_SUCCESS – if a command successfully received and responded to.

  • CONTROL_ERROR – if no command is received before the function times out, or if there was a problem communicating back to the transport layer thread.

control_ret_t device_control_resources_register(device_control_t *ctx, unsigned timeout)

This must be called on the tile that runs the transport layer for the device control instance, and has initialized it with DEVICE_CONTROL_HOST_MODE. This must be called after calling device_control_start() and before the transport layer is started. It is to be run simultaneously with device_control_servicer_register() from other threads on any tiles associated with the device control instance. The number of servicers that must register is specified by the servicer_count parameter of device_control_init().

Parameters:
  • ctx – A pointer to the device control instance to register resources for.

  • timeout – The amount of time in RTOS ticks to wait before all servicers register their resource IDs with device_control_servicer_register().

Return values:
  • CONTROL_SUCCESS – if all servicers successfully register their resource IDs before the timeout.

  • CONTROL_REGISTRATION_FAILED – otherwise.

control_ret_t device_control_servicer_register(device_control_servicer_t *ctx, device_control_t *device_control_ctx[], size_t device_control_ctx_count, const control_resid_t resources[], size_t num_resources)

Registers a servicer for a device control instance. Each servicer is responsible for handling any number of resource IDs. All commands received from the transport layer will be forwarded to the servicer that has registered the resource ID that is found in the command.

Servicers may be registered on any tile that has initialized a device control instance. This must be called after calling device_control_start().

Parameters:
  • ctx – A pointer to the device control servicer context to initialize.

  • device_control_ctx – An array of pointers to the device control instance to register the servicer with.

  • device_control_ctx_count – The number of device control instances to register the servicer with.

  • resources – Array of resource IDs to associate with this servicer.

  • num_resources – The number of resource IDs within resources.

control_ret_t device_control_start(device_control_t *ctx, uint8_t intertile_port, unsigned priority)

Starts a device control instance. This must be called by all tiles that have called device_control_init(). It may be called either before or after starting the RTOS, but must be called before registering the resources and servicers for this instance.

device_control_init() must be called on this device control instance prior to calling this.

Parameters:
  • ctx – A pointer to the device control instance to start.

  • intertile_port – The port to use with any and all associated intertile instances associated with this device control instance. If this device control instance is only used by one tile then this is unused.

  • priority – The priority of the task that will be created if the device control instance was initialized with DEVICE_CONTROL_CLIENT_MODE. This is unused on the tiles where this has been initialized with DEVICE_CONTROL_HOST_MODE. This task is used to listen for commands for a resource ID registered by a servicer running on this tile, but received by the transport layer that is running on another.

control_ret_t device_control_init(device_control_t *ctx, int mode, size_t servicer_count, rtos_intertile_t *intertile_ctx[], size_t intertile_count)

Initializes a device control instance.

This must be called by the tile that runs the transport layer (I2C, USB, etc) for the device control instance, as well as all tiles that will register device control servicers for it. It may be called either before or after starting the RTOS, but must be called before calling device_control_start().

Parameters:
  • ctx – A pointer to the device control context to initialize.

  • mode – Set to DEVICE_CONTROL_HOST_MODE if the command transport layer is on the same tile. Set to DEVICE_CONTROL_CLIENT_MODE if the command transport layer is on another tile.

  • servicer_count – The number of servicers that will be associated with this device control instance.

  • intertile_ctx – An array of intertile contexts used to communicate with other tiles.

  • intertile_count – The number of intertile contexts in the intertile_ctx array.

                       When \p mode is DEVICE_CONTROL_HOST_MODE, this may be 0 if there are
                       no servicers on other tiles, up to one per device control instance that
                       has been initialized with DEVICE_CONTROL_CLIENT_MODE on other tiles.
    
                       When \p mode is DEVICE_CONTROL_CLIENT_MODE then this must be 1,
                       and the intertile context must connect to a device control instance
                       on another tile that has been initialized with DEVICE_CONTROL_HOST_MODE.
    

Returns:

CONTROL_SUCCESS if the initialization was successful. An error status otherwise.

struct device_control_t
#include <device_control.h>

Struct representing a device control instance.

The members in this struct should not be accessed directly.

struct device_control_client_t
#include <device_control.h>

A device_control_t pointer may be cast to a pointer to this structure type and used with the device control API, provided it is initialized with DEVICE_CONTROL_CLIENT_MODE. This is not necessary to do, but will save a small amount of memory.

struct device_control_servicer_t
#include <device_control.h>

Struct representing a device control servicer instance.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Device Control Host API£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_host_api.html#device-control-host-api

The following structures and functions are used to initialize and call a control instance on the host.

control_ret_t control_init_i2c(unsigned char i2c_slave_address)

Initialize the xscope host interface

Parameters:
  • host_str – String containing the name of the xscope host. Eg. “localhost”

  • port_str – String containing the port number of the xscope host

  • spi_mode – Mode that the SPI will run in

  • clock_divider – The amount to divide the Raspberry Pi’s clock by, e.g. BCM2835_SPI_CLOCK_DIVIDER_1024 gives a clock of ~122kHz on the RPI 2.

  • delay_for_read – Delay between send and recieve for read command

  • spi_mode – Mode that the SPI will run in

  • spi_bitrate – Bitrate for SPI to run at

  • delay_for_read – Delay between send and recieve for read command

  • i2c_slave_address – I2C address of the slave (controlled device)

Returns:

Whether the initialization was successful or not Shutdown the xscope host interface

Returns:

Whether the shutdown was successful or not Initialize the SPI host (master) interface for the Raspberry Pi

Returns:

Whether the initialization was successful or not Initialize the SPI host (master) interface

Returns:

Whether the initialization was successful or not Shutdown the SPI host (master) interface connection

Returns:

Whether the shutdown was successful or not Initialize the I2C host (master) interface

Returns:

Whether the initialization was successful or not

control_ret_t control_cleanup_i2c(void)

Shutdown the I2C host (master) interface connection

Returns:

Whether the shutdown was successful or not

control_ret_t control_init_usb(int vendor_id, int product_id, int interface_num)

Initialize the USB host interface

Parameters:
  • vendor_id – Vendor ID of controlled USB device

  • product_id – Product ID of controlled USB device

  • interface_num – USB Control interface number of controlled device

Returns:

Whether the initialization was successful or not

control_ret_t control_cleanup_usb(void)

Shutdown the USB host interface connection

Returns:

Whether the shutdown was successful or not

control_ret_t control_init_spi_pi(spi_mode_t spi_mode, bcm2835SPIClockDivider clock_divider, long intertransation_delay_ns)

Initialize the SPI host (master) interface for the Raspberry Pi

Parameters:
  • spi_mode – Mode that the SPI will run in

  • clock_divider – The amount to divide the Raspberry Pi’s clock by, e.g. BCM2835_SPI_CLOCK_DIVIDER_1024 gives a clock of ~122kHz on the RPI 2.

  • intertransaction_delay – Delay in nanoseconds that will be applied between each spi transaction. This is implemented with nanosleep() from time.h.

Returns:

Whether the initialization was successful or not

control_ret_t control_cleanup_spi(void)

Shutdown the SPI host (master) interface connection

Returns:

Whether the shutdown was successful or not

control_ret_t control_query_version(control_version_t *version)

Checks to see that the version of control library in the device is the same as the host

Parameters:
  • version – Reference to control version variable that is set on this call

Returns:

Whether the checking of control library version was successful or not

control_ret_t control_write_command(control_resid_t resid, control_cmd_t cmd, const uint8_t payload[], size_t payload_len)

Request to write to controllable resource inside the device. The command consists of a resource ID, command and a byte payload of length payload_len.

Parameters:
  • resid – Resource ID. Indicates which resource the command is intended for

  • cmd – Command code. Note that this will be in the range 0x80 to 0xFF because bit 7 set indiciates a write command

  • payload – Array of bytes which constitutes the data payload

  • payload_len – Size of the payload in bytes

Returns:

Whether the write to the device was successful or not

control_ret_t control_read_command(control_resid_t resid, control_cmd_t cmd, uint8_t payload[], size_t payload_len)

Request to read from controllable resource inside the device. The command consists of a resource ID, command and a byte payload of length payload_len.

Parameters:
  • resid – Resource ID. Indicates which resource the command is intended for

  • cmd – Command code. Note that this will be in the range 0x80 to 0xFF because bit 7 set indiciates a write command

  • payload – Array of bytes which constitutes the data payload

  • payload_len – Size of the payload in bytes

Returns:

Whether the read from the device was successful or not

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Command Transport Protocol£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_protocol.html#command-transport-protocol
XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Transport protocol for control parameters£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_protocol.html#transport-protocol-for-control-parameters

Control parameters are converted to an array of bytes in network byte order (big endian) before they’re sent over the transport protocol. For example, to set a control parameter to integer value 305419896 which corresponds to hex 0x12345678, the array of bytes sent over the transport protocol would be {0x12, 0x34, 0x56, 0x78}. Similarly, a 4 byte payload {0x00, 0x01, 0x23, 0x22} read over the transport protocol is interpreted as an integer value 0x00012322.

In addition to the control parameters values, commands include Resource ID, the Command ID and Payload Length fields that must be communicated from the host to the device. The Resource ID is an 8-bit identifier that identifies the resource within the device that the command is for. The Command ID is an 8-bit identifier used to identify a command for a resource in the device. Payload length is the length of the data in bytes that the host wants to write to the device or read from the device.

The payload length is interpreted differently for GET_ and SET_ commands. For SET_commands, the payload length is simply the number of bytes worth of control parameters to write to the device. For example, the payload length for a SET_ command to set a control parameter of type int32 to a certain value, would be set to 4. For GET_ commands the payload length is 1 more than the number of bytes of control parameters to read from the device. For example, a GET_ command to read a parameter of type int32, payload length would be set to 5. The one extra byte is used for status and is the first byte (payload[0]) of the payload received from the device. In the example above, payload[0] would be the status byte and payload[1]..payload[4] would be the 4 bytes that make up the value of the control parameter.

The table below lists the different values of the status byte and the action the user is expected to take for each status:

Values for returned status byte

Return code

Values

Description

ctrl_done

0

Read command successful. The payload bytes contain valid payload returned from the device

ctrl_wait

1

Read command not serviced. Retry until ctrl_done status returned

ctrl_invalid

3

Error in read command. Abort and debug

The GET_commands need the extra status byte since the device might not return the control parameter value immediately due to timing constraints. If that is the case the status byte would indicate the status as ctrl_wait and the user would need to retry the command. When returned a ctrl_wait, the user is expected to retry the GET_ command until the status is returned as ctrl_done. The first GET_command is placed in a queue and it will be serviced by the end of each 15ms audio frame. Once the status byte indicates ctrl_done, the rest of the bytes in the payload indicate the control parameter value.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Transporting control parameters over I2C£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_protocol.html#transporting-control-parameters-over-i2c

This section describes the I2C command sequence when issuing read and write commands to the device.

The first byte sent over I2C after start contains the device address and information about whether this is an I2C read transaction or a write transaction. This byte is 0x58 for a write command or 0x59 for a read command. These values are derived by left shifting the device address (0x2c) by 1 and doing a logical OR of the resulting value with 0 for an I2C write and 1 for an I2C read.

The bytes sequence sent between I2C start and stop for SET_ commands is shown in the figure below.

../_images/set_byte_sequence.png

For GET_ commands, the I2C commands sequence consists of a write command followed by a read command with a repeated start between the 2 commands. The write command writes the resource ID, command ID and the expected data length to the device and the read command reads the status byte followed by the rest of the payload that makes up the control parameter value. The figure below shows the I2C bytes sequence sent and received for a GET_ command.

../_images/get_byte_sequence.png
XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Transporting control parameters over USB£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_protocol.html#transporting-control-parameters-over-usb

Use the vendor_id 0x20B1, product_id 0x0020 and interface number 0 to initialize for USB.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Device Control$$$Floating point to fixed point (Q format) conversion£££modules/rtos/doc/programming_guide/reference/rtos_services/device_control/device_control_protocol.html#floating-point-to-fixed-point-q-format-conversion

Numbers with fractional parts can be represented as floating-point or fixed-point numbers. Floating point formats are widely used but carry performance overheads. Fixed point formats can improve system efficiency and are used extensively within the XVF3610. Fixed point numbers have the position of the decimal point fixed and this is indicated as a part of the format description.

In this document, Q format is used to describe fixed point number formats, with the representation given as Qm.n format where m is the number of bits reserved for the sign and integer part of the number and n is the number of bits reserved for the fractional part of the number. The position of the decimal point is a trade-off between the range of values supported and the resolution provided by the fractional bits.

The dynamic range of Qm.n format is -2m-1 and 2m-1-2-n with a resolution of 2-n

To convert a floating-point format number to Qm.n format fixed-point number:

  • Multiply the floating-point number by 2m

  • Round the result to the nearest integer

  • The resulting integer number is the Qm.n fixed-point representation of the initial floating-point number

To convert a Qm.n fixed-point number to floating-point:

  • Divide the fixed-point number by 2m

  • The resulting decimal number is a floating-point representation of the fixed-point number.

Converting a number into fixed point format and then back to a floating point number may introduce an error of up to ±2-(n+1)

Example:

To represent a floating-point number 14.765467 in Q8.24 format, the equivalent fixed-point number would be 14.765467 x 224 = 247723429.2 which rounds to 247723429.

To get back the floating-point number given the Q8.24 number 247723429, calculate 247723429 ÷ 224 and get back the floating-point number as 14.76546699. The difference of 0.00000001 is correct to with the error bounds of ±2-25 which is ±0.00000003

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Concurrency Support£££modules/rtos/doc/programming_guide/reference/rtos_services/concurrency_support/concurrency_support.html#concurrency-support

The concurrency support sw_service contains a multiple reader single writer lock to support multitheaded applications that need to safely support shared access to a single hardware or software resource. This implementation supports either reader preferred or writer preferred locks.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Concurrency Support$$$Concurrency Support API£££modules/rtos/doc/programming_guide/reference/rtos_services/concurrency_support/concurrency_support_api.html#concurrency-support-api

The following structures and functions are used to initialize a multiple reader single writer lock instance.

enum mrsw_lock_type_t

Values:

enumerator MRSW_READER_PREFERRED
enumerator MRSW_WRITER_PREFERRED
enumerator MRSW_COUNT
typedef struct mrsw_lock mrsw_lock_t

Struct representing an MRSW instance.

The members in this struct should not be accessed directly.

typedef struct read_pref_mrsw_lock read_pref_mrsw_lock_t

Struct representing an reader preferred MRSW

The members in this struct should not be accessed directly.

typedef struct write_pref_mrsw_lock write_pref_mrsw_lock_t

Struct representing an writer preferred MRSW

The members in this struct should not be accessed directly.

rtos_osal_status_t mrsw_lock_create(mrsw_lock_t *ctx, char *name, mrsw_lock_type_t type)

Create a MRSW lock

Parameters:
  • ctx – A pointer to an uninitialized lock context

  • name – An optional ASCII name

  • type – The type of lock

Returns:

RTOS_OSAL_SUCCESS on success

rtos_osal_status_t mrsw_lock_delete(mrsw_lock_t *ctx)

Destroy a MRSW lock

Note: This does not check if it is safe to delete locks

Parameters:
  • ctx – A pointer to the associated lock context

Returns:

RTOS_OSAL_SUCCESS on success RTOS_OSAL_ERROR otherwise

struct mrsw_lock
#include <mrsw_lock.h>

Struct representing an MRSW instance.

The members in this struct should not be accessed directly.

struct read_pref_mrsw_lock
#include <mrsw_lock.h>

Struct representing an reader preferred MRSW

The members in this struct should not be accessed directly.

struct write_pref_mrsw_lock
#include <mrsw_lock.h>

Struct representing an writer preferred MRSW

The members in this struct should not be accessed directly.

The following functions are used to use a multiple reader single writer lock instance as a reader.

rtos_osal_status_t mrsw_lock_reader_get(mrsw_lock_t *ctx, unsigned timeout)

Attempt to acquire a lock as a reader.

Parameters:
  • ctx – A pointer to the associated lock context

  • timeout – A timeout before giving up

Returns:

RTOS_OSAL_SUCCESS on success RTOS_OSAL_TIMEOUT on timeout RTOS_OSAL_ERROR otherwise

rtos_osal_status_t mrsw_lock_reader_put(mrsw_lock_t *ctx)

Give an acquired lock as a reader.

Note: User must not give a lock they do not own.

Parameters:
  • ctx – A pointer to the associated lock context

Returns:

RTOS_OSAL_SUCCESS on success RTOS_OSAL_ERROR otherwise

The following functions are used to use a multiple reader single writer lock instance as a writer.

rtos_osal_status_t mrsw_lock_writer_get(mrsw_lock_t *ctx, unsigned timeout)

Attempt to acquire a lock as a writer.

Parameters:
  • ctx – A pointer to the associated lock context

  • timeout – A timeout before giving up

Returns:

RTOS_OSAL_SUCCESS on success RTOS_OSAL_TIMEOUT on timeout RTOS_OSAL_ERROR otherwise

rtos_osal_status_t mrsw_lock_writer_put(mrsw_lock_t *ctx)

Give an acquired lock as a writer.

Note: User must not give a lock they do not own.

Parameters:
  • ctx – A pointer to the associated lock context

Returns:

RTOS_OSAL_SUCCESS on success RTOS_OSAL_ERROR otherwise

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Generic Pipeline£££modules/rtos/doc/programming_guide/reference/rtos_services/generic_pipeline/generic_pipeline.html#generic-pipeline

The generic pipeline service provides a generic construct to create multithreaded pipelines. This can be used to create a variety of sequential operations on data, such as an audio processing pipeline.

The generic_pipeline_init() creates stage_count tasks. In the first stage the application provided input_data function pointer is called. The data then is passed to the first stage_function. After the first state function the data is passed by an RTOS queue to the subsequent stage function. Middle stage functions receive from the previous stage queue, call the stage function, and output to the next stage queue. The last stage function will receive from the previous stage queue, call the stage function, and then call the output_data function pointer.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Generic Pipeline$$$Generic Pipeline Example£££modules/rtos/doc/programming_guide/reference/rtos_services/generic_pipeline/generic_pipeline_example.html#generic-pipeline-example

This code snippet is an example of creating a pipeline to consume a buffer.

Example generic pipeline use
static void *input_func(void *input_app_data)
{
    uint32_t* data = pvPortMalloc(100 * sizeof(uint32_t));

    /* Populate some dummy data */
    for(int i=0; i<100; i++)
    {
        data[i] = i;
    }

    return data;
}

static void *output_func(void *data, void *output_app_data)
{
    /* Use data here */
    for(int i=0; i<100; i++)
    {
        rtos_printf("val[%d] = %d\n", i, (uint32_t*)data[i]);
    }

    return 1;   /* Return nonzero value for generic pipeline to implicitly free the packet */
}

static void stage0(void *data)
{
    /* Perform operation on data here*/
    ;
}

static void stage1(void *data)
{
    /* Perform operation on data here*/
    ;
}

static void stage2(void *data)
{
    /* Perform operation on data here*/
    ;
}
Example generic pipeline use
const pipeline_stage_t stages[] = {
    (pipeline_stage_t)stage0,
    (pipeline_stage_t)stage1,
    (pipeline_stage_t)stage2,
};

const configSTACK_DEPTH_TYPE stage_stack_sizes[] = {
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage0) + RTOS_THREAD_STACK_SIZE(input_func),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage1),
    configMINIMAL_STACK_SIZE + RTOS_THREAD_STACK_SIZE(stage2) + RTOS_THREAD_STACK_SIZE(output_func),
};

generic_pipeline_init((pipeline_input_t)input_func,
                    (pipeline_output_t)output_func,
                    NULL,
                    NULL,
                    stages,
                    (const size_t*) stage_stack_sizes,
                    configMAX_PRIORITIES,
                    stage_count);
XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$API Reference$$$RTOS Services$$$Generic Pipeline$$$Generic Pipeline API£££modules/rtos/doc/programming_guide/reference/rtos_services/generic_pipeline/generic_pipeline_api.html#generic-pipeline-api

The following structures and functions are used to initialize and start a generic pipeline instance.

typedef void *(*pipeline_input_t)(void *input_data)

Function pointer type for application provided generic pipeline input callback functions.

Called by the first generic_pipeline_stage() when the stage wants input data. This data pointer is provided to the first stage function to be processed.

Param input_data:

A pointer to application specific data

Return:

A frame pointer to be used by the pipeline stages

typedef int (*pipeline_output_t)(void *data, void *output_data)

Function pointer type for application provided generic pipeline output callback functions.

Called by the last generic_pipeline_stage() when the stage wants is done processing the data.

Param data:

A pointer to the processed data

Param output_data:

A pointer to application specific data

Return:

0, to take ownership of data pointer otherwise, request the generic pipeline to free data internally

typedef void (*pipeline_stage_t)(void *data)

Function pointer type for application provided generic pipeline stage callback functions.

Called by each generic_pipeline_stage() after input data is received.

Param data:

A pointer to the data. This buffer is used for both input and output.

void generic_pipeline_init(const pipeline_input_t input, const pipeline_output_t output, void *const input_data, void *const output_data, const pipeline_stage_t *const stage_functions, const size_t *const stage_stack_word_sizes, const int pipeline_priority, const int stage_count)

Create a multistage generic pipeline.

This function will create a multistage pipeline, creating a task per stage and connecting them via queues. Each stage task follows the convention:

  • Get input data

  • Process data

  • Push output data

For the first stage, the input data are the provided by the input callback. For the final stage, the output data are provided to the output callback.

Parameters:
  • input – A function pointer called to get input data

  • output – A function pointer called to give output data

  • input_data – A pointer to application specific data to pass to the input callback function

  • output_data – A pointer to application specific data to pass to the output callback function

  • stage_functions – An array of stage function pointers

  • stage_stack_word_sizes – The stack size of each stage. Note: For the first stage must contain enough stack for the stage function + input function. Likewise, the last stage must contain enough stack for the stage function + output function.

  • pipeline_priority – The priority of all pipeline tasks

  • stage_count – The number of stages. The limit is 10 stages.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$FAQs£££modules/rtos/doc/programming_guide/faq.html#faqs

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$FAQs$$$What is the memory overhead of the FreeRTOS kernel?£££modules/rtos/doc/programming_guide/faq.html#what-is-the-memory-overhead-of-the-freertos-kernel

The FreeRTOS kernel can be configured to require as little as 9kB of RAM (per tile). In a typical applicaiton, expect the requirement to be closer to 16kB of RAM (per tile).

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$FAQs$$$How do I determine the number of words to allocate for use as a task’s stack?£££modules/rtos/doc/programming_guide/faq.html#how-do-i-determine-the-number-of-words-to-allocate-for-use-as-a-task-s-stack

Since tasks run within FreeRTOS, the RTOS stack requirement must be known at compile time. In FreeRTOS applications on most other microcontrollers, the general practice is to create a task with a large amount of stack, use the FreeRTOS stack debug functions to determine the worst case runtime usage of stack, and then adjust the stack memory value accordingly. The problem with this method is that the stack of any given thread varies greatly based on the functions that are called within, and thus a code or compiler optimization change result in the optimal task stack usage to have to be redetermined. This issue results in many FreeRTOS applications being written in such a way that wastes memory, by providing task with way more stack than they should need. Additionally, stack overflow bugs can remain hidden for a long time and even when bugs do manifest, the source can be difficult to pinpoint.

The XTC Tools address this issue by creating a symbol that represents the maximum stack requirement of any function at compile time. By using the RTOS_THREAD_STACK_SIZE() macro, for the stack words argument for creating a FreeRTOS task, it is guaranteed that the optimal stack requirement is used, provided that the function does not call function pointers nor can infinitely recurse.

xTaskCreate((TaskFunction_t) example_task,
            "example_task",
            RTOS_THREAD_STACK_SIZE(example_task),
            NULL,
            EXAMPLE_TASK_PRIORITY,
            NULL);

If function pointers are used within a thread, then the application programmer must annotate the code with the appropriate function pointer group attribute. For recursive functions, the only option is to specify the stack manually. See Appendix A - Guiding Stack Size Calculation in the XTC Tools documentation for more information.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$FAQs$$$Can I use xcore resources like channels, timers and hw_locks?£££modules/rtos/doc/programming_guide/faq.html#can-i-use-xcore-resources-like-channels-timers-and-hw-locks

You are free to use channels, ports, timers, etc… in your FreeRTOS applications. However, some considerations need to be made. The RTOS kernel knows about RTOS primitives. For example, if RTOS thread A attempts to take a semaphore, the kernel is free to schedule other tasks in thread A’s place while thread A is waiting for some other task to give the semaphore. The RTOS kernel does not know anything about xcore resources. For example, if RTOS thread A attempts to recv on a channel, the kernel is not free to schedule other tasks in its place while thread A is waiting for some other task to send to the other end of the channel. A developer should be aware that blocking calls on xcore resources will block a FreeRTOS thread. This may be OK as long as it is carefully considered in the application design. There are a variety of methods to handle the decoupling of xcore and RTOS resources. These can be best seen in the various RTOS drivers, which wrap the realtime IO hardware imitation layer.

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Common Issues£££modules/rtos/doc/programming_guide/common_issues.html#common-issues

XCORE ® -VOICE Solutions$$$RTOS Programming Guide$$$Common Issues$$$Task Stack Space£££modules/rtos/doc/programming_guide/common_issues.html#task-stack-space

One easy to make mistake in FreeRTOS, is not providing enough stack space for a created task. A vast amount of questions exist online around how to select the FreeRTOS stack size, which the most common answer being to create the task with more than enough stack, force the worst case stack condition (not always trivial), and then use the FreeRTOS debug function uxTaskGetStackHighWaterMark() to determine how much you can decrease the stack. This method leaves plenty of room for error and must be done during runtime, and therefore on a build by build basis. The static analysis tools provided by The XTC Tools greatly simplify this process since they calculate the exact stack required for a given function call. The macro RTOS_THREAD_STACK_SIZE will return the nstackwords symbol for a given thread plus the additional space required for the kernel ISRs. Using this macro for every task create will ensure that there is appropriate stack space for each thread, and thus no stack overflow.

xTaskCreate((TaskFunction_t) task_foo,
            "foo",
            RTOS_THREAD_STACK_SIZE(task_foo),
            NULL,
            configMAX_PRIORITIES-1,
            NULL);

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide£££modules/io/doc/programming_guide/index.html#peripheral-io-programming-guide

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$Overview£££modules/io/doc/shared/introduction.html#overview

The peripheral IO framework is a collection of IO libraries written in C for XCORE.AI. It includes software defined peripherals for:

  • UART - transmit and receive

  • I2C - master and slave

  • I2S - master and slave and TDM slave Tx

  • SPI - master and slave

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference£££modules/io/doc/programming_guide/reference/api.html#api-reference

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library£££modules/io/doc/programming_guide/reference/uart/uart.html#uart-library

This library provide a software defined UART (universal asynchronous receiver transmitter) allowing you to communicate with other UART enabled devices in your system. A UART is a single wire per direction communications interface allowing either half or full duplex communication. The components in this library are controlled via C and behave as a UART transmitter and/or receiver peripheral.

Various configuration options are available including baud rate (individually settable per direction), number of data bits (between 5 and 8), parity (EVEN, ODD or NONE) and number of stop bits (1 or 2). The UART does not support flow control signals. Only a single 1b IO port per UART direction is needed.

The Tx UART supports up to 1152000 baud unbuffered and 576000 baud buffered with a 75MHz logical core. The Rx UART supports up to 700000 baud unbuffered and 422400 baud buffered with a 75MHz logical core. Proportionally higher rates are achievable using a higher logical core MHz.

The UART receive supports standard error detection including START, PARITY and FRAMING errors. A callback mechanism is included to notify the user of these conditions.

The UART may be used in blocking mode, where the call to Tx/Rx does not return until the stop bit is complete. It may also be used in ISR/buffered mode where the UART Rx and/or Tx operates in background mode using a FIFO and callbacks to manage data-flow and error conditions. Cycles are stolen from the logical core which setup the interrupt. In ISR/buffered mode additional callbacks are supported indicating the UNDERRUN condition when the Tx buffer is empty and OVERRUN when the Rx buffer is full.

UART data wires

Tx

Transmit line controlled by UART Tx

Rx

Receive line controlled by UART Rx

All UART functions can be accessed via the uart.h header:

#include "uart.h"
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Tx£££modules/io/doc/programming_guide/reference/uart/uart_tx.html#uart-tx
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Tx$$$UART Tx Usage£££modules/io/doc/programming_guide/reference/uart/uart_tx.html#uart-tx-usage

The following code snippet demonstrates the basic blocking usage of an UART Tx device.

#include <xs1.h>
#include "uart.h"

uart_tx_t uart;

port_t p_uart_tx = XS1_PORT_1A;
hwtimer_t tmr = hwtimer_alloc();

uint8_t tx_data[4] = {0x01, 0x02, 0x04, 0x08};

// Initialize the UART Tx
uart_tx_blocking_init(&uart, p_uart_tx, 115200, 8, UART_PARITY_NONE, 1, tmr);

// Transfer some data
for(int i = 0; i < sizeof(tx_data); i++){
   uart_tx(&uart, tx_data[i]);
}
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Tx$$$UART Tx Usage ISR/Buffered£££modules/io/doc/programming_guide/reference/uart/uart_tx.html#uart-tx-usage-isr-buffered

The following code snippet demonstrates the usage of an UART Tx device used in ISR/Buffered mode:

#include <xs1.h>
#include "uart.h"


HIL_UART_TX_CALLBACK_ATTR void tx_empty_callback(void *app_data){
      int *tx_empty = (int *)app_data;
      *tx_empty = 1;
}

void uart_tx(void){

    uart_tx_t uart;
    port_t p_uart_tx = XS1_PORT_1A;
    hwtimer_t tmr = hwtimer_alloc();
    uint8_t buffer[64 + 1] = {0}; // Note buffer size plus one

    uint8_t tx_data[4] = {0x01, 0x02, 0x04, 0x08};
    volatile int tx_empty = 0;

    // Initialize the UART Tx
    uart_tx_init(&uart, p_uart_tx, 115200, 8, UART_PARITY_NONE, 1, tmr, buffer, sizeof(buffer), tx_empty_callback, &tx_empty);

    // Transfer some data
    for(int i = 0; i < sizeof(tx_data); i++){
       uart_tx(&uart, tx_data[i]);
    }

    // Wait for it to complete
    while(!tx_empty);
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Tx$$$UART Tx API£££modules/io/doc/programming_guide/reference/uart/uart_tx.html#uart-tx-api

The following structures and functions are used to initialize and start an UART Tx instance.

enum uart_parity

Enum type representing the different options parity types.

Values:

enumerator UART_PARITY_NONE
enumerator UART_PARITY_EVEN
enumerator UART_PARITY_ODD
enum uart_callback_code_t

Enum type representing the callback error codes.

Values:

enumerator UART_RX_COMPLETE
enumerator UART_UNDERRUN_ERROR
enumerator UART_START_BIT_ERROR
enumerator UART_PARITY_ERROR
enumerator UART_FRAMING_ERROR
enumerator UART_OVERRUN_ERROR
enum uart_state_t

Enum type representing the different states for the UART logic.

Values:

enumerator UART_IDLE
enumerator UART_START
enumerator UART_DATA
enumerator UART_PARITY
enumerator UART_STOP
typedef enum uart_parity uart_parity_t

Enum type representing the different options parity types.

void uart_tx_init(uart_tx_t *uart, port_t tx_port, uint32_t baud_rate, uint8_t data_bits, uart_parity_t parity, uint8_t stop_bits, hwtimer_t tmr, uint8_t *tx_buff, size_t buffer_size_plus_one, void (*uart_tx_empty_callback_fptr)(void *app_data), void *app_data)

Initializes a UART Tx I/O interface. Passing a valid buffer will enable buffered mode with ISR for use in bare-metal applications.

Parameters:
  • uart – The uart_tx_t context to initialise.

  • tx_port – The port used transmit the UART frames.

  • baud_rate – The baud rate of the UART in bits per second.

  • data_bits – The number of data bits per frame sent.

  • parity – The type of parity used. See uart_parity_t above.

  • stop_bits – The number of stop bits asserted at the of the frame.

  • tmr – The resource id of the timer to be used. Polling mode will be used if set to 0.

  • tx_buff – Pointer to a buffer. Optional. If set to zero the UART will run in blocking mode. If initialised to a valid buffer, the UART will be interrupt driven.

  • buffer_size_plus_one – Size of the buffer if enabled in tx_buff. Note that the buffer allocation and size argument must be one greater than needed. Eg. buff[65] for a 64 byte buffer.

  • uart_tx_empty_callback_fptr – Callback function pointer for UART buffer empty in buffered mode.

  • app_data – A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

void uart_tx_blocking_init(uart_tx_t *uart, port_t tx_port, uint32_t baud_rate, uint8_t data_bits, uart_parity_t parity, uint8_t stop_bits, hwtimer_t tmr)

Initializes a UART Tx I/O interface. The API is hard wired to blocking mode where the call to uart_tx will return at the end of sending the stop bit.

Parameters:
  • uart – The uart_tx_t context to initialise.

  • tx_port – The port used transmit the UART frames.

  • baud_rate – The baud rate of the UART in bits per second.

  • data_bits – The number of data bits per frame sent.

  • parity – The type of parity used. See uart_parity_t above.

  • stop_bits – The number of stop bits asserted at the of the frame.

  • tmr – The resource id of the timer to be used. Polling mode will be used if set to 0.

void uart_tx(uart_tx_t *uart, uint8_t data)

Transmits a single UART frame with parameters as specified in uart_tx_init()

Parameters:
  • uart – The uart_tx_t context to initialise.

  • data – The word to transmit.

void uart_tx_deinit(uart_tx_t *uart)

De-initializes the specified UART Tx interface. This disables the port also. The timer, if used, needs to be freed by the application.

Parameters:
  • uart – The uart_tx_t context to de-initialise.

UART_START_BIT_ERROR_VAL

Define which sets the enum start point of RX errors. This is relied upon by the RTOS drivers and allows optimisation of error handling.

HIL_UART_TX_CALLBACK_ATTR

This attribute must be specified on the UART TX UNDERRUN callback function provided by the application. It ensures the correct stack usage is calculated.

HIL_UART_RX_CALLBACK_ATTR

This attribute must be specified on the UART Rx callback functions (both ERROR and Rx complete callbacks) provided by the application. It ensures the correct stack usage is correctly calculated.

struct uart_tx_t
#include <uart.h>

Struct to hold a UART Tx context.

The members in this struct should not be accessed directly. Use the API provided instead.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Rx£££modules/io/doc/programming_guide/reference/uart/uart_rx.html#uart-rx
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Rx$$$UART Rx Usage£££modules/io/doc/programming_guide/reference/uart/uart_rx.html#uart-rx-usage

The following code snippet demonstrates the basic usage of an UART Rx device where the function call to Rx returns after the stop bit has been sampled. The function blocks until a complete byte has been received.

#include <xs1.h>
#include <print.h>
#include "uart.h"


HIL_UART_RX_CALLBACK_ATTR void rx_error_callback(uart_callback_code_t callback_code, void *app_data){
    switch(callback_code){
        case UART_START_BIT_ERROR:
            printstrln("UART_START_BIT_ERROR");
            break;
        case UART_PARITY_ERROR:
            printstrln("UART_PARITY_ERROR");
            break;
        case UART_FRAMING_ERROR:
            printstrln("UART_FRAMING_ERROR");
            test_abort = 1;
            break;
        case UART_OVERRUN_ERROR:
            printstrln("UART_OVERRUN_ERROR");
            break;
        case UART_UNDERRUN_ERROR:
            printstrln("UART_UNDERRUN_ERROR");
            break;
        default:
            printstr("Unexpected callback code: ");
            printintln(callback_code);
    }
}

void uart_rx(void){

    uart_rx_t uart;

    port_t p_uart_rx = XS1_PORT_1B;
    hwtimer_t tmr = hwtimer_alloc();

    char test_rx[16];

    // Initialize the UART Rx
    uart_rx_blocking_init(  &uart, p_uart_rx, 115200, 8, UART_PARITY_NONE, 1, tmr,
                            rx_error_callback, &uart);

    // Receive some data
    for(int i = 0; i < sizeof(rx_data); i++){
       test_rx[i] = uart_rx(&uart);
    }
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Rx$$$UART Rx Usage ISR/Buffered£££modules/io/doc/programming_guide/reference/uart/uart_rx.html#uart-rx-usage-isr-buffered

The following code snippet demonstrates the usage of an UART Rx device used in ISR/Buffered mode:

 #include <xs1.h>
 #include <print.h>
 #include "uart.h"


 HIL_UART_RX_CALLBACK_ATTR void rx_error_callback(uart_callback_code_t callback_code, void *app_data){
     switch(callback_code){
         case UART_START_BIT_ERROR:
             printstrln("UART_START_BIT_ERROR");
             break;
         case UART_PARITY_ERROR:
             printstrln("UART_PARITY_ERROR");
             break;
         case UART_FRAMING_ERROR:
             printstrln("UART_FRAMING_ERROR");
             test_abort = 1;
             break;
         case UART_OVERRUN_ERROR:
             printstrln("UART_OVERRUN_ERROR");
             break;
         case UART_UNDERRUN_ERROR:
             printstrln("UART_UNDERRUN_ERROR");
             break;
         default:
             printstr("Unexpected callback code: ");
             printintln(callback_code);
     }
 }


HIL_UART_RX_CALLBACK_ATTR void rx_callback(void *app_data){
      unsigned *bytes_received = (unsigned *)app_data;
      *bytes_received += 1;
}

void uart_rx(void){

    uart_rx_t uart;
    port_t p_uart_rx = XS1_PORT_1A;
    hwtimer_t tmr = hwtimer_alloc();
    uint8_t buffer[64 + 1] = {0}; // Note buffer size plus one

    volatile unsigned bytes_received = 0;

    // Initialize the UART Rx
    uart_rx_init(&uart, p_uart_rx, 115200, 8, UART_PARITY_NONE, 1, tmr,
                 buffer, sizeof(buffer), rx_callback, &bytes_received);

    // Wait for 16b of data
    while(bytes_received < 15);

    // Get the data
    uint8_t test_rx[NUM_RX_WORDS];
    for(int i = 0; i < 16; i++){
        test_rx[i] = uart_rx(&uart);
    }
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$UART Library$$$UART Rx$$$UART Rx API£££modules/io/doc/programming_guide/reference/uart/uart_rx.html#uart-rx-api

The following structures and functions are used to initialize and start an UART Rx instance.

void uart_rx_init(uart_rx_t *uart, port_t rx_port, uint32_t baud_rate, uint8_t data_bits, uart_parity_t parity, uint8_t stop_bits, hwtimer_t tmr, uint8_t *rx_buff, size_t buffer_size_plus_one, void (*uart_rx_complete_callback_fptr)(void *app_data), void (*uart_rx_error_callback_fptr)(uart_callback_code_t callback_code, void *app_data), void *app_data)

Initializes a UART Rx I/O interface. Passing a valid buffer will enable buffered mode with ISR for use in bare-metal applications.

Parameters:
  • uart – The uart_rx_t context to initialise.

  • rx_port – The port used receive the UART frames.

  • baud_rate – The baud rate of the UART in bits per second.

  • data_bits – The number of data bits per frame sent.

  • parity – The type of parity used. See uart_parity_t above.

  • stop_bits – The number of stop bits asserted at the of the frame.

  • tmr – The resource id of the timer to be used. Polling mode will be used if set to 0.

  • rx_buff – Pointer to a buffer. Optional. If set to zero the UART will run in blocking mode. If initialised to a valid buffer, the UART will be interrupt driven.

  • buffer_size_plus_one – Size of the buffer if enabled in rx_buff. Note that the buffer allocation and size argument must be one greater than needed. Eg. buff[65] for a 64 byte buffer.

  • uart_rx_complete_callback_fptr – Callback function pointer for UART rx complete (one word) in buffered mode only. Optionally NULL.

  • uart_rx_error_callback_fptr – Callback function pointer for UART rx errors The error is contained in cb_code in the uart_rx_t struct.

  • app_data – A pointer to application specific data provided by the application. Used to share data between this callback function and the application.

void uart_rx_blocking_init(uart_rx_t *uart, port_t rx_port, uint32_t baud_rate, uint8_t data_bits, uart_parity_t parity, uint8_t stop_bits, hwtimer_t tmr, void (*uart_rx_error_callback_fptr)(uart_callback_code_t callback_code, void *app_data), void *app_data)

Initializes a UART Rx I/O interface. This API is fixed to blocking mode which is where the call to uart_rx returns as soon as the stop bit has been sampled.

Parameters:
  • uart – The uart_rx_t context to initialise.

  • rx_port – The port used receive the UART frames.

  • baud_rate – The baud rate of the UART in bits per second.

  • data_bits – The number of data bits per frame sent.

  • parity – The type of parity used. See uart_parity_t above.

  • stop_bits – The number of stop bits asserted at the of the frame.

  • tmr – The resource id of the timer to be used. Polling mode will be used if set to 0.

  • uart_rx_error_callback_fptr – Callback function pointer for UART rx errors The error is contained in cb_code in the uart_rx_t struct.

  • app_data – A pointer to application specific data provided by the application. Used to share data between the error callback function and the application.

uint8_t uart_rx(uart_rx_t *uart)

Receives a single UART frame with parameters as specified in uart_rx_init()

Parameters:
  • uart – The uart_rx_t context to receive from.

Returns:

The word received in the UART frame. In buffered mode it gets the oldest received word.

void uart_rx_deinit(uart_rx_t *uart)

De-initializes the specified UART Rx interface. This disables the port also. The timer, if used, needs to be freed by the application.

Parameters:
  • uart – The uart_rx_t context to de-initialise.

struct uart_rx_t
#include <uart.h>

Struct to hold a UART Rx context.

The members in this struct should not be accessed directly. Use the API provided instead.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library£££modules/io/doc/programming_guide/reference/i2c/i2c.html#i2c-library

A software defined I2C library that allows you to control an I2C bus via xcore ports. I2C is a two-wire hardware serial interface, first developed by Philips. The components in the library are controlled via C and can either act as I2C master or slave.

The library is compatible with multiple slave devices existing on the same bus. The I2C master component can be used by multiple tasks within the xcore device (each addressing the same or different slave devices).

The library can also be used to implement multiple I2C physical interfaces on a single xcore device simultaneously.

All signals are designed to comply with the timings in the I2C specification.

Note that the following optional parts of the I2C specification are not supported:

  • Multi-master arbitration

  • 10-bit slave addressing

  • General call addressing

  • Software reset

  • START byte

  • Device ID

  • Fast-mode Plus, High-speed mode, Ultra Fast-mode

I2C consists of two signals: a clock line (SCL) and a data line (SDA). Both these signals are open-drain and require external resistors to pull the line up if no device is driving the signal down. The correct value for the resistors can be found in the I2C specification.

All I2C functions can be accessed via the i2c.h header:

#include <i2c.h>
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Master£££modules/io/doc/programming_guide/reference/i2c/i2c_master.html#i2c-master
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Master$$$I2C Master Usage£££modules/io/doc/programming_guide/reference/i2c/i2c_master.html#i2c-master-usage

The following code snippet demonstrates the basic usage of an I2C master device.

#include <xs1.h>
#include "i2c.h"

i2c_master_t i2c_ctx;

port_t p_scl = XS1_PORT_1A;
port_t p_sda = XS1_PORT_1B;

uint8_t data[1] = {0x99};

// Initialize the master
i2c_master_init(
            &i2c_ctx,
            p_scl, 0, 0,
            p_sda, 0, 0,
            100);

// Write some data
i2c_master_write(&i2c_ctx, 0x33, data, 1, NULL, 1);

// Shutdown
i2c_master_shutdown(&i2c_ctx) ;
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Master$$$I2C Master API£££modules/io/doc/programming_guide/reference/i2c/i2c_master.html#i2c-master-api

The following structures and functions are used to initialize and start an I2C master instance.

enum i2c_res_t

Status codes for I2C master operations

Values:

enumerator I2C_NACK

The slave has NACKed the last byte.

enumerator I2C_ACK

The slave has ACKed the last byte.

enumerator I2C_STARTED

The requested I2C transaction has started.

enumerator I2C_NOT_STARTED

The requested I2C transaction could not start.

typedef struct i2c_master_struct i2c_master_t

Type representing an I2C master context

i2c_res_t i2c_master_write(i2c_master_t *ctx, uint8_t device_addr, uint8_t buf[], size_t n, size_t *num_bytes_sent, int send_stop_bit)

Writes data to an I2C bus as a master.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to write to.

  • buf – The buffer containing data to write.

  • n – The number of bytes to write.

  • num_bytes_sent – The function will set this value to the number of bytes actually sent. On success, this will be equal to n but it will be less if the slave sends an early NACK on the bus and the transaction fails.

  • send_stop_bit – If this is non-zero then a stop bit will be sent on the bus after the transaction. This is usually required for normal operation. If this parameter is zero then no stop bit will be omitted. In this case, no other task can use the component until a stop bit has been sent.

Returns:

I2C_ACK if the write was acknowledged by the device, I2C_NACK otherwise.

i2c_res_t i2c_master_read(i2c_master_t *ctx, uint8_t device_addr, uint8_t buf[], size_t n, int send_stop_bit)

Reads data from an I2C bus as a master.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to read from.

  • buf – The buffer to fill with data.

  • n – The number of bytes to read.

  • send_stop_bit – If this is non-zero then a stop bit. will be sent on the bus after the transaction. This is usually required for normal operation. If this parameter is zero then no stop bit will be omitted. In this case, no other task can use the component until a stop bit has been sent.

Returns:

I2C_ACK if the read was acknowledged by the device, I2C_NACK otherwise.

void i2c_master_stop_bit_send(i2c_master_t *ctx)

Send a stop bit to an I2C bus as a master.

This function will cause a stop bit to be sent on the bus. It should be used to complete/abort a transaction if the send_stop_bit argument was not set when calling the i2c_master_read() or i2c_master_write() functions.

Parameters:
  • ctx – A pointer to the I2C master context to use.

void i2c_master_init(i2c_master_t *ctx, const port_t p_scl, const uint32_t scl_bit_position, const uint32_t scl_other_bits_mask, const port_t p_sda, const uint32_t sda_bit_position, const uint32_t sda_other_bits_mask, const unsigned kbits_per_second)

Implements an I2C master device on one or two single or multi-bit ports.

Parameters:
  • ctx – A pointer to the I2C master context to initialize.

  • p_scl – The port containing SCL. This may be either the same as or different than p_sda.

  • scl_bit_position – The bit number of the SCL line on the port p_scl.

  • scl_other_bits_mask – A value that is ORed into the port value driven to p_scl both when SCL is high and low. The bit representing SCL (as well as SDA if they share the same port) must be set to 0.

  • p_sda – The port containing SDA. This may be either the same as or different than p_scl.

  • sda_bit_position – The bit number of the SDA line on the port p_sda.

  • sda_other_bits_mask – A value that is ORed into the port value driven to p_sda both when SDA is high and low. The bit representing SDA (as well as SCL if they share the same port) must be set to 0.

  • kbits_per_second – The speed of the I2C bus. The maximum value allowed is 400.

void i2c_master_shutdown(i2c_master_t *ctx)

Shuts down the I2C master device.

This function disables the ports associated with the I2C master and deallocates its timer if it was not provided by the application.

If subsequent reads or writes need to be performed, then i2c_master_init() must be called again first.

Parameters:
  • ctx – A pointer to the I2C master context to shut down.

struct i2c_master_struct
#include <i2c.h>

Struct to hold an I2C master context.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Slave£££modules/io/doc/programming_guide/reference/i2c/i2c_slave.html#i2c-slave
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Slave$$$I2C Slave Usage£££modules/io/doc/programming_guide/reference/i2c/i2c_slave.html#i2c-slave-usage

The following code snippet demonstrates the basic usage of an I2C slave device.

#include <xs1.h>
#include "i2c.h"

port_t p_scl = XS1_PORT_1A;
port_t p_sda = XS1_PORT_1B;

// Setup callbacks
//  NOTE: See API or SDK examples for more on using the callbacks
i2c_callback_group_t i_i2c = {
     .ack_read_request = (ack_read_request_t) i2c_ack_read_req,
     .ack_write_request = (ack_write_request_t) i2c_ack_write_req,
     .master_requires_data = (master_requires_data_t) i2c_master_req_data,
     .master_sent_data = (master_sent_data_t) i2c_master_sent_data,
     .stop_bit = (stop_bit_t) i2c_stop_bit,
     .shutdown = (shutdown_t) i2c_shutdown,
     .app_data = NULL,
};

// Start the slave device in this thread
//  NOTE: You may wish to launch the slave device in a different thread.
//        See the XTC Tools documentation reference for lib_xcore.
i2c_slave(&i_i2c, p_scl, p_sda, 0x3c);
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Slave$$$I2C Slave API£££modules/io/doc/programming_guide/reference/i2c/i2c_slave.html#i2c-slave-api

The following structures and functions are used to initialize and start an I2C slave instance.

enum i2c_slave_ack

I2C Slave Response

This type is used to describe the I2C slave response.

Values:

enumerator I2C_SLAVE_ACK

ACK to accept request

enumerator I2C_SLAVE_NACK

NACK to ignore request

typedef enum i2c_slave_ack i2c_slave_ack_t

I2C Slave Response

This type is used to describe the I2C slave response.

typedef i2c_slave_ack_t (*ack_read_request_t)(void *app_data)

The bus master has requested a read.

This callback function is called if the bus master requests a read from this slave device.

At this point the slave can choose to accept the request (and drive an ACK signal back to the master) or not (and drive a NACK signal).

Param app_data:

A pointer to application specific data provided by the application. Used to share data between the callback functions and the application.

Return:

The callback must return either I2C_SLAVE_ACK or I2C_SLAVE_NACK.

typedef i2c_slave_ack_t (*ack_write_request_t)(void *app_data)

The bus master has requested a write.

This callback function is called if the bus master requests a write from this slave device.

At this point the slave can choose to accept the request (and drive an ACK signal back to the master) or not (and drive a NACK signal).

Param app_data:

A pointer to application specific data provided by the application. Used to share data between the callback functions and the application.

Return:

The callback must return either I2C_SLAVE_ACK or I2C_SLAVE_NACK.

typedef uint8_t (*master_requires_data_t)(void *app_data)

The bus master requires data.

This callback function is called when the bus master requires data from this slave device.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between the callback functions and the application.

Return:

a byte of data to send to the master.

typedef i2c_slave_ack_t (*master_sent_data_t)(void *app_data, uint8_t data)

The bus master has sent some data.

This callback function is called when the bus master has transferred a byte of data this slave device.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between the callback functions and the application.

Param data:

The byte of data received from the bus master.

Return:

The callback must return either I2C_SLAVE_ACK or I2C_SLAVE_NACK.

typedef void (*stop_bit_t)(void *app_data)

The bus master has sent a stop bit.

This callback function is called when a stop bit is sent by the bus master.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between the callback functions and the application.

typedef int (*shutdown_t)(void *app_data)

Shuts down the I2C slave device.

This function can be used to stop the I2C slave task. It will disable the SCL and SDA ports and then return.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between the callback functions and the application.

Return:

- Non-zero if the I2C slave task should shut down.

  • Zero if the I2C slave task should continue running.

void i2c_slave(const i2c_callback_group_t *const i2c_cbg, port_t p_scl, port_t p_sda, uint8_t device_addr)

I2C slave task.

This function instantiates an I2C slave device.

Parameters:
  • i2c_cbg – The I2C callback group pointing to the application’s functions to use for initialization and getting and receiving frames. Also points to application specific data which will be shared between the callbacks.

  • p_scl – The SCL port of the I2C bus. This should be a 1 bit port. If not, The SCL pin must be at bit 0 and the other bits unused.

  • p_sda – The SDA port of the I2C bus. This should be a 1 bit port. If not, The SDA pin must be at bit 0 and the other bits unused.

  • device_addr – The address of the slave device.

I2C_CALLBACK_ATTR

This attribute must be specified on all I2C callback functions provided by the application.

struct i2c_callback_group_t
#include <i2c.h>

Callback group representing callback events that can occur during the operation of the I2C slave task. Must be initialized by the application prior to passing it to one of the I2C tasks.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Registers£££modules/io/doc/programming_guide/reference/i2c/i2c_registers.html#i2c-registers
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2C Library$$$I2C Registers$$$I2C Register API£££modules/io/doc/programming_guide/reference/i2c/i2c_registers.html#i2c-register-api

The following structures and functions are used to read and write I2C registers.

enum i2c_regop_res_t

This type is used by the supplementary I2C register read/write functions to report back on whether the operation was a success or not.

Values:

enumerator I2C_REGOP_SUCCESS

The operation was successful.

enumerator I2C_REGOP_DEVICE_NACK

The operation was NACKed when sending the device address, so either the device is missing or busy.

enumerator I2C_REGOP_INCOMPLETE

The operation was NACKed halfway through by the slave.

inline uint8_t read_reg(i2c_master_t *ctx, uint8_t device_addr, uint8_t reg, i2c_regop_res_t *result)

Read an 8-bit register on a slave device.

This function reads from an 8-bit addressed, 8-bit register in an I2C device. The function reads the data by sending the register address followed reading the register data from the device at the specified device address.

Note

No stop bit is transmitted between the write and the read. The operation is performed as one transaction using a repeated start.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to read from.

  • reg – The address of the register to read from.

  • result – Indicates whether the read completed successfully. Will be set to I2C_REGOP_DEVICE_NACK if the slave NACKed, and I2C_REGOP_SUCCESS on successful completion of the read.

Returns:

The value of the register.

inline uint8_t read_reg8_addr16(i2c_master_t *ctx, uint8_t device_addr, uint16_t reg, i2c_regop_res_t *result)

Read an 8-bit register on a slave device.

This function reads from an 16-bit addressed, 8-bit register in an I2C device. The function reads the data by sending the register address followed reading the register data from the device at the specified device address.

Note

No stop bit is transmitted between the write and the read. The operation is performed as one transaction using a repeated start.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to read from.

  • reg – The address of the register to read from.

  • result – Indicates whether the read completed successfully. Will be set to I2C_REGOP_DEVICE_NACK if the slave NACKed, and I2C_REGOP_SUCCESS on successful completion of the read.

Returns:

The value of the register.

inline uint16_t read_reg16_addr8(i2c_master_t *ctx, uint8_t device_addr, uint8_t reg, i2c_regop_res_t *result)

Read an 16-bit register on a slave device.

This function reads from an 8-bit addressed, 16-bit register in an I2C device. The function reads the data by sending the register address followed reading the register data from the device at the specified device address.

Note

No stop bit is transmitted between the write and the read. The operation is performed as one transaction using a repeated start.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to read from.

  • reg – The address of the register to read from.

  • result – Indicates whether the read completed successfully. Will be set to I2C_REGOP_DEVICE_NACK if the slave NACKed, and I2C_REGOP_SUCCESS on successful completion of the read.

Returns:

The value of the register.

inline uint16_t read_reg16(i2c_master_t *ctx, uint8_t device_addr, uint16_t reg, i2c_regop_res_t *result)

Read an 16-bit register on a slave device.

This function reads from an 16-bit addressed, 16-bit register in an I2C device. The function reads the data by sending the register address followed reading the register data from the device at the specified device address.

Note

No stop bit is transmitted between the write and the read. The operation is performed as one transaction using a repeated start.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to read from.

  • reg – The address of the register to read from.

  • result – Indicates whether the read completed successfully. Will be set to I2C_REGOP_DEVICE_NACK if the slave NACKed, and I2C_REGOP_SUCCESS on successful completion of the read.

Returns:

The value of the register.

inline i2c_regop_res_t write_reg(i2c_master_t *ctx, uint8_t device_addr, uint8_t reg, uint8_t data)

Write to an 8-bit register on an I2C device.

This function writes to an 8-bit addressed, 8-bit register in an I2C device. The function writes the data by sending the register address followed by the register data to the device at the specified device address.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to write to.

  • reg – The address of the register to write to.

  • data – The 8-bit value to write.

Returns:

I2C_REGOP_DEVICE_NACK if the address is NACKed.

Returns:

I2C_REGOP_INCOMPLETE if not all data was ACKed.

Returns:

I2C_REGOP_SUCCESS on successful completion of the write.

inline i2c_regop_res_t write_reg8_addr16(i2c_master_t *ctx, uint8_t device_addr, uint16_t reg, uint8_t data)

Write to an 8-bit register on an I2C device.

This function writes to a 16-bit addressed, 8-bit register in an I2C device. The function writes the data by sending the register address followed by the register data to the device at the specified device address.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to write to.

  • reg – The address of the register to write to.

  • data – The 8-bit value to write.

Returns:

I2C_REGOP_DEVICE_NACK if the address is NACKed.

Returns:

I2C_REGOP_INCOMPLETE if not all data was ACKed.

Returns:

I2C_REGOP_SUCCESS on successful completion of the write.

inline i2c_regop_res_t write_reg16_addr8(i2c_master_t *ctx, uint8_t device_addr, uint8_t reg, uint16_t data)

Write to a 16-bit register on an I2C device.

This function writes to an 8-bit addressed, 16-bit register in an I2C device. The function writes the data by sending the register address followed by the register data to the device at the specified device address.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to write to.

  • reg – The address of the register to write to.

  • data – The 16-bit value to write.

Returns:

I2C_REGOP_DEVICE_NACK if the address is NACKed.

Returns:

I2C_REGOP_INCOMPLETE if not all data was ACKed.

Returns:

I2C_REGOP_SUCCESS on successful completion of the write.

inline i2c_regop_res_t write_reg16(i2c_master_t *ctx, uint8_t device_addr, uint16_t reg, uint16_t data)

Write to a 16-bit register on an I2C device.

This function writes to a 16-bit addressed, 16-bit register in an I2C device. The function writes the data by sending the register address followed by the register data to the device at the specified device address.

Parameters:
  • ctx – A pointer to the I2C master context to use.

  • device_addr – The address of the device to write to.

  • reg – The address of the register to write to.

  • data – The 16-bit value to write.

Returns:

I2C_REGOP_DEVICE_NACK if the address is NACKed.

Returns:

I2C_REGOP_INCOMPLETE if not all data was ACKed.

Returns:

I2C_REGOP_SUCCESS on successful completion of the write.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library£££modules/io/doc/programming_guide/reference/i2s/i2s.html#i2s-library

A software defined library that allows you to control an I2S (Inter-IC Sound) bus via xcore ports. I2S is a digital data streaming interfaces particularly appropriate for transmission of audio data. TDM is a special case of I2S which supports transport of more than two audio channels and is partially included in the library at this time. The components in the library are controlled via C and can either act as I2S master, I2S slave or TDM slave.

Note

TDM is only currently supported as a TDM16 slave Tx component. Expansion of this library to support master or slave Rx is possible and can be done on request.

I2S is a protocol between two devices where one is the master and one is the slave which determines who drives the clock lines. The protocol is made up of four signals shown in I2S data wires.

I2S data wires

MCLK

Clock line, driven by external oscillator. This signal is optional.

BCLK

Bit clock. This is a fixed divide of the MCLK and is driven by the master.

LRCLK (or WCLK)

Word clock (or word select). This is driven by the master.

DATA

Data line, driven by one of the slave or master depending on the data direction. There may be several data lines in differing directions.

All I2S functions can be accessed via the i2s.h header:

#include "i2s.h"

TDM is a protocol between two devices similar to I2S where one is the master and one is the slave which determines who drives the clock lines. The protocol is made up of four signals shown in TDM data wires.

TDM data wires

MCLK

Clock line, driven by external oscillator. This signal is optional.

BCLK

Bit clock. This is a fixed divide of the MCLK and is driven by the master.

FSYCNH

Frame synchronization. Toggles at the start of the TDM data frame. This is driven by the master.

DATA

Data line, driven by one of the slave or master depending on the data direction. There may be several data lines in differing directions.

Currently supported TDM functions can be accessed via the i2s_tdm_slave.h header:

#include "i2s_tdm_slave.h"
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Common API£££modules/io/doc/programming_guide/reference/i2s/i2s_common.html#i2s-common-api
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Common API$$$I2S Instances£££modules/io/doc/programming_guide/reference/i2s/i2s_common.html#i2s-instances

The macro I2S_DATA_WIDTH may be set as a compile flag (e.g. -DI2S_DATA_WIDTH=16) to alter the number of bits per word for both the I2S Master and I2S Slave components; this defaults to 32 bits per word. This value may be set to any value between 1 and 32. Correct operation of the I2S components has only currently been verified at 16 and 32 bits per word.

The following structures and functions are used by an I2S master or slave instance.

enum i2s_mode

I2S mode.

This type is used to describe the I2S mode.

Values:

enumerator I2S_MODE_I2S

The LR clock transitions ahead of the data by one bit clock.

enumerator I2S_MODE_LEFT_JUSTIFIED

The LR clock and data are phase aligned.

enum i2s_slave_bclk_polarity

I2S slave bit clock polarity.

Standard I2S is positive, that is toggle data and LR clock on falling edge of bit clock and sample them on rising edge of bit clock. Some masters have it the other way around.

Values:

enumerator I2S_SLAVE_SAMPLE_ON_BCLK_RISING

Toggle falling, sample rising (default if not set)

enumerator I2S_SLAVE_SAMPLE_ON_BCLK_FALLING

Toggle rising, sample falling

enum i2s_restart

Restart command type.

Restart commands that can be signalled to the I2S or TDM component.

Values:

enumerator I2S_NO_RESTART

Do not restart.

enumerator I2S_RESTART

Restart the bus (causes the I2S/TDM to stop and a new init callback to occur allowing reconfiguration of the BUS).

enumerator I2S_SHUTDOWN

Shutdown. This will cause the I2S/TDM component to exit.

typedef enum i2s_mode i2s_mode_t

I2S mode.

This type is used to describe the I2S mode.

typedef enum i2s_slave_bclk_polarity i2s_slave_bclk_polarity_t

I2S slave bit clock polarity.

Standard I2S is positive, that is toggle data and LR clock on falling edge of bit clock and sample them on rising edge of bit clock. Some masters have it the other way around.

typedef struct i2s_config i2s_config_t

I2S configuration structure.

This structure describes the configuration of an I2S bus.

typedef enum i2s_restart i2s_restart_t

Restart command type.

Restart commands that can be signalled to the I2S or TDM component.

typedef void (*i2s_init_t)(void *app_data, i2s_config_t *i2s_config)

I2S initialization event callback.

The I2S component will call this when it first initializes on first run of after a restart.

This will contain the TDM context when in TDM mode.

Param app_data:

Points to application specific data supplied by the application. May be used for context data specific to each I2S task instance.

Param i2s_config:

This structure is provided if the connected component drives an I2S bus. The members of the structure should be set to the required configuration. This is ignored when used in TDM mode.

typedef i2s_restart_t (*i2s_restart_check_t)(void *app_data)

I2S restart check callback.

This callback is called once per frame. The application must return the required restart behavior.

Param app_data:

Points to application specific data supplied by the application. May be used for context data specific to each I2S task instance.

Return:

The return value should be set to I2S_NO_RESTART, I2S_RESTART or I2S_SHUTDOWN.

typedef void (*i2s_receive_t)(void *app_data, size_t num_in, const int32_t *samples)

Receive an incoming frame of samples.

This callback will be called when a new frame of samples is read in by the I2S task.

Param app_data:

Points to application specific data supplied by the application. May be used for context data specific to each I2S task instance.

Param num_in:

The number of input channels contained within the array.

Param samples:

The samples data array as signed 32-bit values. The component may not have 32-bits of accuracy (for example, many I2S codecs are 24-bit), in which case the bottom bits will be arbitrary values.

typedef void (*i2s_send_t)(void *app_data, size_t num_out, int32_t *samples)

Request an outgoing frame of samples.

This callback will be called when the I2S task needs a new frame of samples.

Param app_data:

Points to application specific data supplied by the application. May be used for context data specific to each I2S task instance.

Param num_out:

The number of output channels contained within the array.

Param samples:

The samples data array as signed 32-bit values. The component may not have 32-bits of accuracy (for example, many I2S codecs are 24-bit), in which case the bottom bits will be arbitrary values.

I2S_MAX_DATALINES
I2S_CHANS_PER_FRAME
I2S_CALLBACK_ATTR

This attribute must be specified on all I2S callback functions provided by the application.

struct i2s_config
#include <i2s.h>

I2S configuration structure.

This structure describes the configuration of an I2S bus.

struct i2s_callback_group_t
#include <i2s.h>

Callback group representing callback events that can occur during the operation of the I2S task. Must be initialized by the application prior to passing it to one of the I2S tasks.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Common API$$$TDM Instances£££modules/io/doc/programming_guide/reference/i2s/i2s_common.html#tdm-instances

The following structures and functions are used by an TDM master or slave instance.

typedef void (*tdm_post_port_init_t)(void *i2s_tdm_ctx)

TDM post resource initialization event callback.

The TDM component will call this after it first initializes the ports. This gives the app the chance to make adjustments to port timing which are often needed when clocking above 15MHz.

Param i2s_tdm_ctx:

Points to i2s_tdm_ctx_t struct allowing the resources to be modified after they have been enabled and initialised.

I2S_TDM_MAX_POUT_CNT
I2S_TDM_MAX_PIN_CNT
I2S_TDM_MAX_CH_PER_FRAME
TDM_CALLBACK_ATTR

This attribute must be specified on the TDM callback function provided by the application.

struct i2s_tdm_ctx_t
#include <i2s_tdm_slave.h>

Struct to hold an I2S TDM context.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Master£££modules/io/doc/programming_guide/reference/i2s/i2s_master.html#i2s-master
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Master$$$I2S Master Usage£££modules/io/doc/programming_guide/reference/i2s/i2s_master.html#i2s-master-usage

The following code snippet demonstrates the basic usage of an I2S master device.

#include <xs1.h>
#include "i2s.h"

port_t p_i2s_dout[1];
port_t p_bclk;
port_t p_lrclk;
port_t p_mclk;
xclock_t bclk;
i2s_callback_group_t i2s_cb_group;

// Setup ports and clocks
p_i2s_dout[0] = PORT_I2S_DAC_DATA;
p_bclk = PORT_I2S_BCLK;
p_lrclk = PORT_I2S_LRCLK;
p_mclk = PORT_MCLK_IN;
bclk = I2S_CLKBLK;

port_enable(p_mclk);
port_enable(p_bclk);
// NOTE:  p_lrclk does not need to be enabled by the caller

// Setup callbacks
//  NOTE: See API or SDK examples for more on using the callbacks
i2s_cb_group.init = (i2s_init_t) i2s_init;
i2s_cb_group.restart_check = (i2s_restart_check_t) i2s_restart_check;
i2s_cb_group.receive = (i2s_receive_t) i2s_receive;
i2s_cb_group.send = (i2s_send_t) i2s_send;
i2s_cb_group.app_data = NULL;

// Start the master device in this thread
//  NOTE: You may wish to launch the slave device in a different thread.
//        See the XTC Tools documentation reference for lib_xcore.
i2s_master(&i2s_cb_group, p_i2s_dout, 1, NULL, 0, p_bclk, p_lrclk, p_mclk, bclk);
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Master$$$I2S Master API£££modules/io/doc/programming_guide/reference/i2s/i2s_master.html#i2s-master-api

The following structures and functions are used to initialize and start an I2S master instance.

void i2s_master(const i2s_callback_group_t *const i2s_cbg, const port_t p_dout[], const size_t num_out, const port_t p_din[], const size_t num_in, const port_t p_bclk, const port_t p_lrclk, const port_t p_mclk, const xclock_t bclk)

I2S master task

This task performs I2S on the provided pins. It will perform callbacks over the i2s_callback_group_t callback group to get/receive frames of data from the application using this component.

The task performs I2S master so will drive the word clock and bit clock lines.

Parameters:
  • i2s_cbg – The I2S callback group pointing to the application’s functions to use for initialization and getting and receiving frames. Also points to application specific data which will be shared between the callbacks.

  • p_dout – An array of data output ports

  • num_out – The number of output data ports

  • p_din – An array of data input ports

  • num_in – The number of input data ports

  • p_bclk – The bit clock output port

  • p_lrclk – The word clock output port

  • p_mclk – Input port which supplies the master clock

  • bclk – A clock that will get configured for use with the bit clock

void i2s_master_external_clock(const i2s_callback_group_t *const i2s_cbg, const port_t p_dout[], const size_t num_out, const port_t p_din[], const size_t num_in, const port_t p_bclk, const port_t p_lrclk, const xclock_t bclk)

I2S master task

This task differs from i2s_master() in that bclk must already be configured to the BCLK frequency. Other than that, it is identical.

This task performs I2S on the provided pins. It will perform callbacks over the i2s_callback_group_t callback group to get/receive frames of data from the application using this component.

The task performs I2S master so will drive the word clock and bit clock lines.

Parameters:
  • i2s_cbg – The I2S callback group pointing to the application’s functions to use for initialization and getting and receiving frames. Also points to application specific data which will be shared between the callbacks.

  • p_dout – An array of data output ports

  • num_out – The number of output data ports

  • p_din – An array of data input ports

  • num_in – The number of input data ports

  • p_bclk – The bit clock output port

  • p_lrclk – The word clock output port

  • bclk – A clock that is configured externally to be used as the bit clock

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Slave£££modules/io/doc/programming_guide/reference/i2s/i2s_slave.html#i2s-slave
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Slave$$$I2S Slave Usage£££modules/io/doc/programming_guide/reference/i2s/i2s_slave.html#i2s-slave-usage

The following code snippet demonstrates the basic usage of an I2S slave device.

#include <xs1.h>
#include "i2s.h"

// Setup ports and clocks
port_t p_bclk  = XS1_PORT_1B;
port_t p_lrclk = XS1_PORT_1C;
port_t p_din [4] = {XS1_PORT_1D, XS1_PORT_1E, XS1_PORT_1F, XS1_PORT_1G};
port_t p_dout[4] = {XS1_PORT_1H, XS1_PORT_1I, XS1_PORT_1J, XS1_PORT_1K};
xclock_t bclk = XS1_CLKBLK_1;

port_enable(p_bclk);
// NOTE:  p_lrclk does not need to be enabled by the caller

// Setup callbacks
//  NOTE: See API or SDK examples for more on using the callbacks
i2s_callback_group_t i_i2s = {
         .init = (i2s_init_t) i2s_init,
         .restart_check = (i2s_restart_check_t) i2s_restart_check,
         .receive = (i2s_receive_t) i2s_receive,
         .send = (i2s_send_t) i2s_send,
         .app_data = NULL,
};

// Start the slave device in this thread
//  NOTE: You may wish to launch the slave device in a different thread.
//        See the XTC Tools documentation reference for lib_xcore.
i2s_slave(&i_i2s, p_dout, 4, p_din, 4, p_bclk, p_lrclk, bclk);
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$I2S Slave$$$I2S Slave API£££modules/io/doc/programming_guide/reference/i2s/i2s_slave.html#i2s-slave-api

The following structures and functions are used to initialize and start an I2S slave instance.

void i2s_slave(const i2s_callback_group_t *const i2s_cbg, port_t p_dout[], const size_t num_out, port_t p_din[], const size_t num_in, port_t p_bclk, port_t p_lrclk, xclock_t bclk)

I2S slave task

This task performs I2S on the provided pins. It will perform callbacks over the i2s_callback_group_t callback group to get/receive data from the application using this component.

The component performs I2S slave so will expect the word clock and bit clock to be driven externally.

Parameters:
  • i2s_cbg – The I2S callback group pointing to the application’s functions to use for initialization and getting and receiving frames. Also points to application specific data which will be shared between the callbacks.

  • p_dout – An array of data output ports

  • num_out – The number of output data ports

  • p_din – An array of data input ports

  • num_in – The number of input data ports

  • p_bclk – The bit clock input port

  • p_lrclk – The word clock input port

  • bclk – A clock that will get configured for use with the bit clock

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$TDM Slave£££modules/io/doc/programming_guide/reference/i2s/i2s_tdm_slave.html#tdm-slave
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$TDM Slave$$$TDM Slave Tx Usage£££modules/io/doc/programming_guide/reference/i2s/i2s_tdm_slave.html#tdm-slave-tx-usage

The following code snippet demonstrates the basic usage of a TDM slave Tx device.

#include <xs1.h>
#include "i2s_tdm_slave.h"

// Setup ports and clocks
port_t p_bclk = XS1_PORT_1A;
port_t p_fsync = XS1_PORT_1B;
port_t p_dout = XS1_PORT_1C;

xclock_t clk_bclk = XS1_CLKBLK_1;

// Setup callbacks
// NOTE: See API or sln_voice examples for more on using the callbacks
i2s_tdm_ctx_t ctx;
i2s_callback_group_t i_i2s = {
        .init = (i2s_init_t) i2s_init,
        .restart_check = (i2s_restart_check_t) i2s_restart_check,
        .receive = NULL,
        .send = (i2s_send_t) i2s_send,
        .app_data = NULL,
};

// Initialize the TDM slave
i2s_tdm_slave_tx_16_init(
        &ctx,
        &i_i2s,
        p_dout,
        p_fsync,
        p_bclk,
        clk_bclk,
        0,
        I2S_SLAVE_SAMPLE_ON_BCLK_FALLING,
        NULL);

// Start the slave device in this thread
// NOTE: You may wish to launch the slave device in a different thread.
//       See the XTC Tools documentation reference for lib_xcore.
i2s_tdm_slave_tx_16_thread(&ctx);
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$I2S Library$$$TDM Slave$$$TDM Slave Tx API£££modules/io/doc/programming_guide/reference/i2s/i2s_tdm_slave.html#tdm-slave-tx-api

The following structures and functions are used to initialize and start a TDM slave Tx instance.

void i2s_tdm_slave_tx_16_init(i2s_tdm_ctx_t *ctx, i2s_callback_group_t *i2s_cbg, port_t p_dout, port_t p_fsync, port_t p_bclk, xclock_t bclk, uint32_t tx_offset, i2s_slave_bclk_polarity_t slave_bclk_polarity, tdm_post_port_init_t tdm_post_port_init)

I2S TDM slave context initialization for 16 channel TX only with 1 output port, 32b word length, 32b channel length, and 16 channels per frame.

This prepares a context for I2S TDM slave on the provided pins.

The resulting context can be used with i2s_tdm_slave_tx_16_thread().

Parameters:
  • ctx – A pointer to the I2S TDM context to use.

  • i2s_cbg – The I2S callback group pointing to the application’s functions to use for initialization and getting and receiving frames. For TDM the app_data variable within this struct is NOT used.

  • p_dout – The data output port. MUST be a 1b port

  • p_fsync – The fsync input port. MUST be a 1b port

  • p_bclk – The bit clock input port. MUST be a 1b port

  • bclk – A clock that will get configured for use with the bit clock

  • tx_offset – The number of bclks from FSYNC transition to the MSB of Slot 0

  • slave_bclk_pol – The polarity of bclk

  • tdm_post_port_init – Callback to be called just after resource init. Allows for modification of port timing for >15MHz clocks. Set to NULL if not needed.

void i2s_tdm_slave_tx_16_thread(i2s_tdm_ctx_t *ctx)

I2S TDM TX 16 ch slave task

This task performs I2S TDM slave on the provided context which was initialized with i2s_tdm_slave_tx_16_init(). It will perform callbacks over the i2s_callback_group_t callback group to get data from the application using this component.

This thread assumes 1 data output port, 32b word length, 32b channel length, and 16 channels per frame.

The component performs I2S TDM slave so will expect the fsync and bit clock to be driven externally.

Parameters:
  • ctx – A pointer to the I2S TDM context to use.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library£££modules/io/doc/programming_guide/reference/spi/spi.html#spi-library

A software defined SPI (serial peripheral interface) library that allows you to control a SPI bus via the xcore GPIO hardware-response ports. SPI is a four-wire hardware bi-directional serial interface. The components in the library are controlled via C and can either act as SPI master or slave.

The SPI bus can be used by multiple tasks within the xcore device and (each addressing the same or different slaves) and is compatible with other slave devices on the same bus.

The SPI protocol requires a clock, one or more slave selects and either one or two data wires.

SPI data wires

SCLK

Clock line, driven by the master

MOSI

Master Output, Slave Input data line, driven by the master

MISO

Master Input, Slave Output data line, driven by the slave

SS

Slave select line, driven by the master

All SPI functions can be accessed via the spi.h header:

#include <spi.h>
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Master£££modules/io/doc/programming_guide/reference/spi/spi_master.html#spi-master
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Master$$$SPI Master Usage£££modules/io/doc/programming_guide/reference/spi/spi_master.html#spi-master-usage

The following code snippet demonstrates the basic usage of an SPI master device.

#include <xs1.h>
#include "spi.h"

spi_master_t spi_ctx;
spi_master_device_t spi_dev;

port_t p_miso = XS1_PORT_1A;
port_t p_ss[1] = {XS1_PORT_1B};
port_t p_sclk = XS1_PORT_1C;
port_t p_mosi = XS1_PORT_1D;
xclock_t cb = XS1_CLKBLK_1;

uint8_t tx[4] = {0x01, 0x02, 0x04, 0x08};
uint8_t rx[4];

// Initialize the master device
spi_master_init(&spi_ctx, cb, p_ss[0], p_sclk, p_mosi, p_miso);
spi_master_device_init(&spi_dev, &spi_ctx,
     1,
     SPI_MODE_0,
     spi_master_source_clock_ref,
     0,
     spi_master_sample_delay_0,
     0, 0 ,0 ,0 );

// Transfer some data
spi_master_start_transaction(&spi_ctx);
spi_master_transfer(&spi_ctx, (uint8_t *)tx, (uint8_t *)rx, 4);
spi_master_end_transaction(&spi_ctx);
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Master$$$SPI Master API£££modules/io/doc/programming_guide/reference/spi/spi_master.html#spi-master-api

The following structures and functions are used to initialize and start an SPI master instance.

enum spi_master_sample_delay_t

Enum type representing the different options for the SPI master sample delay.

Values:

enumerator spi_master_sample_delay_0

Samples 1/2 clock cycle after output from device

enumerator spi_master_sample_delay_1

Samples 3/4 clock cycle after output from device

enumerator spi_master_sample_delay_2

Samples 1 clock cycle after output from device

enumerator spi_master_sample_delay_3

Samples 1 and 1/4 clock cycle after output from device

enumerator spi_master_sample_delay_4

Samples 1 and 1/2 clock cycle after output from device

enum spi_master_source_clock_t

Enum type used to set which of the two clock sources SCLK is derived from.

Values:

enumerator spi_master_source_clock_ref

SCLK is derived from the 100 MHz reference clock

enumerator spi_master_source_clock_xcore

SCLK is derived from the core clock

typedef void (*slave_transaction_started_t)(void *app_data, uint8_t **out_buf, size_t *outbuf_len, uint8_t **in_buf, size_t *inbuf_len)

Master has started a transaction

This callback function will be called when the SPI master has asserted this slave’s chip select.

The input and output buffer may be the same; however, partial byte/incomplete reads will result in out_buf bits being masked off due to a partial bit output.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between

Param out_buf:

The buffer to send to the master

Param outbuf_len:

The length in bytes of out_buf

Param in_buf:

The buffer to receive into from the master

Param inbuf_len:

The length in bytes of in_buf

typedef void (*slave_transaction_ended_t)(void *app_data, uint8_t **out_buf, size_t bytes_written, uint8_t **in_buf, size_t bytes_read, size_t read_bits)

Master has ended a transaction

This callback function will be called when the SPI master has de-asserted this slave’s chip select.

The value of bytes_read contains the number of full bytes that are in in_buf. When read_bits is greater than 0, the byte after the last full byte contains the partial bits read.

Param app_data:

A pointer to application specific data provided by the application. Used to share data between

Param out_buf:

The buffer that had been provided to be sent to the master

Param bytes_written:

The length in bytes of out_buf that had been written

Param in_buf:

The buffer that had been provided to be received into from the master

Param bytes_read:

The length in bytes of in_buf that has been read in to

Param read_bits:

The length in bits of in_buf

void spi_master_init(spi_master_t *spi, xclock_t clock_block, port_t cs_port, port_t sclk_port, port_t mosi_port, port_t miso_port)

Initializes a SPI master I/O interface.

Note: To guarantee timing in all situations, the SPI I/O interface implicitly sets the fast mode and high priority status register bits for the duration of SPI operations. This may reduce the MIPS of other threads based on overall system setup.

Parameters:
  • spi – The spi_master_t context to initialize.

  • clock_block – The clock block to use for the SPI master interface.

  • cs_port – The SPI interface’s chip select port. This may be a multi-bit port.

  • sclk_port – The SPI interface’s SCLK port. Must be a 1-bit port.

  • mosi_port – The SPI interface’s MOSI port. Must be a 1-bit port.

  • miso_port – The SPI interface’s MISO port. Must be a 1-bit port.

void spi_master_device_init(spi_master_device_t *dev, spi_master_t *spi, uint32_t cs_pin, int cpol, int cpha, spi_master_source_clock_t source_clock, uint32_t clock_divisor, spi_master_sample_delay_t miso_sample_delay, uint32_t miso_pad_delay, uint32_t cs_to_clk_delay_ticks, uint32_t clk_to_cs_delay_ticks, uint32_t cs_to_cs_delay_ticks)

Initialize a SPI device. Multiple SPI devices may be initialized per SPI interface. Each must be on a unique pin of the interface’s chip select port.

Parameters:
  • dev – The context representing the device to initialize.

  • spi – The context representing the SPI master interface that the device is connected to.

  • cs_pin – The bit number of the chip select port that is connected to the device’s chip select pin.

  • cpol – The clock polarity required by the device.

  • cpha – The clock phase required by the device.

  • source_clock – The source clock to derive SCLK from. See spi_master_source_clock_t.

  • clock_divisor – The value to divide the source clock by. The frequency of SCLK will be set to:

    • (F_src) / (4 * clock_divisor) when clock_divisor > 0

    • (F_src) / (2) when clock_divisor = 0 Where F_src is the frequency of the source clock.

  • miso_sample_delay – When to sample MISO. See spi_master_sample_delay_t.

  • miso_pad_delay – The number of core clock cycles to delay sampling the MISO pad during a transaction. This allows for more fine grained adjustment of sampling time. The value may be between 0 and 5.

  • cs_to_clk_delay_ticks – The minimum number of reference clock ticks between assertion of chip select and the first clock edge.

  • clk_to_cs_delay_ticks – The minimum number of reference clock ticks between the last clock edge and de-assertion of chip select.

  • cs_to_cs_delay_ticks – The minimum number of reference clock ticks between transactions, which is between de-assertion of chip select and the end of one transaction, and its re-assertion at the beginning of the next.

void spi_master_start_transaction(spi_master_device_t *dev)

Starts a SPI transaction with the specified SPI device. This leaves chip select asserted.

Parameters:
  • dev – The SPI device with which to start a transaction.

void spi_master_transfer(spi_master_device_t *dev, uint8_t *data_out, uint8_t *data_in, size_t len)

Transfers data to/from the specified SPI device. This may be called multiple times during a single transaction.

Parameters:
  • dev – The SPI device with which to transfer data.

  • data_out – Buffer containing the data to send to the device. May be NULL if no data needs to be sent.

  • data_in – Buffer to save the data received from the device. May be NULL if the data received is not needed.

  • len – The length in bytes of the data to transfer. Both buffers must be at least this large if not NULL.

inline void spi_master_delay_before_next_transfer(spi_master_device_t *dev, uint32_t delay_ticks)

Enforces a minimum delay between the time this is called and the next transfer. It must be called during a transaction. It returns immediately.

Parameters:
  • dev – The active SPI device.

  • delay_ticks – The number of reference clock ticks to delay.

void spi_master_end_transaction(spi_master_device_t *dev)

Ends a SPI transaction with the specified SPI device. This leaves chip select de-asserted.

Parameters:
  • dev – The SPI device with which to end a transaction.

void spi_master_deinit(spi_master_t *spi)

De-initializes the specified SPI master interface. This disables the ports and clock block.

Parameters:
SPI_MODE_0

Convenience macro that may be used to specify SPI Mode 0 to spi_master_device_init() or spi_slave() in place of cpol and cpha.

SPI_MODE_1

Convenience macro that may be used to specify SPI Mode 1 to spi_master_device_init() or spi_slave() in place of cpol and cpha.

SPI_MODE_2

Convenience macro that may be used to specify SPI Mode 2 to spi_master_device_init() or spi_slave() in place of cpol and cpha.

SPI_MODE_3

Convenience macro that may be used to specify SPI Mode 3 to spi_master_device_init() or spi_slave() in place of cpol and cpha.

SPI_CALLBACK_ATTR

This attribute must be specified on all SPI callback functions provided by the application.

struct spi_master_t
#include <spi.h>

Struct to hold a SPI master context.

The members in this struct should not be accessed directly.

struct spi_master_device_t
#include <spi.h>

Struct type representing a SPI device connected to a SPI master interface.

The members in this struct should not be accessed directly.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Slave£££modules/io/doc/programming_guide/reference/spi/spi_slave.html#spi-slave
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Slave$$$SPI Slave Usage£££modules/io/doc/programming_guide/reference/spi/spi_slave.html#spi-slave-usage

The following code snippet demonstrates the basic usage of an SPI slave device.

#include <xs1.h>
#include "spi.h"

// Setup callbacks
//  NOTE: See API or SDK examples for more on using the callbacks
spi_slave_callback_group_t spi_cbg = {
     .slave_transaction_started = (slave_transaction_started_t) start,
     .slave_transaction_ended = (slave_transaction_ended_t) end,
     .app_data = NULL
};

port_t p_miso = XS1_PORT_1A;
port_t p_cs   = XS1_PORT_1B;
port_t p_sclk = XS1_PORT_1C;
port_t p_mosi = XS1_PORT_1D;
xclock_t cb   = XS1_CLKBLK_1;

// Start the slave device in this thread
//  NOTE: You may wish to launch the slave device in a different thread.
//        See the XTC Tools documentation reference for lib_xcore.
spi_slave(&spi_cbg, p_sclk, p_mosi, p_miso, p_cs, cb, SPI_MODE_0);
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Slave$$$SPI Slave API£££modules/io/doc/programming_guide/reference/spi/spi_slave.html#spi-slave-api

The following structures and functions are used to initialize and start an SPI slave instance.

void spi_slave(const spi_slave_callback_group_t *spi_cbg, port_t p_sclk, port_t p_mosi, port_t p_miso, port_t p_cs, xclock_t clk, int cpol, int cpha, uint32_t thread_mode)

Initializes a SPI slave.

The CS to first clock minimum delay, sometimes referred to as setup time, will vary based on the duration of the slave_transaction_started callback. This parameter will be application specific. To determine the typical value, time the duration of the slave_transaction_started callback, and add 2000ns as a safety factor. If slave_transaction_started has a non-deterministic runtime, perhaps due to waiting on an XCORE resource, then the application developer must decide an appropriate CS to first SCLK specification.

The minimum delay between consecutive transactions varies based on SPI mode, and if MISO is used.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Slave$$$£££modules/io/doc/programming_guide/reference/spi/spi_slave.html#group__hil__spi__slave_1autotoc_md152_1s1
XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$API Reference$$$SPI Library$$$SPI Slave$$$MISO | CPOL | CPHA | Min Delay (ns)£££modules/io/doc/programming_guide/reference/spi/spi_slave.html#group__hil__spi__slave_1autotoc_md152

enabled | 0 | 0 | 2270 enabled | 0 | 1 | 2240 enabled | 1 | 0 | 2240 enabled | 1 | 1 | 2270 disabled | 0 | 0 | 440 disabled | 0 | 1 | 420 disabled | 1 | 0 | 430 disabled | 1 | 1 | 430

Note

Verified at 25000 kbps, with a 2000ns CS assertion to first clock in all modes.

Parameters:
  • spi_cbg – The spi_slave_callback_group_t context to use.

  • p_sclk – The SPI slave’s SCLK port. Must be a 1-bit port.

  • p_mosi – The SPI slave’s MOSI port. Must be a 1-bit port.

  • p_miso – The SPI slave’s MISO port. Must be a 1-bit port.

  • p_cs – The SPI slave’s CS port. Must be a 1-bit port.

  • clock_block – The clock block to use for the SPI slave.

  • cpol – The clock polarity to use.

  • cpha – The clock phase to use.

struct spi_slave_callback_group_t
#include <spi.h>

Callback group representing callback events that can occur during the operation of the SPI slave task. Must be initialized by the application prior to passing it to one of the SPI slaves.

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$Licenses£££modules/io/doc/shared/legal.html#licenses

XCORE ® -VOICE Solutions$$$Peripheral IO Programming Guide$$$Licenses$$$XMOS£££modules/io/doc/shared/legal.html#xmos

All original source code is licensed under the XMOS License.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library£££modules/io/modules/mic_array/doc/rst/lib_mic_array.html#lib-mic-array-pdm-microphone-array-library

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Introduction£££modules/io/modules/mic_array/doc/rst/src/introduction.html#introduction

lib_mic_array is a library for interfacing with one or more PDM microphones on an XMOS device.

Version 5.0 of this library has been redesigned from scratch to make efficient usage of the XMOS XS3 architecture.

See Getting Started to get going.

Note

Version 5.0 does not currently support XS2 or XS1 devices. Please use version 4.5.0 if you need support for these devices: https://github.com/xmos/lib_mic_array/releases/tag/v4.5.0

Find the latest version of lib_mic_array on GitHub.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview£££modules/io/modules/mic_array/doc/rst/src/overview.html#overview

lib_mic_array is a library for capturing and processing PDM microphone data on xcore.ai devices.

PDM microphones are a kind of ‘digital microphone’ which captures audio data as a stream of 1-bit samples at a very high sample rate. The high sample rate PDM stream is captured by the device, filtered and decimated to a 32-bit PCM audio stream.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$Capabilities£££modules/io/modules/mic_array/doc/rst/src/overview.html#capabilities

  • Both SDR (1 mic per pin) and DDR (2 mics per pin) microphone configurations are supported

  • Configurable clock divider allows user-selectable PDM sample clock frequency (3.072 MHz typical)

  • Configurable two-stage decimating FIR filter

    • First stage has fixed tap count of 256 and decimation factor of 32

    • Second stage has fully configurable tap count and decimation factor

    • Custom filter coefficients can be used for either stage

    • Reference filter with total decimation factor of 192 is provided (16 kHz output sample rate w/ 3.072 MHz PDM clock).

    • Filter generation scripts and examples are included to support 32 kHz and 48 kHz.

  • Supports 1-, 4- and 8-bit ports.

  • Supports 1 to 16 microphones

    • Includes ability to capture samples on a subset of a port’s pins (e.g. 3 PDM microphones may be used with a 4- or 8-bit port)

    • Also supports microphone channel index remapping

  • Optional DC offset elimination filter

  • Sample framing with user selectable frame size (down to single samples)

  • Most configurations require only a single hardware thread

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$High-Level Process View£££modules/io/modules/mic_array/doc/rst/src/overview.html#high-level-process-view

This section gives a brief overview of the steps to process a PDM audio stream into a PCM audio stream. This section is concerned with the steady state behavior and does not describe any necessary initialization steps. The high level process view is depicted in the figure Mic Array High Level Process.

../_images/high_level_process.drawio.png

Mic Array High Level Process

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$High-Level Process View$$$Execution Contexts£££modules/io/modules/mic_array/doc/rst/src/overview.html#execution-contexts

The mic array unit uses two different execution contexts. The first is the PDM rx service (“PDM rx”), which is responsible for reading PDM samples from the physical port, and has relatively little work to do, but also has a strict real-time constraint on reading port data in a timely manner. The second is the decimation thread, which is where all processing other than PDM capture is performed.

This two-context model relaxes the need for tight coupling and synchronization between PDM rx and the decimation thread, allowing significant flexibility in how samples are processed in the decimation thread.

PDM rx is typically run within an interrupt on the same hardware core as the decimation thread, but it can also be run as a separate thread in cases where many channels result in a high processing load.

Likewise, the decimators may be split into multiple parallel hardware threads in the case where the processing load exceeds the MIPS available in a single thread.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$High-Level Process View$$$Step 1: PDM Capture£££modules/io/modules/mic_array/doc/rst/src/overview.html#step-1-pdm-capture

The PDM data signal is captured by the xcore.ai device’s port hardware. The port receiving the PDM signals buffers the received samples. Each time the port buffer is filled, PDM rx reads the received samples.

Samples are collected word-by-word and assembled into blocks. Each time a block has been filled, the block is transferred to the decimation thread where all remaining mic array processing takes place.

The size of PDM data blocks varies depending upon the configured number of microphone channels and the configured second stage decimator’s decimation factor. Each PDM data block will contain exactly enough PDM samples to produce one new mic array (multi-channel) output sample.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$High-Level Process View$$$Step 2: First Stage Decimation£££modules/io/modules/mic_array/doc/rst/src/overview.html#step-2-first-stage-decimation

The conversion from the high-sample-rate PDM stream to lower-sample-rate PCM stream involves two stages of decimating filters. After the decimation thread receives a block of PDM samples, the samples are filtered by the first stage decimator.

The first stage decimator has a fixed decimation factor of 32 and a fixed tap count of 256. An application is free to supply its own filter coefficients for the first stage decimator (using the fixed decimation factor and tap count), however this library also provides a reference filter for the first stage decimator that is recommended for most applications.

The first stage decimating filter is an FIR filter with 16-bit coefficients, and where each input sample corresponds to a +1 or a -1 (typical for PDM signals). The output of the first stage decimator is a block of 32-bit PCM samples with a sample time 32 times longer than the PDM sample time.

See Decimator Stages for further details.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$High-Level Process View$$$Step 3: Second Stage Decimation£££modules/io/modules/mic_array/doc/rst/src/overview.html#step-3-second-stage-decimation

The second stage decimator is a decimating FIR filter with a configurable decimation factor and tap count. Like the first stage decimator, this library provides a reference filter suitable for the second stage decimator. The supplied filter has a tap count of 65 and a decimation factor of 6.

The output of the first stage decimator is a block of N*K PCM values, where N is the number of microphones and K is the second stage decimation factor. This is just enough samples to produce one output sample from the second stage decimator.

The resulting sample is vector-valued (one element per channel) and has a sample time corresponding to 32*K PDM clock periods. Using the reference filters and a 3.072 MHz PDM clock, the output sample rate is 16 kHz.

See Decimator Stages for further details.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$High-Level Process View$$$Step 4: Post-Processing£££modules/io/modules/mic_array/doc/rst/src/overview.html#step-4-post-processing

After second stage decimation, the resulting sample goes to post-processing where two (optional) post-processing steps are available.

The first is a simple IIR filter, called DC Offset Elimination, which seeks to ensure each output channel tends to approach zero mean. DC Offset Elimination can be disabled if not desired. See Sample Filters for further details.

The second post-processing step is framing, where instead of signaling each sample of audio to subsequent processing stages one at a time, samples can be aggregated and transferred to subsequent processing stages as non-overlapping blocks. The size of each frame is configurable (down to 1 sample per frame, where framing is functionally disabled).

Finally, the sample or frame is transmitted over a channel from the mic array module to the next stage of the processing pipeline.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Overview$$$High-Level Process View$$$Extending/Modifying Mic Array Behavior£££modules/io/modules/mic_array/doc/rst/src/overview.html#extending-modifying-mic-array-behavior

At the core of lib_mic_array are several C++ class templates which are loosely coupled and intended to be easily overridden for modified behavior. The mic array unit itself is an object made by the composition of several smaller components which perform well-defined roles.

For example, modifying the mic array unit to use some mechanism other than a channel to move the audio frames out of the mic array is a matter of defining a small new class encapsulating just the modified transfer behavior, and then instantiating the mic array class template with the new class as the appropriate template parameter.

With that in mind, while most applications will have no need to modify the mic array behavior, it is nevertheless designed to be easy to do so.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#getting-started

There are three models for how the mic array unit can be included in an application. The details of how to allocate, initialize and start the mic array will depend on the chosen model.

In order of increasing complexity, these are:

  • Vanilla Model - The simplest way to include the mic array. It is usually sufficient but offers comparatively little flexibility with respect to configuration and run-time control. Using this model (mostly) means modifying an application’s build scripts.

  • Prefab Model - This model involves a little more effort from the application developer, including writing a couple C++ wrapper functions, but gives the application access to any of the defined prefab mic array components.

  • General Model - Any other case. This is necessary if an application wishes to use a customized mic array component.

The vanilla and prefab models for integrating the mic array into your application will be discussed in more detail below. The general model may involve customizing or extending the classes in lib_mic_array and is beyond the scope of this introduction.

Whichever model is chosen, the first step to integrate a mic array unit into an application is to identify the required hardware resources.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Identify Resources£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#identify-resources

The key hardware resources to be identified are the ports and clock blocks that will be used by the mic array unit. The ports correspond to the physical pins on which clocks and sample data will be signaled. Clock blocks are a type of hardware resource which can be attached to ports to coordinate the presentation and capture of signals on physical pins.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Identify Resources$$$Clock Blocks£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#clock-blocks

While clock blocks may be more abstract than ports, their implications for this library are actually simpler. First, the mic array unit will need a way of taking the audio master clock and dividing it to produce a PDM sample clock. This can be accomplished with a clock block. This will be the clock block which the API documentation refers to as “Clock A”.

Second, if (and only if) the PDM microphones are being used in a Dual Data Rate (DDR) configuration a second clock block will be required. In a DDR configuration 2 microphones share a physical pin for output sample data, where one signals on the rising edge of the PDM clock and the other signals on the falling edge. The second clock block required in a DDR configuration is referred to as “Clock B” in the API documentation.

Each tile on an xcore.ai device has 5 clock blocks available. In code, a clock block is identified by its resource ID, which are given as the preprocessor macros XS1_CLKBLK_1 through XS1_CLKBLK_5.

Unlike ports, which are tied to specific physical pins, clock blocks are fungible. Your application is free to use any clock block that has not already been allocated for another purpose. The vanilla component model defaults to using XS1_CLKBLK_1 and XS1_CLKBLK_2.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Identify Resources$$$Ports£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#ports

Three ports are needed for the mic array component. As mentioned above, ports are physically tied to specific device pins, and so the correct ports must be identified for correct behavior.

Note that while ports are physically tied to specific pins, this is not a 1-to-1 mapping. Each port has a port width (measured in bits) which is the number of pins which comprise the port. Further, the pin mappings for different ports overlap, with a single pin potentially belonging to multiple ports. When identifying the needed ports, take care that both the pin map (see the documentation for your xcore.ai package) and port width are correct.

The first port needed is a 1-bit port on which the audio master clock is received. In the documentation, this is usually referred to as p_mclk.

The second port needed is a 1-bit port on which the PDM clock will be signaled to the PDM mics. This port is referred to as p_pdm_clk.

The third port is that on which the PDM data is received. In an SDR configuration, the width of this port must be greater than or equal to the number of microphones. In a DDR configuration, twice this port width must be greater than or equal to the number of microphones. This port is referred to as p_pdm_mics.

XCore applications are typically compiled with an “XN” file (with a “.xn” file extension). An XN file is an XML document which describes some information about the device package as well as some other helpful board-related information. The identification of your ports may have already been done for you in your XN file. Following is a snippet from an XN file with mappings for the three ports described above:

...
<Tile Number="1" Reference="tile[1]">
  <!-- MIC related ports -->
  <Port Location="XS1_PORT_1G"  Name="PORT_PDM_CLK"/>
  <Port Location="XS1_PORT_1F"  Name="PORT_PDM_DATA"/>
  <!-- Audio ports -->
  <Port Location="XS1_PORT_1D"  Name="PORT_MCLK_IN_OUT"/>
  <Port Location="XS1_PORT_1C"  Name="PORT_I2S_BCLK"/>
  <Port Location="XS1_PORT_1B"  Name="PORT_I2S_LRCLK"/>
  <!-- Used for looping back clocks -->
  <Port Location="XS1_PORT_1N"  Name="PORT_NOT_IN_PACKAGE_1"/>
</Tile>
...

The first 3 ports listed, PORT_PDM_CLK, PORT_PDM_DATA and PORT_MCLK_IN_OUT are respectively p_pdm_clk, p_pdm_mics and p_mclk. The value in the Location attribute (e.g. XS1_PORT_1G) is the port name as you will find it in your package documentation.

In this case, either PORT_PDM_CLK or XS1_PORT_1G can be used in code to identify this port.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Identify Resources$$$Declaring Resources£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#declaring-resources

Once the ports and clock blocks to be used have been identified, these resources can be represented in code using a pdm_rx_resources_t struct. The following is an example of declaring resources in a DDR configuration. See pdm_rx_resources_t, PDM_RX_RESOURCES_SDR() and PDM_RX_RESOURCES_DDR() for more details.

pdm_rx_resources_t pdm_res = PDM_RX_RESOURCES_DDR(
                                PORT_MCLK_IN_OUT,
                                PORT_PDM_CLK,
                                PORT_PDM_DATA,
                                XS1_CLKBLK_1,
                                XS1_CLKBLK_2);

Note that this is not necessary in applications using the vanilla model.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Identify Resources$$$Other Resources£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#other-resources

In addition to ports and clock blocks, there are also several other hardware resource types used by lib_mic_array which are worth considering. Running out of any of these will preclude the mic array from running correctly (if at all)

  • Threads - At least one hardware thread is required to run the mic array component.

  • Compute - The mic array unit will require a fixed number of MIPS (millions of instructions per second) to perform the required processing. The exact requirement will depend on the configuration used.

  • Memory - The mic array requires a modest amount of memory for code and data. (see Mic Array Resource Usage).

  • Chanends - At least 4 chanends must be available for signaling between threads/sub-components.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Vanilla Model£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#vanilla-model

Mic array configuration with the vanilla model is achieved mostly through the application’s build system configuration.

In the /etc/vanilla directory of the lib_mic_array repository are a source and header file which are not compiled with (or on the include path) of the library. Configuring the mic array using the vanilla model means adding those files to your application’s build (not the library target), and defining several compile options which tell it how to behave.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Vanilla Model$$$Vanilla - CMake Macro£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#vanilla-cmake-macro

To simplify this further, a CMake macro called mic_array_vanilla_add() has been included with the build system.

mic_array_vanilla_add() takes several arguments:

  • TARGET_NAME - The name of the CMake application target that the vanilla mode source should be added to.

  • MCLK_FREQ - The frequency of the master audio clock, in Hz.

  • PDM_FREQ - The desired frequency of the PDM clock, in Hz.

  • MIC_COUNT - The number of microphone channels to be captured.

  • SAMPLES_PER_FRAME - The size of the audio frames produced by the mic array unit (frames will be 2 dimensional arrays with shape (MIC_COUNT,SAMPLES_PER_FRAME)).

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Vanilla Model$$$Vanilla - Optional Configuration£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#vanilla-optional-configuration

Though not exposed by the mic_array_vanilla_add() macro, several additional configuration options are available when using the vanilla model. These are all configured by adding defines to the application target.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Vanilla Model$$$Vanilla - Initializing and Starting£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#vanilla-initializing-and-starting

Once the configuration options have been chosen, initializing and starting the mic array at run-time is easily achieved. Two function calls are necessary, both are included through mic_array_vanilla.h (which was added to your include path through your build configuration).

First, during application initialization, the function ma_vanilla_init(), which takes no arguments, must be called. This will configure the hardware resources and install the PDM rx service as an ISR, but will not actually start any threads or PDM capture.

Once any remaining application initialization is complete, PDM capture and processing is started by calling ma_vanilla_task(). ma_vanilla_task() is a blocking call which takes a single argument which is the chanend that will be used to transmit audio frames to subsequent stages of the processing pipeline. Usually the call to ma_vanilla_task() will be placed directly in a par {...} block along with other threads to be started on the tile.

Note

Both ma_vanilla_init() and ma_vanilla_task() must be called from the core which will host the decimation thread.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Prefab Model£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#prefab-model

The lib_mic_array library has a C++ namespace mic_array::prefab which contains class templates for typical mic array setups using common sub-components. The templates in the mic_array::prefab namespace hide most of the complexity (and unneeded flexibility) from the application author, so they can focus only on pieces they care about.

Note

As of version 5.0.1, only one prefab class template, BasicMicArray, has been defined.

To configure the mic array using a prefab, you will need to add a C++ source file to your application. NB: This will end up looking a lot like the contents of mic_array_vanilla.cpp when you are through.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Prefab Model$$$Prefab - Declare Resources£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#prefab-declare-resources

The example in this section will use 2 microphones in a DDR configuration with DC offset elimination enabled, and using 128-sample frames. The resource IDs used may differ than those required for your application.

pdm_res will be used to identify the ports and clocks which will be configured for PDM capture.

Within a C++ source file:

#include "mic_array/mic_array.h"
...
#define MIC_COUNT    2    // 2 mics
#define DCOE_ENABLE  true // DCOE on
#define FRAME_SIZE   128  // 128 samples per frame
...
pdm_rx_resources_t pdm_res = PDM_RX_RESOURCES_DDR(
                                PORT_MCLK_IN_OUT,
                                PORT_PDM_CLK,
                                PORT_PDM_DATA,
                                MIC_ARRAY_CLK1,
                                MIC_ARRAY_CLK2);
...
XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Prefab Model$$$Prefab - Allocate MicArray£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#prefab-allocate-micarray

The C++ class template MicArray is central to the mic array unit in this library. The class templates defined in the mic_array::prefab namespace each derive from mic_array::MicArray.

Define and allocate the specific implementation of MicArray to be used.

...
// Using the full name of the class could become cumbersome. Using an alias.
using TMicArray = mic_array::prefab::BasicMicArray<
                      MIC_COUNT, FRAME_SIZE, DCOE_ENABLED>
// Allocate mic array
TMicArray mics = TMicArray();
...

Now the mic array unit has been defined and allocated. The template parameters supplied (e.g. MIC_COUNT and FRAME_SIZE) are used to calculate the size of any data buffers required by the mic array, and so the mics object is self-contained, with all required buffers being statically allocated. Additionally, class templates will ultimately allow unused features to be optimized out at build time. For example, if DCOE is disabled, it will be optimized out at build time so that at run time it won’t even need to check whether DCOE is enabled.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Getting Started$$$Prefab Model$$$Prefab - Init and Start Functions£££modules/io/modules/mic_array/doc/rst/src/getting_started.html#prefab-init-and-start-functions

Now a couple functions need to be implemented in your C++ file. In most cases these functions will need to be callable from C or XC, and so they should not be static, and they should be decorated with extern "C" (or the MA_C_API preprocessor macro provided by the library).

First, a function which initializes the MicArray object and configures the port and clock block resources. The documentation for BasicMicArray indicates any parts of the MicArray object that need to be initialized.

#define MCLK_FREQ   24576000
#define PDM_FREQ    3072000
...
MA_C_API
void app_init() {
  // Configure clocks and ports
  const unsigned mclk_div = mic_array_mclk_divider(MCLK_FREQ, PDM_FREQ);
  mic_array_resources_configure(&pdm_res, mclk_div);

  // Initialize the PDM rx service
  mics.PdmRx.Init(pdm_res.p_pdm_mics);
}
...

app_init() can be called from an XC main() during initialization.

Assuming the PDM rx service is to be run as an ISR, a second function is used to actually start the mic array unit. This starts the PDM clock, install the ISR and enter the decimator thread’s main loop.

MA_C_API
void app_mic_array_task(chanend_t c_audio_frames) {
  mics.SetOutputChannel(c_audio_frames);

  // Start the PDM clock
  mic_array_pdm_clock_start(&pdm_res);

  mics.InstallPdmRxISR();
  mics.UnmaskPdmRxISR();

  mics.ThreadEntry();
}

Now a call to app_mic_array_task() with the channel to send frames on can be placed inside a par {...} block to spawn the thread.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#decimator-stages

The mic array unit provided by this library uses a two-stage decimation process to convert a high sample rate stream of (1-bit) PDM samples into a lower sample rate stream of (32-bit) PCM samples.

Below is a Simplified Decimator Model.

../_images/decimator_stages.drawio.png

Simplified Decimator Model

The first stage filter is a decimating FIR filter with a fixed tap count (S1_TAP_COUNT) of 256 and a fixed decimation factor (S1_DEC_FACTOR) of 32.

The second stage decimator is a fully configurable FIR filter with tap count S2_TAP_COUNT and a decimation factor of S2_DEC_FACTOR (this can be 1).

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 1£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#decimator-stage-1

For the first stage decimating FIR filter, the actual filter coefficients used are configurable, so an application is free to use a custom first stage filter, as long as the tap count is 256. This library also provides coefficients for the first stage filter, whose filter characteristics are adequate for most applications.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 1$$$Filter Implementation (Stage 1)£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#filter-implementation-stage-1

The input to the first stage decimator (here called “Stream A”) is a stream of 1-bit PDM samples with a sample rate of PDM_FREQ. Rather than each PDM sample representing a value of 0 or 1, each PDM sample represents a value of either +1 or -1. Specifically, on-chip and in-memory, a bit value of 0 represents +1 and a bit value of 1 represents -1.

The output from the first stage decimator, Stream B, is a stream of 32-bit PCM samples with a sample rate of PDM_FREQ/S1_DEC_FACTOR = PDM_FREQ/32. For example, if PDM_FREQ is 3.072 MHz, then Stream B’s sample rate is 96.0 kHz.

The first stage filter is structured to make optimal use of the XCore XS3 vector processing unit (VPU), which can compute the dot product of a pair of 256-element 1-bit vectors in a single cycle. The first stage uses 256 16-bit coefficients for its filter taps.

The signature of the filter function is

int32_t fir_1x16_bit(uint32_t signal[8], uint32_t coeff_1[]);

Each time 32 PDM samples (1 word) become available for an audio channel, those samples are shifted into the 8-word (256-bit) filter state, and a call to fir_1x16_bit results in 1 Stream B sample element for that channel.

The actual implementation for the first stage filter can be found in src/fir_1x16_bit.S. Additional usage details can be found in api/etc/fir_1x16_bit.h.

Note that the 256 16-bit filter coefficients are not stored in memory as a standard coefficient array (i.e. int16_t filter[256] = {b[0], b[1], ... };). Rather, in order to take advantage of the VPU, the coefficients must be rearranged bit-by-bit into a block form suitable for VPU processing. See the section below on filter conversion if supplying a custom filter for stage 1.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 1$$$Provided Filter (Stage 1)£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#provided-filter-stage-1

This library provides filter coefficients that may be used with the first stage decimator. These coefficients are available in your application through the header mic_array/etc/filters_default.h as stage1_coef.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 1$$$Provided Filter (Stage 1)$$$Filter Characteristics (Stage 1)£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#filter-characteristics-stage-1

The plot below indicates the frequency response of the provided first stage decimation filter First stage decimation filter freq response.

../_images/stage1_freq_response.png

First stage decimation filter freq response

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 1$$$Filter Conversion Script£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#filter-conversion-script

Taking a set of floating-point coefficients, quantizing them into 16-bit coefficients and ‘boggling’ them into the correct memory layout can be a tricky business. To simplify this process, this library provides a Python (3) script which does this process for you.

The script can be found in this repository at python/stage1.py.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 2£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#decimator-stage-2

An application is free to supply its own second stage filter. This library also provides a second stage filter whose characteristics are adequate for many or most applications.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 2$$$Filter Implementation (Stage 2)£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#filter-implementation-stage-2

The input to the second stage decimator (here called “Stream B”) is the stream of 32-bit PCM samples emitted from the first stage decimator with a sample rate of PDM_FREQ/32.

The output from the second stage decimator, Stream C, is a stream of 32-bit PCM samples with a sample rate of PDM_FREQ/(32*S2_DEC_FACTOR). For example, if PDM_FREQ is 3.072 MHz, and S2_DEC_FACTOR is 6, then Stream C’s sample rate (the sample rate received by the main application code) is

3.072 MHz / (32*6) = 16 kHz

The second stage filter uses the 32-bit FIR filter implementation from lib_xcore_math. See xs3_filter_fir_s32() in that library for more implementation details.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 2$$$Provided Filter (Stage 2)£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#provided-filter-stage-2

This library provides a filter suitable for the second stage decimator. It is available in your application through the header mic_array/etc/filters_default.h.

For the provided filter S2_TAP_COUNT = 65, and S2_DEC_FACTOR = 6.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Decimator Stage 2$$$Provided Filter (Stage 2)$$$Filter Characteristics (Stage 2)£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#filter-characteristics-stage-2

The plot below indicates the frequency response of the provided second stage decimation filter Second stage decimation filter freq response.

../_images/stage2_freq_response.png

Second stage decimation filter freq response

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Custom Filters£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#custom-filters

Without writing a custom decimator implementation, the tap count and decimation factor for the first stage decimator are fixed to 256 and 32 respectively. These can be modified for the second stage, and the filter coefficients for both stages can be modified.

When using the C++ API to construct your application’s mic array component, the decimator’s metaparameters (tap count, decimation factor) are given as C++ template parameters for the decimator class template. Pointers to the coefficients are provided to the decimator when it is initialized.

To keep things simple, when using the vanilla API or when constructing the mic array component using BasicMicArray, it is assumed that the filter parameters will be those from stage1_fir_coef.c, stage2_fir_coef.c and filters_default.h. In this case it is recommended to simple change those files directly with the updated coefficients. Otherwise you may need to use the C++ API directly.

Note that both the first and second stage filters are implemented using fixed-point arithmetic which requires the coefficients to be presented in a particular format. The Python scripts stage1.py and stage2.py, provided with this library, can be used to help with this formatting. See the associated README for usage details.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Custom Filters$$$Configuring for 32 kHz or 48 kHz output£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#configuring-for-32-khz-or-48-khz-output

Filter design scripts are provided to support higher output sampling rates than the default 16 kHz.

Both stage 1 and stage 2 need to be updated because the first stage needs a higher cut off frequency before samples are passed to the downsample by three (32 kHz) or two (48 kHz) second stage decimator.

From the command line, follow these instructions:

python filter_design/design_filter.py # generate the filter .pkl files
python stage1.py good_32k_filter_int.pkl # convert the .pkl file to a C style array for stage 1
python stage2.py good_32k_filter_int.pkl # convert the .pkl file to a C style array for stage 2

Note

Use good_48k_filter_int.pkl instead of good_32k_filter_int.pkl to support 48 kHz.

Next copy the output from last two scripts into a source file. This could be your mic_array.cpp file which launches the mic array tasks. It may look something like this:

#define MIC_ARRAY_32K_STAGE_1_TAP_COUNT 148
#define MIC_ARRAY_32K_STAGE_1_FILTER_WORD_COUNT 128
static const uint32_t WORD_ALIGNED stage1_32k_coefs[MIC_ARRAY_32K_STAGE_1_FILTER_WORD_COUNT]
{
    .... the coeffs
};

#define MIC_ARRAY_32K_STAGE_2_TAP_COUNT 96
static constexpr right_shift_t stage2_32k_shift = 3;

static const int32_t WORD_ALIGNED stage2_32k_coefs[MIC_ARRAY_32K_STAGE_2_TAP_COUNT] = {
    .... the coeffs
};

The new decimation object must now be declared that references your new filter coefficients. Again, this example is for 32 kHz output since the decimation factor is 3.:

using TMicArray = mic_array::MicArray<mic_count,
    mic_array::TwoStageDecimator<mic_count,
                               3,
                               MIC_ARRAY_32K_STAGE_2_TAP_COUNT>,
    mic_array::StandardPdmRxService<MIC_ARRAY_CONFIG_MIC_IN_COUNT,
                                mic_count,
                                3>,
    typename std::conditional<MIC_ARRAY_CONFIG_USE_DC_ELIMINATION,
                                mic_array::DcoeSampleFilter<mic_count>,
                                mic_array::NopSampleFilter<mic_count>>::type,
    mic_array::FrameOutputHandler<mic_count,
                                MIC_ARRAY_CONFIG_SAMPLES_PER_FRAME,
                                mic_array::ChannelFrameTransmitter>>;

Next you need to change how you initialise and run the mic array task to reference your new mic array custom object. Normally the following code would be used in ma_init():

mics.Init();
mics.SetPort(pdm_res.p_pdm_mics);
mic_array_resources_configure(&pdm_res, MIC_ARRAY_CONFIG_MCLK_DIVIDER);
mic_array_pdm_clock_start(&pdm_res);

however if you wish to use custom filters then the initialisation would look like this:

mics.Decimator.Init(stage1_32k_coefs, stage2_32k_coefs, stage2_32k_shift);
mics.PdmRx.Init(pdm_res.p_pdm_mics);
mic_array_resources_configure(&pdm_res, MIC_ARRAY_CONFIG_MCLK_DIVIDER);
mic_array_pdm_clock_start(&pdm_res);

Finally, the ma_task() function needs to be changed from the default way of calling:

mics.SetOutputChannel(c_frames_out);
mics.InstallPdmRxISR();
mics.UnmaskPdmRxISR();
mics.ThreadEntry();

to using the custom version of the object:

mics.OutputHandler.FrameTx.SetChannel(c_frames_out);
mics.PdmRx.InstallISR();
mics.PdmRx.UnmaskISR();
mics.ThreadEntry();

The increased sample rate will place a higher MIPS burden on the processor. The typical MIPS usage (see section Mic Array Resource Usage) is in the order of 11 MIPS per channel using a 16 kHz output decimator.

Increasing the output sample rate to 32 kHz using the same length filters will increase processor usage per channel to approximately 13 MIPS rising to 15.6 MIPS for 48 kHz.

Increasing the filer lengths to 148 and 96 for stages 1 and 2 respectively at 48 kHz will increase processor usage per channel to around 20 MIPS.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Custom Filters$$$Configuring for 32 kHz or 48 kHz output$$$Filter Characteristics for good_32k_filter_int.pkl£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#filter-characteristics-for-good-32k-filter-int-pkl

The plot below indicates the frequency response of the first and second stages of the provided 32 kHz filters as well as the cascaded overall response. Note that the overall combined response provides a nice flat passband as shown in the good_32k_filter_int.pkl frequency response.

../_images/32k_freq_response.png

good_32k_filter_int.pkl frequency response

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Decimator Stages$$$Custom Filters$$$Configuring for 32 kHz or 48 kHz output$$$Filter Characteristics for good_48k_filter_int.pkl£££modules/io/modules/mic_array/doc/rst/src/decimator_stages.html#filter-characteristics-for-good-48k-filter-int-pkl

The plot below indicates the frequency response of the first and second stages of the provided 48 kHz filters as well as the cascaded overall response. Note that the overall combined response provides a nice flat passband as shown good_48k_filter_int.pkl frequency response.

../_images/48k_freq_response.png

good_48k_filter_int.pkl frequency response

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#sample-filters

Following the two-stage decimation procedure is an optional post-processing stage called the sample filter. This stage operates on each sample emitted by the second stage decimator, one at a time, before the samples are handed off for framing or transfer to the rest of the application’s audio pipeline.

Note

This is represented by the SampleFilter sub-component of the MicArray class template.

An application may implement its own sample filter in the form of a C++ class which implements the Filter() function as required by MicArray. See the implementation of DcoeSampleFilter for a simple example.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters$$$DC Offset Elimination£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#dc-offset-elimination

The current version of this library provides a simple IIR filter called DC Offset Elimination (DCOE) that can be used as the sample filter. This is a high-pass filter meant to ensure that each audio channel will tend towards a mean sample value of zero.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters$$$DC Offset Elimination$$$Enabling/Disabling DCOE£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#enabling-disabling-dcoe

Whether the DCOE filter is enabled by default and how to enable or disable it depends on which approach your project uses to include the mic array component in the application.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters$$$DC Offset Elimination$$$Enabling/Disabling DCOE$$$Vanilla Model£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#vanilla-model

If your project uses the vanilla model (see Vanilla API) to include the mic array unit in your application, then DCOE is enabled by default. To disable DCOE your build script must add a compiler option to your application target that sets the MIC_ARRAY_CONFIG_USE_DC_ELIMINATION preprocessor macro to the value 0.

For example, in a typical application’s CMakeLists.txt, that may look like the following.

# Gather sources and create application target
# ...
# Add vanilla source to application build
mic_array_vanilla_add(my_app  ${MCLK_FREQ} ${PDM_FREQ}
                            ${MIC_COUNT} ${FRAME_SIZE} )
# ...
# Disable DCOE
target_compile_definitions(my_app
    PRIVATE MIC_ARRAY_CONFIG_USE_DC_ELIMINATION=0 )
XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters$$$DC Offset Elimination$$$Enabling/Disabling DCOE$$$Prefab Model£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#prefab-model

If your project instantiates the BasicMicArray class template to include the mic array unit, DC offset elimination is enabled or disabled with the USE_DCOE boolean template parameter (there is no default).

template <unsigned MIC_COUNT, unsigned FRAME_SIZE, bool USE_DCOE>
    class BasicMicArray : public ...

The sample filter chosen is based on the USE_DCOE template parameter when the class template gets instantiated. If true, DcoeSampleFilter will be selected as the MicArray SampleFilter sub-component. Otherwise NopSampleFilter will be used.

Note

NopSampleFilter is a no-op filter – it does not modify the samples given to it and ultimately will be completely optimized out at compile time.

For example, in your application source:

#include "mic_array/mic_array.h"
...
// Controls whether DCOE is enabled
static constexpr bool enable_dcoe = true;
auto mics = mic_array::prefab::BasicMicArray<MICS, FRAME_SIZE, enable_dcoe>();
...
XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters$$$DC Offset Elimination$$$Enabling/Disabling DCOE$$$General Model£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#general-model

If your project does not use either the vanilla or prefab models to include the mic array unit in your application, then precisely how the DCOE filter is included may depend on the specifics of your application. In general, however, the DCOE filter will be enabled by using DcoeSampleFilter as the TSampleFilter template parameter for the MicArray class template.

For example, sub-classing mic_array::MicArray as follows will enable DCOE for any MicArray implementation deriving from that sub-class.

#include "mic_array/cpp/MicArray.hpp"
using namespace mic_array;
...
template <unsigned MIC_COUNT, class TDecimator,
          class TPdmRx, class TOutputHandler>
class DcoeEnabledMicArray : public MicArray<MIC_COUNT, TDecimator, TPdmRx,
                                    DcoeSampleFilter, TOutputHandler>
{
  ...
};
XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters$$$DC Offset Elimination$$$DCOE Filter Equation£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#dcoe-filter-equation

As mentioned above, the DCOE filter is a simple IIR filter given by the following equation, where x[t] and x[t-1] are the current and previous input sample values respectively, and y[t] and y[t-1] are the current and previous output sample values respectively.

R = 252.0 / 256.0
y[t] = R * y[t-1] + x[t] - x[t-1]
XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Sample Filters$$$DC Offset Elimination$$$DCOE Filter Frequency Response£££modules/io/modules/mic_array/doc/rst/src/sample_filters.html#dcoe-filter-frequency-response

The plot below indicates the frequency response of DCOE filter DCOE filter frequency response.

../_images/dcoe_freq_response.png

DCOE filter frequency response

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#software-structure

The core of lib_mic_array are a set of C++ class templates representing the mic array unit and its sub-components.

The template parameters of these class templates are (mainly) used for two different purposes. Non-type template parameters are used to specify certain quantitative configuration values, such as the number of microphone channels or the second stage decimator tap count. Type template parameters, on the other hand, are used for configuring the behavior of sub-components.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$High-Level View£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#high-level-view

At the heart of the mic array API is the MicArray class template.

Note

All classes and class templates mentioned are in the mic_array C++ namespace unless otherwise specified. Additionally, this documentation may refer to class templates (e.g. MicArray) with unbound template parameters as “classes” when doing so is unlikely to lead to confusion.

The MicArray class template looks like the following:

template <unsigned MIC_COUNT,
          class TDecimator,
          class TPdmRx,
          class TSampleFilter,
          class TOutputHandler>
class MicArray;

Here the non-type template parameter MIC_COUNT indicates the number of microphone channels to be captured and processed by the mic array unit. Most of the class templates have this as a parameter.

A MicArray object comprises 4 sub-components:

Member Field

Component Class

Responsibility

PdmRx

TPdmRx

Capturing PDM data from a port.

Decimator

TDecimator

2-stage decimation on blocks of PDM data.

SampleFilter

TSampleFilter

Post-processing of decimated samples.

OutputHandler

TOutputHandler

Transferring audio data to subsequent pipeline stages.

Each of the MicArray sub-components has a type that is specified as a template parameter when the class template is instantiated. MicArray requires the class of each of its sub-components to implement a certain minimal interface. The MicArray object interacts with its sub-components using this interface.

Note

Abstract classes are not used to enforce this interface contract. Instead, the contract is enforced (at compile time) solely in how the MicArray object makes use of the sub-component.

The following diagram Mic Array High Level Process conceptually captures the flow of information through the MicArray sub-components.

../_images/high_level_process.drawio.png

Mic Array High Level Process

Note

MicArray does not enforce the use of an XCore port for collecting PDM samples or an XCore channel for transferring processed data. This is just the typical usage.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$High-Level View$$$Mic Array / Decimator Thread£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#mic-array-decimator-thread

Aside from aggregating its sub-components into a single logical entity, the MicArray class template also holds the high-level logic for capturing, processing and coordinating movement of the audio stream data.

The following code snippet is the implementation for the main mic array thread (or “decimation thread”; not to be confused with (optional) PDM capture thread).

void mic_array::MicArray<MIC_COUNT,TDecimator,TPdmRx,
                                  TSampleFilter,
                                  TOutputHandler>::ThreadEntry()
{
  int32_t sample_out[MIC_COUNT] = {0};

  while(1){
    uint32_t* pdm_samples = PdmRx.GetPdmBlock();
    Decimator.ProcessBlock(sample_out, pdm_samples);
    SampleFilter.Filter(sample_out);
    OutputHandler.OutputSample(sample_out);
  }
}

The thread loops forever, and on each iteration

  • Requests a block of PDM sample data from the PDM rx service. This is a blocking call which only returns once a complete block becomes available.

  • Passes the block of PDM sample data to the decimator to produce a single output sample.

  • Applies a post-processing filter to the sample data.

  • Passes the processed sample to the output handler to be transferred to the next stage of the processing pipeline. This may also be a blocking call, only returning once the data has been transferred.

Note that the MicArray object doesn’t care how these steps are actually implemented. For example, one output handler implementation may send samples one at a time over a channel. Another output handler implementation may collect samples into frames, and use a FreeRTOS queue to transfer the data to another thread.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$High-Level View$$$Curiously Recurring Template Pattern£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#curiously-recurring-template-pattern

The C++ API of this library makes heavy use of the Curiously Recurring Template Pattern (CRTP).

Instead of providing flexibility through abstract classes or polymorphism, CRTP achieves flexibility through the use of class templates with type template parameters. As with derived classes and virtual methods, the CRTP template parameter must follow a contract with the class template where it implements one or more methods with specific names and signatures that the class template directly calls.

There are a couple notable advantages of using CRTP over polymorphic behavior. With CRTP flexibility does not generally come with the same run-time costs (in terms of both compute and memory) as polymorphic solutions. This is because the CRTP class template always knows the concrete type of any objects it uses at compile time. This avoids the need for run time type information or virtual function tables. This allows compile time optimizations can be made which may not be otherwise available. This in-turn allows many function calls to be inlined, or in some cases, entirely eliminated.

Additionally, while not strictly an example of CRTP, integer template parameters are also heavily used in class templates. The two main advantages of this are that it allows objects to encapsulate their own (statically allocated) memory, and that it allows the compiler to make compile time loop optimizations that it may not otherwise be able to make.

The downside to CRTP is that it tends to lead to highly verbose class type names, where templated classes end up with type parameter assignments are themselves templated classes with their own template parameters.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$High-Level View$$$Sub-Component Initialization£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#sub-component-initialization

Each of MicArray’s sub-components may have implementation-specific configuration or initialization requirements. Each sub-component is a public member of MicArray (see table above). An application can access a sub-component directly to perform any type-specific initialization or other manipulation.

For example, the ChannelFrameTransmitter output handler class needs to know the chanend to be used for sending samples. This can be initialized on a MicArray object mics with mics.OutputHandler.SetChannel(c_sample_out).

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$Sub-Components£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#sub-components

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$Sub-Components$$$PdmRx£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#pdmrx

PdmRx, or the PDM rx service is the MicArray sub-component responsible for capturing PDM sample data, assembling it into blocks, and passing it along so that it can be decimated.

The MicArray class requires only that PdmRx implement GetPdmBlock(), a blocking call that returns a pointer to a block of PDM data which is ready for further processing.

Generally speaking, PdmRx will derive from the PdmRxService class template. PdmRxService encapsulates the logic of using an xCore port for capturing PDM samples one word (32 bits) at a time, and managing two buffers where blocks of samples are collected. It also simplifies the logic of running PDM rx as either an interrupt or as a stand-alone thread.

PdmRxService has 2 template parameters. The first is the BLOCK_SIZE, which specifies the size of a PDM sample block (in words). The second, SubType, is the type of the sub-class being derived from PdmRxService. This is the CRTP (Curiously Recurring Template Pattern), which allows a base class to use polymorphic-like behaviors while ensuring that all types are known at compile-time, avoiding the drawbacks of using virtual functions.

There is currently one class template which derives from PdmRxService, called StandardPdmRxService. StandardPdmRxService uses a streaming channel to transfer PDM blocks to the decimator. It also provides methods for installing an optimized ISR for PDM capture.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$Sub-Components$$$Decimator£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#decimator

The Decimator sub-component encapsulates the logic of converting blocks of PDM samples into PCM samples. The TwoStageDecimator class is a decimator implementation that uses a pair of decimating FIR filters to accomplish this.

The first stage has a fixed tap count of 256 and a fixed decimation factor of 32. The second stage has a configurable tap count and decimation factor.

For more details, see Decimator Stages.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$Sub-Components$$$SampleFilter£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#samplefilter

The SampleFilter sub-component is used for post-processing samples emitted by the decimator. Two implementations for the sample filter sub-component are provided by this library.

The NopSampleFilter class can be used to effectively disable per-sample filtering on the output of the decimator. It does nothing to the samples presented to it, and so calls to it can be optimized out during compilation.

The DcoeSampleFilter class is used for applying the DC offset elimination filter to the decimator’s output. The DC offset elimination filter is meant to ensure the sample mean for each channel tends toward zero.

For more details, see Sample Filters.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$Sub-Components$$$OutputHandler£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#outputhandler

The OutputHandler sub-component is responsible for transferring processed sample data to subsequent processing stages.

There are two main considerations for output handlers. The first is whether audio data should be transferred sample-by-sample or as frames containing many samples. The second is the method of actually transferring the audio data.

The class ChannelSampleTransmitter sends samples one at a time to subsequent processing stages using an xCore channel.

The FrameOutputHandler class collects samples into frames, and uses a frame transmitter to send the frames once they’re ready.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Software Structure$$$Sub-Components$$$Prefabs£££modules/io/modules/mic_array/doc/rst/src/software_structure.html#prefabs

One of the drawbacks to broad use of class templates is that concrete class names can unfortunately become excessively verbose and confusing. For example, the following is the fully qualified name of a (particular) concrete MicArray implementation:

mic_array::MicArray<2,
    mic_array::TwoStageDecimator<2,6,65>,
    mic_array::StandardPdmRxService<2,2,6>,
    mic_array::DcoeSampleFilter<2>,
    mic_array::FrameOutputHandler<2,256,
        mic_array::ChannelFrameTransmitter>>

This library also provides a C++ namespace mic_array::prefab which is intended to simplify construction of MicArray objects where common configurations are needed.

The BasicMicArray class template uses the most typical component implementations, where PDM rx can be run as an interrupt or as a stand-alone thread, and where audio frames are transmitted to subsequent processing stages using a channel.

To demonstrate how BasicMicArray simplifies this process, observe that the following MicArray type is behaviorally identical to the above:

mic_array::prefab::BasicMicArray<2,256,true>

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#mic-array-resource-usage

The mic array unit requires several kinds of hardware resources, including ports, clock blocks, chanends, hardware threads, compute time (MIPS) and memory. Compared to previous versions of this library, the biggest advantage to the current version with respect to hardware resources is a greatly reduced compute requirement. This was made possible by the introduction of the VPU in the XMOS XS3 architecture. The VPU can do certain operations in a single instruction which would take many, many instructions on previous architectures.

This page attempts to capture the requirements for each hardware type with relevant configurations.

Warning

The usage information below applies when the Vanilla API or prefab APIs are used. Resource usage in an application which uses custom mic array sub-components will depend crucially on the specifics of the customization.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage$$$Discrete Resources£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#discrete-resources

Resource

Count

port

3

clock block

1 (SDR)

2 (DDR)

chanend

4

thread

1 (Vanilla)

1 or 2 (prefab)

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage$$$Discrete Resources$$$Ports£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#ports

In all configurations, the mic array unit requires 3 of the xcore.ai device’s hardware ports. Two of these ports (for the master audio clock and PDM clock) must be 1-bit ports. The third (PDM capture port) can be 1-, 4- or 8-bit, depending on the microphone count and SDR/DDR configuration.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage$$$Discrete Resources$$$Clock Blocks£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#clock-blocks

In applications which use an SDR microphone configuration, the mic array unit requires 1 of the xcore.ai device’s 5 clock blocks. This clock block is used both to generate the PDM clock from the master audio clock and as the PDM capture clock.

In applications which use a DDR microphone configuration, the mic array unit requires 2 of the xcore.ai device’s 5 clock blocks. One clock is used to generate the PDM clock from the master audio clock, and the other is used as the PDM capture clock (which must operate at different rates in a DDR configuration).

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage$$$Discrete Resources$$$Chanends£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#chanends

Chanends are a hardware resource which allow threads (possibly running on different tiles) to communicate over channels. The mic array unit requires 4 chanends. Two are used for communication between the PDM rx service and the decimation thread. Two more are needed for transfering completed frames from the mic array unit to other application components.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage$$$Discrete Resources$$$Threads£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#threads

The prefab API can run the PDM rx service either as a stand-alone thread or as an interrupt in another thread. The Vanilla API only supports running it as an interrupt. The Vanilla API requires only on hardware thread. The prefab API requires 1 thread if PDM rx is used in interrupt mode, and 2 if PDM rx is a stand-alone thread..

Running PDM rx as a stand-alone thread modestly reduces the mic array unit’s MIPS consumption by eliminating the context switch overhead of an interrupt. The cost of that is one hardware thread.

Note

When configured as an interrupt, PDM rx ISR is typically configured on the decimation thread, but this is not a strict requirement. The PDM rx interrupt can be configured for any thread on the same tile as the decimation thread. They must be on the same tile because shared memory is used between the two contexts.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage$$$Compute£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#compute

The compute requirement of the mic array unit depends strongly on the actual configuration being used. The compute requirement is expressed in millions of instructions per second (MIPS) and is approximately linearly related to many of the configuration parameters.

Each tile of an xcore.ai device has 8 hardware threads and a 5 stage pipline. The exact calculation of how many MIPS are available to a thread is complicated, and is, in general, affected by both the number of threads being used, as well as the work being done by each thread.

As a rule of thumb, however, the core scheduler will offer each thread a minimum of CORE_CLOCK_MHZ/8 millions of instruction issue slots per second (~MIPS), and no more than CORE_CLOCK_MHZ/5 millions of issue slots per second, where CORE_CLOCK_MHZ is the core CPU clock rate. With a core clock rate of 600 MHz, that means that each core should expect at least 75 MIPS.

The MIPS values in the table below are estimates obtained using the demo applications in demo/measure_mips.

PDM Freq

S2DF

S2TC

PdmRx

1 mic MIPS

2 mic MIPS

4 mic MIPS

8 mic MIPS

3.072 MHz

6

65

ISR

10.65

22.00

43.70

N/A

3.072 MHz

6

65

Thread

9.33

19.37

38.48

75.90

6.144 MHz

6

65

ISR

21.26

43.89

TBD

TBD

6.144 MHz

6

65

Thread

18.66

38.73

TBD

TBD

3.072 MHz

3

65

ISR

12.90

26.44

TBD

TBD

3.072 MHz

3

65

Thread

11.62

23.85

TBD

TBD

3.072 MHz

6

130

ISR

11.17

23.04

TBD

TBD

3.072 MHz

6

130

Thread

9.86

20.42

TBD

TBD

PDM Freq

Frequency of the PDM clock.

S2DF

Stage 2 decimation factor. Output sample rate is (PDM Freq / (32 * S2DF)).

S2TC

Stage 2 tap count.

PdmRx

Whether PDM capture is done in a stand-alone thread or in an ISR.

Measurements indicate that enabling or disabling the DC offset removal filter has little effect on the MIPS usage. The selected frame size has only a slight negative correlation with MIPS usage.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Mic Array Resource Usage$$$Memory£££modules/io/modules/mic_array/doc/rst/src/resource_usage.html#memory

The memory cost of the mic array unit has three parts: code, stack and data. Code is the memory needed to store compiled instructions in RAM. Stack is the memory required to store intermediate results during function calls, and data is the memory used to store persistant objects, variables and constants.

The stack memory requirement is minimal. The code memory requirement depends on the particular configuration, but ranges from about 1600 bytes in a 1 mic configuration to about 2000 bytes in an 8 mic configuration.

Not included in the table is the space allocated for the first and second stage filter coefficients. The first stage filter coefficients take a constant 523 bytes. The second stage filter coefficients use 4*S2TC bytes, where S2TC is the stage 2 decimator tap count. The value shown in the ‘data’ column of the table is the sizeof() the BasicMicArray that is instantiated. The table below indicates the data size for various configurations.

Mics

S2DF

S2TC

SPF

DCOE

Data Memory

1

6

65

16

On

504 B

2

6

65

16

On

968 B

4

6

65

16

On

1888 B

8

6

65

16

On

3728 B

1

6

65

16

On

768 B

2

6

130

16

On

1488 B

1

6

130

16

On

576 B

2

12

65

16

On

1112 B

1

12

65

160

On

1080 B

2

6

65

160

On

2120 B

1

6

65

16

Off

496 B

2

6

65

16

Off

948 B

S2DF

Stage 2 decimator’s decimation factor.

S2TC

Stage 2 decimator’s tap count.

SPF

Samples per frame in frames delivered by the mic array unit.

DCOE

DC Offset Elimination

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Vanilla API£££modules/io/modules/mic_array/doc/rst/src/vanilla_api.html#vanilla-api

The Vanilla API is a small optional API which greatly simplifies the process of including a mic array unit in an xcore.ai application. Most applications that make use of a PDM mic array will not have complicated needs from the mic array software component beyond delivery of frames of audio data from a configurable set of microphones at a configurable rate. This API targets that majority of applications.

The prefab API requires the application developer to have at least some minimal understanding of the objects and classes associated with the mic array unit, and requires the developer to write some application-specific code to configure and start the mic array. The Vanilla API (which builds on top of the prefab model) by contrast, requires as little as two standard function calls, and instead moves the majority of the application logic into the application’s build project.

Note

Why “Vanilla”? “Vanilla” was originally meant as a generic placeholder name, but no better name was ever suggested.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Vanilla API$$$How It Works£££modules/io/modules/mic_array/doc/rst/src/vanilla_api.html#how-it-works

The Vanilla API comprises two code files, etc/vanilla/mic_array_vanilla.cpp and etc/vanilla/mic_array_vanilla.h which are not compiled as part of this library. Instead, if used, these are added to the application target’s build. To control configuration, the source file relies on a set of pre-processor macros (added via compile flags) which determine how the mic array unit will be instantiated.

The API is included in an application by using a CMake macro (mic_array_vanilla_add()) provided in this library. The macro updates the application’s sources, includes and compile definitions to include the API.

In the application code, two function calls are needed. First, ma_vanilla_init() is called to initialize the various mic array sub-components, preparing for capture of PDM data. Then, to start capture the decimation thread is started with ma_vanilla_task() as entrypoint. ma_vanilla_task() takes an XCore chanend as a parameter, which tells it where completed audio frames should be routed.

Note

The Vanilla API runs the PDM rx service as an interrupt in the decimation thread. To run it as a separate thread (for reduced total MIPS consumption) one of the lower-level APIs must be used.

As with the prefab API, audio frames are extracted from the mic array unit over a (non-streaming) channel using the ma_frame_rx() or ma_frame_rx_transpose() functions.

Note

The Vanilla API uses the default filters provided with this library, and does not currently provide a way to override this. To use custom filters, you must either use a lower-level API or modify the vanilla API.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Vanilla API$$$Configuration£££modules/io/modules/mic_array/doc/rst/src/vanilla_api.html#configuration

Configuration with the Vanilla API is achieved through compile definitions. The required definitions are provided through the mic_array_vanilla_add() macro. There are several additional optional definitions.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Vanilla API$$$Configuration$$$mic_array_vanilla_add()£££modules/io/modules/mic_array/doc/rst/src/vanilla_api.html#mic-array-vanilla-add

mic_array_vanilla_add() is the CMake macro used to add the Vanilla API to an application.

macro( mic_array_vanilla_add
          TARGET_NAME
          MCLK_FREQ
          PDM_FREQ
          MIC_COUNT
          SAMPLES_PER_FRAME )
TARGET_NAME

The name of the application’s CMake target. It is the target the Vanilla API is added to.

MCLK_FREQ

The known frequency, in Hz, of the application’s master audio clock. A typical frequency is 24576000 Hz. Note that this parameter is not configuring the master audio clock. (Equivalent compile definition: MIC_ARRAY_CONFIG_MCLK_FREQ)

PDM_FREQ

The desired frequency, in Hz, of the PDM clock. This should be an integer factor of MCLK_FREQ between 1 and 510. (Equivalent compile definition: MIC_ARRAY_CONFIG_PDM_FREQ)

MIC_COUNT

The number of PDM microphone channels to be captured. This API supports values of 1 (SDR), 2 (DDR), 4 (SDR) and 8 (SDR/DDR). This value must match the configuration (SDR/DDR) and port width of the PDM capture port. That is, in an SDR port configuration, MIC_COUNT must equal the capture port width, and in DDR port configuration, MIC_COUNT must be twice the port width. (Equivalent compile definition: MIC_ARRAY_CONFIG_MIC_COUNT)

Note

This API does not support capturing only a subset of the capture port’s channels, e.g. capturing only 3 channels on a 4-bit port. To accomplish this the prefab API should be used.

Note

Though listed under Optional Configuration below, if the microphones are in a DDR configuration and MIC_COUNT is not 2, the application must also define MIC_ARRAY_CONFIG_USE_DDR.

SAMPLES_PER_FRAME is the number of samples (for each microphone channel) that will be delivered in each (non-overlapping) frame retrieved by ma_frame_rx(). A minimum value of 1 is supported, to deliver samples one at a time. The larger this value, the looser the real-time constraint on the thread receiving the mic array unit’s output (while also increasing the amount of audio data to be processed).

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Vanilla API$$$Configuration$$$Optional Configuration£££modules/io/modules/mic_array/doc/rst/src/vanilla_api.html#optional-configuration

These are configuration parameters that receive default values but can be optionally overridden by an application. These can be defined in your application’s CMakeLists.txt using CMake’s built-in target_compile_definitions() command.

MIC_ARRAY_CONFIG_USE_DDR

Indicates whether the microphones are arranged in an SDR (0) or DDR (1) configuration. An SDR configuration is one in which each port pin is connected to a single PDM microphone. A DDR configuration is one which each port pin is connected to two PDM microphones. Defaults to 0 (SDR), unless MIC_ARRAY_CONFIG_MIC_COUNT is 2 in which case it defaults to 1 (DDR).

MIC_ARRAY_CONFIG_USE_DC_ELIMINATION

Indicates whether the DC offset elimination filter should be applied to the output of the decimator. Set to 0 to disable or 1 to enable. Defaults to 1 (filter on).

The next three parameters are the identifiers for hardware port resources used by the mic array unit. They can be specified as either the identifier listed in your device’s datasheet (e.g. XS1_PORT_1D) or as an alias for the port listed in your application’s XN file (e.g. PORT_MCLK_IN_OUT). For example:

...
<Tile Number="0" Reference="tile[0]">
...
  <Port Location="XS1_PORT_1D"  Name="PORT_MCLK_IN_OUT"/>
...
</Tile>
...
MIC_ARRAY_CONFIG_PORT_MCLK

Identifier of the 1-bit port on which the device is receiving the master audio clock. Defaults to PORT_MLCK_IN_OUT.

MIC_ARRAY_CONFIG_PORT_PDM_CLK

Identifier of the 1-bit port on which the device will signal the PDM clock to the microphones. Defaults to PORT_PDM_CLK.

MIC_ARRAY_CONFIG_PORT_PDM_DATA

Identifier of the port on which the device will capture PDM sample data. The port width of this port must match the MIC_COUNT parameter given to mic_array_vanilla_add() and the value of MIC_ARRAY_CONFIG_USE_DDR. Defaults to PORT_PDM_DATA.

The final two parameters indicate which clock block resource(s) should be used to generate the PDM clock and the capture clock. An xcore.ai device provides 5 hardware clock blocks for application use, identified as XS1_CLKBLK_1 through XS1_CLKBLK_5. The device’s clock blocks are interchangeable, but if another component of your application uses one of these defaults, you may need to change these parameters.

MIC_ARRAY_CONFIG_CLOCK_BLOCK_A

Clock block used as ‘clock A’ (see Getting Started). This clock block is used in both SDR and DDR configurations.

MIC_ARRAY_CONFIG_CLOCK_BLOCK_B

Clock block used as ‘clock B’ (see Getting Started). This clock block is only needed in DDR configurations and is ignored (not configured) in SDR configurations.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$Vanilla API$$$Configuration$$$Vanilla API with other Build Systems£££modules/io/modules/mic_array/doc/rst/src/vanilla_api.html#vanilla-api-with-other-build-systems

Using the Vanilla API with build systems other than CMake is simple.

  • Add the file etc/vanilla/mic_array_vanilla.cpp to the application’s source files.

  • Add etc/vanilla/ (relative to repository root) to the application include paths.

  • Add the compile definitions for the parameters listed in the previous sections (each parameter beginning with MIC_ARRAY_CONFIG_) to the compile options for mic_array_vanilla.cpp.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference£££modules/io/modules/mic_array/doc/rst/src/reference/api_index.html#api-reference

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#c-api-reference

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$MicArray£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#micarray
template<unsigned MIC_COUNT, class TDecimator, class TPdmRx, class TSampleFilter, class TOutputHandler>
class MicArray

Represents the microphone array component of an application.

Like many classes in this library, FrameOutputHandler uses the Curiously Recurring Template Pattern.

Template Parameters:
  • MIC_COUNT

    The number of microphones to be captured by the MicArray’s PdmRx component. For example, if using a 4-bit port to capture 6 microphone channels in a DDR configuration (because there are no 3 or 6 pin ports) MIC_COUNT should be 8, because that’s how many must be captured, even if two of them are stripped out before passing audio frames to subsequent application stages.

  • TDecimator – Type for the decimator. See Decimator.

  • TPdmRx – Type for the PDM rx service used. See PdmRx.

  • TSampleFilter – Type for the output filter used. See SampleFilter.

  • TOutputHandler – Type for the output handler used. See OutputHandler.

Public Functions

inline MicArray()

Construct a MicArray.

This constructor uses the default constructor for each of its components, PdmRx, Decimator, SampleFilter, and OutputHandler.

inline MicArray(TPdmRx pdm_rx, TSampleFilter sample_filter, TOutputHandler output_handler)

Construct a MicArray.

This constructor uses the default constructor for its Decimator component.

The remaining components are initialized with the supplied objects.

Parameters:
  • pdm_rx – The PDM rx object.

  • sample_filter – The SampleFilter object.

  • output_handler – The OutputHandler object.

inline MicArray(TPdmRx pdm_rx, TOutputHandler output_handler)

Construct a MicArray

This constructor uses the default constructor for its Decimator and SampleFilter components.

The remaining components are initialized with the supplied objects.

Parameters:
  • pdm_rx – The PDM rx object.

  • output_handler – The OutputHandler object.

void ThreadEntry()

Entry point for the decimation thread.

This function does not return. It loops indefinitely, collecting blocks of PDM data from PdmRx (which must have already been started), uses Decimator to filter and decimate the sample stream to the output sample rate, applies any post-processing with SampleFilter, and then delivers the stream of output samples through OutputHandler.

Public Members

TPdmRx PdmRx

The PDM rx service.

The template parameter TPdmRx is the concrete class implementing the microphone array’s PDM rx service, which is responsible for collecting PDM samples from a port and delivering them to the decimation thread.

TPdmRx is only required to implement one function, GetPdmBlock():

uint32_t* GetPdmBlock();

GetPdmBlock() returns a pointer to a block of PDM data, formatted as expected by the decimator. GetPdmBlock() is called from the decimator thread and is expected to block until a new full block of PDM data is available to be decimated.

For example, StandardPdmRxService::GetPdmBlock() waits to receive a pointer to a block of PDM data from a streaming channel. The pointer is sent from the PdmRx interrupt (or thread) when the block has been completed. This is used for capturing PDM data from a port.

TDecimator Decimator

The Decimator.

The template parameter TDecimator is the concrete class implementing the microphone array’s decimation procedure. TDecimator is only required to implement one function, ProcessBlock():

void ProcessBlock(
    int32_t sample_out[MIC_COUNT],
    uint32_t pdm_block[BLOCK_SIZE]);

ProcessBlock() takes a block of PDM samples via its pdm_block parameter, applies the appropriate decimation logic, and outputs a single (multi-channel) sample sample via its sample_out parameter. The size and formatting of the PDM block expected by the decimator depends on its particular implementation.

A concrete class based on the mic_array::TwoStageDecimator class template is used in the prefab::BasicMicArray prefab.

TSampleFilter SampleFilter

The output filter.

The template parameter TSampleFilter is the concrete class implementing the microphone array’s sample filter component. This component can be used to apply additional non-decimating, non-interpolating filtering of samples. TSampleFilter() is only required to implement one function, Filter():

void Filter(int32_t sample[MIC_COUNT]);

Filter() takes a single (multi-channel) sample from the decimator component’s output and may update the sample in-place.

For example a sample filter based on the DcoeSampleFilter class template applies a simple first-order IIR filter to the output of the decimator, in order to eliminate the DC component of the audio signals.

If no additional filtering is required, the NopSampleFilter class template can be used for TSampleFilter, which leaves the sample unmodified. In this case, it is expected that the call to NopSampleFilter::Filter() will ultimately get completely eliminated at build time. That way no addition run-time compute or memory costs need be introduced for the additional flexibility.

Even though TDecimator and TSampleFilter both (possibly) apply filtering, they are separate components of the MicArray because they are conceptually independent.

A concrete class based on either the DcoeSampleFilter class template or the NopSampleFilter class template is used in the prefab::BasicMicArray prefab, depending on the USE_DCOE parameter of that class template.

TOutputHandler OutputHandler

The output handler.

The template parameter TOutputHandler is the concrete class implementing the microphone array’s output handler component. After the PDM input stream has been decimated to the appropriate output sample rate, and after any post-processing of that output stream by the sample filter, the output samples must be delivered to another thread for any additional processing. It is the responsibility of this component to package and deliver audio samples to subsequent processing stages.

TOutputHandler is only required to implement one function, OutputSample():

void OutputSample(int32_t sample[MIC_COUNT]);

OutputSample() is called exactly once for each mic array output sample. OutputSample() may block if necessary until the subsequent processing stage ready to receive new data. However, the decimator thread (in which OutputSample() is called) as a whole has a real-time constraint - it must be ready to pull the next block of PDM data while it is available.

A concrete class based on the FrameOutputHandler class template is used in the prefab::BasicMicArray prefab.

Public Static Attributes

static unsigned MicCount = MIC_COUNT

Number of microphone channels.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$BasicMicArray£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#basicmicarray
template<unsigned MIC_COUNT, unsigned FRAME_SIZE, bool USE_DCOE, unsigned MICS_IN = MIC_COUNT>
class BasicMicArray : public mic_array::MicArray<MIC_COUNT, TwoStageDecimator<MIC_COUNT, STAGE2_DEC_FACTOR, STAGE2_TAP_COUNT>, StandardPdmRxService<MIC_COUNT, MIC_COUNT, STAGE2_DEC_FACTOR>, std::conditional<USE_DCOE, DcoeSampleFilter<MIC_COUNT>, NopSampleFilter<MIC_COUNT>>::type, FrameOutputHandler<MIC_COUNT, FRAME_SIZE, ChannelFrameTransmitter>>

Class template for a typical bare-metal mic array unit.

This prefab is likely the right starting point for most applications.

With this prefab, the decimator will consume one device core, and the PDM rx service can be run either as an interrupt, or as an additional thread. Normally running as an interrupt is recommended.

For the first and second stage decimation filters, this prefab uses the coefficients provided with this library. The first stage uses a decimation factor of 32, and the second stage is configured to use a decimation factor of 6.

To get 16 kHz audio output from the BasicMicArray prefab, then, the PDM clock must be configured to 3.072 MHz (3.072 MHz / (32 * 6) = 16 kHz).

Sub-Components

Being derived from mic_array::MicArray, an instance of BasicMicArray has 4 sub-components responsible for different portions of the work being done. These sub-components are PdmRx, Decimator, SampleFilter and OutputHandler. See the documentation for MicArray for more details about these.

Template Parameters Details

The template parameter MIC_COUNT is the number of microphone channels to be processed and output.

The template parameter FRAME_SIZE is the number of samples in each output frame produced by the mic array. Frame data is communicated using the API found in mic_array/frame_transfer.h.

Typically ma_frame_rx() will be the right function to use in a receiving thread to retrieve audio frames. ma_frame_rx() receives audio frames with shape (MIC_COUNT,FRAME_SIZE), meaning that all samples corresponding to a given channel will end up in a contiguous block of memory. Instead of ma_frame_rx(), ma_frame_rx_transpose() can be used to swap the dimensions, resulting in the shape (FRAME_SIZE, MIC_COUNT).

Note that calls to ma_frame_rx() or ma_frame_rx_transpose() will block until a frame becomes available on the specified chanend.

If the receiving thread is not waiting to retrieve the audio frame from the mic array when it becomes available, the pipeline may back up and cause samples to be dropped. It is the responsibility of the application developer to ensure this does not happen.

The boolean template parameter USE_DCOE indicates whether the DC offset elimination filter should be applied to the output of the second stage decimator. DC offset elimination is an IIR filter intended to ensure audio samples on each channel tend towards zero-mean.

For more information about DC offset elimination, see Sample Filters

.

If USE_DCOE is false, no further filtering of the second stage decimator’s output will occur.

The template parameter MICS_IN indicates the number of microphone channels to be captured by the PdmRx component of the mic array unit. This will often be the same as MIC_COUNT, but in some applications, MIC_COUNT microphones must be physically connected to an XCore port which is not MIC_COUNT (SDR) or MIC_COUNT/2 (DDR) bits wide.

In these cases, capturing the additional channels (likely not even physically connected to PDM microphones) is unavoidable, but further processing of the additional (junk) channels can be avoided by using MIC_COUNT < MICS_IN. The mapping which tells the mic array unit how to derive output channels from input channels can be configured during initialization by calling StandardPdmRxService::MapChannels() on the PdmRx sub-component of the BasicMicarray.

If the application uses an SDR microphone configuration (i.e. 1 microphone per port pin), then MICS_IN must be the same as the port width. If the application is running in a DDR microphone configuration, MICS_IN must be twice the port width. MICS_IN defaults to MIC_COUNT.

Allocation

Before a mic array unit can be started or initialized, it must be allocated.

Instances of BasicMicArray are self-contained with respect to memory, needing no external buffers to be supplied by the application. Allocating an instance is most easily accomplished by simply declaring the mic array unit. An example follows.

#include "mic_array/cpp/Prefab.hpp"
...
using AppMicArray = mic_array::prefab::BasicMicArray<MICS,SAMPS,DCOE>;
AppMicArray mics;

Here, mics is an allocated mic array unit. The example (and all that follow) assumes the macros used for template parameters are defined elsewhere.

Initialization

Before a mic array unit can be started, it must be initialized.

BasicMicArray reads PDM samples from an XCore port, and delivers frames of audio data over an XCore channel. To this end, an instance of BasicMicArray needs to be given the resource IDs of the port to be read and the chanend to transmit frames over. This can be accomplished in either of two ways.

If the resource IDs for the port and chanend are available as the mic array unit is being allocated, one option is to explicitly construct the BasicMicArray instance with the required resource IDs using the two-argument constructor:

using AppMicArray = mic_array:prefab::BasicMicArray<MICS,SAMPS,DCOE>;
AppMicArray mics(PORT_PDM_MICS, c_frames_out);

Otherwise (typically), these can be set using BasicMicArray::SetPort(port_t) and BasicMicArray::SetOutputChannel(chanend_t) to set the port and channel respectively.

AppMicArray mics;
...
void app_init(port_t p_pdm_mics, chanend_t c_frames_out)
{
 mics.SetPort(p_pdm_mics);
 mics.SetOutputChannel(p_pdm_mics);
}

Next, the ports and clock block(s) used by the PDM rx service need to be configured appropriately. This is not accomplished directly through the BasicMicArray object. Instead, a pdm_rx_resources_t struct representing these hardware resources is constructed and passed to mic_array_resources_configure(). See the documentation for pdm_rx_resources_t and mic_array_resources_configure() for more details.

Finally, if running BasicMicArray’s PDM rx service within an ISR, before the mic array unit can be started, the ISR must be installed. This is accomplished with a call to BasicMicArray::InstallPdmRxISR(). Installing the ISR will not unmask it.

Begin Processing (PDM rx ISR)

After it has been initialized, starting the mic array unit with the PDM rx service running as an ISR, three steps are required.

First, the PDM clock must be started. This is accomplished with a call to mic_array_pdm_clock_start(). The same pdm_rx_resources_t that was passed to mic_array_resources_configure() is given as an argument here.

Second, the PDM rx ISR that was installed during initialization must be unmasked. This is accomplished by calling BasicMicArray::UnmaskPdmRxISR() on the mic array unit.

Finally, the mic array processing thread must be started. The entry point for the mic array thread is BasicMicArray::ThreadEntry().

A typical pattern will include all three of these steps in a single function which wraps the mic array thread entry point.

AppMicArray mics;
pdm_rx_resources_t pdm_res;
...
MA_C_API  // alias for 'extern "C"'
void app_mic_array_task()
{
 mic_array_pdm_clock_start(&pdm_res);
 mics.UnmaskPdmRxISR();
 mics.ThreadEntry();
}

Using this pattern, app_mic_array_task() is a C-compatible function which can be called from a multi-tile main() in an XC file. Then, app_mic_array_task() is called directly from a par {...} block. For example,

main(){
 ...
 par {
   on tile[1]: {
     ... // Do initialization stuff
     
     par {
       app_mic_array_task();
       ...
       other_thread_on_tile1(); // other threads
     }
   }
 }
}

Begin Processing (PDM Rx Thread)

The procedure for running the mic array unit with the PDM rx component running as a stand-alone thread is much the same with just a couple key differences.

When running PDM rx as a thread, no call to BasicMicArray::UnmaskPdmRxISR() is necessary. Instead, the application spawns a second thread (the first being the mic array processing thread) using BasicMicArray::PdmRxThreadEntry() as the entry point.

mic_array_pdm_clock_start() must still be called, but here the requirement is that it be called from the hardware thread on which the PDM rx component is running (which, of course, cannot be the mic array thread).

A typical application with a multi-tile XC main() will provide two C-compatible functions - one for each thread:

MA_C_API
void app_pdm_rx_task()
{
 mic_array_pdm_clock_start(&pdm_res);
 mics.PdmRxThreadEntry();
}

MA_C_API
void app_mic_array_task()
{
 mics.ThreadEntry();
}

Notice that app_mic_array_task() above is a thin wrapper for mics.ThreadEntry(). Unfortunately, because the type of mics is a C++ class, mics.ThreadEntry() cannot be called directly from an XC file (including the one containing main()). Further, because a C++ class template was used, this library cannot provide a generic C-compatible call wrapper for the methods on a MicArray object. This unfortunately means it is necessary in some cases to create a thin wrapper such as app_mic_array_task().

The threads are spawned from XC main using a par {...} block:

main(){
 ...
 par {
   on tile[1]: {
     ... // Do initialization stuff
     
     par {
       app_mic_array_task();
       app_pdm_rx_task();
       ...
       other_thread_on_tile1(); // other threads
     }
   }
 }
}

Real-Time Constraint

Once the PDM rx thread is launched or the PDM rx interrupt has been unmasked, PDM data will start being collected and reported to the decimator thread. The application then must start the decimator thread within one output sample time (i.e. sample time for the output of the second stage decimator) to avoid issues.

Once the mic array processing thread is running, the real-time constraint is active for the thread consuming the mic array unit’s output, and it must waiting to receive an audio frame within one frame time.

Examples

This library comes with examples which demonstrate how a mic array unit is used in an actual application. If you are encountering difficulties getting BasicMicArray to work, studying the provided examples may help.

Note

BasicMicArray::InstallPdmRxISR() installs the ISR on the hardware thread that calls the method. In most cases, installing it in the same thread as the decimator is the right choice.

Template Parameters:
  • MIC_COUNT – Number of microphone channels.

  • FRAME_SIZE – Number of samples in each output audio frame.

  • USE_DCOE – Whether DC offset elimination should be used.

Public Types

using TParent = MicArray<MIC_COUNT, TwoStageDecimator<MIC_COUNT, STAGE2_DEC_FACTOR, STAGE2_TAP_COUNT>, StandardPdmRxService<MICS_IN, MIC_COUNT, STAGE2_DEC_FACTOR>, typename std::conditional<USE_DCOE, DcoeSampleFilter<MIC_COUNT>, NopSampleFilter<MIC_COUNT>>::type, FrameOutputHandler<MIC_COUNT, FRAME_SIZE, ChannelFrameTransmitter>>

TParent is an alias for this class template from which this class template inherits.

Public Functions

inline constexpr BasicMicArray() noexcept

No-argument constructor.

This constructor allocates the mic array and nothing more.

Call BasicMicArray::Init() to initialize the decimator.

Subsequent calls to BasicMicArray::SetPort() and BasicMicArray::SetOutputChannel() will also be required before any processing begins.

void Init()

Initialize the decimator.

BasicMicArray(port_t p_pdm_mics, chanend_t c_frames_out)

Initialzing constructor.

If the communication resources required by BasicMicArray are known at construction time, this constructor can be used to avoid further initialization steps.

This constructor does not install the ISR for PDM rx, and so that must be done separately if PDM rx is to be run in interrupt mode.

Parameters:
  • p_pdm_mics – Port with PDM microphones

  • c_frames_out – (non-streaming) chanend used to transmit frames.

void SetPort(port_t p_pdm_mics)

Set the PDM data port.

This function calls this->PdmRx.Init(p_pdm_mics).

This should be called during initialization.

Parameters:

p_pdm_mics – The port to receive PDM data on.

void SetOutputChannel(chanend_t c_frames_out)

Set the audio frame output channel.

This function calls this->OutputHandler.FrameTx.SetChannel(c_frames_out).

This must be set prior to entrying the decimator task.

Parameters:

c_frames_out – The channel to send audio frames on.

void PdmRxThreadEntry()

Entry point for PDM rx thread.

This function calls this->PdmRx.ThreadEntry().

Note

This call does not return.

void InstallPdmRxISR()

Install the PDM rx ISR on the calling thread.

This function calls this->PdmRx.InstallISR().

void UnmaskPdmRxISR()

Unmask interrupts on the calling thread.

This function calls this->PdmRx.UnmaskISR().

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$PdmRxService£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#pdmrxservice
template<unsigned BLOCK_SIZE, class SubType>
class PdmRxService

Collects PDM sample data from a port.

Derivatives of this class template are intended to be used for the TPdmRx template parameter of MicArray, where it represents the MicArray::PdmRx component of the mic array.

An object derived from PdmRxService collects blocks of PDM samples from a port and makes them available to the decimation thread as the blocks are completed.

PdmRxService is a base class using CRTP. Subclasses extend PdmRxService providing themselves as the template parameter SubType.

This base class provides the logic for aggregating PDM data taken from a port into blocks, and a subclass is required to provide methods SubType::ReadPort(), SubType::SendBlock() and SubType::GetPdmBlock().

SubType::ReadPort() is responsible for reading 1 word of data from p_pdm_mics. See StandardPdmRxService::ReadPort() as an example.

SubType::SendBlock() is provided a block of PDM data as a pointer and is responsible for signaling that to the subsequent processing stage. See StandardPdmRxService::SendBlock() as an example.

ReadPort() and SendBlock() are used by PdmRxService itself (when running as a thread, rather than ISR).

SubType::GetPdmBlock() responsible for receiving a block of PDM data from SubType::SendBlock() as a pointer, deinterleaving the buffer contents, and returning a pointer to the PDM data in the format expected by the mic array unit’s decimator component. See StandardPdmRxService::GetPdmBlock() as an example.

GetPdmBlock() is called by the decimation thread. The pair of functions, SendBlock() and GetPdmBlock() facilitate inter-thread communication, SendBlock() being called by the transmitting end of the communication channel, and GetPdmBlock() being called by the receiving end.

Template Parameters:
  • BLOCK_SIZE – Number of words of PDM data per block.

  • SubType – Subclass of PdmRxService actually being used.

Public Functions

void SetPort(port_t p_pdm_mics)

Set the port from which to collect PDM samples.

void ProcessNext()

Perform a port read and if a new block has completed, signal.

void ThreadEntry()

Entry point for PDM processing thread.

This function loops forever, calling ProcessNext() with each iteration.

Public Static Attributes

static unsigned BlockSize = BLOCK_SIZE

Number of words of PDM data per block.

Typically (e.g. TwoStageDecimator) BLOCK_SIZE will be exactly the number of words of PDM samples required to produce exactly one new output sample for the mic array unit’s output stream.

Once BlockSize words have been read into one of the block_data, buffers, PDM rx will signal to the decimator thread that new PDM data is available for processing.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$PdmRxService$$$StandardPdmRxService£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#standardpdmrxservice
struct pdm_rx_isr_context_t

PDM rx interrupt configuration and context.

Public Members

port_t p_pdm_mics

Port on which PDM samples are received.

uint32_t *pdm_buffer[2]

Pointers to a pair of buffers used for storing captured PDM samples.

The buffers themselves are allocated by an instance of mic_array::PdmRxService. The idea is that while the PDM rx ISR is filling one buffer, the decimation thread is busy processing the contents of the other buffer. If the real-time constraint is maintained, the decimation thread will be finished with the contents of its buffer before the PDM rx ISR fills the other buffer. Once full, the PDM rx ISR does a double buffer pointer swap and hands the newly-filled buffer to the decimation thread.

unsigned phase

Tracks the completeness of the buffer currently being filled.

Each read of samples from p_pdm_mics gives one word of data. This variable tracks how many more port reads are required before the current buffer has been filled.

unsigned phase_reset

The number of words to read from p_pdn_mics to fill a buffer.

chanend_t c_pdm_data

Streaming chanend the PDM rx ISR uses to signal the decimation thread that another buffer is full and ready to be processed.

The streaming channel itself is allocated by mic_array::StandardPdmRxService, which owns the other end of the channel.

unsigned credit

Used for detecting when the real-time constraint is violated by the decimation thread.

Each time the decimation thread is given a block of PDM data to process, credit is reset to 2. Each time the PDM rx ISR hands a block of PDM data to the decimation thread, this is decremented.

Deadlock Condition

mic_array::StandardPdmRxService uses a streaming channel to facilitate communication between the two execution contexts used by the mic array, the decimation thread and the PDM rx ISR. A streaming channel is used because it allows the contexts to operate asynchronously.

A channel has a 2 word buffer, and as long as there is room in the buffer, an OUT instruction putting a word (in this case, a pointer) into the channel is guaranteed not to block. This is important because the PDM rx ISR is typically configured on the same hardware thread as the decimation thread.

If a thread is blocked on an OUT instruction to a channel, in order to unblock the thread, an IN must be issued on the other end of that channel. But because the PDM rx ISR is blocked, it cannot hand control back to the decimation thread, which means the decimation thread can never issue an IN instruction to unblock the ISR. The result is a deadlock.

Unfortunately, there is no way for a thread to query a chanend to determine whether it will block if an OUT instruction is issued. That is why credit is used. Before issuing an OUT to c_pdm_data, the PDM rx ISR checks whether credit is non-zero. If so, the ISR issues the OUT instruction as normal and decrements credit.

If credit is zero, the default behavior of PDM rx ISR is to raise an exception (ET_ECALL). This reflects the idea that it is generally better if system-breaking errors loudly announce themselves (at least by default). If using mic_array::StandardPdmRxService, this behavior can be changed by passing false in a call to mic_array::StandardPdmRxService::AssertOnDroppedBlock(), which will allow blocks of PDM data to be silently dropped (while still avoiding a permanent deadlock).

unsigned missed_blocks

Controls and records anti-deadlock behavior.

If the PDM rx ISR finds that credit is 0 when it’s time to send a filled buffer to the decimation thread, it uses missed_blocks to control whether the PDM rx ISR should raise an exception or silently drop the block of PDM data.

If missed_blocks is -1 (its default value) an exception is raised. Otherwise missed_blocks is used to record the number of blocks that have been quietly dropped.

pdm_rx_isr_context_t pdm_rx_isr_context

Configuration and context of the PDM rx ISR when mic_array::StandardPdmRxService is used in interrupt mode.

pdm_rx_isr (pdm_rx_isr.S) directly allocates this object as configuration and state parameters required by that interrupt routine.

static inline void enable_pdm_rx_isr(const port_t p_pdm_mics)

Configure port to use pdm_rx_isr as an interrupt routine.

This function configures p_pdm_mics to use pdm_rx_isr as its interrupt vector and enables the interrupt on the current hardware thread.

This function does NOT unmask interrupts.

Parameters:

p_pdm_mics – Port resource to enable ISR on.

template<unsigned CHANNELS_IN, unsigned CHANNELS_OUT, unsigned SUBBLOCKS>
class StandardPdmRxService : public mic_array::PdmRxService<CHANNELS_IN * SUBBLOCKS, StandardPdmRxService<CHANNELS_IN, CHANNELS_OUT, SUBBLOCKS>>

PDM rx service which uses a streaming channel to send a block of data by pointer.

This class can run the PDM rx service either as a stand-alone thread or through an interrupt.

Inter-context Transfer

A streaming channel is used to transfer control of the PDM data block between execution contexts (i.e. thread->thread or ISR->thread).

The mic array unit receives blocks of PDM data from an instance of this class by calling GetPdmBlock(), which blocks until a new PDM block is available.

Layouts

The buffer transferred by SendBlock() contains CHANNELS_IN*SUBBLOCKS words of PDM data for CHANNELS_IN microphone channels. The words are stored in reverse order of arrival.

See mic_array::deinterleave_pdm_samples() for additional details on this format.

Within GetPdmBlock() (i.e. mic array thread) the PDM data block is deinterleaved and copied to another buffer in the format required by the decimator component, which is returned by GetPdmBlock(). This buffer contains samples for CHANNELS_OUT microphone channels.

Channel Filtering

In some cases an application may be required to capture more microphone channels than should actually be processed by subsequent processing stages (including the decimator component). For example, this may be the case if 4 microphone channels are desired but only an 8 bit wide port is physically available to capture the samples.

This class template has a parameter both for the number of channels to be captured by the port (CHANNELS_IN), as well as for the number of channels that are to be output for consumption by the MicArray’s decimator component (CHANNELS_OUT).

When the PDM microphones are in an SDR configuration, CHANNELS_IN must be the width (in bits) of the XCore port to which the microphones are physically connected. When in a DDR configuration, CHANNELS_IN must be twice the width (in bits) of the XCore port to which the microphones are physically connected.

CHANNELS_OUT is the number of microphone channels to be consumed by the mic array’s decimator component (i.e. must be the same as the MIC_COUNT template parameter of the decimator component). If all port pins are connected to microphones, this parameter will generally be the same as CHANNELS_IN.

Channel Index (Re-)Mapping

The input channel index of a microphone depends on the pin to which it is connected. Each pin connected to a port has a bit index for that port, given in the ‘Signal Description and GPIO’ section of your package’s datasheet.

Suppose an N-bit port is used to capture microphone data, and a microphone is connected to bit B of that port. In an SDR microphone configuration, the input channel index of that microphone is B, the same as the port bit index.

In a DDR configuration, that microphone will be on either input channel index B or B+N, depending on whether that microphone is configured for in-phase capture or out-of-phase capture.

Sometimes it may be desirable to re-order the microphone channel indices. This is likely the case, for example, when CHANNELS_IN > CHANNELS_OUT.

By default output channels are mapped from the input channels with the same index. If CHANNELS_IN > CHANNELS_OUT, this means that the input channels with the highest CHANNELS_IN-CHANNELS_OUT indices are dropped by default.

The MapChannel() and MapChannels() methods can be used to specify a non-default mapping from input channel indices to output channel indices. It takes a pointer to a CHANNELS_OUT-element array specifying the input channel index for each output channel.

Template Parameters:
  • CHANNELS_IN – The number of microphone channels to be captured by the port.

  • CHANNELS_OUT – The number of microphone channels to be delivered by this StandardPdmRxService instance.

  • SUBBLOCKS – The number of 32-sample sub-blocks to be captured for each microphone channel.

Public Functions

uint32_t ReadPort()

Read a word of PDM data from the port.

Returns:

A uint32_t containing 32 PDM samples. If MIC_COUNT >= 2 the samples from each port will be interleaved together.

void SendBlock(uint32_t block[CHANNELS_IN * SUBBLOCKS])

Send a block of PDM data to a listener.

Parameters:

block – PDM data to send.

void Init(port_t p_pdm_mics)

Initialize this object with a channel and port.

Parameters:

p_pdm_mics – Port to receive PDM data on.

void MapChannels(unsigned map[CHANNELS_OUT])

Set the input-output mapping for all output channels.

By default, input channel index k maps to output channel index k.

This method overrides that behavior for all channels, re-mapping each output channel such that output channel k is derived from input channel map[k].

Note

Changing the channel mapping while the mic array unit is running is not recommended.

Parameters:

map – Array containing new channel map.

void MapChannel(unsigned out_channel, unsigned in_channel)

Set the input-output mapping for a single output channel.

By default, input channel index k maps to output channel index k.

This method overrides that behavior for a single output channel, configuring output channel out_channel to be derived from input channel in_channel.

Note

Changing the channel mapping while the mic array unit is running is not recommended.

Parameters:
  • out_channel – Output channel index to be re-mapped.

  • in_channel – New source channel index for out_channel.

void InstallISR()

Install ISR for PDM reception on the current core.

Note

This does not unmask interrupts.

void UnmaskISR()

Unmask interrupts on the current core.

uint32_t *GetPdmBlock()

Get a block of PDM data.

Because blocks of PDM samples are delivered by pointer, the caller must either copy the samples or finish processing them before the next block of samples is ready, or the data will be clobbered.

Note

This is a blocking call.

Returns:

Pointer to block of PDM data.

void AssertOnDroppedBlock(bool doAssert)

Set whether dropped PDM samples should cause an assertion.

If doAssert is set to true (default), the PDM rx ISR will raise an exception (ET_CALL) if it is ready to deliver a PDM block to the mic array thread when the mic array thread is not ready to receive it. If false, dropped blocks can be tracked through pdm_rx_isr_context.missed_blocks.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$TwoStageDecimator£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#twostagedecimator
template<unsigned MIC_COUNT, unsigned S2_DEC_FACTOR, unsigned S2_TAP_COUNT>
class TwoStageDecimator

First and Second Stage Decimator.

This class template represents a two stage decimator which converts a stream of PDM samples to a lower sample rate stream of PCM samples.

Concrete implementations of this class template are meant to be used as the TDecimator template parameter in the MicArray class template.

Template Parameters:
  • MIC_COUNT – Number of microphone channels.

  • S2_DEC_FACTOR – Stage 2 decimation factor.

  • S2_TAP_COUNT – Stage 2 tap count.

Public Functions

void Init(const uint32_t *s1_filter_coef, const int32_t *s2_filter_coef, const right_shift_t s2_filter_shr)

Initialize the decimator.

Sets the stage 1 and 2 filter coefficients. The decimator must be initialized before any calls to ProcessBlock().

s1_filter_coef points to a block of coefficients for the first stage decimator. This library provides coefficients for the first stage decimator; see mic_array/etc/filters_default.h.

s2_filter_coef points to an array of coefficients for the second stage decimator. This library provides coefficients for the second stage decimator where the second stage decimation factor is 6; see mic_array/etc/filters_default.h.

s2_filter_shr is the final right-shift applied to the stage 2 filter’s accumulator prior to output. See lib_xcore_math’s documentation of filter_fir_s32_t for more details.

Parameters:
  • s1_filter_coef

    Stage 1 filter coefficients.

    This points to a block of coefficients for the first stage decimator. This library provides coefficients for the first stage decimator.

    See stage1_coef.

  • s2_filter_coef

    Stage 2 filter coefficients.

    This points to a block of coefficients for the second stage decimator. This library provides coefficients for the second stage decimator.

    See stage2_coef.

  • s2_filter_shr

    Stage 2 filter right-shift.

    This is the output shift used by the second stage decimator.

    See stage2_shr.

void ProcessBlock(int32_t sample_out[MIC_COUNT], uint32_t pdm_block[BLOCK_SIZE])

Process one block of PDM data.

Processes a block of PDM data to produce an output sample from the second stage decimator.

pdm_block contains exactly enough PDM samples to produce a single output sample from the second stage decimator. The layout of pdm_block should (effectively) be:

struct {
  struct {
    // lower word indices are older samples.
    // less significant bits in a word are older samples.
    uint32_t samples[S2_DEC_FACTOR];
  } microphone[MIC_COUNT]; // mic channels are in ascending order
} pdm_block;

A single output sample from the second stage decimator is computed and written to sample_out[].

Parameters:
  • sample_out – Output sample vector.

  • pdm_block – PDM data to be processed.

Public Members

unsigned DecimationFactor = S2_DEC_FACTOR

Stage 2 decimator decimation factor.

unsigned TapCount = S2_TAP_COUNT

Stage 2 decimator tap count.

const uint32_t *filter_coef

Pointer to filter coefficients for Stage 1

uint32_t pdm_history[MIC_COUNT][8]

Filter state (PDM history) for stage 1 filters.

filter_fir_s32_t filters[MIC_COUNT]

Stage 2 FIR filters

int32_t filter_state[MIC_COUNT][S2_TAP_COUNT] = {{0}}

Stage 2 filter stage.

Public Static Attributes

static unsigned BLOCK_SIZE = MIC_COUNT * S2_DEC_FACTOR

Size of a block of PDM data in words.

static unsigned MicCount = MIC_COUNT

Number of microphone channels.

static const struct mic_array::TwoStageDecimator Stage2

Stage 2 decimator parameters

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$SampleFilter£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#samplefilter
XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$SampleFilter$$$NopSampleFilter£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#nopsamplefilter
template<unsigned MIC_COUNT>
class NopSampleFilter

SampleFilter which does nothing.

To be used as the TSampleFilter template parameter of MicArray when no post-decimation filtering is desired.

Calls to NopSampleFilter::Filter() are intended to be optimized out at compile time.

Template Parameters:

MIC_COUNT – Number of microphone channels.

Public Functions

inline void Filter(int32_t sample[MIC_COUNT])

Do nothing.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$SampleFilter$$$DcoeSampleFilter£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#dcoesamplefilter
template<unsigned MIC_COUNT>
class DcoeSampleFilter

Filter which applies DC Offset Elimination (DCOE).

To be used as the TSampleFilter template parameter of MicArray when DCOE is desired as post-processing after the decimation filter.

The filter is a simple first-order IIR filter which applies the following filter equation:

R = 252.0 / 256.0
y[t] = R * y[t-1] + x[t] - x[t-1]

Template Parameters:

MIC_COUNT – Number of microphone channels.

Public Functions

void Init()

Initialize the filter states.

The filter states must be initialized prior to calls to Filter().

void Filter(int32_t sample[MIC_COUNT])

Apply DCOE filter on samples.

sample is an array of samples to be filtered, and is updated in-place.

The filter states must have been initialized with a call to Init() prior to calling this function.

Parameters:

sample – Samples to be filtered. Updated in-place.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$OutputHandler£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#outputhandler

An OutputHandler is a class which meets the requirements to be used as the TOutputHandler template parameter of the MicArray class template. The basic requirement is that it have a method:

This method is how the mic array communicates its output with the rest of the application’s audio processing pipeline. MicArray calls this method once for each mic array output sample.

See MicArray::OutputHandler for more details.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$OutputHandler$$$FrameOutputHandler£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#frameoutputhandler
template<unsigned MIC_COUNT, unsigned SAMPLE_COUNT, template<unsigned, unsigned> class FrameTransmitter, unsigned FRAME_COUNT = 1>
class FrameOutputHandler

OutputHandler implementation which groups samples into non-overlapping multi-sample audio frames and sends entire frames to subsequent processing stages.

This class template can be used as an OutputHandler with the MicArray class template. See MicArray::OutputHandler.

Classes derived from this template collect samples into frames. A frame is a 2 dimensional array with one index corresponding to the audio channel and the other index corresponding to time step, e.g.:

int32_t frame[MIC_COUNT][SAMPLE_COUNT];

Each call to OutputSample() adds the sample to the current frame, and then iff the frame is full, uses its FrameTx component to transfer the frame of audio to subsequent processing stages. Only one of every SAMPLE_COUNT calls to OutputSample() results in an actual transmission to subsequent stages.

With FrameOutputHandler, the thread receiving the audio will generally need to know how many microphone channels and how many samples to expect per frame (although, strictly speaking, that depends upon the chosen FrameTransmitter implementation).

Template Parameters:
  • MIC_COUNT

    The number of audio channels in each sample and each frame.

  • SAMPLE_COUNT – Number of samples per frame.

    The SAMPLE_COUNT template parameter is the number of samples assembled into each audio frame. Only completed frames are transmitted to subsequent processing stages. A SAMPLE_COUNT value of 1 effectively disables framing, transmitting one sample for each call made to OutputSample.

  • FrameTransmitter

    The concrete type of the FrameTx component of this class.

    Like many classes in this library, FrameOutputHandler uses the Curiously Recurring Template Pattern.

  • FRAME_COUNT

    The number of frame buffers an instance of FrameOutputHandler should cycle through. Unless audio frames are communicated with subsequent processing stages through shared memory, the default value of 1 is usualy ideal.

Public Functions

inline FrameOutputHandler()

Construct new FrameOutputHandler.

The default no-argument constructor for FrameTransmitter is used to create FrameTx.

inline FrameOutputHandler(FrameTransmitter<MIC_COUNT, SAMPLE_COUNT> frame_tx)

Construct new FrameOutputHandler.

Uses the provided FrameTransmitter to send frames.

Parameters:

frame_tx – Frame transmitter for sending frames.

void OutputSample(int32_t sample[MIC_COUNT])

Add new sample to current frame and output frame if filled.

Parameters:

sample – Sample to be added to current frame.

Public Members

FrameTransmitter<MIC_COUNT, SAMPLE_COUNT> FrameTx

FrameTransmitter used to transmit frames to the next stage for processing.

FrameTransmitter is the CRTP type template parameter used in this class to control how frames of audio data are communicated with subsequent pipeline stages.

The type supplied for FrameTransmitter must be a class template with two integer template parameters, corresponding to this class’s MIC_COUNT and SAMPLE_COUNT template parameters respectively, indicating the shape of the frame object to be transmitted.

The FrameTransmitter type is required to implement a single method:

void OutputFrame(int32_t frame[MIC_COUNT][SAMPLE_COUNT]);

OutputFrame() is called once for each completed audio frame and is responsible for the details of how the frame’s data gets communicated to subsequent stages. For example, the ChannelFrameTransmitter class template uses an XCore channel to send samples to another thread (by value).

Alternative implementations might use shared memory or an RTOS queue to transmit the frame data, or might even use a port to signal the samples directly to an external DAC.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$OutputHandler$$$ChannelFrameTransmitter£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#channelframetransmitter
template<unsigned MIC_COUNT, unsigned SAMPLE_COUNT>
class ChannelFrameTransmitter

Frame transmitter which transmits frame over a channel.

This class template is meant for use as the FrameTransmitter template parameter of FrameOutputHandler.

When using this frame transmitter, frames are transmitted over a channel using the frame transfer API in mic_array/frame_transfer.h.

Usually, a call to ma_frame_rx() (with the other end of c_frame_out as argument) should be used to receive the frame on another thread.

If the receiving thread is not waiting to receive the frame when OutputFrame() is called, that method will block until the frame has been transmitted. In order to ensure there are no violations of the mic array’s real-time constraints, the receiver should be ready to receive a frame as soon as it becomes available.

Frames can be transmitted between tiles using this class.

Note

While OutputFrame() is blocking, it will not prevent the PDM rx interrupt from firing.

Template Parameters:
  • MIC_COUNT – Number of audio channels in each frame.

  • SAMPLE_COUNT – Number of samples per frame.

Public Functions

inline ChannelFrameTransmitter()

Construct a ChannelFrameTransmitter.

If this constructor is used, SetChannel() must be called to configure the channel over which frames are transmitted prior to any calls to OutputFrame().

inline ChannelFrameTransmitter(chanend_t c_frame_out)

Construct a ChannelFrameTransmitter.

The supplied value of c_frame_out must be a valid chanend.

Parameters:

c_frame_out – Chanend over which frames will be transmitted.

void SetChannel(chanend_t c_frame_out)

Set channel used for frame transfers.

The supplied value of c_frame_out must be a valid chanend.

Parameters:

c_frame_out – Chanend over which frames will be transmitted.

chanend_t GetChannel()

Get the chanend used for frame transfers.

Returns:

Channel to be used for frame transfers.

void OutputFrame(int32_t frame[MIC_COUNT][SAMPLE_COUNT])

Transmit the specified frame.

See ChannelFrameTransmitter for additional details.

Parameters:

frame – Frame to be transmitted.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C++ API Reference$$$Misc£££modules/io/modules/mic_array/doc/rst/src/reference/cpp/cpp_api.html#misc
template<unsigned MIC_COUNT>
void mic_array::deinterleave_pdm_samples(uint32_t *samples, unsigned s2_dec_factor)

Deinterleave the channels of a block of PDM data.

PDM samples received on a port are shifted into a 32-bit buffer in such a way that the samples for each microphone channel are all interleaved with one another. The first stage decimator, however, requires these to be separated.

samples must point to a buffer containing (MIC_COUNT*s2_dec_factor) words of PDM data. Because the decimation factor for the first stage decimator is a fixed value of 32, 32 PDM samples from each microphone is enough to produce one output sample (a MIC_COUNT-element vector) from the first stage decimator. 32*s2_dec_factor PDM samples for each of the MIC_COUNT microphone channels is then exactly what is required to produce a single output sample from the second stage decimator.

The PDM data will be deinterleaved in-place.

On input, the format of the buffer to which samples points is assumed to be such that the following function will extract (only) the kth sample for microphone channel n (where k is a time index, not a memory index):

Input Format

unsigned get_sample(uint32_t* samples, 
                    unsigned MIC_COUNT, unsigned s2_dec_factor, 
                    unsigned n, unsigned k)
{
  const end_word = MIC_COUNT * s2_dec_factor - 1; // chronologically first
  const unsigned samp_per_word = 32 / MIC_COUNT;
  const words_from_end = k / samp_per_word;
  const uint32_t word_val = samples[end_word-words_from_end];
  const unsigned bit_offset = (k % end_word) + n;
  return (word_val >> bit_offset) & 1;
}

Here, the words of samples are stored in reverse order (older samples are at higher word indices), and within a word the oldest samples are the least significant bits. The LSb of a word is always microphone channel 0, and the MSb of a word is always microphone channel MIC_COUNT-1.

Upon return, the format of the buffer to which samples points will be such that the following function will extract (only) the kth sample for microphone channel n:

Output Format

unsigned get_sample(uint32_t* samples, 
                    unsigned MIC_COUNT, unsigned s2_dec_factor, 
                    unsigned n, unsigned k)
{
  const unsigned subblock = (s2_dec_factor-1)-(k/32);
  const unsigned word_val = samples[subblock * MIC_COUNT + n];
  return (word_val >> (k%32)) & 1;
}

Here, each word contains samples from only a single channel, with words at higher addresses containing older samples. samples[0] contains the newest samples for microphone channel 0, and samples[MIC_COUNT-1] contains the newest samples for microphone channel MIC_COUNT-1. samples[MIC_COUNT] contains the next-oldest set of samples for channel 0, and so on.

Template Parameters:

MIC_COUNT

Number of channels represented in PDM data.

One of

{1,2,4,8}

Parameters:
  • samples – Pointer to block of PDM samples.

  • s2_dec_factor – Stage2 decimator decimation factor.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference£££modules/io/modules/mic_array/doc/rst/src/reference/c/c_api.html#c-api-reference

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$filters_default.h£££modules/io/modules/mic_array/doc/rst/src/reference/c/filters_default.html#filters-default-h

The filters described below are the first and second stage filters provided by this library which are used with the TwoStageDecimator class template by default.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$filters_default.h$$$Stage 1 - PDM-to-PCM Decimating FIR Filter£££modules/io/modules/mic_array/doc/rst/src/reference/c/filters_default.html#stage-1-pdm-to-pcm-decimating-fir-filter
Decimation Factor:  32
Tap Count: 256

The first stage decimation FIR filter converts 1-bit PDM samples into 32-bit PCM samples and simultaneously decimates by a factor of 32.

A typical input PDM sample rate will be 3.072M samples/sec, thus the corresponding output sample rate will be 96k samples/sec.

The first stage filter uses 16-bit coefficients for its taps. Because this is a highly optimized filter targeting the VPU hardware, the first stage filter is presently restricted to using exactly 256 filter taps.

For more information about the example first stage filter supplied with the library, including frequency response and steps for using a custom first stage filter, see Decimator Stages.

STAGE1_DEC_FACTOR

Macro indicating Stage 1 Decimation Factor.

This is the ratio of input sample rate to output sample rate for the first filter stage.

Note

In version 5.0 of lib_mic_array, this value is fixed (even if you choose not to use the default filter coefficients).

STAGE1_TAP_COUNT

Macro indicating Stage 1 Filter Tap Count.

This is the number of filter taps in the first stage filter.

Note

In version 5.0 of lib_mic_array, this value is fixed (even if you choose not to use the default filter coefficients).

STAGE1_WORDS

Macro indicating Stage 1 Filter Word Count.

This is a helper macro to indicate the number of 32-bit words required to store the filter coefficients.

Note

Even though the coefficients are 16-bit, the related lib_mic_array structs and functions expect them to be contained in an array of uint32_t, rather than an array of int16_t. There are two reasons for this. The first is that the VPU instructions require loaded data to start at a word-aligned (0 mod 4) address. uint32_t allocated on the heap or stack are guaranteed by the compiler to be at word-aligned addresses. The second reason is to mitigate possible confusion regarding the arrangement of the filter coefficients in memory. Not only are the 16-bit coefficients not stored in order (e.g. b[0], b[1], b[2], ...), the bits of individual 16-bit coefficients are not stored together in memory. This is, again, due to the behavior of the VPU hardware.

const uint32_t stage1_coef[STAGE1_WORDS]

Stage 1 PDM-to-PCM Decimation Filter Default Coefficients.

These are the default coefficients for the first stage filter.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$Stage 2 - PCM Decimating FIR Filter£££modules/io/modules/mic_array/doc/rst/src/reference/c/filters_default.html#stage-2-pcm-decimating-fir-filter
Decimation Factor: (configurable)
Tap Count: (configurable)

The second stage decimation FIR filter filters and downsamples the 32-bit PCM output stream from the first stage filter into another 32-bit PCM stream with sample rate reduced by the stage 2 decimation factor.

A typical first stage output sample rate will be 96k samples/sec, a decimation factor of 6 (i.e. using the default stage 2 filter) will mean a second stage output sample rate of 16k samples/sec.

The second stage filter uses 32-bit coefficients for its taps. A complete description of the FIR implementation is outside the scope of this documentation, but it can be found in the `xs3_filter_fir_s32_t` documentation of lib_xcore_math.

In brief, the second stage filter coefficients are quantized to a Q1.30 fixed-point format with input samples treated as integers. The tap outputs are added into a 40-bit accumulator, and an output sample is produced by applying a rounding arithmetic right-shift to the accumulator and then clipping the result to the interval [INT32_MAX, INT32_MIN).

For more information about the example second stage filter supplies with the library, including frequency response and steps for using a custom filter, see Decimator Stages.

STAGE2_DEC_FACTOR

Stage 2 Decimation Factor for default filter.

This is the ratio of input sample rate to output sample rate for the second filter stage.

While the second stage filter can be configured with a different decimation factor, this is the one used for the filter supplied with this library.

STAGE2_TAP_COUNT

Stage 2 Filter tap count for default filter.

This is the number of filter taps associated with the second stage filter supplied with this library.

const int32_t stage2_coef[STAGE2_TAP_COUNT]

Stage 2 Decimation Filter Default Coefficients.

These are the default coefficients for the second stage filter.

const right_shift_t stage2_shr

Stage 2 Decimation Filter Default Output Shift.

This is the non-negative, rounding, arithmetic right-shift applied to the 40-bit accumulator to produce an output sample.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$pdm_resources.h£££modules/io/modules/mic_array/doc/rst/src/reference/c/pdm_resources.html#pdm-resources-h
struct pdm_rx_resources_t

Collection of resources IDs required for PDM capture.

This struct is a container for the IDs of the XCore hardware resources used by the mic array unit’s PdmRx component for capturing PDM data from a port.

An object of this type will be used for initializing and starting the mic array unit.

Public Members

port_t p_mclk

Resource ID of the 1-bit port on which the master audio clock signal is received.

The master audio clock will be divided by a clock block to produce the PDM sample clock.

This port will be configured as an input.

port_t p_pdm_clk

Resource ID of the 1-bit port through which the PDM sample clock is signaled.

The PDM sample clock is used by the PDM microphones to trigger sample conversion.

This port will be configured as an output.

port_t p_pdm_mics

Resource ID of the port on which PDM samples are received.

In an SDR configuration, the number of microphone channels is the width of this port. In a DDR configuration, the number of microphone channels is twice the width of this port.

This port will be configured as an input.

clock_t clock_a

Resource ID of the clock block used to derive the PDM clock from the master audio clock.

In SDR configurations this is also the PDM data capture clock.

clock_t clock_b

Resource ID of the clock block used only in DDR configurations to trigger reads of the PDM data.

If operating in an SDR configuration, clock_b is 0. A value of 0 is what indicates an SDR configuration is being used.

PDM_RX_RESOURCES_SDR(P_MCLK, P_PDM_CLK, P_PDM_MICS, CLOCK_A)

Construct a pdm_rx_resources_t for an SDR configuration.

pdm_rx_resources_t.clock_b is initialized to 0, indicating an SDR configuration.

Parameters:
  • P_MCLK – Master audio clock port resource ID.

  • P_PDM_CLK – PDM sample clock port resource ID.

  • P_PDM_MICS – PDM microphone data port resource ID.

  • CLOCK_A – PDM clock and capture clock block resource ID.

PDM_RX_RESOURCES_DDR(P_MCLK, P_PDM_CLK, P_PDM_MICS, CLOCK_A, CLOCK_B)

Construct a pdm_rx_resources_t for a DDR configuration.

Parameters:
  • P_MCLK – Master audio clock port resource ID.

  • P_PDM_CLK – PDM sample clock port resource ID.

  • P_PDM_MICS – PDM microphone data port resource ID.

  • CLOCK_A – PDM clock clock block resource ID.

  • CLOCK_B – PDM capture clock block resource ID.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$setup.h£££modules/io/modules/mic_array/doc/rst/src/reference/c/setup.html#setup-h
void mic_array_resources_configure(pdm_rx_resources_t *pdm_res, int divide)

Configure the hardware resources needed by the mic array.

Several hardware resources are needed to correctly run the mic array, including 3 ports and 1 or 2 clock blocks (depending on whether SDR or DDR mode is used). This function configures these resources for operation with the mic array.

The pdm_rx_resources_t struct is a container for identifying precisely these resources. All three ports are reset by this function; any existing port configuration will be clobbered.

The parameter divide is the ratio of the audio master clock to the desired PDM clock rate. For example, to generate a desired 3.072 MHz PDM clock from an audio master clock with frequency 24.576 MHz, a divide value of 8 is needed.

Divide can also be calculated from the master and PDM clock frequencies using mic_array_mclk_divider().

pdm_res->p_mclk is the resource ID for the 1-bit port on which the audio master clock is received. This function will enable this port and configure it as the source port for pdm_res->clock_a and for pdm_res->clock_b if operating in a DDR configuration.

pdm_res->clock_a is the resource ID for the first (in SDR configuration, the only) clock block required by the mic array. Clock A divides the audio master clock (by a factor of divide) to generate the PDM clock. This function enables it with the audio master clock as its source.

pdm_res->p_pdm_clk is the resource ID for the 1-bit port from which the PDM clock will be signaled to the microphones. This function enables it and configures Clock A as its source clock.

pdm_res->clock_b is the resource ID for a second clock block, which is only required by the mic array in a DDR configuration. In DDR mode, this function enables Clock B with the audio master clock as its source. The divider for Clock B is half of that for Clock A (so it runs at twice the frequency). In a DDR configuration Clock B is used as the PDM capture clock. In an SDR configuration, this field must be set to 0 (this is how SDR/DDR is determined).

pdm_res->p_pdm_mics is the resource ID for the port on which PDM data is received. This function enables it and configures it as a 32-bit buffered input. If operating in an SDR configuration, Clock A is used as the capture clock. If operating in a DDR configuration, Clock B is used as its capture clock.

This function only configures and does not start either Clock A or Clock B. A call to mic_array_pdm_clock_start() with pdm_res as the argument can be used to start the clock(s).

This function should be called during initialization, before any PDM data can be captured or processed.

Parameters:
  • pdm_res – The hardware resources used by the mic array.

  • divide – The divider to generate the PDM clock from the master clock.

void mic_array_pdm_clock_start(pdm_rx_resources_t *pdm_res)

Start the PDM and capture clock(s).

This function starts Clock A, and if using a DDR configuration, Clock B.

mic_array_resources_configure() must have been called already to configure the resources indicated in pdm_res.

Clock A is the PDM clock. Starting Clock A will cause pdm_res->p_pdm_clk to begin strobing the PDM clock to the PDM microphones.

In an SDR configuration, Clock A is also the capture clock. In a DDR configuration, Clock B is the capture clock. In either case, the capture clock is also started, causing pdm_res->p_pdm_mics to begin storing PDM samples received on each period of the capture clock.

In DDR configuration, this function starts Clock B, waits for a rising edge, and then starts Clock A, ensuring that the rising edges of the two clocks are not in phase.

This function must be called prior to launching the decimator or PDM rx threads.

Warning

Once this function has been called, the port receiving PDM data will begin capturing samples. If the mic array unit is not started by the time the port buffer fills ((32/mic_count) sample times) samples will begin to be dropped.

Parameters:
  • pdm_res – The hardware resources used by the mic array.

static inline unsigned mic_array_mclk_divider(const unsigned master_clock_freq, const unsigned pdm_clock_freq)

Compute clock divider for PDM clock.

This is a convenience function which computes the required clock divider to derive a pdm_clock_freq Hz clock from a master_clock_freq Hz clock. This function is simple integer division.

Parameters:
  • master_clock_freq – The master audio clock frequency in Hz.

  • pdm_clock_freq – The desired PDM clock frequency in Hz.

Returns:

Required clock divider.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$frame_transfer.h£££modules/io/modules/mic_array/doc/rst/src/reference/c/frame_transfer.html#frame-transfer-h
void ma_frame_tx(const chanend_t c_frame_out, const int32_t frame[], const unsigned channel_count, const unsigned sample_count)

Transmit 32-bit PCM frame over a channel.

This function transmits the 32-bit PCM frame frame[] over the channel c_frame_out.

This is a blocking call which will wait for a receiver to accept the data from the channel. Typically this will be accomplished with a call to ma_frame_rx() or ma_frame_rx_transpose().

The receiver is not required to be on the same tile as the sender.

Note

Internally, a channel transaction is established to reduce the overhead of channel communication. Any custom functions are used to receive this frame in an application, they must wrap the channel reads in a (slave) channel transaction. See xcore/channel_transaction.h.

Warning

No protocol is used to ensure consistency between the frame layout of the transmitter and receiver. Disagreement about frame size will likely cause one side to block indefinitely. It is the responsibility of the application author to ensure consistency between transmitter and receiver.

Parameters:
  • c_frame_out – Channel over which to send frame.

  • frame – Frame to be transmitted.

  • channel_count – Number of channels represented in the frame.

  • sample_count – Number of samples represented in the frame.

void ma_frame_rx(int32_t frame[], const chanend_t c_frame_in, const unsigned channel_count, const unsigned sample_count)

Receive 32-bit PCM frame over a channel.

This function receives a PCM frame over c_frame_in. Normally, the frame will have been transmitted using ma_frame_tx(). The received frame is stored in frame[].

This is a blocking call which does not return until the frame has been fully received.

The sender is not required to be on the same tile as the receiver.

Note

Internally, a channel transaction is established to reduce the overhead of channel communication. This function may only be used to receive the frame if the transmitter has wrapped the channel writes in a (master) channel transaction. See xcore/channel_transaction.h.

Warning

No protocol is used to ensure consistency between the frame layout of the transmitter and receiver. Disagreement about frame size will likely cause one side to block indefinitely. It is the responsibility of the application author to ensure consistency between transmitter and receiver.

Parameters:
  • frame – Buffer to store received frame.

  • c_frame_in – Channel from which to receive frame.

  • channel_count – Number of channels represented in the frame.

  • sample_count – Number of samples represented in the frame.

void ma_frame_rx_transpose(int32_t frame[], const chanend_t c_frame_in, const unsigned channel_count, const unsigned sample_count)

Receive 32-bit PCM frame over a channel with transposed dimensions.

This function receives a PCM frame over c_frame_in. Normally, the frame will have been transmitted using ma_frame_tx(). The received frame is stored in frame[].

Unlike ma_frame_rx(), this function reorders the frame elements as they are received. ma_frame_tx() always transmits the frame elements in memory order. This function swaps the channel and sample axes so that if the transmitter frame has shape (CHANNEL, SAMPLE), the caller’s frame array will have shape (SAMPLE, CHANNEL).

This is a blocking call which does not return until the frame has been fully received.

The sender is not required to be on the same tile as the receiver.

Note

Internally, a channel transaction is established to reduce the overhead of channel communication. This function may only be used to receive the frame if the transmitter has wrapped the channel writes in a (master) channel transaction. See xcore/channel_transaction.h.

Warning

No protocol is used to ensure consistency between the frame layout of the transmitter and receiver. Disagreement about frame size will likely cause one side to block indefinitely. It is the responsibility of the application author to ensure consistency between transmitter and receiver.

Parameters:
  • frame – Buffer to store received frame.

  • c_frame_in – Channel from which to receive frame.

  • channel_count – Number of channels represented in the frame.

  • sample_count – Number of samples represented in the frame.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$dc_elimination.h£££modules/io/modules/mic_array/doc/rst/src/reference/c/dc_elimination.html#dc-elimination-h
struct dcoe_chan_state_t

DC Offset Elimination (DCOE) State.

This is the required state information for a single channel to which the DC offset elimination filter is to be applied.

To apply the DC offset elimination filter to multiple channels simultaneously, an array of dcoe_chan_state_t should be used.

dcoe_state_init() is used once to initialize an array of state objects, and dcoe_filter() is used on each consecutive sample to apply the filter and get the resulting output sample.

DC offset elimination is an IIR filter. The state must persist between time steps.

Use in lib_mic_array

Typical users of lib_mic_array will not need to directly use this type or any functions which take it as a parameter.

The C++ class template mic_array::DcoeSampleFilter, if used in an application’s mic array unit, will allocate, initialize and apply the DCOE filter automatically.

With MicArray Prefabs

The MicArray prefab mic_array::prefab::BasicMicArray has a bool template parameter USE_DCOE which indicates whether the mic_array::DcoeSampleFilter should be used. If true, DCOE will be enabled.

With Vanilla API

When using the ‘vanilla’ API, DCOE is enabled by default. To disable DCOE when using this API, add a preprocessor definition to the compiler flags, setting MIC_ARRAY_CONFIG_USE_DC_ELIMINATION to 0.

Public Members

int64_t prev_y

Previous output sample value.

void dcoe_state_init(dcoe_chan_state_t state[], const unsigned chan_count)

Initialize DCOE states.

The DC offset elimination state needs to be intialized before the filter can be applied. This function initializes it.

For correct behavior, the state vector state must persist between audio samples and is supplied with each call to dcoe_filter().

Parameters:
  • state[in] Array of dcoe_chan_state_t to be initialized.

  • chan_count[in] Number of elements in state.

void dcoe_filter(int32_t new_output[], dcoe_chan_state_t state[], int32_t new_input[], const unsigned chan_count)

Apply DCOE filter.

Applies the DC offset elimination filter to get a new output sample and updates the filter state.

For correct behavior, this function should be called once per sample (here “sample” refers to a vector-valued quantity containing one element for each audio channel) of that stream.

The index of each array (state, new_input and new_output) corresponds to the audio channel. The update associated with each audio channel is independent of each other audio channel.

The equation used for each channel is:

y[t] = R * y[t-1] + x[t] - x[t-1]

where t is the current sample time index, y[] is the output signal, x[] is the input signal, and R is (252.0/256).

To filter a sample in-place use the same array for both the new_input and new_output arguments.

Parameters:
  • new_output[out] Array into which the output sample will be placed.

  • state[in] DC offset elimination state vector.

  • new_input[in] New input sample.

  • chan_count[in] Number of channels to be processed.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$util.h£££modules/io/modules/mic_array/doc/rst/src/reference/c/util.html#util-h
void deinterleave2(uint32_t*)

Perform deinterleaving for a 2-microphone subblock.

Assembly function.

Deinterleave the samples for 1 subblock of 2 microphones. Argument points to a 2 word buffer.

void deinterleave4(uint32_t*)

Perform deinterleaving for a 4-microphone subblock.

Assembly function.

Deinterleave the samples for 1 subblock of 4 microphones. Argument points to a 4 word buffer.

void deinterleave8(uint32_t*)

Perform deinterleaving for a 8-microphone subblock.

Assembly function.

Deinterleave the samples for 1 subblock of 8 microphones. Argument points to a 8 word buffer.

void deinterleave16(uint32_t*)

Perform deinterleaving for a 16-microphone subblock.

Assembly function.

Deinterleave the samples for 1 subblock of 16 microphones. Argument points to a 16 word buffer.

XCORE ® -VOICE Solutions$$$lib_mic_array: PDM microphone array library$$$API Reference$$$C API Reference$$$mic_array_vanilla.h£££modules/io/modules/mic_array/doc/rst/src/reference/c/mic_array_vanilla.html#mic-array-vanilla-h
void ma_vanilla_init()

Initializes the mic array module. (Vanilla API only)

Initializes the contexts for the decimator thread and configures the clocks and ports for PDM reception.

After calling this, the PDM clock is active and signaling, but the PDM rx service (ISR) has not yet been activated, so received PDM samples are ignored. The real-time condition is not yet active.

Parameters:
  • pdm_res – Hardware resources required by the mic array module.

void ma_vanilla_task(chanend_t c_frames_out)

Entry point for decimator thread and PDM rx. (Vanilla API only)

This function sets up and activates the PDM rx service in ISR mode, and then immediately begins executing the decimator.

After calling this the real-time condition is active, meaning there must be another thread waiting to pull frames from the other end of c_frames_out as they become available.

Parameters:
  • c_frames_out – (Non-streaming) Channel over which to send processed frames of audio.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/lib_xcore_math.html#lib-xcore-math-xcore-optimised-math

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Introduction£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/introduction.html#introduction

lib_xcore_math is a library of optimised math functions for taking advantage of the vector processing unit (VPU) of the XMOS XS3 architecture (i.e xcore.ai). Included in the library are functions for block floating-point arithmetic, fast Fourier transforms, linear algebra, discrete cosine transforms, linear filtering and more.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Introduction$$$Repository structure£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/introduction.html#repository-structure

  • /lib_xcore_math/

    • api/ - Headers containing the public API.

    • script/ - Scripts used for source generation.

    • src/- Library source code.

  • /doc/ - documentation source.

  • /examples/ - Example applications.

  • /tests/ - Unit test projects.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Introduction$$$API structure£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/introduction.html#api-structure

This library is organised around several sub-APIs. These APIs collect the provided operations into coherent groups based on the kind of operation or the types of object being acted upon.

The current APIs are:

  • Block Floating-Point Vector API

  • Vector/Array API

  • Scalar API

  • Linear Filtering API

  • Fast Fourier Transform API

  • Discrete Cosine Transform API

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Introduction$$$Using lib_xcore_math£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/introduction.html#using-lib-xcore-math

lib_xcore_math is intended to be used with the XCommon CMake , the XMOS application build and dependency management system.

lib_xcore_math can be compiled for both x86 platforms and XS3 based processors.

On x86 platforms you can develop DSP algorithms and test them for functional correctness; this is an optional step before porting the library to an xcore device.

To use this module, include lib_xcore_math in the application’s APP_DEPENDENT_MODULES list and include the xcore_math.h header file.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Getting Started£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/getting_started.html#getting-started

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Getting Started$$$Overview£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/getting_started.html#overview

lib_xcore_math is a library containing efficient implementations of various mathematical operations that may be required in an embedded application. In particular, this library is geared towards operations which work on vectors or arrays of data, including vectorized arithmetic, linear filtering, and fast Fourier transforms.

This library comprises several sub-APIs. Grouping of operations into sub-APIs is a matter of conceptual convenience. In general, functions from a given API share a common prefix indicating which API the function comes from, or the type of object on which it acts. Additionally, there is some interdependence between these APIs.

These APIs are:

  • Block floating-point (BFP) API – High-level API providing operations on BFP vectors. See Block Floating-Point background for an introduction to block floating-point. These functions manage the exponents and headroom of input and output BFP vectors to avoid overflow and underflow conditions.

  • Vector/Array API – Lower-level API which is used heavily by the BFP API. As such, the operations available in this API are similar to those in the BFP API, but the user will have to manage exponents and headroom on their own. Many of these routines are implemented directly in optimized assembly to use the hardware as efficiently as possible.

  • Scalar API – Provides various operations on scalar objects. In particular, these operations focus on simple arithmetic operations applied to non-IEEE 754 floating-point objects, as well as optimized operations which are applied to IEEE 754 floats.

  • Filtering API – Provides access to linear filtering operations, including 16- and 32-bit FIR filters and 32-bit biquad filters.

  • Fast Fourier Transform (FFT) API – Provides both low-level and block floating-point FFT implementations. Optimized FFT implementations are provided for real signals, pairs of real signals, and for complex signals.

  • Discrete Cosine Transform (DCT) API – Provides functions which implement the type-II (‘forward’) and type-III (‘inverse’) DCT for a variety of block lengths. Also provides a fast 8x8 two dimensional forward and inverse DCT.

All APIs are accessed by including the single header file:

#include "xcore_math.h"

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Getting Started$$$Usage£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/getting_started.html#usage

The following sections are intended to give the reader a general sense of how to use the API.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Getting Started$$$Usage$$$BFP API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/getting_started.html#bfp-api

In the BFP API the BFP vectors are C structures such as bfp_s16_t, bfp_s32_t, or bfp_complex_s32_t, backed by a memory buffer. These objects contain a pointer to the data carrying the content (mantissas) of the vector, as well as information about the length, headroom and exponent of the BFP vector.

Below is the definition of bfp_s32_t from xmath/types.h.

C_TYPE
typedef struct {
    /** Pointer to the underlying element buffer.*/
    int32_t* data;
    /** Exponent associated with the vector. */
    exponent_t exp;
    /** Current headroom in the ``data[]`` */
    headroom_t hr;
    /** Current size of ``data[]``, expressed in elements */
    unsigned length;
    /** BFP vector flags. Users should not normally modify these manually. */
    bfp_flags_e flags;
} bfp_s32_t;

The 32-bit BFP functions take bfp_s32_t pointers as input and output parameters.

Functions in the BFP API generally are prefixed with bfp_. More specifically, functions where the ‘main’ operands are 32-bit BFP vectors are prefixed with bfp_s32_, whereas functions where the ‘main’ operands are complex 16-bit BFP vectors are prefixed with bfp_complex_s16_, and so on for the other BFP vector types.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Getting Started$$$Usage$$$BFP API$$$Initializing BFP Vectors£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/getting_started.html#initializing-bfp-vectors

Before calling these functions, the BFP vectors represented by the arguments must be initialized. For bfp_s32_t this is accomplished with bfp_s32_init(). Initialization requires that a buffer of sufficient size be provided to store the mantissa vector, as well as an initial exponent. If the first usage of a BFP vector is as an output, then the exponent will not matter, but the object must still be initialized before use. Additionally, the headroom of the vector may be computed upon initialization; otherwise it is set to 0.

Here is an example of a 32-bit BFP vector being initialized.

#define LEN (20)

//The object representing the BFP vector
bfp_s32_t bfp_vect;

// buffer backing bfp_vect
int32_t data_buffer[LEN];
for(int i = 0; i < LEN; i++) data_buffer[i] = i;

// The initial exponent associated with bfp_vect
exponent_t initial_exponent = 0;

// If non-zero, `bfp_s32_init()` will compute headroom currently present in data_buffer.
// Otherwise, headroom is initialized to 0 (which is always safe but may not be optimal)
unsigned calculate_headroom = 1;

// Initialize the vector object
bfp_s32_init(&bfp_vec, data_buffer, initial_exponent, LEN, calculate_headroom);

// Go do stuff with bfp_vect
...

Once initialized, the exponent and mantissas of the vector can be accessed by bfp_vect.exp and bfp_vect.data[] respectively, with the logical (floating-point) value of element k being given by \(\mathtt{bfp\_vect.data[k]}\cdot2^{\mathtt{bfp\_vect.exp}}\).

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Getting Started$$$Usage$$$BFP API$$$BFP Arithmetic Functions£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/getting_started.html#bfp-arithmetic-functions

The following snippet shows a function foo() which takes 3 BFP vectors, a, b and c, as arguments. It multiplies together a and b element-wise, and then subtracts c from the product. In this example both operations are performed in-place on a. (See bfp_s32_mul() and bfp_s32_sub() for more information about those functions)

void foo(bfp_s32_t* a, const bfp_s32_t* b, const bfp_s32_t* c)
{
    // Multiply together a and b, updating a with the result.
    bfp_s32_mul(a, a, b);

    // Subtract c from the product, again updating a with the result.
    bfp_s32_sub(a, a, c);
}

The caller of foo() can then access the results through a. Note that the pointer a->data was not modified during this call.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Getting Started$$$Usage$$$Vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/getting_started.html#vector-api

The functions in the lower-level vector API are optimized for performance. They do very little to protect the user from mangling their data by arithmetic saturation/overflows or underflows (although they do provide the means to prevent this).

Functions in the vector API are generally prefixed with vect_. For example, functions which operate primarily on 16-bit vectors are prefixed with vect_s16_.

Some functions are prefixed with chunk_ instead of vect_. A “chunk” is just a vector with a fixed memory footprint (currently 32 bytes, or 8 32-bit elements) meant to match the width of the architecture’s vector registers.

As an example of a function from the vector API, see vect_s32_mul() (from vect_s32.h), which multiplies together two int32_t vectors element by element.

C_API
headroom_t vect_s32_mul(
    int32_t a[],
    const int32_t b[],
    const int32_t c[],
    const unsigned length,
    const right_shift_t b_shr,
    const right_shift_t c_shr);

This function takes two int32_t arrays, b and c, as inputs and one int32_t array, a, as output (in the case of vect_s32_mul(), it is safe to have a point to the same buffer as b or c, computing the result in-place). length indicates the number of elements in each array. The final two parameters, b_shr and c_shr, are the arithmetic right-shifts applied to each element of b and c before they are multiplied together.

Why the right-shifts? In the case of 32-bit multiplication, the largest possible product is \(2^{62}\), which will not fit in the 32-bit output vector. Applying positive arithmetic right-shifts to the input vectors reduces the largest possible product. So, the shifts are there to manage the headroom/size of the resulting product in order to maximize precision while avoiding overflow or saturation.

Contrast this with vect_s16_mul():

C_API
headroom_t vect_s16_mul(
    int16_t a[],
    const int16_t b[],
    const int16_t c[],
    const unsigned length,
    const right_shift_t a_shr);

The parameters are similar here, but instead of b_shr and c_shr, there’s only an a_shr. In this case, the arithmetic right-shift a_shr is applied to the products of b and c. In this case the right-shift is also unsigned – it can only be used to reduce the size of the product.

Shifts like those in these two examples are very common in the vector API, as they are the main mechanism for managing exponents and headroom. Whether the shifts are applied to inputs, outputs, both, or only one input will depend on a number of factors. In the case of vect_s32_mul() they are applied to inputs because the XS3 VPU includes a compulsory (hardware) right-shift of 30 bits on all products of 32-bit numbers, and so often inputs may need to be left-shifted (negative shift) in order to avoid underflows. In the case of vect_s16_mul(), this is unnecessary because no compulsory shift is included in 16-bit multiply-accumulates.

Both vect_s32_mul() and vect_s16_mul() return the headroom of the output vector a.

Functions in the vector API are in many cases closely tied to the instruction set architecture for XS3. As such, if more efficient algorithms are found to perform an operation these low-level API functions are more likely to change in future versions.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Block Floating-Point background£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/bfp_background.html#block-floating-point-background

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Block Floating-Point background$$$Block Floating-Point vectors£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/bfp_background.html#block-floating-point-vectors

A standard (IEEE) floating-point object can exist either as a scalar, e.g.

//Single IEEE floating-point variable
float foo;

or as a vector, e.g.

//Array of IEEE floating-point variables
float foo[20];

Standard floating-point values carry both a mantissa \(m\) and an exponent \(p\), such that the logical value represented by such a variable is \(m\cdot2^p\). When you have a vector of standard floating-point values, each element of the vector carries its own mantissa and its own exponent: \(m[k]\cdot2^{p[k]}\).

../_images/bfp_bg_fig1.png

By contrast, block floating-point objects have a vector of mantissas \(\bar{m}\) which all share the same exponent \(p\), such that the logical value of the element at index \(k\) is \(m[k]\cdot2^p\).

struct {
    // Array of mantissas
    int32_t mant[20];
    // Shared exponent
    int32_t exp;
} bfp_vect;
../_images/bfp_bg_fig2.png

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Block Floating-Point background$$$Headroom£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/bfp_background.html#headroom

With a given exponent, \(p\), the largest value that can be represented by a 32-bit BFP vector is given by a maximal mantissa (\(2^{31}-1\)), for a logical value of \((2^{31}-1)\cdot2^p\). The smallest non-zero value that an element can represent is \(1\cdot2^p\).

Because all elements must share a single exponent, in order to avoid overflow or saturation of the largest magnitude values, the exponent of a BFP vector is constrained by the element with the largest (logical) value. The drawback to this is that when the elements of a BFP vector represent a large dynamic range – that is, where the largest magnitude element is many, many times larger than the smallest (non-zero) magnitude element – the smaller magnitude elements effectively have fewer bits of precision.

Consider a 2-element BFP vector intended to carry the values \(2^{20}\) and \(255 \cdot 2^{-10}\). One way this vector can be represented is to use an exponent of \(0\).

struct {
    int32_t mant[2];
    int32_t exp;
} vect = { { (1<<20), (0xFF >> 10) }, 0 };
../_images/bfp_bg_fig3.png

In the diagram above, the fractional bits (shown in red text) are discarded, as the mantissa is only 32 bits. Then, with \(0\) as the exponent, mant[1] underflows to \(0\). Meanwhile, the 12 most significant bits of mant[0] are all zeros.

The headroom of a signed integer is the number of redundant leading sign bits. Equivalently, it is the number of bits that a mantissa can be left-shifted without losing any information. In the the diagram, the bits corresponding to headroom are shown in green text. Here mant[0] has 10 bits of headroom and mant[1] has a full 32 bits of headroom. (mant[0] does not have 11 bits of headroom because in two’s complement the MSb serves as a sign bit). The headroom for a BFP vector is the minimum of headroom amongst each of its elements; in this case, 10 bits.

If we remove headroom from one mantissa of a BFP vector, all other mantissas must shift by the same number of bits, and the vector’s exponent must be adjusted accordingly. A left-shift of one bit corresponds to reducing the exponent by 1, because a single bit left-shift corresponds to multiplication by 2.

In this case, if we remove 10 bits of headroom and subtract 10 from the exponent we get the following:

struct {
    int32_t mant[2];
    int32_t exp;
} vect = { { (1<<30), (0xFF >> 0) }, -10 };
../_images/bfp_bg_fig4.png

Now, no information is lost in either element. One of the main goals of BFP arithmetic is to keep the headroom in BFP vectors to the minimum necessary (equivalently, keeping the exponent as small as possible). That allows for maximum effective precision of the elements in the vector.

Note that the headroom of a vector also tells you something about the size of the largest magnitude mantissa in the vector. That information (in conjunction with exponents) can be used to determine the largest possible output of an operation without having to look at the mantissas.

For this reason, the BFP vectors in lib_xcore_math carry a field which tracks their current headroom. The functions in the BFP API use this property to make determinations about how best to preserve precision.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/reference_index.html#api-reference

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#xmath-types

Each of the main operand types used in this library has a short-hand which is used as a prefix in the naming of API operations. The following tables can be used for reference.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types$$$Common Vector Types£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#common-vector-types

The following table indicates the types and abbreviations associated with various common vector types.

Common Vector Types

Prefix

Object Type

Notes

vect_s32

int32_t[]

Raw vector of signed 32-bit integers.

vect_s16

int16_t[]

Raw vector of signed 16-bit integers.

vect_s8

int8_t[]

Raw vector of signed 8-bit integers.

vect_complex_s32

complex_s32_t[]

Raw vector of complex 32-bit integers.

vect_complex_s16

(int16_t[], int16_t[])

Complex 16-bit vectors are usually represented as a pair of 16-bit vectors. This is an optimization due to the word-alignment requirement when loading data into the VPU’s vector registers.

chunk_s32

int32_t[8]

A ‘chunk’ is a fixed size vector corresponding to the size of the VPU vector registers.

vect_qXX

int32_t[]

When used in an API function name, the XX will be an actual number (e.g. vect_q30_exp_small()) indicating the fixed-point interpretation used by that function.

vect_f32

float[]

Raw vector of standard IEEE float

vect_float_s32

float_s32_t[]

Vector of non-standard 32-bit floating-point scalars.

bfp_s32

bfp_s32_t

Block floating-point vector contianing 32-bit mantissas.

bfp_s16

bfp_s16_t

Block floating-point vector contianing 16-bit mantissas.

bfp_complex_s32

bfp_complex_s32_t

Block floating-point vector contianing complex 32-bit mantissas.

bfp_complex_s16

bfp_complex_s16_t

Block floating-point vector contianing complex 16-bit mantissas.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types$$$Common Scalar Types£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#common-scalar-types

The following table indicates the types and abbreviations associated with various common scalar types.

Common Scalar Types

Prefix

Object Type

Notes

s32

int32_t

32-bit signed integer. May be a simple integer, a fixed-point value or the mantissa of a floating-point value.

s16

int16_t

16-bit signed integer. May be a simple integer, a fixed-point value or the mantissa of a floating-point value.

s8

int8_t

8-bit signed integer. May be a simple integer, a fixed-point value or the mantissa of a floating-point value.

complex_s32

complex_s32_t

Signed complex integer with 32-bit real and 32-bit imaginary parts.

complex_s16

complex_s16_t

Signed complex integer with 16-bit real and 16-bit imaginary parts.

float_s64

float_s64_t

Non-standard floating-point scalar with exponent and 64-bit mantissa.

float_s32

float_s32_t

Non-standard floating-point scalar with exponent and 32-bit mantissa.

qXX

int32_t

32-bit fixed-point value with XX fractional bits (i.e. exponent of -XX).

f32

float

Standard IEEE 754 single-precision float.

f64

double

Standard IEEE 754 double-precision float.

float_complex_s64

float_complex_s64_t

Floating-point value with exponent and complex mantissa with 64-bit real and imaginary parts.

float_complex_s32

float_complex_s32_t

Floating-point value with exponent and complex mantissa with 32-bit real and imaginary parts.

float_complex_s16

float_complex_s16_t

Floating-point value with exponent and complex mantissa with 16-bit real and imaginary parts.

N/A

exponent_t

Represents an exponent \(p\) as in \(2^p\). Unless otherwise specified exponent are always assumed to have a base of \(2\).

N/A

headroom_t

The headroom of a scalar or vector. See Headroom for more information.

N/A

right_shift_t

Represents a rightward bit-shift of a certain number of bits. Care should be taken, as sometimes this is treated as unsigned.

N/A

left_shift_t

Represents a leftward bit-shift of a certain number of bits. Care should be taken, as sometimes this is treated as unsigned.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types$$$Block Floating-Point Types£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#block-floating-point-types
group type_bfp

Enums

enum bfp_flags_e

(Opaque) Flags field for BFP vectors.

Warning

Users should not manually modify fields of this type, as it is intended to be opaque.

Values:

enumerator BFP_FLAG_DYNAMIC

Indicates that BFP vector’s mantissa buffer(s) were allocated dynamically

This flag lets the bfp_*_dealloc() functions know whether the mantissa vector must be free()ed.

struct bfp_s32_t
#include <types.h>

A block floating-point vector of 32-bit elements.

Initialized with the bfp_s32_init() function.

The logical quantity represented by each element of this vector is: data[i]*2^(exp) where the multiplication and exponentiation are using real (non-modular) arithmetic.

The BFP API keeps the hr field up-to-date with the current headroom of data[] so as to minimize precision loss as elements become small.

Public Members

int32_t *data

Pointer to the underlying element buffer.

exponent_t exp

Exponent associated with the vector.

headroom_t hr

Current headroom in the data[]

unsigned length

Current size of data[], expressed in elements

bfp_flags_e flags

BFP vector flags. Users should not normally modify these manually.

struct bfp_s16_t
#include <types.h>

A block floating-point vector of 16-bit elements.

Initialized with the bfp_s16_init() function.

The logical quantity represented by each element of this vector is: data[i] * 2^(exp) where the multiplication and exponentiation are using real (non-modular) arithmetic.

The BFP API keeps the hr field up-to-date with the current headroom of data[] so as to minimize precision loss as elements become small. [bfp_s16_t]

Public Members

int16_t *data

Pointer to the underlying element buffer.

exponent_t exp

Exponent associated with the vector.

headroom_t hr

Current headroom in the data[]

unsigned length

Current size of data[], expressed in elements

bfp_flags_e flags

BFP vector flags. Users should not normally modify these manually.

struct bfp_complex_s32_t
#include <types.h>

[bfp_s16_t]

A block floating-point vector of complex 32-bit elements.

Initialized with the bfp_complex_s32_init() function.

The logical quantity represented by each element of this vector is: data[k].re * 2^(exp) + i * data[k].im * 2^(exp) where the multiplication and exponentiation are using real (non-modular) arithmetic, and i is sqrt(-1)

The BFP API keeps the hr field up-to-date with the current headroom of data[] so as to minimize precision loss as elements become small. [bfp_complex_s32_t]

Public Members

complex_s32_t *data

Pointer to the underlying element buffer.

exponent_t exp

Exponent associated with the vector.

headroom_t hr

Current headroom in the data[]

unsigned length

Current size of data[], expressed in elements

bfp_flags_e flags

BFP vector flags. Users should not normally modify these manually.

struct bfp_complex_s16_t
#include <types.h>

[bfp_complex_s32_t]

A block floating-point vector of complex 16-bit elements.

Initialized with the bfp_complex_s16_init() function.

The logical quantity represented by each element of this vector is: data[k].re * 2^(exp) + i * data[k].im * 2^(exp) where the multiplication and exponentiation are using real (non-modular) arithmetic, and i is sqrt(-1)

The BFP API keeps the hr field up-to-date with the current headroom of data[] so as to minimize precision loss as elements become small. [bfp_complex_s16_t]

Public Members

int16_t *real

Pointer to the underlying element buffer.

int16_t *imag

Pointer to the underlying element buffer.

exponent_t exp

Exponent associated with the vector.

headroom_t hr

Current headroom in the data[]

unsigned length

Current size of data[], expressed in elements

bfp_flags_e flags

BFP vector flags. Users should not normally modify these manually.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types$$$Scalar Types (Integer)£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#scalar-types-integer
group type_scalar_int

Typedefs

typedef int exponent_t

An exponent.

Many places in this API make use of integers representing the exponent associated with some floating-point value or block floating-point vector.

For a floating-point value \(x \cdot 2^p\), \(p\) is the exponent, and may usually be positive or negative.

typedef unsigned headroom_t

Headroom of some integer or integer array.

Represents the headroom of a signed or unsigned integer, complex integer or channel pair, or the headroom of the mantissa array of a block floating-point vector.

typedef int right_shift_t

A rightwards arithmetic bit-shift.

Represents a right bit-shift to be applied to an integer. May be signed or unsigned, depending on context. If signed, negative values represent leftward bit-shifts.

See also

left_shift_t

typedef int left_shift_t

A leftwards arithmetic bit-shift.

Represents a left bit-shift to be applied to an integer. May be signed or unsigned, depending on context. If signed, negative values represent rightward bit-shifts.

See also

right_shift_t

struct complex_s64_t
#include <types.h>

A complex number with a 64-bit real part and 64-bit imaginary part.

Public Members

int64_t re

Real Part.

int64_t im

Imaginary Part.

struct complex_s32_t
#include <types.h>

A complex number with a 32-bit real part and 32-bit imaginary part.

Public Members

int32_t re

Real Part.

int32_t im

Imaginary Part.

struct complex_s16_t
#include <types.h>

A complex number with a 16-bit real part and 16-bit imaginary part.

Public Members

int16_t re

Real Part.

int16_t im

Imaginary Part.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types$$$Scalar Types (Floating-Point)£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#scalar-types-floating-point
group type_scalar_float
struct float_s32_t
#include <types.h>

A floating-point scalar with a 32-bit mantissa.

Represents a (non-standard) floating-point value given by \( M \cdot 2^{x} \), where \(M\) is the 32-bit mantissa mant, and \(x\) is the exponent exp.

To convert a float_s32_t to a standard IEEE754 single-precision floating-point value (which may result in a loss of precision):

float to_ieee_float(float_s32_t x) {
    return ldexpf(x.mant, x.exp);
}

Public Members

int32_t mant

32-bit mantissa

exponent_t exp

exponent

struct float_s64_t
#include <types.h>

A floating-point scalar with a 64-bit mantissa.

Represents a (non-standard) floating-point value given by \( M \cdot 2^{x} \), where \(M\) is the 64-bit mantissa mant, and \(x\) is the exponent exp.

To convert a float_s64_t to a standard IEEE754 double-precision floating-point value (which may result in a loss of precision):

double to_ieee_float(float_s64_t x) {
    return ldexp(x.mant, x.exp);
}

Public Members

int64_t mant

64-bit mantissa

exponent_t exp

exponent

struct float_complex_s16_t
#include <types.h>

A complex floating-point scalar with a complex 16-bit mantissa.

Represents a (non-standard) complex floating-point value given by \( A + j\cdot B \cdot 2^{x}\), where \(A\) is mant.re, the 16-bit real part of the mantissa, \(B\) is mant.im, the 16-bit imaginary part of the mantissa, and \(x\) is the exponent exp.

Public Members

complex_s16_t mant

complex 16-bit mantissa

exponent_t exp

exponent

struct float_complex_s32_t
#include <types.h>

A complex floating-point scalar with a complex 32-bit mantissa.

Represents a (non-standard) complex floating-point value given by \( A + j\cdot B \cdot 2^{x} \), where \(A\) is mant.re, the 32-bit real part of the mantissa, \(B\) is mant.im, the 32-bit imaginary part of the mantissa, and \(x\) is the exponent exp.

Public Members

complex_s32_t mant

complex 32-bit mantissa

exponent_t exp

exponent

struct float_complex_s64_t
#include <types.h>

A complex floating-point scalar with a complex 64-bit mantissa.

Represents a (non-standard) complex floating-point value given by \( A + j\cdot B \cdot 2^{x}\), where \(A\) is mant.re, the 64-bit real part of the mantissa, \(B\) is mant.im, the 64-bit imaginary part of the mantissa, and \(x\) is the exponent exp.

Public Members

complex_s64_t mant

complex 64-bit mantissa

exponent_t exp

exponent

struct complex_float_t
#include <types.h>

[bfp_complex_s16_t]

A complex number with a single-precision floating-point real part and a single-precision floating-point imaginary part.

Public Members

float re

Real Part.

float im

Imaginary Part.

struct complex_double_t
#include <types.h>

A complex number with a double-precision floating-point real part and a double-precision floating-point imaginary part.

Public Members

double re

Real Part.

double im

Imaginary Part.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types$$$Scalar Types (Fixed-Point)£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#scalar-types-fixed-point
group type_scalar_fixed

Typedefs

typedef int32_t q1_31

Q1.31 (Signed) Fixed-point value.

Represents a signed, 32-bit, real, fixed-point value with 31 fractional bits (i.e. an implicit exponent of \(-31\)).

Capable of representing values in the range \(\left[-1.0, 1.0\right)\)

typedef int32_t q2_30

Q2.30 (Signed) Fixed-point value.

Represents a signed, 32-bit, real, fixed-point value with 30 fractional bits (i.e. an implicit exponent of \(-30\)).

Capable of representing values in the range \(\left[-2.0, 2.0\right)\)

typedef int32_t q4_28

Q4.28 (Signed) Fixed-point value.

Represents a signed, 32-bit, real, fixed-point value with 28 fractional bits (i.e. an implicit exponent of \(-28\)).

Capable of representing values in the range \(\left[-8.0, 8.0\right)\)

typedef int32_t q8_24

Q8.24 (Signed) Fixed-point value.

Represents a signed, 32-bit, real, fixed-point value with 24 fractional bits (i.e. an implicit exponent of \(-24\)).

Capable of representing values in the range \(\left[-128.0, 128.0\right)\)

typedef uint32_t uq0_32

UQ0.32 (Unsigned) Fixed-point value.

Represents an unsigned, 32-bit, real, fixed-point value with 32 fractional bits (i.e. an implicit exponent of \(-32\)).

Capable of representing values in the range \(\left[0, 1.0\right)\)

typedef uint32_t uq1_31

UQ1.31 (Unsigned) Fixed-point value.

Represents an unsigned, 32-bit, real, fixed-point value with 31 fractional bits (i.e. an implicit exponent of \(-31\)).

Capable of representing values in the range \(\left[0, 2.0\right)\)

typedef uint32_t uq2_30

UQ2.30 (Unsigned) Fixed-point value.

Represents an unsigned, 32-bit, real, fixed-point value with 30 fractional bits (i.e. an implicit exponent of \(-30\)).

Capable of representing values in the range \(\left[0, 4.0\right)\)

typedef uint32_t uq4_28

UQ4.28 (Unsigned) Fixed-point value.

Represents an unsigned, 32-bit, real, fixed-point value with 28 fractional bits (i.e. an implicit exponent of \(-28\)).

Capable of representing values in the range \(\left[0, 16.0\right)\)

typedef uint32_t uq8_24

UQ8.24 (Unsigned) Fixed-point value.

Represents an unsigned, 32-bit, real, fixed-point value with 24 fractional bits (i.e. an implicit exponent of \(-24\)).

Capable of representing values in the range \(\left[0, 256.0\right)\)

typedef q1_31 sbrad_t

Specialized angular unit used by this library.

‘sbrad’ is a kind of modified binary radian (hence ‘brad’) which takes into account the symmetries of \(sin(\theta)\).

Use radians_to_sbrads() to convert from radians to sbrad_t.

typedef q8_24 radian_q24_t

Angle measurement in radians using a Q8.24 representation.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$XMath Types$$$Misc Types£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/types.html#misc-types
group type_misc
struct split_acc_s32_t
#include <types.h>

Holds a set of sixteen 32-bit accumulators in the XS3 VPU’s internal format.

The XS3 VPU stores 32-bit accumulators with the most significant 16-bits stored in one 256-bit vector register (called vD), and the least significant 16-bit stored in another 256-bit register (called vR). This struct reflects that internal format, and is occasionally used to store intermediate results.

Note

vR is unsigned. This reflects the fact that a signed 16-bit integer 0xSTUVWXYZ is always exactly 0x0000WXYZ larger than 0xSTUV0000. To combine the upper and lower 16-bits of an accumulator, use (((int32_t)vD[k]) << 16) + vR[k].

Public Members

int16_t vD[16]

Most significant 16 bits of accumulators.

uint16_t vR[16]

Least significant 16 bits of accumulators.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_index.html#block-floating-point-api

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$BFP API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_quickref.html#bfp-api-quick-reference

The tables below list the functions of the block floating-point API. The “EW” column indicates whether the operation acts element-wise.

The “Signature” column is intended as a hint which quickly conveys the kind of the conceptual inputs to and outputs from the operation. The signatures are only intended to convey how many (conceptual) inputs and outputs there are, and their dimensionality.

The functions themselves will typically take more arguments than these signatures indicate. Check the function’s full documentation to get more detailed information.

The following symbols are used in the signatures:

Symbol

Description

\(\mathbb{S}\)

A scalar input or output value.

\(\mathbb{V}\)

A vector-valued input or output.

\(\mathbb{M}\)

A matrix-valued input or output.

\(\varnothing\)

Placeholder indicating no input or output.

For example, the operation signature \((\mathbb{V \times V \times S}) \to \mathbb{V}\) indicates the operation takes two vector inputs and a scalar input, and the output is a vector.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$BFP API quick reference$$$32-Bit BFP API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_quickref.html#bit-bfp-api-quick-reference

32-Bit BFP API - quick reference

Function

EW

Signature

Brief

bfp_s32_init()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Initialize (static)

bfp_s32_alloc()

\(\varnothing \to \mathbb{V}\)

Initialize (dynamic)

bfp_s32_dealloc()

\(\mathbb{V} \to \mathbb{\varnothing}\)

Deinitialize

bfp_s32_set()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Set All Elements

bfp_s32_use_exponent()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Force Exponent

bfp_s32_headroom()

\(\mathbb{V} \to \mathbb{S}\)

Get Headroom

bfp_s32_shl()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Shift Mantissas

bfp_s32_add()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Add Vector

bfp_s32_add_scalar()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Add Scalar

bfp_s32_sub()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Subtract Vector

bfp_s32_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Multiply Vector

bfp_s32_macc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Multiply-Accumulate

bfp_s32_nmacc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Negated Multiply-Accumulate

bfp_s32_scale()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Multiply Scalar

bfp_s32_abs()

x

\(\mathbb{V} \to \mathbb{V}\)

Absolute Values

bfp_s32_sum()

\(\mathbb{V} \to \mathbb{S}\)

Sum Elements

bfp_s32_dot()

\((\mathbb{V \times V}) \to \mathbb{S}\)

Inner Product

bfp_s32_clip()

x

\((\mathbb{V \times S \times S}) \to \mathbb{V}\)

Clip Bounds

bfp_s32_rect()

x

\(\mathbb{V} \to \mathbb{V}\)

Rectify Elements

bfp_s32_to_bfp_s16()

\(\mathbb{V} \to \mathbb{V}\)

Convert to 16-bit

bfp_s32_sqrt()

x

\(\mathbb{V} \to \mathbb{V}\)

Square Root

bfp_s32_inverse()

x

\(\mathbb{V} \to \mathbb{V}\)

Multiplicative Inverse

bfp_s32_abs_sum()

\(\mathbb{V} \to \mathbb{S}\)

Absolute Sum Elements

bfp_s32_mean()

\(\mathbb{V} \to \mathbb{S}\)

Vector Mean Value

bfp_s32_energy()

\(\mathbb{V} \to \mathbb{S}\)

Vector Energy

bfp_s32_rms()

\(\mathbb{V} \to \mathbb{S}\)

Vector RMS Value

bfp_s32_max()

\(\mathbb{V} \to \mathbb{S}\)

Vector Max Element

bfp_s32_min()

\(\mathbb{V} \to \mathbb{S}\)

Vector Min Element

bfp_s32_max_elementwise()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Elementwise Max

bfp_s32_min_elementwise()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Elementwise Min

bfp_s32_argmax()

\(\mathbb{V} \to \mathbb{S}\)

Max Element Index

bfp_s32_argmin()

\(\mathbb{V} \to \mathbb{S}\)

Min Element Index

bfp_s32_convolve_valid()

\((\mathbb{V \times V}) \to \mathbb{V}\)

Convolve With Kernel (Valid mode)

bfp_s32_convolve_same()

\((\mathbb{V \times V}) \to \mathbb{V}\)

Convolve With Kernel (Same mode)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$BFP API quick reference$$$16-Bit BFP API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_quickref.html#id1

16-Bit BFP API - quick reference

Function

EW

Signature

Brief

bfp_s16_init()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Initialize (static)

bfp_s16_alloc()

\(\varnothing \to \mathbb{V}\)

Initialize (dynamic)

bfp_s16_dealloc()

\(\mathbb{V} \to \mathbb{\varnothing}\)

Deinitialize

bfp_s16_set()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Set All Elements

bfp_s16_use_exponent()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Force Exponent

bfp_s16_headroom()

\(\mathbb{V} \to \mathbb{S}\)

Get Headroom

bfp_s16_shl()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Shift Mantissas

bfp_s16_add()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Add Vector

bfp_s16_add_scalar()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Add Scalar

bfp_s16_sub()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Subtract Vector

bfp_s16_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Multiply Vector

bfp_s16_macc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Multiply-Accumulate

bfp_s16_nmacc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Negated Multiply-Accumulate

bfp_s16_scale()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Multiply Scalar

bfp_s16_abs()

x

\(\mathbb{V} \to \mathbb{V}\)

Absolute Values

bfp_s16_sum()

\(\mathbb{V} \to \mathbb{S}\)

Sum Elements

bfp_s16_dot()

\((\mathbb{V \times V}) \to \mathbb{S}\)

Inner Product

bfp_s16_clip()

x

\((\mathbb{V \times S \times S}) \to \mathbb{V}\)

Clip Bounds

bfp_s16_rect()

x

\(\mathbb{V} \to \mathbb{V}\)

Rectify Elements

bfp_s16_to_bfp_s32()

x

\(\mathbb{V} \to \mathbb{V}\)

Convert to 32-bit

bfp_s16_sqrt()

x

\(\mathbb{V} \to \mathbb{V}\)

Square Root

bfp_s16_inverse()

x

\(\mathbb{V} \to \mathbb{V}\)

Multiplicative Inverse

bfp_s16_abs_sum()

\(\mathbb{V} \to \mathbb{V}\)

Absolute Sum Elements

bfp_s16_mean()

\(\mathbb{V} \to \mathbb{V}\)

Vector Mean Value

bfp_s16_energy()

\(\mathbb{V} \to \mathbb{S}\)

Vector Energy

bfp_s16_rms()

\(\mathbb{V} \to \mathbb{S}\)

Vector RMS Value

bfp_s16_max()

\(\mathbb{V} \to \mathbb{S}\)

Vector Max Element

bfp_s16_min()

\(\mathbb{V} \to \mathbb{S}\)

Vector Min Element

bfp_s16_max_elementwise()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Elementwise Max

bfp_s16_min_elementwise()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Elementwise Min

bfp_s16_argmax()

\(\mathbb{V} \to \mathbb{S}\)

Max Element Index

bfp_s16_argmin()

\(\mathbb{V} \to \mathbb{S}\)

Min Element Index

bfp_s16_accumulate()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Elementwise Accumulate

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$BFP API quick reference$$$Complex 32-bit BFP API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_quickref.html#complex-32-bit-bfp-api-quick-reference

Complex 32-Bit BFP API - quick reference

Function

EW

Signature

Brief

bfp_complex_s32_init()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Initialize (static)

bfp_complex_s32_alloc()

\(\varnothing \to \mathbb{V}\)

Initialize (dynamic)

bfp_complex_s32_dealloc()

\(\mathbb{V} \to \mathbb{\varnothing}\)

Deinitialize

bfp_complex_s32_set()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Set All Elements

bfp_complex_s32_use_exponent()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Force Exponent

bfp_complex_s32_headroom()

\(\mathbb{V} \to \mathbb{S}\)

Get Headroom

bfp_complex_s32_shl()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Shift Mantissas

bfp_complex_s32_real_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Real Vector Multiply

bfp_complex_s32_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Multiply

bfp_complex_s32_conj_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Conjugate Multiply

bfp_complex_s32_macc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Multiply-Accumulate

bfp_complex_s32_nmacc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Negated Multiply-Accumulate

bfp_complex_s32_conj_macc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Conjugate Multiply-Accumulate

bfp_complex_s32_conj_nmacc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Negated Conjugate Multiply-Accumulate

bfp_complex_s32_real_scale()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Real Scalar Multiply

bfp_complex_s32_scale()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Complex Scalar Multiply

bfp_complex_s32_add()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Add

bfp_complex_s32_add_scalar()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Complex Scalar Add

bfp_complex_s32_sub()

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Subtract

bfp_complex_s32_to_bfp_complex_s16()

x

\(\mathbb{V} \to \mathbb{V}\)

Convert to 16-bit

bfp_complex_s32_squared_mag()

x

\(\mathbb{V} \to \mathbb{V}\)

Squared Magnitude

bfp_complex_s32_mag()

x

\(\mathbb{V} \to \mathbb{V}\)

Magnitude

bfp_complex_s32_sum()

\(\mathbb{V} \to \mathbb{S}\)

Vector Sum

bfp_complex_s32_conjugate()

x

\(\mathbb{V} \to \mathbb{V}\)

Complex Conjugate

bfp_complex_s32_energy()

\(\mathbb{V} \to \mathbb{S}\)

Vector Energy

bfp_complex_s32_make()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Construct Complex From Real and Imaginary

bfp_complex_s32_real_part()

x

\(\mathbb{V} \to \mathbb{V}\)

Real Part

bfp_complex_s32_imag_part()

x

\(\mathbb{V} \to \mathbb{V}\)

Imaginary Part

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$BFP API quick reference$$$Complex 16-bit BFP API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_quickref.html#complex-16-bit-bfp-api-quick-reference

Complex 16-Bit BFP API - quick reference

Function

EW

Signature

Brief

bfp_complex_s16_init()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Initialize (static)

bfp_complex_s16_alloc()

\(\varnothing \to \mathbb{V}\)

Initialize (dynamic)

bfp_complex_s16_dealloc()

\(\mathbb{V} \to \mathbb{\varnothing}\)

Deinitialize

bfp_complex_s16_set()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Set All Elements

bfp_complex_s16_use_exponent()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Force Exponent

bfp_complex_s16_headroom()

\(\mathbb{V} \to \mathbb{S}\)

Get Headroom

bfp_complex_s16_shl()

x

\((\mathbb{V \times S}) \to \mathbb{V}\)

Shift Mantissas

bfp_complex_s16_real_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Real Vector Multiply

bfp_complex_s16_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Multiply

bfp_complex_s16_conj_mul()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Conjugate Multiply

bfp_complex_s16_macc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Multiply-Accumulate

bfp_complex_s16_nmacc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Negated Multiply-Accumulate

bfp_complex_s16_conj_macc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Conjugate Multiply-Accumulate

bfp_complex_s16_conj_nmacc()

x

\((\mathbb{V \times V \times V}) \to \mathbb{V}\)

Complex Vector Negated Conjugate Multiply-Accumulate

bfp_complex_s16_real_scale()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Real Scalar Multiply

bfp_complex_s16_scale()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Complex Scalar Multiply

bfp_complex_s16_add()

x

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Add

bfp_complex_s16_add_scalar()

\((\mathbb{V \times S}) \to \mathbb{V}\)

Complex Scalar Add

bfp_complex_s16_sub()

\((\mathbb{V \times V}) \to \mathbb{V}\)

Complex Vector Subtract

bfp_complex_s16_to_bfp_complex_s32()

x

\(\mathbb{V} \to \mathbb{V}\)

Convert to 32-bit

bfp_complex_s16_squared_mag()

x

\(\mathbb{V} \to \mathbb{V}\)

Squared Magnitude

bfp_complex_s16_sum()

\(\mathbb{V} \to \mathbb{S}\)

Vector Sum

bfp_complex_s16_mag()

x

\(\mathbb{V} \to \mathbb{V}\)

Magnitude

bfp_complex_s16_conjugate()

x

\(\mathbb{V} \to \mathbb{V}\)

Complex Conjugate

bfp_complex_s16_energy()

\(\mathbb{V} \to \mathbb{S}\)

Vector Energy

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$16-bit Block Floating-Point API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_s16.html#bit-block-floating-point-api
group bfp_s16_api

Functions

void bfp_s16_init(bfp_s16_t *a, int16_t *data, const exponent_t exp, const unsigned length, const unsigned calc_hr)

Initialize a 16-bit BFP vector.

This function initializes each of the fields of BFP vector a.

data points to the memory buffer used to store elements of the vector, so it must be at least length * 2 bytes long, and must begin at a word-aligned address.

exp is the exponent assigned to the BFP vector. The logical value associated with the kth element of the vector after initialization is \( data_k \cdot 2^{exp} \).

If calc_hr is false, a->hr is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initialize a->hr.

Parameters:
  • a[out] BFP vector to initialize

  • data[in] int16_t buffer used to back a

  • exp[in] Exponent of BFP vector

  • length[in] Number of elements in the BFP vector

  • calc_hr[in] Boolean indicating whether the HR of the BFP vector should be calculated

bfp_s16_t bfp_s16_alloc(const unsigned length)

Dynamically allocate a 16-bit BFP vector from the heap.

If allocation was unsuccessful, the data field of the returned vector will be NULL, and the length field will be zero. Otherwise, data will point to the allocated memory and the length field will be the user-specified length. The length argument must not be zero.

Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_s16_set() on the retuned BFP vector.

BFP vectors allocated using this function must be deallocated using bfp_s16_dealloc() to avoid a memory leak.

To initialize a BFP vector using static memory allocation, use bfp_s16_init() instead.

Note

Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.

Parameters:
  • length[in] The length of the BFP vector to be allocated (in elements)

Returns:

16-bit BFP vector

void bfp_s16_dealloc(bfp_s16_t *vector)

Deallocate a 16-bit BFP vector allocated by bfp_s16_alloc().

Use this function to free the heap memory allocated by bfp_s16_alloc().

BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_s16_t which has not had its flags or data manually manipulated, including:

In the latter two cases, this function does nothing. In the former, the data, length and flags fields of vector are cleared to zero.

See also

bfp_s16_alloc

Parameters:
  • vector[in] BFP vector to be deallocated.

void bfp_s16_set(bfp_s16_t *a, const int16_t b, const exponent_t exp)

Set all elements of a 16-bit BFP vector to a specified value.

The exponent of a is set to exp, and each element’s mantissa is set to b.

After performing this operation, all elements will represent the same value \(b \cdot 2^{exp}\).

a must have been initialized (see bfp_s16_init()).

Parameters:
  • a[out] BFP vector to update

  • b[in] New value each mantissa is set to

  • exp[in] New exponent for the BFP vector

headroom_t bfp_s16_headroom(bfp_s16_t *b)

Get the headroom of a 16-bit BFP vector.

The headroom of a vector is the number of bits its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.

In a BFP context, headroom applies to mantissas only, not exponents.

In particular, if the 16-bit mantissa vector \(\bar x\) has \(N\) bits of headroom, then for any element \(x_k\) of \(\bar x\)

\(-2^{15-N} \le x_k < 2^{15-N}\)

And for any element \(X_k = x_k \cdot 2^{x\_exp}\) of a complex BFP vector \(\bar X\)

\(-2^{15 + x\_exp - N} \le X_k < 2^{15 + x\_exp - N} \)

This function determines the headroom of b, updates b->hr with that value, and then returns b->hr.

Parameters:
  • b – BFP vector to get the headroom of

Returns:

Headroom of BFP vector b

void bfp_s16_use_exponent(bfp_s16_t *a, const exponent_t exp)

Modify a 16-bit BFP vector to use a specified exponent.

This function forces BFP vector \(\bar A\) to use a specified exponent. The mantissa vector \(\bar a\) will be bit-shifted left or right to compensate for the changed exponent.

This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.

Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).

If the required fixed-point Q-format is QX.Y, where Y is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameter exp) is -Y.

a points to input BFP vector \(\bar A\), with mantissa vector \(\bar a\) and exponent \(a\_exp\). a is updated in place to produce resulting BFP vector \(\bar{\tilde{A}}\) with mantissa vector \(\bar{\tilde{a}}\) and exponent \(\tilde{a}\_exp\).

exp is \(\tilde{a}\_exp\), the required exponent. \(\Delta{}p = \tilde{a}\_exp - a\_exp\) is the required change in exponent.

If \(\Delta{}p = 0\), the BFP vector is left unmodified.

If \(\Delta{}p > 0\), the required exponent is larger than the current exponent and an arithmetic right-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When applying a right-shift, precision may be lost by discarding the \(\Delta{}p\) least significant bits.

If \(\Delta{}p < 0\), the required exponent is smaller than the current exponent and a left-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 16-bit saturation bounds.

The exponent and headroom of a are updated by this function.

Operation Performed

\[\begin{split}\begin{aligned} & \Delta{}p = \tilde{a}\_exp - a\_exp \\ & \tilde{a_k} \leftarrow sat_{16}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{aligned}\end{split}\]

Parameters:
  • a[inout] Input BFP vector \(\bar A\) / Output BFP vector \(\bar{\tilde{A}}\)

  • exp[in] The required exponent, \(\tilde{a}\_exp\)

void bfp_s16_shl(bfp_s16_t *a, const bfp_s16_t *b, const left_shift_t b_shl)

Apply a left-shift to the mantissas of a 16-bit BFP vector.

Each mantissa of input BFP vector \(\bar B\) is left-shifted b_shl bits and stored in the corresponding element of output BFP vector \(\bar A\).

This operation can be used to add or remove headroom from a BFP vector.

b_shl is the number of bits that each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values for b_shl will right-shift the mantissas.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 16-bit range (the open interval \((-2^{15}, 2^{15})\)). To avoid saturation, b_shl should be no greater than the headroom of b (b->hr).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{16}( \lfloor b_k \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • b_shl[in] Signed arithmetic left-shift to be applied to mantissas of \(\bar B\).

void bfp_s16_add(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Add two 16-bit BFP vectors together.

Add together two input BFP vectors \(\bar B\) and \(\bar C\) and store the result in BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} + \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s16_add_scalar(bfp_s16_t *a, const bfp_s16_t *b, const float c)

Add a scalar to a 16-bit BFP vector.

Add a real scalar \(c\) to input BFP vector \(\bar B\) and store the result in BFP vector \(\bar A\).

a, and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & \bar{A} \leftarrow \bar{B} + c \\ \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input scalar \(c\)

void bfp_s16_sub(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Subtract one 16-bit BFP vector from another.

Subtract input BFP vector \(\bar C\) from input BFP vector \(\bar C\) and store the result in BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} - \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s16_mul(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Multiply one 16-bit BFP vector by another element-wise.

Multiply each element of input BFP vector \(\bar B\) by the corresponding element of input BFP vector \(\bar C\) and store the results in output BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a – Output BFP vector \(\bar A\)

  • b – Input BFP vector \(\bar B\)

  • c – Input BFP vector \(\bar C\)

void bfp_s16_macc(bfp_s16_t *acc, const bfp_s16_t *b, const bfp_s16_t *c)

Multiply one 16-bit BFP vector by another element-wise and add the result to a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k + B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s16_nmacc(bfp_s16_t *acc, const bfp_s16_t *b, const bfp_s16_t *c)

Multiply one 16-bit BFP vector by another element-wise and subtract the result from a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k - B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s16_scale(bfp_s16_t *a, const bfp_s16_t *b, const float alpha)

Multiply a 16-bit BFP vector by a scalar.

Multiply input BFP vector \(\bar B\) by scalar \(\alpha \cdot 2^{\alpha\_exp}\) and store the result in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

alpha represents the scalar \(\alpha \cdot 2^{\alpha\_exp}\), where \(\alpha\) is alpha.mant and \(\alpha\_exp\) is alpha.exp.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} \cdot \left(\alpha \cdot 2^{\alpha\_exp}\right) \end{aligned}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • alpha[in] Scalar by which \(\bar B\) is multiplied

void bfp_s16_abs(bfp_s16_t *a, const bfp_s16_t *b)

Get the absolute values of elements of a 16-bit BFP vector.

Compute the absolute value of each element \(B_k\) of input BFP vector \(\bar B\) and store the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

float_s32_t bfp_s16_sum(const bfp_s16_t *b)

Sum the elements of a 16-bit BFP vector.

Sum the elements of input BFP vector \(\bar B\) to get a result \(A = a \cdot 2^{a\_exp}\), which is returned. The returned value has a 32-bit mantissa.

b must have been initialized (see bfp_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the sum of elements of \(\bar B\)

float_s64_t bfp_s16_dot(const bfp_s16_t *b, const bfp_s16_t *c)

Compute the inner product of two 16-bit BFP vectors.

Adds together the element-wise products of input BFP vectors \(\bar B\) and \(\bar C\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is the 64-bit mantissa of the result and \(a\_exp\) is its associated exponent. \(A\) is returned.

b and c must have been initialized (see bfp_s16_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & a \cdot 2^{a\_exp} \leftarrow \sum_{k=0}^{N-1} \left( B_k \cdot C_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

Returns:

\(A\), the inner product of vectors \(\bar B\) and \(\bar C\)

void bfp_s16_clip(bfp_s16_t *a, const bfp_s16_t *b, const int16_t lower_bound, const int16_t upper_bound, const int bound_exp)

Clamp the elements of a 16-bit BFP vector to a specified range.

Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is in the range \( [ L \cdot 2^{bound\_exp}, U \cdot 2^{bound\_exp} ] \), otherwise it is set to the nearest value inside that range.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \begin{cases} L \cdot 2^{bound\_exp} & B_k < L \cdot 2^{bound\_exp} \\ U \cdot 2^{bound\_exp} & B_k > U \cdot 2^{bound\_exp} \\ B_k & otherwise \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • lower_bound[in] Mantissa of the lower clipping bound, \(L\)

  • upper_bound[in] Mantissa of the upper clipping bound, \(U\)

  • bound_exp[in] Shared exponent of the clipping bounds

void bfp_s16_rect(bfp_s16_t *a, const bfp_s16_t *b)

Rectify a 16-bit BFP vector.

Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is non-negative, otherwise it is set to \(0\).

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \begin{cases} 0 & B_k < 0 \\ B_k & otherwise \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

void bfp_s16_to_bfp_s32(bfp_s32_t *a, const bfp_s16_t *b)

Convert a 16-bit BFP vector into a 32-bit BFP vector.

Increases the bit-depth of each 16-bit element \(B_k\) of input BFP vector \(\bar B\) to 32 bits, and stores the 32-bit result in the corresponding element \(A_k\) of output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s16_init() and bfp_s32_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \overset{32-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

void bfp_s16_sqrt(bfp_s16_t *a, const bfp_s16_t *b)

Get the square roots of elements of a 16-bit BFP vector.

Computes the square root of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \sqrt{B_k} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Notes

  • Only the XMATH_BFP_SQRT_DEPTH_S16 (see xmath_conf.h) most significant bits of each result are computed.

  • This function only computes real roots. For any \(B_k < 0\), the corresponding output \(A_k\) is set to \(0\).

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

void bfp_s16_inverse(bfp_s16_t *a, const bfp_s16_t *b)

Get the inverses of elements of a 16-bit BFP vector.

Computes the inverse of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k^{-1} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

float_s32_t bfp_s16_abs_sum(const bfp_s16_t *b)

Sum the absolute values of elements of a 16-bit BFP vector.

Sum the absolute values of elements of input BFP vector \(\bar B\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is a 32-bit mantissa and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sum_{k=0}^{N-1} \left| A_k \right| \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the sum of absolute values of elements of \(\bar B\)

float bfp_s16_mean(const bfp_s16_t *b)

Get the mean value of a 16-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the mean value of elements of input BFP vector \(\bar B\), where \(a\) is the 16-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \frac{1}{N} \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the mean value of \(\bar B\)’s elements

float_s64_t bfp_s16_energy(const bfp_s16_t *b)

Get the energy (sum of squared of elements) of a 16-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the sum of squares of elements of input BFP vector \(\bar B\), where \(a\) is the 64-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k^2 \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), \(\bar B\)’s energy

float_s32_t bfp_s16_rms(const bfp_s16_t *b)

Get the RMS value of elements of a 16-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the RMS value of elements of input BFP vector \(\bar B\), where \(a\) is the 32-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

The RMS (root-mean-square) value of a vector is the square root of the sum of the squares of the vector’s elements.

b must have been initialized (see bfp_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sqrt{\frac{1}{N}\sum_{k=0}^{N-1} \left( B_k^2 \right) } \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the RMS value of \(\bar B\)’s elements

float bfp_s16_max(const bfp_s16_t *b)

Get the maximum value of a 16-bit BFP vector.

Finds \(A\), the maximum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.

b must have been initialized (see bfp_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow max\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector

Returns:

\(A\), the value of \(\bar B\)’s maximum element

void bfp_s16_max_elementwise(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Get the element-wise maximum of two 16-bit BFP vectors.

Each element of output vector \(\bar A\) is set to the maximum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow max(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a – Output BFP vector \(\bar A\)

  • b – Input BFP vector \(\bar B\)

  • c – Input BFP vector \(\bar C\)

float bfp_s16_min(const bfp_s16_t *b)

Get the minimum value of a 16-bit BFP vector.

Finds \(A\), the minimum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.

b must have been initialized (see bfp_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow min\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector

Returns:

\(A\), the value of \(\bar B\)’s minimum element

void bfp_s16_min_elementwise(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Get the element-wise minimum of two 16-bit BFP vectors.

Each element of output vector \(\bar A\) is set to the minimum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow min(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a – Output BFP vector \(\bar A\)

  • b – Input BFP vector \(\bar B\)

  • c – Input BFP vector \(\bar C\)

unsigned bfp_s16_argmax(const bfp_s16_t *b)

Get the index of the maximum value of a 16-bit BFP vector.

Finds \(a\), the index of the maximum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.

If i is the value returned, then the maximum value in \(\bar B\) is ldexp(b->data[i],b->exp).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmax_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Notes

  • If there is a tie for maximum value, the lowest tying index is returned.

Parameters:
  • b[in] Input vector

Returns:

\(a\), the index of the maximum value from \(\bar B\)

unsigned bfp_s16_argmin(const bfp_s16_t *b)

Get the index of the minimum value of a 16-bit BFP vector.

Finds \(a\), the index of the minimum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.

If i is the value returned then the minimum value in \(\bar B\) is ldexp(b->data[i], b->exp).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmin_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Notes

  • If there is a tie for minimum value, the lowest tying index is returned.

Parameters:
  • b[in] Input vector

Returns:

\(a\), the index of the minimum value from \(\bar B\)

headroom_t bfp_s16_accumulate(split_acc_s32_t a[], const exponent_t a_exp, const bfp_s16_t *b)

Accumulate a 16-bit BFP vector into a 32-bit accumulator vector.

This function is used for efficiently accumulating a series of 16-bit BFP vectors into a 32-bit vector. Each call to this function adds a BFP vector \(\bar B\) into the persistent 32-bit accumulator vector \(\bar A\).

Eventually the value of \(\bar A\) will be needed for something other than simple accumulation, which requires converting from the XS3-native split accumulator representation given by the split_acc_s32_t struct, into a standard vector of int32_t. This can be accomplished using vect_s32_merge_accs(). From there, the int32_t vector can be dropped to a 16-bit vector with vect_s32_to_vect_s16() if needed.

Note, in order for this operation to work, \(\mathtt{b\_exp} - \mathtt{a\_exp}\) must be no greater than \(14\).

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{A} + \bar{B} \end{aligned}\]

Proper use of this function requires some book-keeping on the part of the caller. In particular, the caller is responsible for tracking the exponent and monitoring the headroom of the accumulator vector \(\bar A\).

Usage

To begin a sequence of accumulation, start by clearing the contents of \(\bar A\) to all zeros. Then, an appropriate exponent for \(\bar A\) must be chosen. The only hard constraint is that the accumulator exponent, \(\mathtt{a\_exp}\) must be within \(14\) of \(\bar B\)’s exponent, \(\mathtt{b\_exp}\). If \(\mathtt{b\_exp}\) is unknown, the caller may choose to wait until the first \(\bar B\) is available before initializing \(\mathtt{a\_exp}\).

As vectors are accumulated into \(\bar A\) with multiple calls to this function, it becomes possible for \(\bar A\) to saturate for some element. Each call to this function returns the headroom of \(\bar A\) (note: no more than 15 bits of headroom will be reported). If \(\bar A\) has at least 1 bit of headroom, then a call to this function is guarranteed not to saturate.

The larger \(\mathtt{a\_exp}\) is compared to each \(\mathtt{b\_exp}\), the more 16-bit vectors can be accumulated before saturation becomes possible (and by virtue of that, the more efficiently accumulation can take place.). On the other hand, as long as \(\mathtt{a\_exp} \le \mathtt{b\_exp}\), there is no precision loss during accumulation. It is the responsibility of the caller to manage this trade-off.

If and when this function reports that \(\bar A\) has 0 headroom, if further accumulation is needed, the caller can handle this by increasing \(\mathtt{a\_exp}\). Increasing \(\mathtt{a\_exp}\) will require that the contents of the mantissa vector \(\bar a\) be right-shifted to avoid corrupting the value of \(\bar A\), making room for further accumulation in the process. Shifting the split accumulators can be accomplished with a call to vect_split_acc_s32_shr().

Finally, when accumulation is complete or the accumulator values must be used elsewhere, the split accumulator vector can be converted to simple int32_t vector with a call to vect_s32_merge_accs().

Parameters:
  • a[inout] Mantissas of accumulator vector \(\bar A\)

  • a_exp[in] Exponent of accumulator vector \(\bar A\)

  • b[in] Input vector \(\bar B\)

Returns:

Headroom of \(\bar A\) (up to 15 bits)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$32-bit Block Floating-Point API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_s32.html#bit-block-floating-point-api
group bfp_s32_api

Functions

void bfp_s32_init(bfp_s32_t *a, int32_t *data, const exponent_t exp, const unsigned length, const unsigned calc_hr)

Initialize a 32-bit BFP vector.

This function initializes each of the fields of BFP vector a.

data points to the memory buffer used to store elements of the vector, so it must be at least length * 4 bytes long, and must begin at a word-aligned address.

exp is the exponent assigned to the BFP vector. The logical value associated with the kth element of the vector after initialization is \( data_k \cdot 2^{exp} \).

If calc_hr is false, a->hr is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initialize a->hr.

Parameters:
  • a[out] BFP vector to initialize

  • data[in] int32_t buffer used to back a

  • exp[in] Exponent of BFP vector

  • length[in] Number of elements in the BFP vector

  • calc_hr[in] Boolean indicating whether the HR of the BFP vector should be calculated

bfp_s32_t bfp_s32_alloc(unsigned length)

Dynamically allocate a 32-bit BFP vector from the heap.

If allocation was unsuccessful, the data field of the returned vector will be NULL, and the length field will be zero. Otherwise, data will point to the allocated memory and the length field will be the user-specified length. The length argument must not be zero.

Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_s32_set() on the retuned BFP vector.

BFP vectors allocated using this function must be deallocated using bfp_s32_dealloc() to avoid a memory leak.

To initialize a BFP vector using static memory allocation, use bfp_s32_init() instead.

Note

This function always allocates an extra 2 elements so that bfp_fft_unpack_mono() can safely be used, but these two elements will NOT be reflected in the returned vector length.

Note

Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.

Parameters:
  • length[in] The length of the BFP vector to be allocated (in elements)

Returns:

32-bit BFP vector

void bfp_s32_dealloc(bfp_s32_t *vector)

Deallocate a 32-bit BFP vector allocated by bfp_s32_alloc().

Use this function to free the heap memory allocated by bfp_s32_alloc().

BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_s32_t which has not had its flags or data manually manipulated, including:

In the latter two cases, this function does nothing. In the former, the data, length and flags fields of vector are cleared to zero.

See also

bfp_s32_alloc

Parameters:
  • vector[in] BFP vector to be deallocated.

void bfp_s32_set(bfp_s32_t *a, const int32_t b, const exponent_t exp)

Set all elements of a 32-bit BFP vector to a specified value.

The exponent of a is set to exp, and each element’s mantissa is set to b.

After performing this operation, all elements will represent the same value \(b \cdot 2^{exp}\).

a must have been initialized (see bfp_s32_init()).

Parameters:
  • a[out] BFP vector to update

  • b[in] New value each mantissa is set to

  • exp[in] New exponent for the BFP vector

void bfp_s32_use_exponent(bfp_s32_t *a, const exponent_t exp)

Modify a 32-bit BFP vector to use a specified exponent.

This function forces BFP vector \(\bar A\) to use a specified exponent. The mantissa vector \(\bar a\) will be bit-shifted left or right to compensate for the changed exponent.

This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.

Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).

If the required fixed-point Q-format is QX.Y, where Y is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameter exp) is -Y.

a points to input BFP vector \(\bar A\), with mantissa vector \(\bar a\) and exponent \(a\_exp\). a is updated in place to produce resulting BFP vector \(\tilde{A}\) with mantissa vector \(\tilde{a}\) and exponent \(\tilde{a}\_exp\).

exp is \(\tilde{a}\_exp\), the required exponent. \(\Delta{}p = \tilde{a}\_exp - a\_exp\) is the required change in exponent.

If \(\Delta{}p = 0\), the BFP vector is left unmodified.

If \(\Delta{}p > 0\), the required exponent is larger than the current exponent and an arithmetic right-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When applying a right-shift, precision may be lost by discarding the \(\Delta{}p\) least significant bits.

If \(\Delta{}p < 0\), the required exponent is smaller than the current exponent and a left-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 32-bit saturation bounds.

The exponent and headroom of a are updated by this function.

Operation Performed

\[\begin{split}\begin{aligned} & \Delta{}p = \tilde{a}\_exp - a\_exp \\ & \tilde{a_k} \leftarrow sat_{32}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{aligned}\end{split}\]

Parameters:
  • a[inout] Input BFP vector \(\bar A\) / Output BFP vector \(\tilde{A}\)

  • exp[in] The required exponent, \(\tilde{a}\_exp\)

headroom_t bfp_s32_headroom(bfp_s32_t *b)

Get the headroom of a 32-bit BFP vector.

The headroom of a vector is the number of bits its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.

In a BFP context, headroom applies to mantissas only, not exponents.

In particular, if the 32-bit mantissa vector \(\bar x\) has \(N\) bits of headroom, then for any element \(x_k\) of \(\bar x\)

\(-2^{31-N} \le x_k < 2^{31-N}\)

And for any element \(X_k = x_k \cdot 2^{x\_exp}\) of a complex BFP vector \(\bar X\)

\(-2^{31 + x\_exp - N} \le X_k < 2^{31 + x\_exp - N} \)

This function determines the headroom of b, updates b->hr with that value, and then returns b->hr.

Parameters:
  • b – BFP vector to get the headroom of

Returns:

Headroom of BFP vector b

void bfp_s32_shl(bfp_s32_t *a, const bfp_s32_t *b, const left_shift_t b_shl)

Apply a left-shift to the mantissas of a 32-bit BFP vector.

Each mantissa of input BFP vector \(\bar B\) is left-shifted b_shl bits and stored in the corresponding element of output BFP vector \(\bar A\).

This operation can be used to add or remove headroom from a BFP vector.

b_shl is the number of bits that each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values for b_shl will right-shift the mantissas.

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 32-bit range (the open interval \((-2^{31}, 2^{31})\)). To avoid saturation, b_shl should be no greater than the headroom of b (b->hr).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{32}( \lfloor b_k \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • b_shl[in] Signed arithmetic left-shift to be applied to mantissas of \(\bar B\).

void bfp_s32_add(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)

Add two 32-bit BFP vectors together.

Add together two input BFP vectors \(\bar B\) and \(\bar C\) and store the result in BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} + \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s32_add_scalar(bfp_s32_t *a, const bfp_s32_t *b, const float_s32_t c)

Add a scalar to a 32-bit BFP vector.

Add a real scalar \(c\) to input BFP vector \(\bar B\) and store the result in BFP vector \(\bar A\).

a, and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} + c \end{aligned}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input scalar \(c\)

void bfp_s32_sub(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)

Subtract one 32-bit BFP vector from another.

Subtract input BFP vector \(\bar C\) from input BFP vector \(\bar C\) and store the result in BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} - \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s32_mul(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)

Multiply one 32-bit BFP vector by another element-wise.

Multiply each element of input BFP vector \(\bar B\) by the corresponding element of input BFP vector \(\bar C\) and store the results in output BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a – Output BFP vector \(\bar A\)

  • b – Input BFP vector \(\bar B\)

  • c – Input BFP vector \(\bar C\)

void bfp_s32_macc(bfp_s32_t *acc, const bfp_s32_t *b, const bfp_s32_t *c)

Multiply one 32-bit BFP vector by another element-wise and add the result to a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k + B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s32_nmacc(bfp_s32_t *acc, const bfp_s32_t *b, const bfp_s32_t *c)

Multiply one 32-bit BFP vector by another element-wise and subtract the result from a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k - B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

void bfp_s32_scale(bfp_s32_t *a, const bfp_s32_t *b, const float_s32_t alpha)

Multiply a 32-bit BFP vector by a scalar.

Multiply input BFP vector \(\bar B\) by scalar \(\alpha \cdot 2^{\alpha\_exp}\) and store the result in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

alpha represents the scalar \(\alpha \cdot 2^{\alpha\_exp}\), where \(\alpha\) is alpha.mant and \(\alpha\_exp\) is alpha.exp.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} \cdot \left(\alpha \cdot 2^{\alpha\_exp}\right) \end{aligned}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • alpha[in] Scalar by which \(\bar B\) is multiplied

void bfp_s32_abs(bfp_s32_t *a, const bfp_s32_t *b)

Get the absolute values of elements of a 32-bit BFP vector.

Compute the absolute value of each element \(B_k\) of input BFP vector \(\bar B\) and store the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

float_s64_t bfp_s32_sum(const bfp_s32_t *b)

Sum the elements of a 32-bit BFP vector.

Sum the elements of input BFP vector \(\bar B\) to get a result \(A = a \cdot 2^{a\_exp}\), which is returned. The returned value has a 64-bit mantissa.

b must have been initialized (see bfp_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the sum of elements of \(\bar B\)

float_s64_t bfp_s32_dot(const bfp_s32_t *b, const bfp_s32_t *c)

Compute the inner product of two 32-bit BFP vectors.

Adds together the element-wise products of input BFP vectors \(\bar B\) and \(\bar C\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is the 64-bit mantissa of the result and \(a\_exp\) is its associated exponent. \(A\) is returned.

b and c must have been initialized (see bfp_s32_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & a \cdot 2^{a\_exp} \leftarrow \sum_{k=0}^{N-1} \left( B_k \cdot C_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

  • c[in] Input BFP vector \(\bar C\)

Returns:

\(A\), the inner product of vectors \(\bar B\) and \(\bar C\)

void bfp_s32_clip(bfp_s32_t *a, const bfp_s32_t *b, const int32_t lower_bound, const int32_t upper_bound, const int bound_exp)

Clamp the elements of a 32-bit BFP vector to a specified range.

Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is in the range \( [ L \cdot 2^{bound\_exp}, U \cdot 2^{bound\_exp} ] \), otherwise it is set to the nearest value inside that range.

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \begin{cases} L \cdot 2^{bound\_exp} & B_k < L \cdot 2^{bound\_exp} \\ U \cdot 2^{bound\_exp} & B_k > U \cdot 2^{bound\_exp} \\ B_k & otherwise \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

  • lower_bound[in] Mantissa of the lower clipping bound, \(L\)

  • upper_bound[in] Mantissa of the upper clipping bound, \(U\)

  • bound_exp[in] Shared exponent of the clipping bounds

void bfp_s32_rect(bfp_s32_t *a, const bfp_s32_t *b)

Rectify a 32-bit BFP vector.

Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is non-negative, otherwise it is set to \(0\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \begin{cases} 0 & B_k < 0 \\ B_k & otherwise \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

void bfp_s32_to_bfp_s16(bfp_s16_t *a, const bfp_s32_t *b)

Convert a 32-bit BFP vector into a 16-bit BFP vector.

Reduces the bit-depth of each 32-bit element \(B_k\) of input BFP vector \(\bar B\) to 16 bits, and stores the 16-bit result in the corresponding element \(A_k\) of output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init() and bfp_s16_init()), and must be the same length.

As much precision as possible will be retained.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \overset{16-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

void bfp_s32_sqrt(bfp_s32_t *a, const bfp_s32_t *b)

Get the square roots of elements of a 32-bit BFP vector.

Computes the square root of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \sqrt{B_k} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Notes

  • Only the XMATH_BFP_SQRT_DEPTH_S32 (see xmath_conf.h) most significant bits of each result are computed.

  • This function only computes real roots. For any \(B_k < 0\), the corresponding output \(A_k\) is set to \(0\).

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

void bfp_s32_inverse(bfp_s32_t *a, const bfp_s32_t *b)

Get the inverses of elements of a 32-bit BFP vector.

Computes the inverse of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k^{-1} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output BFP vector \(\bar A\)

  • b[in] Input BFP vector \(\bar B\)

float_s64_t bfp_s32_abs_sum(const bfp_s32_t *b)

Sum the absolute values of elements of a 32-bit BFP vector.

Sum the absolute values of elements of input BFP vector \(\bar B\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is a 64-bit mantissa and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sum_{k=0}^{N-1} \left| A_k \right| \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the sum of absolute values of elements of \(\bar B\)

float_s32_t bfp_s32_mean(const bfp_s32_t *b)

Get the mean value of a 32-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the mean value of elements of input BFP vector \(\bar B\), where \(a\) is the 32-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \frac{1}{N} \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the mean value of \(\bar B\)’s elements

float_s64_t bfp_s32_energy(const bfp_s32_t *b)

Get the energy (sum of squared of elements) of a 32-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the sum of squares of elements of input BFP vector \(\bar B\), where \(a\) is the 64-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k^2 \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), \(\bar B\)’s energy

float_s32_t bfp_s32_rms(const bfp_s32_t *b)

Get the RMS value of elements of a 32-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the RMS value of elements of input BFP vector \(\bar B\), where \(a\) is the 32-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

The RMS (root-mean-square) value of a vector is the square root of the sum of the squares of the vector’s elements.

b must have been initialized (see bfp_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow \sqrt{\frac{1}{N}\sum_{k=0}^{N-1} \left( B_k^2 \right) } \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input BFP vector \(\bar B\)

Returns:

\(A\), the RMS value of \(\bar B\)’s elements

float_s32_t bfp_s32_max(const bfp_s32_t *b)

Get the maximum value of a 32-bit BFP vector.

Finds \(A\), the maximum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.

b must have been initialized (see bfp_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow max\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector

Returns:

\(A\), the value of \(\bar B\)’s maximum element

void bfp_s32_max_elementwise(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)

Get the element-wise maximum of two 32-bit BFP vectors.

Each element of output vector \(\bar A\) is set to the maximum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow max(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a – Output BFP vector \(\bar A\)

  • b – Input BFP vector \(\bar B\)

  • c – Input BFP vector \(\bar C\)

float_s32_t bfp_s32_min(const bfp_s32_t *b)

Get the minimum value of a 32-bit BFP vector.

Finds \(A\), the minimum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.

b must have been initialized (see bfp_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & A \leftarrow min\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector

Returns:

\(A\), the value of \(\bar B\)’s minimum element

void bfp_s32_min_elementwise(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)

Get the element-wise minimum of two 32-bit BFP vectors.

Each element of output vector \(\bar A\) is set to the minimum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow min(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a – Output BFP vector \(\bar A\)

  • b – Input BFP vector \(\bar B\)

  • c – Input BFP vector \(\bar C\)

unsigned bfp_s32_argmax(const bfp_s32_t *b)

Get the index of the maximum value of a 32-bit BFP vector.

Finds \(a\), the index of the maximum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.

If i is the value returned, then the maximum value in \(\bar B\) is ldexp(b->data[i], b->exp).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmax_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Notes

  • If there is a tie for maximum value, the lowest tying index is returned.

Parameters:
  • b[in] Input vector

Returns:

\(a\), the index of the maximum value from \(\bar B\)

unsigned bfp_s32_argmin(const bfp_s32_t *b)

Get the index of the minimum value of a 32-bit BFP vector.

Finds \(a\), the index of the minimum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.

If i is the value returned, then the minimum value in \(\bar B\) is ldexp(b->data[i], b->exp).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmin_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Notes

  • If there is a tie for minimum value, the lowest tying index is returned.

Parameters:
  • b[in] Input vector

Returns:

\(a\), the index of the minimum value from \(\bar B\)

void bfp_s32_convolve_valid(bfp_s32_t *y, const bfp_s32_t *x, const int32_t b_q30[], const unsigned b_length)

Convolve a 32-bit BFP vector with a short convolution kernel (“valid” mode).

Input BFP vector \(\bar X\) is convolved with a short fixed-point convolution kernel \(\bar b\) to produce output BFP vector \(\bar Y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar X\). The convolution is “valid” in the sense that no output elements are emitted where the filter taps extend beyond the bounds of the input vector, resulting in an output vector \(\bar Y\) with fewer elements.

The maximum filter order \(K\) supported by this function is \(7\).

y is the output vector \(\bar Y\). If input \(\bar X\) has \(N\) elements, and the filter has \(K\) coefficients, then \(\bar Y\) has \(N-2P\) elements, where \(P = \lfloor K / 2 \rfloor\).

x is the input vector \(\bar X\) with length \(N\) and elements.

b_q30[] is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixed-point format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{-30}\).

b_length is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps). b_length must be one of \( \{ 1, 3, 5, 7 \} \).

Operation Performed

\[\begin{split}\begin{aligned} & Y_k \leftarrow \sum_{l=0}^{K-1} (X_{(k+l)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{aligned}\end{split}\]

Parameters:
  • y[out] Output BFP vector \(\bar Y\)

  • x[in] Input BFP vector \(\bar X\)

  • b_q30[in] Convolution kernel \(\bar b\)

  • b_length[in] The number of elements \(K\) in \(\bar b\)

void bfp_s32_convolve_same(bfp_s32_t *y, const bfp_s32_t *x, const int32_t b_q30[], const unsigned b_length, const pad_mode_e padding_mode)

Convolve a 32-bit BFP vector with a short convolution kernel (“same” mode).

Input BFP vector \(\bar X\) is convolved with a short fixed-point convolution kernel \(\bar b\) to produce output BFP vector \(\bar Y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar X\). The convolution mode is “same” in that the input vector is effectively padded such that the input and output vectors are the same length. The padding behavior is one of those given by pad_mode_e.

The maximum filter order \(K\) supported by this function is \(7\).

y and x are the output and input BFP vectors \(\bar Y\) and \(\bar X\) respectively.

b_q30[] is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixed-point format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{-30}\).

b_length is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps). b_length must be one of \( \{ 1, 3, 5, 7 \} \).

padding_mode is one of the values from the pad_mode_e enumeration. The padding mode indicates the filter input values for filter taps that have extended beyond the bounds of the input vector \(\bar X\). See pad_mode_e for a list of supported padding modes and associated behaviors.

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{x}_i = \begin{cases} \text{determined by padding mode} & i < 0 \\ \text{determined by padding mode} & i \ge N \\ x_i & otherwise \end{cases} \\ & y_k \leftarrow \sum_{l=0}^{K-1} (\tilde{x}_{(k+l-P)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{aligned}\end{split}\]

Note

Unlike bfp_s32_convolve_valid(), this operation cannot be performed safely in-place on x

Parameters:
  • y[out] Output BFP vector \(\bar Y\)

  • x[in] Input BFP vector \(\bar X\)

  • b_q30[in] Convolution kernel \(\bar b\)

  • b_length[in] The number of elements \(K\) in \(\bar b\)

  • padding_mode[in] The padding mode to be applied at signal boundaries

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$Complex 16-bit Block Floating-Point API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_complex_s16.html#complex-16-bit-block-floating-point-api
group bfp_complex_s16_api

Functions

void bfp_complex_s16_init(bfp_complex_s16_t *a, int16_t *real_data, int16_t *imag_data, const exponent_t exp, const unsigned length, const unsigned calc_hr)

Initialize a complex 16-bit BFP vector.

This function initializes each of the fields of BFP vector a.

Unlike complex 32-bit BFP vectors (bfp_complex_s16_t), for the sake of various optimizations the real and imaginary parts of elements’ mantissas are stored in separate memory buffers.

real_data points to the memory buffer used to store the real part of each mantissa. It must be at least length * 2 bytes long, and must begin at a word-aligned address.

imag_data points to the memory buffer used to store the imaginary part of each mantissa. It must be at least length * 2 bytes long, and must begin at a word-aligned address.

exp is the exponent assigned to the BFP vector. The logical value associated with the kth element of the vector after initialization is \( data_k \cdot 2^{exp} \).

If calc_hr is false, a->hr is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initialize a->hr.

Parameters:
  • a[out] BFP vector to initialize

  • real_data[in] int16_t buffer used to back the real part of a

  • imag_data[in] int16_t buffer used to back the imaginary part of a

  • exp[in] Exponent of BFP vector

  • length[in] Number of elements in BFP vector

  • calc_hr[in] Boolean indicating whether the HR of the BFP vector should be calculated

bfp_complex_s16_t bfp_complex_s16_alloc(const unsigned length)

Dynamically allocate a complex 16-bit BFP vector from the heap.

If allocation was unsuccessful, the real and imag fields of the returned vector will be NULL, and the length field will be zero. Otherwise, real and imag will point to the allocated memory and the length field will be the user-specified length. The length argument must not be zero.

This function allocates a single block of memory for both the real and imaginary parts of the BFP vector. Because all BFP functions require the mantissa buffers to begin at a word- aligned address, if length is odd, this function will allocate an extra int16_t element for the buffer.

Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_complex_s16_set() on the retuned BFP vector.

BFP vectors allocated using this function must be deallocated using bfp_complex_s16_dealloc() to avoid a memory leak.

To initialize a BFP vector using static memory allocation, use bfp_complex_s16_init() instead.

Note

Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.

Parameters:
  • length[in] The length of the BFP vector to be allocated (in elements)

Returns:

Complex 16-bit BFP vector

void bfp_complex_s16_dealloc(bfp_complex_s16_t *vector)

Deallocate a complex 16-bit BFP vector allocated by bfp_complex_s16_alloc().

Use this function to free the heap memory allocated by bfp_complex_s16_alloc().

BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_complex_s16_t which has not had its flags or real manually manipulated, including:

In the latter two cases, this function does nothing. In the former, the real, imag, length and flags fields of vector are cleared to zero.

Parameters:
  • vector[in] BFP vector to be deallocated.

void bfp_complex_s16_set(bfp_complex_s16_t *a, const complex_s16_t b, const exponent_t exp)

Set all elements of a complex 16-bit BFP vector to a specified value.

The exponent of a is set to exp, and each element’s mantissa is set to b.

After performing this operation, all elements will represent the same value \(b \cdot 2^{exp}\).

a must have been initialized (see bfp_complex_s16_init()).

Parameters:
  • a[out] BFP vector to update

  • b[in] New value each complex mantissa is set to

  • exp[in] New exponent for the BFP vector

void bfp_complex_s16_use_exponent(bfp_complex_s16_t *a, const exponent_t exp)

Modify a complex 16-bit BFP vector to use a specified exponent.

This function forces complex BFP vector \(\bar A\) to use a specified exponent. The mantissa vector \(\bar a\) will be bit-shifted left or right to compensate for the changed exponent.

This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.

Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).

If the required fixed-point Q-format is QX.Y, where Y is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameter exp) is -Y.

a points to input BFP vector \(\bar A\), with complex mantissa vector \(\bar a\) and exponent \(a\_exp\). a is updated in place to produce resulting BFP vector \(\bar{\tilde{A}}\) with complex mantissa vector \(\bar{\tilde{a}}\) and exponent \(\tilde{a}\_exp\).

exp is \(\tilde{a}\_exp\), the required exponent. \(\Delta{}p = \tilde{a}\_exp - a\_exp\) is the required change in exponent.

If \(\Delta{}p = 0\), the BFP vector is left unmodified.

If \(\Delta{}p > 0\), the required exponent is larger than the current exponent and an arithmetic right-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When applying a right-shift, precision may be lost by discarding the \(\Delta{}p\) least significant bits.

If \(\Delta{}p < 0\), the required exponent is smaller than the current exponent and a left-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 16-bit saturation bounds.

The exponent and headroom of a are updated by this function.

Operation Performed

\[\begin{split}\begin{aligned} & \Delta{}p = \tilde{a}\_exp - a\_exp \\ & \tilde{a_k} \leftarrow sat_{16}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{aligned}\end{split}\]

Parameters:
  • a[inout] Input BFP vector \(\bar A\) / Output BFP vector \(\bar{\tilde{A}}\)

  • exp[in] The required exponent, \(\tilde{a}\_exp\)

headroom_t bfp_complex_s16_headroom(bfp_complex_s16_t *b)

Get the headroom of a complex 16-bit BFP vector.

The headroom of a complex vector is the number of bits that the real and imaginary parts of each of its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.

In a BFP context, headroom applies to mantissas only, not exponents.

In particular, if the complex 16-bit mantissa vector \(\bar x\) has \(N\) bits of headroom, then for any element \(x_k\) of \(\bar x\)

\(-2^{15-N} \le Re\{x_k\} < 2^{15-N}\)

and

\(-2^{15-N} \le Im\{x_k\} < 2^{15-N}\)

And for any element \(X_k = x_k \cdot 2^{x\_exp}\) of a complex BFP vector \(\bar X\)

\(-2^{15 + x\_exp - N} \le Re\{X_k\} < 2^{15 + x\_exp - N} \)

and

\(-2^{15 + x\_exp - N} \le Im\{X_k\} < 2^{15 + x\_exp - N} \)

This function determines the headroom of b, updates b->hr with that value, and then returns b->hr.

Parameters:
  • b – complex BFP vector to get the headroom of

Returns:

Headroom of complex BFP vector b

void bfp_complex_s16_shl(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const left_shift_t b_shl)

Apply a left-shift to the mantissas of a complex 16-bit BFP vector.

Each complex mantissa of input BFP vector \(\bar B\) is left-shifted b_shl bits and stored in the corresponding element of output BFP vector \(\bar A\).

This operation can be used to add or remove headroom from a BFP vector.

b_shr is the number of bits that the real and imaginary parts of each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values for b_shl will right-shift the mantissas.

a and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 16-bit range (the open interval \((-2^{15}, 2^{15})\)). To avoid saturation, b_shl should be no greater than the headroom of b (b->hr).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow sat_{16}( \lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & Im\{a_k\} \leftarrow sat_{16}( \lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{aligned}\end{split}\]

Parameters:
  • a[out] Complex output BFP vector \(\bar A\)

  • b[in] Complex input BFP vector \(\bar B\)

  • b_shl[in] Signed arithmetic left-shift to be applied to mantissas of \(\bar B\).

void bfp_complex_s16_real_mul(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_s16_t *c)

Multiply a complex 16-bit BFP vector element-wise by a real 16-bit BFP vector.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vector \(\bar B\) and real input BFP vector \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s16_init() and bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input real BFP vector \(\bar C\)

void bfp_complex_s16_mul(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector element-wise another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_conj_mul(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector element-wise by the complex conjugate of another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vectors \(\bar B\), and \((C_k)^*\), the complex conjugate of the corresponding element of complex input BFP vector \(\bar C\).

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot (C_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_macc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by another element-wise and add the result to a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k + (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_nmacc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by another element-wise and subtract the result from a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k - (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_conj_macc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by the complex conjugate of another element-wise and add the result to a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k + (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_conj_nmacc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by the complex conjugate of another element-wise and subtract the result from a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k - (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_real_scale(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const float alpha)

Multiply a complex 16-bit BFP vector by a real scalar.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\), and real scalar \(\alpha\cdot 2^{\alpha\_exp}\), where \(\alpha\) and \(\alpha\_exp\) are the mantissa and exponent respectively of parameter alpha. a and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • alpha[in] Real scalar by which \(\bar B\) is multiplied

void bfp_complex_s16_scale(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const float_complex_s16_t alpha)

Multiply a complex 16-bit BFP vector by a complex scalar.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\), and complex scalar \(\alpha\cdot 2^{\alpha\_exp}\), where \(\alpha\) and \(\alpha\_exp\) are the complex mantissa and exponent respectively of parameter alpha.

a and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • alpha[in] Complex scalar by which \(\bar B\) is multiplied

void bfp_complex_s16_add(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Add one complex 16-bit BFP vector to another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the sum of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} + \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_add_scalar(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const float_complex_s16_t c)

Add a complex scalar to a complex 16-bit BFP vector.

Add a real scalar \(c\) to input BFP vector \(\bar B\) and store the result in BFP vector \(\bar A\).

a, and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} + c \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex scalar \(c\)

void bfp_complex_s16_sub(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Subtract one complex 16-bit BFP vector from another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the difference between \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} - \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s16_to_bfp_complex_s32(bfp_complex_s32_t *a, const bfp_complex_s16_t *b)

Convert a complex 16-bit BFP vector to a complex 32-bit BFP vector.

Each complex 32-bit output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the value of \(B_k\), the corresponding element of complex 16-bit input BFP vector \(\bar B\), sign-extended to 32 bits.

a and b must have been initialized (see bfp_complex_s32_init() and bfp_complex_s16_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \overset{32-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex 32-bit BFP vector \(\bar A\)

  • b[in] Input complex 16-bit BFP vector \(\bar B\)

void bfp_complex_s16_squared_mag(bfp_s16_t *a, const bfp_complex_s16_t *b)

Get the squared magnitude of each element of a complex 16-bit BFP vector.

Each element \(A_k\) of real output BFP vector \(\bar A\) is set to the squared magnitude of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).

a and b must have been initialized (see bfp_s16_init() bfp_complex_s16_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot (B_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } (B_k)^* \text{ is the complex conjugate of } B_k \end{aligned}\end{split}\]

Parameters:
  • a[out] Output real BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

void bfp_complex_s16_mag(bfp_s16_t *a, const bfp_complex_s16_t *b)

Get the magnitude of each element of a complex 16-bit BFP vector.

Each element \(A_k\) of real output BFP vector \(\bar A\) is set to the magnitude of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).

a and b must have been initialized (see bfp_s16_init() bfp_complex_s16_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output real BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

float_complex_s32_t bfp_complex_s16_sum(const bfp_complex_s16_t *b)

Get the sum of elements of a complex 16-bit BFP vector.

The elements of complex input BFP vector \(\bar B\) are summed together. The result is a complex 32-bit floating-point scalar \(a\), which is returned.

b must have been initialized (see bfp_complex_s16_init()).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow \sum_{k=0}^{N-1} \left( b_k \cdot 2^{B\_exp} \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input complex BFP vector \(\bar B\)

Returns:

\(a\), the sum of vector \(\bar B\)’s elements

void bfp_complex_s16_conjugate(bfp_complex_s16_t *a, const bfp_complex_s16_t *b)

Get the complex conjugate of each element of a complex 16-bit BFP vector.

Each element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex conjugate of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } B_k^* \text{ is the complex conjugate of } B_k \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

float_s64_t bfp_complex_s16_energy(const bfp_complex_s16_t *b)

Get the energy of a complex 16-bit BFP vector.

The energy of a complex 16-bit BFP vector here is the sum of the squared magnitudes of each of the vector’s elements.

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow \sum_{k=0}^{N-1} \left( \left|b_k \cdot 2^{B\_exp}\right|^2 \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input complex BFP vector \(\bar B\)

Returns:

\(a\), the energy of vector \(\bar B\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Block Floating-Point API$$$Complex 32-bit Block Floating-Point API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/bfp/bfp_complex_s32.html#complex-32-bit-block-floating-point-api
group bfp_complex_s32_api

Functions

void bfp_complex_s32_init(bfp_complex_s32_t *a, complex_s32_t *data, const exponent_t exp, const unsigned length, const unsigned calc_hr)

Initialize a 32-bit complex BFP vector.

This function initializes each of the fields of a.

Unlike bfp_complex_s16_t, complex 32-bit BFP vectors use a single buffer to store the real and imaginary parts of each mantissa, such that the imaginary part of element k follows the real part of element k in memory. data points to the memory buffer used to store elements of the vector, and must be at least length * 8 bytes long.

exp is the exponent assigned to the BFP vector. The logical value associated with the kth complex element of the vector after initialization will be \( \left(data_{2k} + i\cdot data_{2k+1} \right)\cdot2^{exp} \).

If calc_hr is false, a->hr is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initialize a->hr.

Parameters:
  • a[out] BFP vector struct to initialize

  • data[in] complex_s32_t buffer used to back a

  • exp[in] Exponent of BFP vector

  • length[in] Number of elements in BFP vector

  • calc_hr[in] Boolean indicating whether the HR of the BFP vector should be calculated

bfp_complex_s32_t bfp_complex_s32_alloc(const unsigned length)

Dynamically allocate a complex 32-bit BFP vector from the heap.

If allocation was unsuccessful, the data field of the returned vector will be NULL, and the length field will be zero. Otherwise, data will point to the allocated memory and the length field will be the user-specified length. The length argument must not be zero.

Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_complex_s32_set() on the retuned BFP vector.

BFP vectors allocated using this function must be deallocated using bfp_complex_s32_dealloc() to avoid a memory leak.

To initialize a BFP vector using static memory allocation, use bfp_complex_s32_init() instead.

Note

Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.

Parameters:
  • length[in] The length of the BFP vector to be allocated (in elements)

Returns:

Complex 32-bit BFP vector

void bfp_complex_s32_dealloc(bfp_complex_s32_t *vector)

Deallocate a complex 32-bit BFP vector allocated by bfp_complex_s32_alloc().

Use this function to free the heap memory allocated by bfp_complex_s32_alloc().

BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_complex_s32_t which has not had its flags or data manually manipulated, including:

In the latter two cases, this function does nothing. In the former, the data, length and flags fields of vector are cleared to zero.

Parameters:
  • vector[in] BFP vector to be deallocated.

void bfp_complex_s32_set(bfp_complex_s32_t *a, const complex_s32_t b, const exponent_t exp)

Set all elements of a complex 32-bit BFP vector to a specified value.

The exponent of a is set to exp, and each element’s mantissa is set to b.

After performing this operation, all elements will represent the same value \(b \cdot 2^{exp}\).

a must have been initialized (see bfp_complex_s32_init()).

Parameters:
  • a[out] BFP vector to update

  • b[in] New value each complex mantissa is set to

  • exp[in] New exponent for the BFP vector

void bfp_complex_s32_use_exponent(bfp_complex_s32_t *a, const exponent_t exp)

Modify a complex 32-bit BFP vector to use a specified exponent.

This function forces complex BFP vector \(\bar A\) to use a specified exponent. The mantissa vector \(\bar a\) will be bit-shifted left or right to compensate for the changed exponent.

This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.

Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).

If the required fixed-point Q-format is QX.Y, where Y is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameter exp) is -Y.

a points to input BFP vector \(\bar A\), with complex mantissa vector \(\bar a\) and exponent \(a\_exp\). a is updated in place to produce resulting BFP vector \( \tilde{A} \) with complex mantissa vector \( \tilde{a} \) and exponent \(\tilde{a}\_exp\).

exp is \(\tilde{a}\_exp\), the required exponent. \(\Delta{}p = \tilde{a}\_exp - a\_exp\) is the required change in exponent.

If \(\Delta{}p = 0\), the BFP vector is left unmodified.

If \(\Delta{}p > 0\), the required exponent is larger than the current exponent and an arithmetic right-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When applying a right-shift, precision may be lost by discarding the \(\Delta{}p\) least significant bits.

If \(\Delta{}p < 0\), the required exponent is smaller than the current exponent and a left-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 32-bit saturation bounds.

The exponent and headroom of a are updated by this function.

Operation Performed

\[\begin{split}\begin{aligned} & \Delta{}p = \tilde{a}\_exp - a\_exp \\ & \tilde{a_k} \leftarrow sat_{32}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{aligned}\end{split}\]

Parameters:
  • a[inout] Input BFP vector \(\bar A\) / Output BFP vector \(\tilde{A}\)

  • exp[in] The required exponent, \(\tilde{a}\_exp\)

headroom_t bfp_complex_s32_headroom(bfp_complex_s32_t *b)

Get the headroom of a complex 32-bit BFP vector.

The headroom of a complex vector is the number of bits that the real and imaginary parts of each of its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.

In a BFP context, headroom applies to mantissas only, not exponents.

In particular, if the complex 32-bit mantissa vector \(\bar x\) has \(N\) bits of headroom, then for any element \(x_k\) of \(\bar x\)

\(-2^{31-N} \le Re\{x_k\} < 2^{31-N}\)

and

\(-2^{31-N} \le Im\{x_k\} < 2^{31-N}\)

And for any element \(X_k = x_k \cdot 2^{x\_exp}\) of a complex BFP vector \(\bar X\)

\(-2^{31 + x\_exp - N} \le Re\{X_k\} < 2^{31 + x\_exp - N} \)

and

\(-2^{31 + x\_exp - N} \le Im\{X_k\} < 2^{31 + x\_exp - N} \)

This function determines the headroom of b, updates b->hr with that value, and then returns b->hr.

Parameters:
  • b – complex BFP vector to get the headroom of

Returns:

Headroom of complex BFP vector b

void bfp_complex_s32_shl(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const left_shift_t b_shl)

Apply a left-shift to the mantissas of a complex 32-bit BFP vector.

Each complex mantissa of input BFP vector \(\bar B\) is left-shifted b_shl bits and stored in the corresponding element of output BFP vector \(\bar A\).

This operation can be used to add or remove headroom from a BFP vector.

b_shl is the number of bits that the real and imaginary parts of each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values for b_shl will right-shift the mantissas.

a and b must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 32-bit range (the open interval \((-2^{31}, 2^{31})\)). To avoid saturation, b_shl should be no greater than the headroom of b (b->hr).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow sat_{32}( \lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & Im\{a_k\} \leftarrow sat_{32}( \lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{aligned}\end{split}\]

Parameters:
  • a[out] Complex output BFP vector \(\bar A\)

  • b[in] Complex input BFP vector \(\bar B\)

  • b_shl[in] Signed arithmetic left-shift to be applied to mantissas of \(\bar B\).

void bfp_complex_s32_real_mul(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_s32_t *c)

Multiply a complex 32-bit BFP vector element-wise by a real 32-bit BFP vector.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vector \(\bar B\) and real input BFP vector \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s32_init() and bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input real BFP vector \(\bar C\)

void bfp_complex_s32_mul(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Multiply one complex 32-bit BFP vector element-wise by another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_conj_mul(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Multiply one complex 32-bit BFP vector element-wise by the complex conjugate of another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vectors \(\bar B\), and \((C_k)^*\), the complex conjugate of the corresponding element of complex input BFP vector \(\bar C\).

a, b and c must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot (C_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_macc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Multiply one complex 32-bit BFP vector by another element-wise and add the result to a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k + (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_nmacc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Multiply one complex 32-bit BFP vector by another element-wise and subtract the result from a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k - (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_conj_macc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Multiply one complex 32-bit BFP vector by the complex conjugate of another element-wise and add the result to a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k + (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_conj_nmacc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Multiply one complex 32-bit BFP vector by the complex conjugate of another element-wise and subtract the result from a third vector.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow A_k - (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{aligned}\end{split}\]

Parameters:
  • acc[inout] Input/Output accumulator complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_real_scale(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const float_s32_t alpha)

Multiply a complex 32-bit BFP vector by a real scalar.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\), and real scalar \(\alpha\cdot 2^{\alpha\_exp}\), where \(\alpha\) and \(\alpha\_exp\) are the mantissa and exponent respectively of parameter alpha.

a and b must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • alpha[in] Real scalar by which \(\bar B\) is multiplied

void bfp_complex_s32_scale(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const float_complex_s32_t alpha)

Multiply a complex 32-bit BFP vector by a complex scalar.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\), and complex scalar \(\alpha\cdot 2^{\alpha\_exp}\), where \(\alpha\) and \(\alpha\_exp\) are the complex mantissa and exponent respectively of parameter alpha.

a and b must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • alpha[in] Complex scalar by which \(\bar B\) is multiplied

void bfp_complex_s32_add(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Add one complex 32-bit BFP vector to another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the sum of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} + \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_add_scalar(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const float_complex_s32_t c)

Add a complex scalar to a complex 32-bit BFP vector.

Add a real scalar \(c\) to input BFP vector \(\bar B\) and store the result in BFP vector \(\bar A\).

a, and b must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} + c \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex scalar \(c\)

void bfp_complex_s32_sub(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)

Subtract one complex 32-bit BFP vector from another.

Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the difference between \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.

a, b and c must have been initialized (see bfp_complex_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed

\[\begin{aligned} \bar{A} \leftarrow \bar{B} - \bar{C} \end{aligned}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

  • c[in] Input complex BFP vector \(\bar C\)

void bfp_complex_s32_to_bfp_complex_s16(bfp_complex_s16_t *a, const bfp_complex_s32_t *b)

Convert a complex 32-bit BFP vector to a complex 16-bit BFP vector.

Each complex 16-bit output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the value of \(B_k\), the corresponding element of complex 32-bit input BFP vector \(\bar B\), with its bit-depth reduced to 16 bits.

a and b must have been initialized (see bfp_complex_s16_init() and bfp_complex_s32_init()), and must be the same length.

This function preserves as much precision as possible.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \overset{16-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex 16-bit BFP vector \(\bar A\)

  • b[in] Input complex 32-bit BFP vector \(\bar B\)

void bfp_complex_s32_squared_mag(bfp_s32_t *a, const bfp_complex_s32_t *b)

Get the squared magnitude of each element of a complex 32-bit BFP vector.

Each element \(A_k\) of real output BFP vector \(\bar A\) is set to the squared magnitude of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).

a and b must have been initialized (see bfp_s32_init() bfp_complex_s32_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k \cdot (B_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } (B_k)^* \text{ is the complex conjugate of } B_k \end{aligned}\end{split}\]

Parameters:
  • a[out] Output real BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

void bfp_complex_s32_mag(bfp_s32_t *a, const bfp_complex_s32_t *b)

Get the magnitude of each element of a complex 32-bit BFP vector.

Each element \(A_k\) of real output BFP vector \(\bar A\) is set to the magnitude of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).

a and b must have been initialized (see bfp_s32_init() bfp_complex_s32_init()), and must be the same length.

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output real BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

float_complex_s64_t bfp_complex_s32_sum(const bfp_complex_s32_t *b)

Get the sum of elements of a complex 32-bit BFP vector.

The elements of complex input BFP vector \(\bar B\) are summed together. The result is a complex 64-bit floating-point scalar \(a\), which is returned.

b must have been initialized (see bfp_complex_s32_init()).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow \sum_{k=0}^{N-1} \left( b_k \cdot 2^{B\_exp} \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input complex BFP vector \(\bar B\)

Returns:

\(a\), the sum of vector \(\bar B\)’s elements

void bfp_complex_s32_conjugate(bfp_complex_s32_t *a, const bfp_complex_s32_t *b)

Get the complex conjugate of each element of a complex 32-bit BFP vector.

Each element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex conjugate of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).

Operation Performed

\[\begin{split}\begin{aligned} & A_k \leftarrow B_k^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } B_k^* \text{ is the complex conjugate of } B_k \end{aligned}\end{split}\]

Parameters:
  • a[out] Output complex BFP vector \(\bar A\)

  • b[in] Input complex BFP vector \(\bar B\)

float_s64_t bfp_complex_s32_energy(const bfp_complex_s32_t *b)

Get the energy of a complex 32-bit BFP vector.

The energy of a complex 32-bit BFP vector here is the sum of the squared magnitudes of each of the vector’s elements.

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow \sum_{k=0}^{N-1} \left( \left|b_k \cdot 2^{B\_exp}\right|^2 \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{aligned}\end{split}\]

Parameters:
  • b[in] Input complex BFP vector \(\bar B\)

Returns:

\(a\), the energy of vector \(\bar B\)

void bfp_complex_s32_make(bfp_complex_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)

Create complex 32-bit BFP vector from real and imaginary parts.

Create a complex 32-bit BFP vector as the sum of a real vector \(\bar B\) and imaginary vector \(\bar{C} i\).

a, b and c must have been initialized (see bfp_complex_s32_init() and bfp_s32_init()), must be the same length. &a->data[0] must be a double-word-aligned address.

Operation Performed

\[\begin{aligned} & \bar{A} \leftarrow \bar{B} + \bar{C} i \end{aligned}\]

Parameters:
  • a[out] Complex BFP output vector \(\bar A\)

  • b[in] Real BFP input vector \(\bar B\)

  • c[in] Real BFP input vector \(\bar C\)

void bfp_complex_s32_real_part(bfp_s32_t *a, const bfp_complex_s32_t *b)

Extract the real part of a complex 32-bit BFP vector.

This function populates the real 32-bit BFP vector \(\bar A\) with the real part of complex 32-bit BFP vector \(\bar B\).

&b->data[0] must be a double-word-aligned address.

Operation Performed

\[\begin{aligned} & \bar{A} \leftarrow {Real}\{\bar{B}\} \end{aligned}\]

Parameters:
  • a[out] Real BFP output vector \(\bar A\)

  • b[in] Complex BFP input vector \(\bar B\)

void bfp_complex_s32_imag_part(bfp_s32_t *a, const bfp_complex_s32_t *b)

Extract the imaginary part of a complex 32-bit BFP vector.

This function populates the real 32-bit BFP vector \(\bar A\) with the imaginary part of complex 32-bit BFP vector \(\bar B\).

&b->data[0] must be a double-word-aligned address.

Operation Performed

\[\begin{aligned} & \bar{A} \leftarrow {Imag}\{\bar{B}\} \end{aligned}\]

Parameters:
  • a[out] Real BFP output vector \(\bar A\)

  • b[in] Complex BFP input vector \(\bar B\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Discrete Cosine Transform API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/dct/dct_index.html#discrete-cosine-transform-api

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Discrete Cosine Transform API$$$DCT API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/dct/dct_index.html#dct-api-quick-reference

Note: The forward DCTs are Type-II. The inverse of the Type-II DCT is the Type-III DCT, so Type-II and Type-III are supported here.

DCT functions - quick reference

Brief

Forward Function

Inverse Function

6-point DCT

dct6_forward()

dct6_inverse()

8-point DCT

dct8_forward()

dct8_inverse()

12-point DCT

dct12_forward()

dct12_inverse()

16-point DCT

dct16_forward()

dct16_inverse()

24-point DCT

dct24_forward()

dct24_inverse()

32-point DCT

dct32_forward()

dct32_inverse()

48-point DCT

dct48_forward()

dct48_inverse()

64-point DCT

dct64_forward()

dct64_inverse()

8-by-8 2-dimensional DCT

dct8x8_forward()

dct8x8_inverse()

group dct_api

Functions

void dct6_forward(int32_t y[6], const int32_t x[6])

6-point 32-bit forward DCT.

This function performs a 6-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^4\) (see dct6_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^4} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 6 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct8_forward(int32_t y[8], const int32_t x[8])

8-point 32-bit forward DCT.

This function performs a 8-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^4\) (see dct8_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^4} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 8 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct12_forward(int32_t y[12], const int32_t x[12])

12-point 32-bit forward DCT.

This function performs a 12-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^7\) (see dct12_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^7} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 12 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct16_forward(int32_t y[16], const int32_t x[16])

16-point 32-bit forward DCT.

This function performs a 16-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^7\) (see dct16_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^7} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 16 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct24_forward(int32_t y[24], const int32_t x[24])

24-point 32-bit forward DCT.

This function performs a 24-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^{10}\) (see dct24_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^{10}} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 24 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct32_forward(int32_t y[32], const int32_t x[32])

32-point 32-bit forward DCT.

This function performs a 32-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^{10}\) (see dct32_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^{10}} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 32 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct48_forward(int32_t y[48], const int32_t x[48])

48-point 32-bit forward DCT.

This function performs a 48-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^{13}\) (see dct48_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^{13}} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 48 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct64_forward(int32_t y[64], const int32_t x[64])

64-point 32-bit forward DCT.

This function performs a 64-point forward type-II DCT on input vector \(\bar x\), and populates output vector \(\bar y\) with the result. To avoid possible overflow or saturation, output \(\bar y\) is scaled down by a factor of \(2^{13}\) (see dct64_exp).

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{2^{13}} \left( 2\sum_{n=0}^{N-1} x_n \cos\left( k\pi \frac{2n+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 64 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct6_inverse(int32_t y[6], const int32_t x[6])

6-point 32-bit inverse DCT.

This function performs a 6-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 6 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct8_inverse(int32_t y[8], const int32_t x[8])

8-point 32-bit inverse DCT.

This function performs a 8-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 8 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct12_inverse(int32_t y[12], const int32_t x[12])

12-point 32-bit inverse DCT.

This function performs a 12-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 12 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct16_inverse(int32_t y[16], const int32_t x[16])

16-point 32-bit inverse DCT.

This function performs a 16-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 16 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct24_inverse(int32_t y[24], const int32_t x[24])

24-point 32-bit inverse DCT.

This function performs a 24-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 24 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct32_inverse(int32_t y[32], const int32_t x[32])

32-point 32-bit inverse DCT.

This function performs a 32-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 32 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct48_inverse(int32_t y[48], const int32_t x[48])

48-point 32-bit inverse DCT.

This function performs a 48-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 48 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

void dct64_inverse(int32_t y[64], const int32_t x[64])

64-point 32-bit inverse DCT.

This function performs a 64-point inverse DCT (same as type-III DCT) on input vector \(\bar x\), and populates output vector \(\bar y\) with the result.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \frac{1}{N} \left( \frac{x_0}{2} + \ \sum_{n=1}^{N-1} x_n \cos\left( n\pi \frac{2k+1}{2N} \right) \right) \\ & \qquad\text{for } k = 0,1,\dots,(N-1) \\ & \qquad\text{with } N = 64 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

headroom_t dct8x8_forward(int8_t y[8][8], const int8_t x[8][8], const right_shift_t sat)

8-by-8 2D 8-bit forward DCT.

This function performs a 2-dimensional 8-by-8 type-II DCT on 8-bit input tensor \(\bar x\) (with elements \(x_{rc}\)). Output tensor \(\bar y\) (with elements \(y_{rc}\)) is populated with the result.

This 2D DCT is performed by first applying a 1D 8-point DCT across each row of \(\bar x\), and then applying a 1D 8-point DCT to each column of that intermediate tensor.

The output is scaled by a factor of \(2^{-\mathtt{sat}-8}\). With \(\mathtt{sat}=0\) this scaling is just enough to avoid any possible saturation. If saturation is considered acceptable, or known a priori to not be possible, negative values for \(\mathtt{sat}\) can be used to increase precision on the output.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_{rc} \leftarrow \frac{4 \sum_{m=0}^{N-1} \sum_{n=0}^{N-1} \left( \ x_{mn} \cos\left( c\pi\frac{2n+1}{2N} \right) \cos\left(r\pi\frac{2m+1}{2N} \right)\right)}{2^{\mathtt{sat}+8}}\\ & \\ & \qquad\text{for } r,c \in \{0,1,\dots,(N-1)\} \\ & \qquad\text{with } N = 8 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

  • sat[in] Additional output scaling exponent.

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

headroom_t dct8x8_inverse(int8_t y[8][8], const int8_t x[8][8], const right_shift_t sat)

8-by-8 2D 8-bit inverse DCT.

This function performs a 2-dimensional 8-by-8 type-III (inverse) DCT on 8-bit input tensor \(\bar x\) (with elements \(x_{rc}\)). Output tensor \(\bar y\) (with elements \(y_{rc}\)) is populated with the result.

This 2D DCT is performed by first applying a 1D 8-point DCT across each row of \(\bar x\), and then applying a 1D 8-point DCT to each column of that intermediate tensor.

The output is scaled by a factor of \(2^{-\mathtt{sat}}\). With \(\mathtt{sat}=0\) this scaling is just enough to avoid any possible saturation. If saturation is considered acceptable, or known a priori to not be possible, negative values for \(\mathtt{sat}\) can be used to increase precision on the output.

This operation may be safely performed in-place if x and y point to the same vector.

x and y must point to 8-byte-aligned addresses.

Operation Performed

\[\begin{split}\begin{aligned} & y_{rc} \leftarrow \frac{ \frac{1}{N^2} \sum_{m=0}^{N-1} \sum_{n=0}^{N-1} \left( \ x_{mn} \cos\left( n\pi\frac{2c+1}{2N} \right) \cos\left(m\pi\frac{2r+1}{2N} \right)\right)}{2^{\mathtt{sat}}}\\ & \\ & \qquad\text{for } r,c \in \{0,1,\dots,(N-1)\} \\ & \qquad\text{with } N = 8 \\ \end{aligned}\end{split}\]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

  • sat[in] Additional output scaling exponent.

Throws ET_LOAD_STORE:

Raised if `x` or `y` is not double word-aligned (See Note: Vector Alignment)

Variables

static const exponent_t dct6_exp = 4

Scaling exponent associated with dct6_forward()

Let \(\bar x\) be the input to dct6_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct6\_exp\)

static const exponent_t dct8_exp = 4

Scaling exponent associated with dct8_forward()

Let \(\bar x\) be the input to dct6_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct8_exp\)

static const exponent_t dct12_exp = 7

Scaling exponent associated with dct12_forward()

Let \(\bar x\) be the input to dct12_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct12_exp\)

static const exponent_t dct16_exp = 7

Scaling exponent associated with dct16_forward()

Let \(\bar x\) be the input to dct16_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct16_exp\)

static const exponent_t dct24_exp = 10

Scaling exponent associated with dct24_forward()

Let \(\bar x\) be the input to dct24_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct24_exp\)

static const exponent_t dct32_exp = 10

Scaling exponent associated with dct32_forward()

Let \(\bar x\) be the input to dct32_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct32_exp\)

static const exponent_t dct48_exp = 13

Scaling exponent associated with dct48_forward()

Let \(\bar x\) be the input to dct48_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct48\_exp\)

static const exponent_t dct64_exp = 13

Scaling exponent associated with dct64_forward()

Let \(\bar x\) be the input to dct64_forward() and \(\bar y\) the output. If \(x\_exp\) and \(y\_exp\) are the exponents associated with \(\bar x\) and \(\bar y\) respectively, then the following relation holds: \(y\_exp = x\_exp + dct64\_exp\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Fast Fourier Transform API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/fft/fft_index.html#fast-fourier-transform-api

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Fast Fourier Transform API$$$FFT API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/fft/fft_index.html#fft-api-quick-reference

FFT functions - quick reference

Brief

Forward Function

Inverse Function

BFP FFT on single real signal

bfp_fft_forward_mono()

bfp_fft_inverse_mono()

BFP FFT on single complex signal

bfp_fft_forward_complex()

bfp_fft_inverse_complex()

BFP FFT on pair of real signals

bfp_fft_forward_stereo()

bfp_fft_inverse_stereo()

BFP spectrum packing

bfp_fft_unpack_mono()

bfp_fft_pack_mono()

Low-level decimation-in-time FFT

fft_dit_forward()

fft_dit_inverse()

Low-level decimation-in-frequency FFT

fft_dif_forward()

fft_dif_inverse()

FFT on real signal of float

fft_f32_forward()

fft_f32_inverse()

group fft_api

Functions

bfp_complex_s32_t *bfp_fft_forward_mono(bfp_s32_t *x)

Performs a forward real Discrete Fourier Transform on a real 32-bit sequence.

Performs an \(N\)-point forward real DFT on the real 32-bit BFP vector x, where \(N\) is x->length. The operation is performed in-place, resulting in an \(N/2\)-element complex 32-bit BFP vector.

Operation Performed

\[\begin{split}\begin{aligned} & X[f] = \sum_{n=0}^{N-1} \left( x[n]\cdot e^{-j2\pi fn/N} \right) \\ & \text{ for } 0 \le f \le N/2 \end{aligned}\end{split}\]

where \(x[n]\) is the BFP vector initially represented by x, and \(X[f]\) is the DFT of \(x[n]\) represented by the returned pointer.

The exponent, headroom, length and data contents of x are all updated by this function, though x->data will continue to point to the same address.

x->length must be a power of 2, and must be no larger than (1<<MAX_DIT_FFT_LOG2).

This function returns a bfp_complex_s32_t pointer. This points to the same address as . This is intended as a convenience for user code.

Upon completion, the spectrum data is encoded in x->data as specified for real DFTs in spectrum_packing. That is, x->data[f] for 1 <= f < (x->length) represent \(X[f]\) for \(1 \le f < (N/2)\) and x->data[0] represents \(X[0] + j X[N/2]\).

Example

// Initialize time domain data with samples.
int32_t buffer[N] = { ... };
bfp_s32_t samples;
bfp_s32_init(&samples, buffer, 0, N, 1);
// Perform the forward DFT
{
    bfp_complex_s32_t* spectrum = bfp_fft_forward_mono(&samples);
    // `samples` should no longer be used.
    // Operate on frequency domain data using `spectrum`
    ...
    // Perform the inverse DFT to go back to time domain
    bfp_fft_inverse_mono(spectrum); // returns (bfp_s32_t*) which is the address of `samples`
}
// Use `samples` again to use new time domain data. 
...

Parameters:
  • x[inout] The BFP vector \(x[n]\) to be DFTed.

Returns:

Address of input BFP vector x, cast as bfp_complex_s32_t*.

bfp_s32_t *bfp_fft_inverse_mono(bfp_complex_s32_t *x)

Performs an inverse real Discrete Fourier Transform on a complex 32-bit sequence.

Performs an \(N\)-point inverse real DFT on the real 32-bit BFP vector x, where \(N\) is 2*x->length. The operation is performed in-place, resulting in an \(N\)-element real 32-bit BFP vector.

Operation Performed

\[\begin{split}\begin{aligned} & x[n] = \frac{1}{N}\sum_{f=0}^{N/2} \left( X[f]\cdot e^{j2\pi fn/N} \right) \\ & \text{ for } 0 \le n < N \end{aligned}\end{split}\]

where \(X[f]\) is the BFP vector initially represented by x, and \(x[n]\) is the IDFT of \(X[f]\) represented by the returned pointer.

The exponent, headroom, length and data contents of x are all updated by this function, though x->data will continue to point to the same address.

x->length must be a power of 2, and must be no larger than (1<<(MAX_DIT_FFT_LOG2-1)).

This function returns a bfp_s32_t pointer. This points to the same address as . This is intended as a convenience for user code.

When calling, the spectrum data must be encoded in x->data as specified for real DFTs in spectrum_packing. That is, x->data[f] for 1 <= f < (x->length) represent \(X[f]\) for \(1 \le f < N/2\), and x->data[0] represents \(X[0] + j X[N/2]\).

Example

// Initialize time domain data with samples.
int32_t buffer[N] = { ... };
bfp_s32_t samples;
bfp_s32_init(&samples, buffer, 0, N, 1);
// Perform the forward DFT
{
    bfp_complex_s32_t* spectrum = bfp_fft_forward_mono(&samples);
    // `samples` should no longer be used.
    // Operate on frequency domain data using `spectrum`
    ...
    // Perform the inverse DFT to go back to time domain
    bfp_fft_inverse_mono(spectrum); // returns (bfp_s32_t*) which is the address of `samples`
}
// Use `samples` again to use new time domain data. 
...

Parameters:
  • x[inout] The BFP vector \(X[f]\) to be IDFTed.

Returns:

Address of input BFP vector x, cast as bfp_s32_t*.

void bfp_fft_forward_complex(bfp_complex_s32_t *x)

Performs a forward complex Discrete Fourier Transform on a complex 32-bit sequence.

Performs an \(N\)-point forward complex DFT on the complex 32-bit BFP vector x, where \(N\) is x->length. The operation is performed in-place.

Operation Performed

\[\begin{split}\begin{aligned} & X[f] = \sum_{n=0}^{N-1} \left( x[n]\cdot e^{-j2\pi fn/N} \right) \\ & \text{ for } 0 \le f < N \end{aligned}\end{split}\]

where \(x[n]\) is the BFP vector initially represented by x, and \(X[f]\) is the DFT of \(x[n]\), also represented by x upon completion.

The exponent, headroom and data contents of x are updated by this function. x->data will continue to point to the same address.

x->length ( \(N\)) must be a power of 2, and must be no larger than (1<<MAX_DIT_FFT_LOG2).

Upon completion, the spectrum data is encoded in x as specified in spectrum_packing. That is, x->data[f] for 0 <= f < (x->length) represent \(X[f]\) for \(0 \le f < N\).

Example

// Initialize complex time domain data with samples.
complex_s32_t buffer[N] = { ... };
bfp_complex_s32_t vector;
bfp_complex_s32_init(&vector, buffer, 0, N, 1);
// Perform the forward DFT
bfp_fft_forward_mono(&vector);
// Operate on frequency domain data
...
// Perform the inverse DFT to go back to time domain
bfp_fft_inverse_mono(&vector);
// `vector` contains (complex) time-domain data again
...

Parameters:
  • x[inout] The BFP vector \(x[n]\) to be DFTed.

void bfp_fft_inverse_complex(bfp_complex_s32_t *x)

Performs an inverse complex Discrete Fourier Transform on a complex 32-bit sequence.

Performs an \(N\)-point inverse complex DFT on the complex 32-bit BFP vector x, where \(N\) is x->length. The operation is performed in-place.

Operation Performed

\[\begin{split}\begin{aligned} & x[n] = \frac{1}{N}\sum_{f=0}^{N-1} \left( X[f]\cdot e^{j2\pi fn/N} \right) \\ & \text{ for } 0 \le f < N \end{aligned}\end{split}\]

where \(X[f]\) is the BFP vector initially represented by x, and \(x[n]\) is the DFT of \(X[f]\), also represented by x upon completion.

The exponent, headroom and data contents of x are updated by this function. x->data will continue to point to the same address.

x->length must be a power of 2, and must be no larger than (1<<MAX_DIT_FFT_LOG2).

The data initially encoded in x are interpreted as specified in spectrum_packing. That is, x->data[f] for 0 <= f < (x->length) represent \(X[f]\) for \(0 \le f < N\).

Example

// Initialize complex time domain data with samples.
complex_s32_t buffer[N] = { ... };
bfp_complex_s32_t vector;
bfp_complex_s32_init(&vector, buffer, 0, N, 1);
// Perform the forward DFT
bfp_fft_forward_mono(&vector);
// Operate on frequency domain data
...
// Perform the inverse DFT to go back to time domain
bfp_fft_inverse_mono(&vector);
// `vector` contains (complex) time-domain data again
...

Parameters:
  • x[inout] The BFP vector \(x[n]\) to be IDFTed.

void bfp_fft_forward_stereo(bfp_s32_t *a, bfp_s32_t *b, complex_s32_t scratch[])

Performs a forward real Discrete Fourier Transform on a pair of real 32-bit sequences.

Performs an \(N\)-point forward real DFT on the real 32-bit BFP vectors \(\bar a\) and \(\bar b\), where \(N\) is a->length (which must equal b->length). The resulting spectra, \(\bar A\) and \(\bar B\), are placed in a and b. Each spectrum is a \(N/2\)-element complex 32-bit BFP vectors. To access the spectrum, the pointers a and b should be cast to bfp_complex_s32_t* following a call to this function.

Operation Performed

\[\begin{split}\begin{aligned} & A[f] = \sum_{n=0}^{N-1} \left( a[n]\cdot e^{-j2\pi fn/N} \right) \text{ for } 0 \le f \le N/2 \\ & B[f] = \sum_{n=0}^{N-1} \left( b[n]\cdot e^{-j2\pi fn/N} \right) \text{ for } 0 \le f \le N/2 \end{aligned}\end{split}\]

where \(a[n]\) and \(b[n]\) are the two time-domain sequences represented by input BFP vectors a and b, and \(A[f]\) and \(B[f]\) are the DFT of \(a[n]\) and \(b[n]\) respectively.

a->length ( \(N\)) must be equal to b->length, must be a power of 2, and must be no larger than(1<<MAX_DIT_FFT_LOG2)`.

The parameters and are used as both inputs and outputs. To access the result of the FFT, a and b should be cast to bfp_complex_s32_t*. The structs’ metadata (e.g. exp, hr, length) are updated by this function to reflect this change of interpretation. The bfp_s32_t references should be considered corrupted after this call (at least until bfp_fft_inverse_stereo() is called).

The spectrum data is encoded in a->data and b->data as specified for real DFTs in spectrum_packing. That is, a->data[f] for 1 <= f < (a->length) represent \(A[f]\) for \(1 \le f < (N/2)\) and a->data[0] represents \(A[0] + j A[N/2]\). Likewise for the encoding of b->data.

This function requires a scratch buffer large enough to contain \(N\) complex_s32_t elements.

Deprecated:
Example

// Initialize time domain data with samples.
int32_t bufferA[N] = { ... };
int32_t bufferB[N] = { ... };
complex_s32_t scratch[N]; // scratch buffer -- contents don't matter
bfp_s32_t channel_A, channel_B;
bfp_s32_init(&channel_A, buffer, 0, N, 1);
bfp_s32_init(&channel_B, buffer, 0, N, 1);

// Perform the forward DFT
bfp_fft_forward_stereo(&channel_A, &channel_B, scratch);

// channel_A and channel_B should now be considered clobbered as the structs are now 
// effectively bfp_complex_s32_t
bfp_complex_s32_t* chanA = (bfp_complex_s32_t*) &channel_A;
bfp_complex_s32_t* chanB = (bfp_complex_s32_t*) &channel_B;

// Operate on frequency domain data using `chanA` and `chanB`
...
// Perform the inverse DFT to go back to time domain
bfp_fft_inverse_stereo(&chanA, &chanB, scratch);

// Use channel_A and channel_B again to use new time domain data. 
...

// Suppress this from generated documentation for the time being //

Note

Use of this function is not currently recommended. It functions correctly, but a recent change in this library’s API (namely, dropping support for channel-pair vectors) means this function is no more computationally efficient than calling bfp_fft_forward_mono() on each input vector separately. Additionally, this function currently requires a scratch buffer, whereas the mono FFT does not.

Parameters:
  • a[inout] [Input] Time-domain BFP vector \(\bar a\). [Output] Frequency domain BFP vector \(\bar A\)

  • b[inout] [Input] Time-domain BFP vector \(\bar b\). [Output] Frequency domain BFP vector \(\bar B\)

  • scratch – Scratch buffer of at least a->length complex_s32_t elements

void bfp_fft_inverse_stereo(bfp_complex_s32_t *A_fft, bfp_complex_s32_t *B_fft, complex_s32_t scratch[])

Performs an inverse real Discrete Fourier Transform on a pair of complex 32-bit sequences.

Performs an \(N\)-point inverse real DFT on the 32-bit complex BFP vectors \(\bar A\) and \(\bar B\) (A_fft and B_fft respectively), where \(N\) is A_fft->length . The resulting real signals, \(\bar a\) and \(\bar b\), are placed in A_fft and B_fft. Each time-domain result is a \(N/2\)-element real 32-bit BFP vectors. To access the spectrum, the pointers A_fft and B_fft should be cast to bfp_s32_t* following a call to this function.

Operation Performed

\[\begin{split}\begin{aligned} & a[n] = \frac{1}{N}\sum_{f=0}^{N/2-1} \left( A[f]\cdot e^{j2\pi fn/N} \right) \text{ for } 0 \le n < N \\ & b[n] = \frac{1}{N}\sum_{f=0}^{N/2-1} \left( B[f]\cdot e^{j2\pi fn/N} \right) \text{ for } 0 \le n < N \end{aligned}\end{split}\]

where \(A[f]\) and \(B[f]\) are the frequency spectra represented by BFP vectors A_fft and B_fft, and \(a[n]\) and \(b[n]\) are the IDFT of \(A[f]\) and \(B[f]\).

A_fft->length ( \(N\)) must be a power of 2, and must be no larger than (1<<(MAX_DIT_FFT_LOG2-1)).

The parameters and are used as both inputs and outputs. To access the result of the IFFT, A_fft and B_fft should be cast to bfp_s32_t*. The structs’ metadata (e.g. exp, hr, length) are updated by this function to reflect this change of interpretation. The bfp_complex_s32_t references should be considered corrupted after this call.

The spectrum data encoded in A_fft->data and A_fft->data are interpreted as specified for real DFTs in spectrum_packing. That is, A_fft->data[f] for 1 <= f < (a->length) represent \(A[f]\) for \(1 \le f < (N/2)\) and A_fft->data[0] represents \(A[0] + j A[N/2]\). Likewise for the encoding of B_fft->data.

This function requires a scratch buffer large enough to contain \(2N\) complex_s32_t elements.

Deprecated:
Example

// Initialize time domain data with samples.
int32_t bufferA[N] = { ... };
int32_t bufferB[N] = { ... };
complex_s32_t scratch[N]; // scratch buffer -- contents don't matter
bfp_s32_t channel_A, channel_B;
bfp_s32_init(&channel_A, buffer, 0, N, 1);
bfp_s32_init(&channel_B, buffer, 0, N, 1);

// Perform the forward DFT
bfp_fft_forward_stereo(&channel_A, &channel_B, scratch);

// channel_A and channel_B should now be considered clobbered as the structs are now 
// effectively bfp_complex_s32_t
bfp_complex_s32_t* chanA = (bfp_complex_s32_t*) &channel_A;
bfp_complex_s32_t* chanB = (bfp_complex_s32_t*) &channel_B;

// Operate on frequency domain data using `chanA` and `chanB`
...
// Perform the inverse DFT to go back to time domain
bfp_fft_inverse_stereo(&chanA, &chanB, scratch);

// Use channel_A and channel_B again to use new time domain data. 
...

// Suppress this from generated documentation for the time being //

Note

Use of this function is not currently recommended. It functions correctly, but a recent change in this library’s API (namely, dropping support for channel-pair vectors) means this function is no more computationally efficient than calling bfp_fft_forward_mono() on each input vector separately. Additionally, this function currently requires a scratch buffer, whereas the mono FFT does not.

Parameters:
  • A_fft[inout] [Input] Freq-domain BFP vector \(\bar A\). [Output] Time domain BFP vector \(\bar b\)

  • B_fft[inout] [Input] Freq-domain BFP vector \(\bar b\). [Output] Time domain BFP vector \(\bar b\)

  • scratch – Scratch buffer of at least 2*A_fft->length complex_s32_t elements

void bfp_fft_unpack_mono(bfp_complex_s32_t *x)

Unpack the spectrum resulting from bfp_fft_forward_mono().

The DFT of a real signal is periodic with period FFT_N (the FFT length) and has a complex conjugate symmetry about index 0. These two properties guarantee that the imaginary part of both the DC component (index 0) and the Nyquist component (index FFT_N/2) of the spectrum are zero. To compute the forward FFT in-place, bfp_fft_forward_mono() packs the real part of the Nyquist rate component of the output spectrum into the imaginary part of the DC component.

This may be undesirable when operating on the signal’s complex spectrum. Use this function to unpack the Nyquist component. This function will also adjust the BFP vector’s length to reflect this unpacking.

NOTE: If you intend to unpack the spectrum using this function, the buffer for the time-domain BFP vector must have length FFT_N+2, rather than FFT_N (int32_t elements), but these should NOT be reflected in the time-domain BFP vector’s length field.

Operation Performed

\[\begin{split}\begin{aligned} & Re{x_{N/2}} && \leftarrow Im{x_0} \\ & Im{x_0} && \leftarrow 0 \\ & Im{x_{N/2}} && \leftarrow 0 \\ & x.length && \leftarrow x.length + 1 \end{aligned}\end{split}\]

NOTE: Before bfp_fft_inverse_mono() may be applied, bfp_fft_pack_mono() must be called, as the inverse FFT expects the data to be packed.

Parameters:
  • x[inout] The spectrum to be unpacked

void bfp_fft_pack_mono(bfp_complex_s32_t *x)

Pack the spectrum resulting from bfp_fft_unpack_mono().

This function applies the reverse process of bfp_fft_unpack_mono(), to prepare it for an inverse FFT using bfp_fft_inverse_mono().

Parameters:
  • x[inout] The spectrum to be packed

void fft_dit_forward(complex_s32_t x[], const unsigned N, headroom_t *hr, exponent_t *exp)

Compute a forward DFT using the decimation-in-time FFT algorithm.

This function computes the N-point forward DFT of a complex input signal using the decimation-in-time FFT algorithm. The result is computed in-place.

Operation Performed

\[\begin{split}\begin{aligned} & X[f] = \frac{1}{2^{\alpha}} \sum_{n=0}^{N-1} \left( x[n]\cdot e^{-j2\pi fn/N} \right) \\ & \text{ for } 0 \le f < N \end{aligned}\end{split}\]

x[] is interpreted to be a block floating-point vector with shared exponent *exp and with *hr bits of headroom initially in x[]. During computation, this function monitors the headroom of the data and compensates to avoid overflows and underflows by bit-shifting the data up or down as appropriate. In the equation above, \(\alpha\) represents the (net) number of bits that the data was right-shifted by.

Upon completion, *hr is updated with the final headroom in x[], and the exponent *exp is incremented by \(\alpha\).

Note

In order to guarantee that saturation will not occur, x[] must have an initial headroom of at least 2 bits.

Parameters:
  • x[inout] The N-element complex input vector to be transformed.

  • N[in] The size of the DFT to be performed.

  • hr[inout] Pointer to the initial headroom in x[].

  • exp[inout] Pointer to the initial exponent associated with x[].

Throws ET_LOAD_STORE:

Raised if `x` is not word-aligned (See Note: Vector Alignment)

void fft_dit_inverse(complex_s32_t x[], const unsigned N, headroom_t *hr, exponent_t *exp)

Compute an inverse DFT using the decimation-in-time IFFT algorithm.

This function computes the N-point inverse DFT of a complex spectrum using the decimation-in-time IFFT algorithm. The result is computed in-place.

Operation Performed

\[\begin{split}\begin{aligned} & x[n] = \frac{1}{2^{\alpha}} \sum_{f=0}^{N-1} \left( X[f]\cdot e^{j2\pi fn/N} \right) \\ & \text{ for } 0 \le n < N \end{aligned}\end{split}\]

x[] is interpreted to be a block floating-point vector with shared exponent *exp and with *hr bits of headroom initially in x[]. During computation, this function monitors the headroom of the data and compensates to avoid overflows and underflows by bit-shifting the data up or down as appropriate. In the equation above, \(\alpha\) represents the (net) number of bits that the data was right-shifted by.

Upon completion, *hr is updated with the final headroom in x[], and the exponent *exp is incremented by \(\alpha - log_2(N)\).

Note

In order to guarantee that saturation will not occur, x[] must have an initial headroom of at least 2 bits.

Parameters:
  • x[inout] The N-element complex input vector to be transformed.

  • N[in] The size of the inverse DFT to be performed.

  • hr[inout] Pointer to the initial headroom in x[].

  • exp[inout] Pointer to the initial exponent associated with x[].

Throws ET_LOAD_STORE:

Raised if `x` is not word-aligned (See Note: Vector Alignment)

void fft_dif_forward(complex_s32_t x[], const unsigned N, headroom_t *hr, exponent_t *exp)

Compute a forward DFT using the decimation-in-frequency FFT algorithm.

This function computes the N-point forward DFT of a complex input signal using the decimation-in-frequency FFT algorithm. The result is computed in-place.

Operation Performed

\[\begin{split}\begin{aligned} & X[f] = \frac{1}{2^{\alpha}} \sum_{n=0}^{N-1} \left( x[n]\cdot e^{-j2\pi fn/N} \right) \\ & \text{ for } 0 \le f < N \end{aligned}\end{split}\]

x[] is interpreted to be a block floating-point vector with shared exponent *exp and with *hr bits of headroom initially in x[]. During computation, this function monitors the headroom of the data and compensates to avoid overflows and underflows by bit-shifting the data up or down as appropriate. In the equation above, \(\alpha\) represents the (net) number of bits that the data was right-shifted by.

Upon completion, *hr is updated with the final headroom in x[], and the exponent *exp is incremented by \(\alpha\).

Note

In order to guarantee that saturation will not occur, x[] must have an initial headroom of at least 2 bits.

Parameters:
  • x[inout] The N-element complex input vector to be transformed.

  • N[in] The size of the DFT to be performed.

  • hr[inout] Pointer to the initial headroom in x[].

  • exp[inout] Pointer to the initial exponent associated with x[].

Throws ET_LOAD_STORE:

Raised if `x` is not word-aligned (See Note: Vector Alignment)

void fft_dif_inverse(complex_s32_t x[], const unsigned N, headroom_t *hr, exponent_t *exp)

Compute an inverse DFT using the decimation-in-frequency IFFT algorithm.

This function computes the N-point inverse DFT of a complex spectrum using the decimation-in-frequency IFFT algorithm. The result is computed in-place.

Operation Performed

\[\begin{split}\begin{aligned} & x[n] = \frac{1}{2^{\alpha}} \sum_{f=0}^{N-1} \left( X[f]\cdot e^{j2\pi fn/N} \right) \\ & \text{ for } 0 \le n < N \end{aligned}\end{split}\]

x[] is interpreted to be a block floating-point vector with shared exponent *exp and with *hr bits of headroom initially in x[]. During computation, this function monitors the headroom of the data and compensates to avoid overflows and underflows by bit-shifting the data up or down as appropriate. In the equation above, \(\alpha\) represents the (net) number of bits that the data was right-shifted by.

Upon completion, *hr is updated with the final headroom in x[], and the exponent *exp is incremented by \(\alpha - log_2(N)\).

Note

In order to guarantee that saturation will not occur, x[] must have an initial headroom of at least 2 bits.

Parameters:
  • x[inout] The N-element complex input vector to be transformed.

  • N[in] The size of the inverse DFT to be performed.

  • hr[inout] Pointer to the initial headroom in x[].

  • exp[inout] Pointer to the initial exponent associated with x[].

Throws ET_LOAD_STORE:

Raised if `x` is not word-aligned (See Note: Vector Alignment)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Filtering API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/filter/filter_index.html#filtering-api

Filtering API - quick reference

Filter

Function

Brief

32-bit FIR

filter_fir_s32_init()

Initialize filter

32-bit FIR

filter_fir_s32_add_sample()

Add sample (without computing output)

32-bit FIR

filter_fir_s32()

Process next sample

16-bit FIR

filter_fir_s16_init()

Initialize filter

16-bit FIR

filter_fir_s16_add_sample()

Add sample (without computing output)

16-bit FIR

filter_fir_s16()

Process next sample

32-bit Biquad

filter_biquad_s32()

Process next sample (single block)

32-bit Biquad

filter_biquads_s32()

Process next sample (multi block)

group filter_api

Functions

void filter_fir_s32_init(filter_fir_s32_t *filter, int32_t *sample_buffer, const unsigned tap_count, const int32_t *coefficients, const right_shift_t shift)

Initialize a 32-bit FIR filter.

Before filter_fir_s32() or filter_fir_s32_add_sample() can be used on a filter it must be initialized with a call to this function.

sample_buffer and coefficients must be at least 4 * tap_count bytes long, and aligned to a 4-byte (word) boundary.

See filter_fir_s32_t for more information about 32-bit FIR filters and their operation.

See also

filter_fir_s32_t

Parameters:
  • filter[out] Filter struct to be initialized

  • sample_buffer[in] Buffer used by the filter to contain state information. Must be at least tap_count elements long

  • tap_count[in] Order of the FIR filter; number of filter taps

  • coefficients[in] Array containing filter coefficients.

  • shift[in] Unsigned arithmetic right-shift applied to accumulator to get filter output sample

void filter_fir_s32_add_sample(filter_fir_s32_t *filter, const int32_t new_sample)

Add a new input sample to a 32-bit FIR filter without processing an output sample.

This function adds a new input sample to filter’s state without computing a new output sample. This is a constant- time operation and can be used to quickly pre-load a filter with sample data.

See filter_fir_s32_t for more information about FIR filters and their operation.

See also

filter_fir_s32_t

Parameters:
  • filter[inout] Filter struct to have the sample added

  • new_sample[in] Sample to be added to filter’s history

int32_t filter_fir_s32(filter_fir_s32_t *filter, const int32_t new_sample)

This function implements a Finite Impulse Response (FIR) filter.

The new input sample new_sample is added to this filter’s state, and a new output sample is computed and returned as specified in filter_fir_s32_t.

With a large number of filter taps, this function takes approximately 3 thread cycles per 8 filter taps.

See also

filter_fir_s32_t

Parameters:
  • filter[inout] Filter to be processed

  • new_sample[in] New input sample to be processed by filter

Returns:

Next filtered output sample

void filter_fir_s16_init(filter_fir_s16_t *filter, int16_t *sample_buffer, const unsigned tap_count, const int16_t *coefficients, const right_shift_t shift)

Initialize a 16-bit FIR filter.

Before filter_fir_s16() or filter_fir_s16_add_sample() can be used on a filter it must be initialized with a call to this function.

sample_buffer and coefficients must be at least 2 * tap_count bytes long, and aligned to a 4-byte (word) boundary.

See filter_fir_s16_t for more information about 16-bit FIR filters and their operation.

See also

filter_fir_s16_t

Parameters:
  • filter[out] Filter struct to be initialized

  • sample_buffer[in] Buffer used by the filter to contain state information. Must be at least tap_count elements long

  • tap_count[in] Order of the FIR filter; number of filter taps

  • coefficients[in] Array containing filter coefficients

  • shift[in] Unsigned arithmetic right-shift applied to accumulator to get filter output sample

void filter_fir_s16_add_sample(filter_fir_s16_t *filter, const int16_t new_sample)

Add a new input sample to a 16-bit FIR filter without processing an output sample.

This function adds a new input sample to filter’s state without computing a new output sample.

See filter_fir_s16_t for more information about FIR filters and their operation.

See also

filter_fir_s16_t

Parameters:
  • filter[inout] Filter struct to have the sample added

  • new_sample[in] Sample to be added to filter’s history

int16_t filter_fir_s16(filter_fir_s16_t *filter, const int16_t new_sample)

This function implements a Finite Impulse Response (FIR) filter.

The new input sample new_sample is added to this filter’s state, and a new output sample is computed and returned as specified in filter_fir_s16_t.

With a large number of filter taps, this function takes approximately 3 thread cycles per 16 filter taps.

See also

filter_fir_s16_t

Parameters:
  • filter[inout] Filter to be processed

  • new_sample[in] New input sample to be processed by filter

Returns:

Next filtered output sample

int32_t filter_biquad_s32(filter_biquad_s32_t *filter, const int32_t new_sample)

This function implements a 32-bit Biquad filter.

The new input sample new_sample is added to this filter’s state, and a new output sample is computed and returned as specified in filter_biquad_s32_t.

This function processes a single filter block containing (up to) 8 biquad filter sections. For biquad filters containing 2 or more filter blocks (more than 8 biquad filter sections), see filter_biquads_s32().

Note

When the result exceeds the 32-bit range, the output will overflow.

Parameters:
  • filter[inout] Filter to be processed

  • new_sample[in] New input sample to be processed by filter

Returns:

Next filtered output sample

int32_t filter_biquad_sat_s32(filter_biquad_s32_t *filter, const int32_t new_sample)

This function implements a 32-bit Biquad filter with saturation.

Works the same as filter_biquad_s32(), but saturates the output to the symmetric 32-bit range at the cost of several compute cycles. The cost will depend on the number of biquads and the target architecture.

See also

filter_biquad_s32_t, filter_biquad_s32, filter_biquads_sat_s32

Parameters:
  • filter[inout] Filter to be processed

  • new_sample[in] New input sample to be processed by filter

Returns:

Next filtered output sample

int32_t filter_biquads_s32(filter_biquad_s32_t biquads[], const unsigned block_count, const int32_t new_sample)

This function implements a 32-bit Biquad filter.

The new input sample new_sample is added to this filter’s state, and a new output sample is computed and returned as specified in filter_biquad_s32_t.

This function processes one or more filter blocks, with each block containing up to 8 biquad filter sections.

Note

When the result exceeds the 32-bit range, the output will overflow.

Parameters:
  • biquads[inout] Filter blocks to be processed

  • block_count[in] Number of filter blocks in biquads

  • new_sample[in] New input sample to be processed by filter

Returns:

Next filtered output sample

struct filter_fir_s32_t
#include <filter.h>

32-bit Discrete-Time Finite Impulse Response (FIR) Filter

Todo:

Move most of this information out to higher-level documentation

Filter Model

This struct represents an N-tap 32-bit discrete-time FIR Filter.

At each time step, the FIR filter consumes a single 32-bit input sample and produces a single 32-bit output sample.

To process a new input sample and compute a new output sample, use filter_fir_s32(). To add a new input sample to the filter without computing a new output sample, use filter_fir_s32_add_sample().

An N-tap FIR filter contains N 32-bit cofficients (pointed to by coef) and N words of state data (pointed to by state. The state data is a vector of the N most recent input samples. When processing a new input sample at time step t, x[t] is the new input sample, x[t-1] is the previous input sample, and so on, up to x[t-(N-1)], which is the oldest input considered when computing the new output sample (see note 1 below). The coefficients form a vector b[], where b[k] is the coefficient by which the kth oldest input sample is multiplied. There is an additional parameter shift which scales the output as described below. Both the coefficients and shift are considered to be constants which do not change after initialization (although nothing should break if they are changed to new valid values).

At time step t, the output sample y[t] is computed based on the inner product (i.e. sum of element-wise products) of the coefficients and state data as follows (a more detailed description is below):

acc = x[t-0] * b[0] + x[t-1] * b[1] + x[t-2] * b[2] + ... + x[t-(N-1)] * b[N-1]
y[t] = acc >> shift

Importantly, all three of the operators above (addition, multiplication and the rightwards bit-shift) have slightly ideosyncratic meanings.

The products have a built-in rounding arithmetic right-shift of 30 bits, where ties round toward positive infinity. This is a hardware feature which allows for longer filters (larger N) without sacrificing coefficient precision. These element-wise products accumulate into 8 40-bit accumulators saturate the sums at symmetric 40-bit bounds (see saturation). The order in which the taps are accumulated is unspecified (see note 2 below).

After each tap has been accumulated, the 8 accumulators are then added together to get a 64-bit penultimate result (with 43 useful bits). Finally, an unsigned rounding arithmetic right-shift of shift bits is applied to the 64-bit sum, and the final result is saturated to the symmetric 32-bit range (-INT32_MAX to INT32_MAX inclusive).

Below is a more detailed description of the operations performed (not including the saturation logic applied by the accumulators).

\[\begin{split} & y[t] = sat_{32} \left( round \left( \left( \sum_{k=0}^{N-1} round(x[t-k] \cdot b[k] \cdot 2^{-30}) \right) \cdot 2^{-shift} \right) \right) \\ & \qquad\text{where } sat_{32}() \text{ saturates to } \pm(2^{31}-1) \\ & \qquad\text{ and } round() \text{ rounds to the nearest integer, with ties rounding towards } +\!\infty \end{split}\]

Operations

Initialize: A filter_fir_s32_t filter is initialized with a call to filter_fir_s32_init(). The caller supplies information about the filter, including the number of taps and pointers the coefficients and a state buffer. It is typically recommended that the state buffer be cleared to all 0s before initializing.

Add Sample: To add a new input sample without computing a new output sample, use filter_fir_s32_add_sample(). This is a constant-time operation which does not depend on the number of filter taps. This may be useful in some situations, for example, to quickly pre-load the filter’s state buffer with multiple samples, without incurring the cost of computing an output with each added sample.

Process Sample: To process a new input sample and produce a new output sample, use filter_fir_s32().

Fields

After initialization via filter_fir_s32_init(), the contents of the filter_fir_s32_t struct are considered to be opaque, and may change between major versions. In general, user code should not need to access its members.

num_taps is the order of the filter, or the number of taps. It is also the (minimum) size of the buffers to which coef and state point, in elements (where each element is 4 bytes). The time required to process an input sample and produce an output sample is approximately linear in num_taps (see Performance below).

head is the index into state at which the next sample will be added.

shift is the unsigned arithmetic rounding saturating right-shift applied to internal accumulator to get a final output.

coef is a pointer to a buffer (supplied by the user at initialization) containing the tap coefficients. The coefficients are stored in forward order, with lower indices corresponding to newer samples. coef[0], then, corresponds to b[0], coef[1] to b[1], and so on. None of the functions which operate on filter_fir_s32_t structs in this library will modify the contents of the buffer to which coef points. This buffer must be at least num_taps words long.

state is a pointer to a buffer (supplied by the user at initialization) containing the state data &#8212; a history of the num_taps most recent input samples. state is used in a circular fashion with head indicating the index at which the next sample will be inserted.

Performance

More work remains to fully characterize the time performance of this FIR filter, but asymptotically (i.e. with a large number of filter taps) processing a new input sample to produce a new output sample takes approximately 3 thread cycles per 8 filter taps.

That assumes that both the coefficients (pointed to by coef) and state buffer (pointed to by state) are stored directly in SRAM.

Coefficient Scaling

Suppose you’re starting with a floating-point FIR filter model with coefficients B[k] which operates on a sequence of 32-bit integer input samples x[t] to get a result Y[t] where

Y[t] = x[t-0] * B[0] + x[t-1] * B[1] + ... + x[t-(N-1)] * B[N-1]

Because of the 30-bit right-shift and the right-shift of the final accumulator by shift bits, the coefficients b[k] to use with this library can be thought of as fixed-point values with 30 + shift fractional bits.

The floating-point coefficients B[k] can then be naively converted to fixed-point coefficients b[k]

shift = 0
b[k] = (int32_t) round(ldexp(B[k], 30)

After this, any further doubling of the coefficients can be compensated for without changing the overall gain by incrementing shift.

To maximize precision, you’ll typically want shift to be as large as possible while in the worst case to be considered neither saturates the internal accumulator (which, for safety, should generally be assumed to be 42 bits), nor saturates the final 32-bit output when shift is applied.

The details of this depend on various details, such as your filter’s gain and the statistics of the sequence x[t] (e.g. any headroom x[t] is known a priori to have).

Filter Conversion

This library includes a python script which converts existing floating-point FIR filter coefficients into a suitable representation and generates code for easily initializing and executing the filter. See Note: Digital Filter Conversion for more.

Usage Example


#define N       256                     // Tap count
#define B_VAL   ldexp(1.0/N, 30+7)      // Value for (all) coefficients

const int32_t b[TAPS] =                 // The filter coefficients
{ B_VAL, B_VAL, B_VAL, ..., B_VAL };
const right_shift_t shift = 7;          // The (unsigned) right-shift applied to the final accumulator
int32_t state_buff[TAPS] = { 0 };       // Filter state buffer, initialized to 0's
filter_fir_s32_t filter;            // The filter struct

#define SAMPLE_COUNT    1024
int32_t x[SAMPLE_COUNT] = { ... };      // Some sequence of input samples

// Initialize
filter_fir_s32_init(&filter, state_buff, N, b, shift);

// Just add the first 64 without processing output samples. (not necessary)
for(unsigned i = 0; i < 64; i++)
    filter_fir_s32_add_sample(&filter, x[i]);

// Process the rest, generating a sequence of filtered output samples
int32_t y[SAMPLE_COUNT] = { 0 };        //Output samples (first 64 never get updated here)
for(unsigned i = 64; i < SAMPLE_COUNT; i++)
    y[i] = filter_fir_s32(&filter, x[i]);

// Do something with output sequence
...

This example creates a simple 256-tap filter which averages the most recent 256 samples.

Each b[k] is \(2^{29}\), and the final accumulator is right-shifted 7 bits. In the worst case, all input samples are \(-2^{31}\). In that case, the final accumulator value is \( 256 \cdot (2^{29} \cdot -2^{31} \cdot 2^{-30}) = -2^{38} \), well below the saturation limit of the accumulator. After shift is applied, that becomes \(-2^{38} \cdot 2^{-7} = -2^{31}\). Finally, the 32-bit symmetric saturation logic is applied, making the final output value \(-2^{31}+1\).

Notes

  1. state is a circular buffer, and so the index of x[t] within state changes with each input sample. The state field of this struct is considered to be opaque &#8212; its exact usage may change between versions.

  2. Ordinarily integer sums are associative, so the order in which elements are added added does not affect the final result. The sum that the FIR filters use, however, is saturating, with the saturation logic being applied throughout the sum. This saturation is a hard non-linearity and is not associative. The details of exactly when each tap is accumulated and into which accumulator are complicated and subject to change. It is best to construct a filter such that no ordering of the taps will saturate the accumulators.

struct filter_fir_s16_t
#include <filter.h>

16-bit Discrete-Time Finite Impulse Response (FIR) Filter

Filter Model

This struct represents an N-tap 16-bit discrete-time FIR Filter.

At each time step, the FIR filter consumes a single 16-bit input sample and produces a single 16-bit output sample.

To process a new input sample and compute a new output sample, use filter_fir_s16(). To add a new input sample to the filter without computing a new output sample, use filter_fir_s16_add_sample().

An N-tap FIR filter contains N 16-bit cofficients (pointed to by coef) and N int16_ts of state data (pointed to by state. The state data is a vector of the N most recent input samples. When processing a new input sample at time step t, x[t] is the new input sample, x[t-1] is the previous input sample, and so on, up to x[t-(N-1)], which is the oldest input considered when computing the new output sample (see note 1 below). The coefficients form a vector b[], where b[k] is the coefficient by which the kth oldest input sample is multiplied. There is an additional parameter shift which scales the output as described below. Both the coefficients and shift are considered to be constants which do not change after initialization (although nothing should break if they are changed to new valid values).

At time step t, the output sample y[t] is computed based on the inner product (i.e. sum of element-wise products) of the coefficients and state data as follows (a more detailed description is below):

acc = x[t-0] * b[0] + x[t-1] * b[1] + x[t-2] * b[2] + ... + x[t-(N-1)] * b[N-1]
y[t] = acc >> shift

Unlike the 32-bit FIR filters (see filter_fir_s16_t), the products x[t-k] * b[k] are the raw 32-bit products of the 16-bit elements. These element-wise products accumulate into a 32-bit accumulator which saturates the sums at symmetric 32-bit bounds (see saturation).

After all taps have been accumulated, a rounding arithmetic right-shift of shift bits is applied to the 64-bit sum, and the final result is saturated to the symmetric 16-bit range (the open interval \((-2^{15}, 2^{15})\)).

Below is a more detailed description of the operations performed (not including the saturation logic applied by the accumulators).

\[\begin{split} & y[t] = sat_{16} \left( round \left( \left( \sum_{k=0}^{N-1} round(x[t-k] \cdot b[k]) \right) \cdot 2^{-shift} \right) \right) \\ & \qquad\text{where } sat_{32}() \text{ saturates to } \pm(2^{15}-1) \\ & \qquad\text{ and } round() \text{ rounds to the nearest integer, with ties rounding towards } +\!\infty \end{split}\]

Operations

Initialize: A filter_fir_s16_t filter is initialized with a call to filter_fir_s16_init(). The caller supplies information about the filter, including the number of taps and pointers the coefficients and a state buffer. It is typically recommended that the state buffer be cleared to all 0s before initializing.

Add Sample: To add a new input sample without computing a new output sample, use filter_fir_s16_add_sample(). Unlike filter_fir_s32_add_sample(), this is not a constant-time operation, and does depend on the number of filter taps. Nevertheless, this is faster than computing output samples, and may be useful in some situations, for example, to moer quickly pre-load the filter’s state buffer with multiple samples, without incurring the cost of computing an output with each added sample.

Process Sample: To process a new input sample and produce a new output sample, use filter_fir_s16().

Fields

After initialization via filter_fir_s16_init(), the contents of the filter_fir_s16_t struct are considered to be opaque, and may change between major versions. In general, user code should not need to access its members.

num_taps is the order of the filter, or the number of taps. It is also the (minimum) size of the buffers to which coef and state point, in elements (where each element is 2 bytes). The time required to process an input sample and produce an output sample is approximately linear in num_taps (see Performance below).

shift is the unsigned arithmetic rounding saturating right-shift applied to internal accumulator to get a final output.

coef is a pointer to a buffer (supplied by the user at initialization) containing the tap coefficients. The coefficients are stored in forward order, with lower indices corresponding to newer samples. coef[0], then, corresponds to b[0], coef[1] to b[1], and so on. None of the functions which operate on filter_fir_s16_t structs in this library will modify the contents of the buffer to which coef points. This buffer must be at least num_taps elements long, and must begin at a word-aligned address.

state is a pointer to a buffer (supplied by the user at initialization) containing the state data &#8212; a history of the num_taps most recent input samples. state must begin at a word-aligned address.

Coefficient Scaling

Filter Conversion

This library includes a python script which converts existing floating-point FIR filter coefficients into a suitable representation and generates code for easily initializing and executing the filter. See Note: Digital Filter Conversion for more.

Todo:
Usage Example

struct filter_biquad_s32_t
#include <filter.h>

A biquad filter block.

Contains the coeffient and state information for a cascade of up to 8 biquad filter sections.

To process a new input sample, filter_biquad_s32() or filter_biquad_sat_s32() can be used with a pointer to one of these structs.

For longer cascades, an array of filter_biquad_s32_t structs can be used with filter_biquads_s32() or filter_biquads_sat_s32().

Filter Conversion

This library includes a python script which converts existing floating-point cascaed biquad filter coefficients into a suitable representation and generates code for easily initializing and executing the filter. See Note: Digital Filter Conversion for more.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_index.html#scalar-api

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Scalar API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.html#scalar-api-quick-reference
XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Scalar API quick reference$$$Scalar type conversion£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.html#scalar-type-conversion

Scalar type conversion

Function

Type In

Type Out

f32_unpack()

float

int32_t, exponent_t

f32_unpack_s16()

float

int16_t, exponent_t

f32_to_float_s32()

float

float_s32_t

f64_to_float_s32()

double

float_s32_t

float_s32_to_float_s64()

float_s32_t

float_s64_t

float_s32_to_float()

float_s32_t

float

float_s32_to_double()

float_s32_t

double

s16_to_s32()

int16_t, exponent_t

int32_t, exponent_t

s32_to_s16()

int32_t, exponent_t

int16_t, exponent_t

s64_to_s32()

int64_t, exponent_t

int32_t, exponent_t

s32_to_f32()

int32_t, exponent_t

float

radians_to_sbrads()

radian_q24_t

sbrad_t

s32_to_chunk_s32()

int32_t

int32_t[8]

float_s64_to_float_s32()

float_s64_t

float_s32_t

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Scalar API quick reference$$$Fixed-point scalar ops£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.html#fixed-point-scalar-ops
Fixed-point scalar ops

Function

Input Depth

Fractional Bits

Brief

s16_inverse()

16

0

\(x^{-1}\)

s32_inverse()

32

0

\(x^{-1}\)

sbrad_sin()

32

31

\(\sin(x)\)

sbrad_tan()

32

31

\(\tan(x)\)

q24_sin()

32

24

\(\sin(x)\)

q24_cos()

32

24

\(\cos(x)\)

q24_tan()

32

24

\(\tan(x)\)

q30_exp_small()

32

30

\(\exp(x)\)

q24_logistic()

32

24

\(\frac{1}{1+e^{-x}}\)

q24_logistic_fast()

32

24

\(\frac{1}{1+e^{-x}}\)

q30_powers()

32

30

\((0,x,x^2,x^3,\dots)\)

u32_ceil_log2()

32

0

\(\lceil\log_2(x)\rceil\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Scalar API quick reference$$$IEEE 754 float ops£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.html#ieee-754-float-ops
IEEE 754 float ops

Function

Brief

f32_sin()

\(sin(x)\)

f32_cos()

\(cos(x)\)

f32_log2()

\(log_2(x)\)

f32_power_series()

Evaluate Power Series

f32_normA()

Normalized Form A

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Scalar API quick reference$$$Non-standard scalar float ops£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.html#non-standard-scalar-float-ops
Non-standard scalar float ops

Function

Brief

float_s32_mul()

\(x \times y\)

float_s32_add()

\(x + y\)

float_s32_sub()

\(x - y\)

float_s32_div()

\(\frac{x}{y}\)

float_s32_abs()

\(\left|x\right|\)

float_s32_gt()

\(x > y\)

float_s32_gte()

\(x \ge y\)

float_s32_ema()

\(\alpha x + (1 - \alpha) y\)

float_s32_sqrt()

\(\sqrt{x}\)

float_s32_exp()

\(exp(x)\)

s16_mul()

\(x \times y\)

s32_sqrt()

\(\sqrt{x}\)

s32_mul()

\(x \times y\)

s32_odd_powers()

\(x, x^3, x^5, x^7, \dots\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Scalar API quick reference$$$Non-standard complex scalar float ops£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.html#non-standard-complex-scalar-float-ops
Non-standard complex scalar float ops

Function

Brief

float_complex_s16_mul()

\(x \times y\)

float_complex_s16_add()

\(x + y\)

float_complex_s16_sub()

\(x - y\)

float_complex_s32_mul()

\(x \times y\)

float_complex_s32_add()

\(x + y\)

float_complex_s32_sub()

\(x - y\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$16-bit scalar API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_s16.html#bit-scalar-api
group scalar_s16_api

Functions

int32_t s16_to_s32(exponent_t *a_exp, const int16_t b, const exponent_t b_exp, const unsigned remove_hr)

Convert a 16-bit floating-point scalar to a 32-bit floating-point scalar.

Converts a 16-bit floating-point scalar, represented by the 16-bit mantissa b and exponent b_exp, into a 32-bit floating-point scalar, represented by the 32-bit returned mantissa and output exponent a_exp.

remove_hr, if nonzero, indicates that the output mantissa should have no headroom. Otherwise, the output mantissa will be the same as the input mantissa.

Parameters:
  • a_exp[out] Output exponent

  • b[in] 16-bit input mantissa

  • b_exp[in] Input exponent

  • remove_hr[in] Whether to remove headroom in output

Returns:

32-bit output mantissa

int16_t s16_inverse(exponent_t *a_exp, const int16_t b)

Compute the inverse of a 16-bit integer.

b represents the integer \(b\). a and a_exp together represent the result \(a \cdot 2^{a\_exp}\).

Operation Performed

\[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \frac{1}{b} \end{aligned}\]

Parameters:
  • a_exp[out] Output exponent \(a\_exp\)

  • b[in] Input integer \(b\)

Returns:

Output mantissa \(a\)

int16_t s16_mul(exponent_t *a_exp, const int16_t b, const int16_t c, const exponent_t b_exp, const exponent_t c_exp)

Compute the product of two 16-bit floating-point scalars.

a and a_exp together represent the result \(a \cdot 2^{a\_exp}\).

b and b_exp together represent the result \(b \cdot 2^{b\_exp}\).

c and c_exp together represent the result \(c \cdot 2^{c\_exp}\).

Operation Performed

\[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \left( b\cdot 2^{b\_exp} \right) \cdot \left( c\cdot 2^{c\_exp} \right) \end{aligned}\]

Parameters:
  • a_exp[out] Output exponent \(a\_exp\)

  • b[in] First input mantissa \(b\)

  • c[in] Second input mantissa \(c\)

  • b_exp[in] First input exponent \(b\_exp\)

  • c_exp[in] Second input exponent \(c\_exp\)

Returns:

Output mantissa \(a\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$32-bit scalar API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_s32.html#bit-scalar-api
group scalar_s32_api

Defines

S32_SQRT_MAX_DEPTH

Maximum bit-depth to calculate with s32_sqrt().

Functions

float s32_to_f32(const int32_t mantissa, const exponent_t exp)

Pack a floating point value into an IEEE 754 single-precision float.

The value returned is the nearest representable approximation to \( m \cdot 2^{p} \) where \(m\) is mantissa and \(p\) is exp.

Example

// Pack -12345678 * 2^{-13} into a float
int32_t mant = -12345678;
exponent_t exp = -13;
float val = s32_to_f32(mant, exp);

printf("%e <-- %ld * 2^(%d)\n", val, mant, exp);

Note

This operation may result in a loss of precision.

Parameters:
  • mantissa[in] Mantissa of value to be packed

  • exp[in] Exponent of value to be packed

Returns:

float representation of input value

int16_t s32_to_s16(exponent_t *a_exp, const int32_t b, const exponent_t b_exp)

Convert a 32-bit floating-point scalar to a 16-bit floating-point scalar.

Converts a 32-bit floating-point scalar, represented by the 32-bit mantissa b and exponent b_exp, into a 16-bit floating-point scalar, represented by the 16-bit returned mantissa and output exponent a_exp.

Parameters:
  • a_exp[out] Output exponent

  • b[in] 32-bit input mantissa

  • b_exp[in] Input exponent

Returns:

16-bit output mantissa

int32_t s32_sqrt(exponent_t *a_exp, const int32_t b, const exponent_t b_exp, const unsigned depth)

Compute the square root of a 32-bit floating-point scalar.

b and b_exp together represent the input \(b \cdot 2^{b\_exp}\). Likewise, a and a_exp together represent the result \(a \cdot 2^{a\_exp}\).

depth indicates the number of MSb’s which will be calculated. Smaller values here will execute more quickly at the cost of reduced precision. The maximum valid value for depth is S32_SQRT_MAX_DEPTH.

Operation Performed

\[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \sqrt{\left( b \cdot 2^{b\_exp} \right)} \end{aligned}\]

Parameters:
  • a_exp[out] Output exponent \(a\_exp\)

  • b[in] Input mantissa \(b\)

  • b_exp[in] Input exponent \(b\_exp\)

  • depth[in] Number of most significant bits to calculate

Returns:

Output mantissa \(a\)

int32_t s32_inverse(exponent_t *a_exp, const int32_t b)

Compute the inverse of a 32-bit integer.

b represents the integer \(b\). a and a_exp together represent the result \(a \cdot 2^{a\_exp}\).

Operation Performed

\[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \frac{1}{b} \end{aligned}\]

If \(b\) is the mantissa of a fixed- or floating-point value with an implicit or explicit exponent \(b\_exp\), then

Fixed- or Floating-point

\( \begin{aligned} \frac{1}{b \cdot 2^{b\_exp}} &= \frac{1}{b} \cdot 2^{-b\_exp} \\ &= a \cdot 2^{a\_exp} \cdot 2^{-b\_exp} \\ &= a \cdot 2^{a\_exp - b\_exp} \end{aligned} \)

and so \(b\_exp\) should be subtracted from the output exponent \(a\_exp\).

Parameters:
  • a_exp[out] Output exponent \(a\_exp\)

  • b[in] Input integer \(b\)

Returns:

Output mantissa \(a\)

int32_t s32_mul(exponent_t *a_exp, const int32_t b, const int32_t c, const exponent_t b_exp, const exponent_t c_exp)

Compute the product of two 32-bit floating-point scalars.

a and a_exp together represent the result \(a \cdot 2^{a\_exp}\).

b and b_exp together represent the result \(b \cdot 2^{b\_exp}\).

c and c_exp together represent the result \(c \cdot 2^{c\_exp}\).

Operation Performed

\[\begin{aligned} a \cdot 2^{a\_exp} \leftarrow \left( b\cdot 2^{b\_exp} \right) \cdot \left( c\cdot 2^{c\_exp} \right) \end{aligned}\]

Parameters:
  • a_exp[out] Output exponent \(a\_exp\)

  • b[in] First input mantissa \(b\)

  • c[in] Second input mantissa \(c\)

  • b_exp[in] First input exponent \(b\_exp\)

  • c_exp[in] Second input exponent \(c\_exp\)

Returns:

Output mantissa \(a\)

sbrad_t radians_to_sbrads(const radian_q24_t theta)

Convert angle from radians to a modified binary representation.

Some trig functions, such as sbrad_sin(), rather than taking an angle specified in radians (e.g. radian_q24_t), require their argument to be a modified representation of the angle, as an sbrad_t. The modified binary representation takes into account various properies of the \(sin(\theta)\) function to simplify certain operations.

For any angle \(\theta\) there is a unique angle \(\alpha\) where \(-1\le\alpha\le1\) and \(sin(\frac{\pi}{2}\alpha) = sin(\theta)\). This function essentially just maps the input angle \(\theta\) onto the corresponding angle \(\alpha\) in that region and returns the result in a Q1.31 format.

In this library, the unit of the resulting angle \(\alpha\) is referred to as an ‘sbrad’. ‘brad’ because \(\alpha\) is a kind of binary angular measurement, and ‘s’ because the symmetries of \(sin(\theta)\) are what’s being accounted for.

Parameters:
  • theta[in] Input angle \(\theta\), in radians (Q8.24)

Returns:

Output angle \(\alpha\), in sbrads

q2_30 sbrad_sin(const sbrad_t theta)

Compute the sine of the specified angle.

This function computes \(sin(\frac{\pi}{2}\theta)\), returning the result in Q2.30 format.

The input angle \(\theta\) must be expressed in sbrads (sbrad_t), and must represent a value between \(\pm 0.5\) (inclusive) (as a Q1.31).

Operation Performed

\[\begin{aligned} & sin(\frac{\pi}{2}\theta) \end{aligned}\]

Parameters:
Returns:

Sine of the specified angle in Q2.30 format.

q2_30 sbrad_tan(const sbrad_t theta)

Compute the tangent of the specified angle.

This function computes \(tan(\frac{\pi}{2}\theta)\), returning the result in Q2.30 format.

The input angle \(\theta\) must be expressed in sbrads (sbrad_t), and must represent a value between \(\pm 0.25\) (inclusive) (as a Q1.31).

Operation Performed

\[\begin{aligned} & tan(\frac{\pi}{2}\theta) \end{aligned}\]

Parameters:
Returns:

Tangent of the specified angle in Q2.30 format.

q2_30 q24_sin(const radian_q24_t theta)

Compute the sine of the specified angle.

This function computes \(sin(\theta)\), returning the result in Q2.30 format.

Operation Performed

\[\begin{aligned} & sin(\theta) \end{aligned}\]

Parameters:
  • theta[in] Input angle \(\theta\), in radians (Q8.24)

Returns:

\(sin(\theta)\) as a Q2.30

q2_30 q24_cos(const radian_q24_t theta)

Compute the cosine of the specified angle.

This function computes \(cos(\theta)\), returning the result in Q2.30 format.

Operation Performed

\[\begin{aligned} & cos(\theta) \end{aligned}\]

Parameters:
  • theta[in] Input angle \(\theta\), in radians (Q8.24)

Returns:

\(cos(\theta)\) as a Q2.30

float_s32_t q24_tan(const radian_q24_t theta)

Compute the tangent of the specified angle.

This function computes \(tan(\theta)\). The result is returned as a float_s32_t containing a mantissa and exponent.

The value of \(tan(\theta)\) is considered undefined where \(theta=\frac{\pi}{2}+k\pi\) for any integer \(k\). An exception will be raised if \(\theta\) meets this condition.

Operation Performed

\[\begin{aligned} & tan(\theta) \end{aligned}\]

Parameters:
  • theta[in] Input angle \(\theta\), in radians (Q8.24)

Throws ET_ARITHMETIC:

Raised if \(tan(\theta)\) is undefined.

Returns:

\(tan(\theta)\) as a float_s32_t

q2_30 q30_exp_small(const q2_30 x)

Compute \(e^x\) for Q2.30 value near \(0\).

This function computes \(e^x\) where \(x\) is a fixed-point value with 30 fractional bits.

This function implements \(e^x\) using a truncated power series, and is only intended to be used for inputs in the range \(-0.5 \le x \le 0.5\).

The output is also in the Q2.30 format.

For the range \(-0.5 \le x \le 0.5\), the maximum observed error (compared to exp(double) from math.h) was 2 (which corresponds to \(2^{-29}\)).

For the range \(-1.0 \le x \le 1.0\), the corresponding maximum observed error was 324, or approximately \(2^{-21}\).

To compute \(e^x\) for \(x\) outside of \(\left[-0.5, 0.5\right]\), use float_s32_exp().

Operation Performed

\[\begin{aligned} & y \leftarrow e^x \end{aligned}\]

Parameters:
  • x[in] Input value \(x\)

Returns:

\(y\)

q8_24 q24_logistic(const q8_24 x)

Evaluate the logistic function at the specified point.

This function computes the value of the logistic function \(y =\frac{1}{1+e^{-x}}\). This is a sigmoidal curve bounded below by \(y = 0\) and above by \(y = 1\).

The input \(x\) and output \(y\) are both Q8.24 fixed-point values.

If speed is greatly preferred to precision, q24_logistic_fast() can be used instead.

Operation Performed

\[\begin{aligned} & y \leftarrow \frac{1}{1+e^{-x}} \end{aligned}\]

Parameters:
  • x[in] Input value \(x\)

Returns:

\(y\)

q8_24 q24_logistic_fast(const q8_24 x)

Evaluate the logistic function at the specified point.

This function computes the value of the logistic function \(y =\frac{1}{1+e^{-x}}\). This is a sigmoidal curve bounded below by \(y = 0\) and above by \(y = 1\).

The input \(x\) and output \(y\) are both Q8.24 fixed-point values.

This implementation trades off precision for speed, approximating results in a piece-wise linear manner. If a precise result is desired, q24_logistic() should be used instead.

Operation Performed

\[\begin{aligned} & y \leftarrow \frac{1}{1+e^{-x}} \end{aligned}\]

Parameters:
  • x[in] Input value \(x\)

Returns:

\(y\)

void s32_to_chunk_s32(int32_t a[VPU_INT32_EPV], int32_t b)

Broadcast an integer to a vector chunk.

This function broadcasts the input \(b\) to the 8 elements of \(\bar a\).

Operation Performed

\[\begin{aligned} & a_k \leftarrow b \end{aligned}\]

Parameters:
  • a[out] Output chunk \(\bar a\)

  • b[in] Input value \(b\)

Throws ET_LOAD_STORE:

Raised if `a` is not double word-aligned (See Note: Vector Alignment)

void q30_powers(q2_30 a[], const q2_30 b, const unsigned N)

Get the first \(N\) powers of \(b\).

This function computes the first \(N\) powers (starting with \(0\)) of the Q2.30 input \(b\). The results are output as \(\bar a\), also in Q2.30 format.

Operation Performed

\[\begin{split}\begin{aligned} & a_0 \leftarrow 2^{30} = \mathtt{Q30(1.0)} \\ & a_k \leftarrow round\left(\frac{a_{k-1}\cdot b}{2^{30}}\right) \\ & \qquad\text{for }k \in {0..N-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output \(\bar a\)

  • b[in] Input \(b\)

  • N[in] Number of elements of \(\bar a\) to compute

void s32_odd_powers(int32_t a[], const int32_t b, const unsigned count, const right_shift_t shr)

Fill vector with odd powers of \(b\).

This function populates the elements of output vector \(\bar a\) with the odd powers of input \(b\). The first count odd powers of \(b\) are output. The highest power output will be \(2\cdot\mathtt{count}-1\).

The 64-bit product of each multiplication is right-shifted by shr bits and truncated to the 32 least significant bits. If \(b\) is a fixed-point value with shr fractional bits, then each \(a_k\) will have the same Q-format as input \(b\). shr must be non-negative.

This function neither rounds nor saturates results. It is up to the user to ensure overflows are avoided.

Typical use-case is computing a power series of a function with odd symmetry.

Operation Performed

\[\begin{split}\begin{aligned} & b_{sqr} = \frac{b^2}{2^{\mathtt{shr}}} \\ & a_0 \leftarrow b \\ & a_k \leftarrow \frac{a_{k-1},b_{sqr}}{\mathtt{shr}} \\ & \qquad\text{for } k \in {1, 2, 3, ..., \mathtt{count} - 1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input \(b\)

  • count[in] Number of elements to output.

  • shr[in] Number of bits to right-shift 64-bit products.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Scalar IEEE 754 float API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_f32.html#scalar-ieee-754-float-api
group scalar_f32_api

Functions

void f32_unpack(int32_t *mantissa, exponent_t *exp, const float input)

Unpack an IEEE 754 single-precision float into a 32-bit mantissa and exponent.

Example

// Unpack 1.52345246 * 10^(-5)
float val = 1.52345246e-5;
int32_t mant;
exponent_t exp;
f32_unpack(&mant, &exp, val);

printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);

Parameters:
  • mantissa[out] Unpacked output mantissa

  • exp[out] Unpacked output exponent

  • input[in] Float value to be unpacked

void f32_unpack_s16(int16_t *mantissa, exponent_t *exp, const float input)

Unpack an IEEE 754 single-precision float into a 16-bit mantissa and exponent.

Example

// Unpack 1.52345246 * 10^(-5)
float val = 1.52345246e-5;
int16_t mant;
exponent_t exp;
f32_unpack_s16(&mant, &exp, val);

printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);

Note

This operation may result in a loss of precision.

Parameters:
  • mantissa[out] Unpacked output mantissa

  • exp[out] Unpacked output exponent

  • input[in] Float value to be unpacked

float_s32_t f32_to_float_s32(const float x)

Convert an IEEE754 float to a float_s32_t.

Parameters:
  • x[in] Input value

Throws ET_ARITHMETIC:

Raised if `x` is infinite or NaN

Returns:

float_s32_t representation of x

float_s32_t f64_to_float_s32(const double x)

Convert an IEEE754 double to a float_s32_t.

Note

This operation may result in precision loss.

Parameters:
  • x[in] Input value

Throws ET_ARITHMETIC:

Raised if `x` is infinite or NaN

Returns:

float_s32_t representation of x

float f32_sin(const float theta)

Get the sine of a specified angle.

Computes \(sin(\theta)\) using the power series expansion of \(sin()\) truncated to 8 terms.

This implementation is meant to make optimal use of the XS3 floating-point unit.

Parameters:
  • theta[in] Angle \(\theta\) to compute the sine of (in radians)

Throws ET_ARITHMETIC:

Raised if \(\theta\) is infinite or NaN

Returns:

Sine of the angle \(\theta\)

float f32_cos(const float theta)

Get the cosine of a specified angle.

Computes \(cos(\theta) = sin(\theta+\frac{\pi}{2}\) using the power series expansion of \(sin()\) truncated to 8 terms.

This implementation is meant to make optimal use of the XS3 floating-point unit.

Parameters:
  • theta[in] Angle \(\theta\) to compute the cosine of (in radians)

Throws ET_ARITHMETIC:

Raised if \(\theta\) is infinite or NaN

Returns:

Cosine of the angle \(\theta\)

float f32_log2(const float x)

Get the base-2 logarithm of the specified value.

This function computes \(log_2(x)\) using the power series expansion of \(log_2()\) truncated to 11 terms.

Parameters:
  • x[in] Input value \(x\) to get the logarithm of.

Throws ET_ARITHMETIC:

Raised if \(x\) is infinite or NaN

Returns:

\(log_2(x)\)

float f32_power_series(const float x, const float b[], const unsigned N)

Compute power series summation using specified coefficients.

This function is used to compute the sum of terms in a power series, truncated to \(N\) terms, starting with the \(x^0\) term.

b is an \(N\)-element vector of coefficients \(\bar b\) which are multiplied by the corresponding powers of \(x\).

\(N\) is the length of \(\bar b\) and number of terms to sum together.

Operation Performed

\[\begin{aligned} & a \leftarrow \sum_{k=0}^{N-1}\left( x^k,b_k \right) \end{aligned}\]

Parameters:
  • x[in] Input value \(x\).

  • b[in] Vector of coefficients \(\bar b\).

  • N[in] Number of power series terms to sum.

Throws ET_ARITHMETIC:

Raised if \(x\) or any element of \(\bar b\) is infinite or NaN.

Returns:

\(a\), the sum of the first \(N\) power series terms.

float f32_normA(exponent_t *p, const float x)

Get a representation of the input \(x\) in normalized form A.

This function is used internally to transform a float value into a representation required for certain purposes.

In particular, this function behaves much like frexpf(), where it is guaranteed that the returned value \(a\) is either \(0\) or that \(0.5 \le \left| a \right| < 1.0\), and the output exponent \(p\) is such that \(x = a \cdot 2^{p}\).

In anticipation that future work may require alternative “normalized” representations, this form is being defined here as form A.

Parameters:
  • p[in] Output exponent \(p\)

  • x[in] Input value \(x\)

Throws ET_ARITHMETIC:

Raised if \(x\) or any element of \(\bar b\) is infinite or NaN.

Returns:

\(a\) in normalized form A.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$32-bit scalar float API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_s32.html#bit-scalar-float-api
group float_s32_api

Functions

float_s64_t float_s32_to_float_s64(const float_s32_t x)

Convert a float_s32_t to a float_s64_t.

Parameters:
  • x[in] Input value

Returns:

float_s64_t representation of x

float float_s32_to_float(const float_s32_t x)

Convert a float_s32_t to an IEEE754 float.

Parameters:
  • x[in] Input value

Returns:

float representation of x

double float_s32_to_double(const float_s32_t x)

Convert a float_s32_t to an IEEE754 double.

Parameters:
  • x[in] Input value

Returns:

double representation of x

float_s32_t float_s32_mul(const float_s32_t x, const float_s32_t y)

Multiply two float_s32_t together.

The inputs \(x\) and \(y\) are multiplied together for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x \cdot y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

The product of \(x\) and \(y\)

float_s32_t float_s32_add(const float_s32_t x, const float_s32_t y)

Add two float_s32_t together.

The inputs \(x\) and \(y\) are added together for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x + y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

The sum of \(x\) and \(y\)

float_s32_t float_s32_sub(const float_s32_t x, const float_s32_t y)

Subtract one float_s32_t from another.

The input \(y\) is subtracted from the input \(x\) for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x - y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

The difference of \(x\) and \(y\)

float_s32_t float_s32_div(const float_s32_t x, const float_s32_t y)

Divide one float_s32_t from another.

The input \(x\) is divided by the input \(y\) for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow \frac{x}{y} \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Throws ET_ARITHMETIC:

if \(Y\) is \(0\)

Returns:

The result of \(x / y\)

float_s32_t float_s32_abs(const float_s32_t x)

Get the absolute value of a float_s32_t.

\(a\), the absolute value of \(x\) is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow \left| x \right| \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

Returns:

The absolute value of \(x\)

unsigned float_s32_gt(const float_s32_t x, const float_s32_t y)

Determine whether one float_s32_t is greater than another.

The inputs \(x\) and \(y\) are compared. The result \(a\) is true iff \(x\) is greater than \(y\) and false otherwise. \(a\) is returned.

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow \begin{cases} 1 & x > y \\ 0 & otherwise \end{cases} \end{aligned}\end{split}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

1 iff \(x > y\); 0 otherwise

unsigned float_s32_gte(const float_s32_t x, const float_s32_t y)

Determine whether one float_s32_t is greater or equal to another.

The inputs \(x\) and \(y\) are compared. The result \(a\) is true iff \(x\) is greater than or equal to \(y\) and false otherwise. \(a\) is returned.

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow \begin{cases} 1 & x \geq y \\ 0 & otherwise \end{cases} \end{aligned}\end{split}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

1 iff \(x \geq y\); 0 otherwise

float_s32_t float_s32_ema(const float_s32_t x, const float_s32_t y, const uq2_30 coef)

Update an exponential moving average.

This function updates an exponential moving average by applying a single new sample. \(x\) is taken as the previous EMA state, with \(y\) as the new sample. The EMA coefficient \(\alpha\) is applied to the term including \(x\).

coef is a fixed-point value in a UQ2.30 format (i.e. has an implied exponent of \(-30\)), and should be in the range \(0 \leq \alpha \leq 1\).

Operation Performed

\[\begin{aligned} & a \leftarrow \alpha \cdot x + (1 - \alpha) \cdot y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

  • coef[in] EMA coefficient \(\alpha\) encoded in UQ2.30 format

Returns:

The new EMA state

float_s32_t float_s32_sqrt(const float_s32_t x)

Get the square root of a float_s32_t.

This function computes the square root of \(x\). The result, \(a\) is returned.

The precision with which \(a\) is computed is configurable via the XMATH_BFP_SQRT_DEPTH_S32 configuration parameter. It indicates the number of most significant bits to be calculated.

Operation Performed

\[\begin{aligned} & a \leftarrow \sqrt{x} \end{aligned}\]

Warning

\(x\) must be non-negative to get a correct result.

Parameters:
  • x[in] Input operand \(x\)

Returns:

The square root of \(x\)

float_s32_t float_s32_exp(const float_s32_t x)

Compute \(e^x\).

This function computes \(e^x\) for real input \(x\).

If \(x\) is known to be in the interval \(\left[-0.5,0.5\right]\), q30_exp_small() (which is used internally by this function) may be used instead for a speed boost.

Operation Performed

\[\begin{aligned} & y \leftarrow e^x \end{aligned}\]

Parameters:
  • x[in] Input \(x\)

Returns:

\(y\)

float_s32_t float_s64_to_float_s32(const float_s64_t x)

Convert a float_s64_t to a float_s32_t.

Note

This operation may result in precision loss.

Parameters:
  • x[in] Input value

Returns:

float_s32_t representation of x

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$16-bit complex scalar floating-point API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_complex_s16.html#bit-complex-scalar-floating-point-api
group float_complex_s16_api

Functions

float_complex_s16_t float_complex_s16_mul(const float_complex_s16_t x, const float_complex_s16_t y)

Multiply two float_complex_s16_t together.

The inputs \(x\) and \(y\) are multiplied together (using complex multiplication) for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x \cdot y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

\(a\), the complex product of \(x\) and \(y\)

float_complex_s16_t float_complex_s16_add(const float_complex_s16_t x, const float_complex_s16_t y)

Add two float_complex_s16_t together.

The inputs \(x\) and \(y\) are added together for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x + y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

\(a\), the sum of \(x\) and \(y\)

float_complex_s16_t float_complex_s16_sub(const float_complex_s16_t x, const float_complex_s16_t y)

Subtract one float_complex_s16_t from another.

The input \(y\) is subtracted from the input \(x\) for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x - y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

\(a\), the difference of \(x\) and \(y\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$32-bit complex scalar floating-point API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_complex_s32.html#bit-complex-scalar-floating-point-api
group float_complex_s32_api

Functions

float_complex_s32_t float_complex_s32_mul(const float_complex_s32_t x, const float_complex_s32_t y)

Multiply two float_complex_s32_t together.

The inputs \(x\) and \(y\) are multiplied together (using complex multiplication) for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x \cdot y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

\(a\), the complex product of \(x\) and \(y\)

float_complex_s32_t float_complex_s32_add(const float_complex_s32_t x, const float_complex_s32_t y)

Add two float_complex_s32_t together.

The inputs \(x\) and \(y\) are added together for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x + y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

\(a\), the sum of \(x\) and \(y\)

float_complex_s32_t float_complex_s32_sub(const float_complex_s32_t x, const float_complex_s32_t y)

Subtract one float_complex_s32_t from another.

The input \(y\) is subtracted from the input \(x\) for a result \(a\), which is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow x - y \end{aligned}\]

Parameters:
  • x[in] Input operand \(x\)

  • y[in] Input operand \(y\)

Returns:

\(a\), the difference of \(x\) and \(y\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Scalar API$$$Miscellaneous scalar API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/scalar/scalar_misc.html#miscellaneous-scalar-api
group scalar_misc_api

Functions

static inline unsigned u32_ceil_log2(unsigned N)

Get the size of a 32-bit unsigned number.

This function reports the size of the number as \(a\), the number of bits required to store unsigned integer \(N\). This is equivalent to \( ceil\left(log_2\left(N\right)\right) \).

N is the input \(N\).

Operation Performed

\[\begin{split}\begin{aligned} a \leftarrow \begin{cases} 0 & N = 0 \\ \lceil log_2\left( N \right) \rceil & otherwise \end{cases} \end{aligned}\end{split}\]

Parameters:
  • N[in] Number to get the size of

Returns:

Number of bits \(a\) required to store \(N\)

int32_t s64_to_s32(exponent_t *a_exp, const int64_t b, const exponent_t b_exp)

Convert a 64-bit floating-point scalar to a 32-bit floating-point scalar.

Converts a 64-bit floating-point scalar, represented by the 64-bit mantissa b and exponent b_exp, into a 32-bit floating-point scalar, represented by the 32-bit returned mantissa and output exponent a_exp.

Parameters:
  • a_exp[out] Output exponent

  • b[in] 64-bit input mantissa

  • b_exp[in] Input exponent

Returns:

32-bit output mantissa

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_index.html#vector-api

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$Vector API quick reference£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_quickref.html#vector-api-quick-reference

The tables below list the functions of the vector API. The “EW” column indicates whether the operation acts element-wise.

The “Signature” column is intended as a hint which quickly conveys the kind of the conceptual inputs to and outputs from the operation. The signatures are only intended to convey how many (conceptual) inputs and outputs there are, and their dimensionality.

The functions themselves will typically take more arguments than these signatures indicate. For example, most functions take vector lengths as input, and many take shift values which are used to control growth of element bit-depth. Check the function’s full documentation to get more detailed information.

The following symbols are used in the signatures:

Symbol

Description

\(\mathbb{S}\)

A scalar input or output value.

\(\mathbb{V}\)

A vector-valued input or output.

\(\mathbb{M}\)

A matrix-valued input or output.

\(\varnothing\)

Placeholder indicating no input or output.

For example, the operation signature \((\mathbb{V \times V \times S}) \to \mathbb{V}\) indicates the operation takes two vector inputs and a scalar input, and the output is a vector.

32-bit Vector Ops

Function

EW

Signature

vect_s32_copy()

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_abs()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_abs_sum()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_add()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_add_scalar()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s32_argmax()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_argmin()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_clip()

x

\((\mathbb{V \times S \times S})\) \(\to \mathbb{V}\)

vect_s32_dot()

\((\mathbb{V \times V})\) \(\to \mathbb{S}\)

vect_s32_energy()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_headroom()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_inverse()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_max()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_max_elementwise()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_min()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_min_elementwise()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_macc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_s32_nmacc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_s32_rect()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_scale()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s32_set()

x

\(\mathbb{S}\) \(\to \mathbb{V}\)

vect_s32_shl()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s32_shr()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s32_sqrt()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_sub()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_sum()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s32_zip()

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_unzip()

\(\mathbb{V}\) \(\to (\mathbb{V \times V})\)

vect_s32_convolve_valid()

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_convolve_same()

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s32_log_base()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s32_log()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_log2()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_log10()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

chunk_s32_dot()

\((\mathbb{V \times V})\) \(\to \mathbb{S}\)

chunk_s32_log()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

16-bit Vector Ops

Function

EW

Signature

vect_s16_abs()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s16_abs_sum()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_add()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s16_add_scalar()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s16_argmax()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_argmin()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_clip()

x

\((\mathbb{V \times S \times S})\) \(\to \mathbb{V}\)

vect_s16_dot()

\((\mathbb{V \times V})\) \(\to \mathbb{S}\)

vect_s16_energy()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_headroom()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_inverse()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s16_max()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_max_elementwise()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s16_min()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_min_elementwise()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s16_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s16_macc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_s16_nmacc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_s16_rect()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s16_scale()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s16_set()

x

\(\mathbb{S}\) \(\to \mathbb{V}\)

vect_s16_shl()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s16_shr()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s16_sqrt()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s16_sub()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_s16_sum()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_s16_extract_high_byte()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s16_extract_low_byte()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

8-bit Vector Ops

Function

EW

Signature

Brief

vect_s8_is_negative()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

Identify negative elements

32-bit Complex Vector Ops

Function

EW

Signature

vect_complex_s32_add()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_add_scalar()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s32_conj_macc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_conj_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_conj_nmacc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_conjugate()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_complex_s32_headroom()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_complex_s32_macc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_mag()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_complex_s32_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_nmacc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_real_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_real_scale()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s32_scale()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s32_set()

x

\(\mathbb{S}\) \(\to \mathbb{V}\)

vect_complex_s32_shl()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s32_shr()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s32_squared_mag()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_complex_s32_sub()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s32_sum()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_complex_s32_tail_reverse()

\(\mathbb{V}\) \(\to \mathbb{V}\)

16-bit Complex Vector Ops

Function

EW

Signature

vect_complex_s16_add()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_add_scalar()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s16_conj_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_conj_macc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_conj_nmacc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_headroom()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_complex_s16_macc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_mag()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_complex_s16_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_nmacc()

x

\((\mathbb{V \times V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_real_mul()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_real_scale()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s16_scale()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s16_set()

x

\(\mathbb{S}\) \(\to \mathbb{V}\)

vect_complex_s16_shl()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s16_shr()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_complex_s16_squared_mag()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_complex_s16_sub()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_complex_s16_sum()

\(\mathbb{V}\) \(\to \mathbb{S}\)

Fixed-Point Vector Ops

Function

EW

Signature

vect_q30_power_series()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

vect_q30_exp_small()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

chunk_q30_power_series()

x

\((\mathbb{V \times V})\) \(\to \mathbb{V}\)

chunk_q30_exp_small()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

Floating-Point Vector Ops

Function

EW

Signature

vect_f32_max_exponent()

\(\mathbb{V}\) \(\to \mathbb{S}\)

vect_f32_dot()

\((\mathbb{V \times V})\) \(\to \mathbb{S}\)

vect_f32_add()

x

\(\mathbb{V \times V}\) \(\to \mathbb{V}\)

vect_float_s32_log_base()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_float_s32_log()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_float_s32_log2()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_float_s32_log10()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

chunk_float_s32_log()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_complex_f32_add()

x

\(\mathbb{V \times V}\) \(\to \mathbb{V}\)

vect_complex_f32_mul()

x

\(\mathbb{V \times V}\) \(\to \mathbb{V}\)

vect_complex_f32_conj_mul()

x

\(\mathbb{V \times V}\) \(\to \mathbb{V}\)

vect_complex_f32_macc()

x

\(\mathbb{V \times V \times V}\) \(\to \mathbb{V}\)

vect_complex_f32_conj_macc()

x

\(\mathbb{V \times V \times V}\) \(\to \mathbb{V}\)

Note that several of the functions below take vectors of the split_acc_s32_t type. This is a 32-bit vector type used for accumulating results of 8- or 16-bit operations in a manner optimized for the XS3 VPU.

Other Vector Ops

Function

EW

Signature

vect_split_acc_s32_shr()

x

\((\mathbb{V \times S})\) \(\to \mathbb{V}\)

vect_s32_merge_accs()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

vect_s32_split_accs()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

chunk_s16_accumulate()

x

\(\mathbb{V}\) \(\to \mathbb{V}\)

mat_mul_s8_x_s8_yield_s32()

\((\mathbb{M \times V})\) \(\to \mathbb{V}\)

mat_mul_s8_x_s16_yield_s32()

\((\mathbb{M \times V})\) \(\to \mathbb{V}\)

Vector Type Conversion Ops

Function

Array Element Type

Input

Output

vect_s16_to_vect_s32()

int16_t

int32_t

vect_s32_to_vect_s16()

int32_t

int16_t

vect_s32_to_vect_f32()

int32_t

float

vect_f32_to_vect_s32()

float

int32_t

vect_complex_s16_to_vect_complex_s32()

complex_s16_t

complex_s32_t

vect_complex_s32_to_vect_complex_s16()

complex_s32_t

complex_s16_t

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$8-bit vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_s8.html#bit-vector-api
group vect_s8_api

Functions

void vect_s8_is_negative(int8_t a[], const int8_t b[], const unsigned length)

Determine whether each element of a signed 8-bit input vector are negative.

Each element \(a_k\) of 8-bit output vector \(\bar a\) is set to 1 if the corresponding element \(b_k\) of 8-bit input vector \(\bar b\) is negative, and is set to 0 otherwise.

a[] represents the 8-bit output vector \(\bar a\), with the element a[k] representing \(a_k\).

b[] represents the 8-bit input vector \(\bar b\), with the element b[k] representing \(b_k\).

length is the number of elements in a[] and b[].

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \begin{cases} 1 & b_k < 0 \\ 0 & otherwise\end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

void mat_mul_s8_x_s8_yield_s32(split_acc_s32_t accumulators[], const int8_t matrix[], const int8_t input_vect[], const unsigned M_rows, const unsigned N_cols)

Multiply-accumulate an 8-bit matrix by an 8-bit vector into 32-bit accumulators.

This function multiplies an 8-bit \(M \times N\) matrix \(\bar W\) by an 8-bit \(N\)-element column vector \(\bar v\) and adds it to the 32-bit accumulator vector \(\bar a\).

accumulators is the output vector \(\bar a\) to which the product \(\bar W\times\bar v\) is accumulated. Note that the accumulators are encoded in a format native to the xcore VPU. To initialize the accumulator vector to zeros, just zero the memory.

matrix is the matrix \(\bar W\).

input_vect is the vector \(\bar v\).

matrix and input_vect must both begin at a word-aligned offsets.

M_rows and N_rows are the dimensions \(M\) and \(N\) of matrix \(\bar W\). \(M\) must be a multiple of 16, and \(N\) must be a multiple of 32.

The result of this multiplication is exact, so long as saturation does not occur.

Parameters:
  • accumulators[inout] The accumulator vector \(\bar a\)

  • matrix[in] The weight matrix \(\bar W\)

  • input_vect[in] The input vector \(\bar v\)

  • M_rows[in] The number of rows \(M\) in matrix \(\bar W\)

  • N_cols[in] The number of columns \(N\) in matrix \(\bar W\)

Throws ET_LOAD_STORE:

Raised if `matrix` or `input_vect` is not word-aligned (See Note: Vector Alignment)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$16-bit vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_s16.html#bit-vector-api
group vect_s16_api

Functions

headroom_t vect_s16_abs(int16_t a[], const int16_t b[], const unsigned length)

Compute the element-wise absolute value of a 16-bit vector.

a[] and b[] represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{32}(\left| b_k \right|) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

int32_t vect_s16_abs_sum(const int16_t b[], const unsigned length)

Compute the sum of the absolute values of elements of a 16-bit vector.

b[] represents the 16-bit vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} a \leftarrow \sum_{k=0}^{length-1} \left| b_k \right| \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

The 32-bit sum \(a\)

headroom_t vect_s16_add(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Add one 16-bit BFP vector to another.

a[], b[] and c[] represent the 16-bit vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' = sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' = sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow sat_{16}\!\left( b_k' + c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_s16_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_s16_add_scalar(int16_t a[], const int16_t b[], const int16_t c, const unsigned length, const right_shift_t b_shr)

Add a scalar to a 16-bit vector.

a[], b[] represent the 16-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

c is the scalar \(c\) to be added to each element of \(\bar b\).

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shifts applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' = sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow sat_{16}\!\left( b_k' + c \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If elements of \(\bar b\) are the mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp} \), and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_s16_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Note that \(c\_shr\) is an output of vect_s16_add_scalar_prepare(), but is not a parameter to this function. The \(c\_shr\) produced by vect_s16_add_scalar_prepare() is to be applied by the user, and the result passed as input c.

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input scalar \(c\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

unsigned vect_s16_argmax(const int16_t b[], const unsigned length)

Obtain the array index of the maximum element of a 16-bit vector.

b[] represents the 16-bit input vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmax_k\{ b_k \} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elemetns in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

\(a\), the index of the maximum element of vector \(\bar b\). If there is a tie for the maximum value, the lowest tying index is returned.

unsigned vect_s16_argmin(const int16_t b[], const unsigned length)

Obtain the array index of the minimum element of a 16-bit vector.

b[] represents the 16-bit input vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmin_k\{ b_k \} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elemetns in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

\(a\), the index of the minimum element of vector \(\bar b\). If there is a tie for the minimum value, the lowest tying index is returned.

headroom_t vect_s16_clip(int16_t a[], const int16_t b[], const unsigned length, const int16_t lower_bound, const int16_t upper_bound, const right_shift_t b_shr)

Clamp the elements of a 16-bit vector to a specified range.

a[] and b[] represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

lower_bound and upper_bound are the lower and upper bounds of the clipping range respectively. These bounds are checked for each element of \(\bar b\) only after b_shr is applied.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\) before being compared to the upper and lower bounds.

If \(\bar b\) are the mantissas for a BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the exponent \(a\_exp\) of the output BFP vector \(\bar{a} \cdot 2^{a\_exp}\) is given by \(a\_exp = b\_exp + b\_shr\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow \begin{cases} lower\_bound & b_k' \le lower\_bound \\ upper\_bound & b_k' \ge upper\_bound \\ b_k' & otherwise \end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • lower_bound[in] Lower bound of clipping range

  • upper_bound[in] Upper bound of clipping range

  • b_shr[in] Arithmetic right-shift applied to elements of \(\bar b\) prior to clipping

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

int64_t vect_s16_dot(const int16_t b[], const int16_t c[], const unsigned length)

Compute the inner product of two 16-bit vectors.

b[] and c[] represent the 32-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address.

length is the number of elements in each of the vectors.

Operation Performed

\[\begin{aligned} a \leftarrow \sum_{k=0}^{length-1}\left( b_k \cdot c_k \right) \end{aligned}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of the BFP vectors \( \bar{b} \cdot 2^{b\_exp}\) and \(\bar{c}\cdot 2^{c\_exp}\), then result \(a\) is the mantissa of the result \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp\).

If needed, the bit-depth of \(a\) can then be reduced to 16 or 32 bits to get a new result \(a' \cdot 2^{a\_exp'}\) where \(a' = a \cdot 2^{-a\_shr}\) and \(a\_exp' = a\_exp + a\_shr\).

Notes

The sum \(a\) is accumulated simultaneously into 16 48-bit accumulators which are summed together at the final step. So long as length is less than roughly 2 million, no overflow or saturation of the resulting sum is possible.

Parameters:
  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

\(a\), the inner product of vectors \(\bar b\) and \(\bar c\).

int32_t vect_s16_energy(const int16_t b[], const unsigned length, const right_shift_t b_shr)

Calculate the energy (sum of squares of elements) of a 16-bit vector.

b[] represents the 16-bit vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\). b_shr should be chosen to avoid the possibility of saturation. See the note below.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a \leftarrow \sum_{k=0}^{length-1} (b_k')^2 \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of the BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then floating-point result is \(a \cdot 2^{a\_exp}\), where the 32-bit mantissa \(a\) is returned by this function, and \(a\_exp = 2 \cdot (b\_exp + b\_shr) \).

Additional Details

If \(\bar b\) has \(b\_hr\) bits of headroom, then each product \((b_k')^2\) can be a maximum of \( 2^{30 - 2 \cdot (b\_hr + b\_shr)}\). So long as length is less than \(1 + 2\cdot (b\_hr + b\_shr) \), such errors should not be possible. Each increase of \(b\_shr\) by \(1\) doubles the number of elements that can be summed without risk of overflow.

If the caller’s mantissa vector is longer than that, the full result can be found by calling this function multiple times for partial results on sub-sequences of the input, and adding the results in user code.

In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are application-specific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

64-bit mantissa of vector \(\bar b\)’s energy

headroom_t vect_s16_headroom(const int16_t b[], const unsigned length)

Calculate the headroom of a 16-bit vector.

The headroom of an N-bit integer is the number of bits that the integer’s value may be left-shifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.

The headroom of an int16_t array is the minimum of the headroom of each of its int16_t elements.

This function efficiently traverses the elements of b[] to determine its headroom.

b[] represents the 16-bit vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in b[].

Operation Performed

\[\begin{aligned} a \leftarrow min\!\{ HR_{16}\left(x_0\right), HR_{16}\left(x_1\right), ..., HR_{16}\left(x_{length-1}\right) \} \end{aligned}\]

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] The number of elements in vector \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar b\)

void vect_s16_inverse(int16_t a[], const int16_t b[], const unsigned length, const unsigned scale)

Compute the inverse of elements of a 16-bit vector.

a[] and b[] represent the 16-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

scale is a scaling parameter used to maximize the precision of the result.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \lfloor\frac{2^{scale}}{b_k}\rfloor \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = scale - b\_exp\).

The function vect_s16_inverse_prepare() can be used to obtain values for \(a\_exp\) and \(scale\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • scale[in] Scale factor applied to dividend when computing inverse

Returns:

Headroom of output vector \(\bar a\)

int16_t vect_s16_max(const int16_t b[], const unsigned length)

Find the maximum value in a 16-bit vector.

b[] represents the 16-bit vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} max\{ x_0, x_1, ..., x_{length-1} \} \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 16-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Maximum value from \(\bar b\)

headroom_t vect_s16_max_elementwise(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Get the element-wise maximum of two 16-bit vectors.

a[], b[] and c[] represent the 16-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[], but not on c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow max(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).

The function vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Warning

For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar a\)

int16_t vect_s16_min(const int16_t b[], const unsigned length)

Find the minimum value in a 16-bit vector.

b[] represents the 16-bit vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} max\{ x_0, x_1, ..., x_{length-1} \} \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 16-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Minimum value from \(\bar b\)

headroom_t vect_s16_min_elementwise(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Get the element-wise minimum of two 16-bit vectors.

a[], b[] and c[] represent the 16-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[], but not on c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow min(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).

The function vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Warning

For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar a\)

headroom_t vect_s16_macc(int16_t acc[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)

Multiply one 16-bit vector element-wise by another, and add the result to an accumulator.

acc[] represents the 16-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the 16-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr is the signed arithmetic right-shift applied to the accumulators \(a_k\) prior to accumulation.

bc_sat is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before accumulation.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow round( sat_{16}( b_k \cdot c_k \cdot 2^{-bc\_sat} ) ) \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & a_k \leftarrow sat_{16}( \hat{a}_k + v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Accumulator \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • bc_sat[in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_s16_nmacc(int16_t acc[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)

Multiply one 16-bit vector element-wise by another, and subtract the result from an accumulator.

acc[] represents the 16-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the 16-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr is the signed arithmetic right-shift applied to the accumulators \(a_k\) prior to accumulation.

bc_sat is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before accumulation.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow round( sat_{16}( b_k \cdot c_k \cdot 2^{-bc\_sat} ) ) \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & a_k \leftarrow sat_{16}( \hat{a}_k - v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_nmacc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Accumulator \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • bc_sat[in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_s16_mul(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t a_shr)

Multiply two 16-bit vectors together element-wise.

a[], b[] and c[] represent the 16-bit vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

a_shr is an unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.

Operation Performed

\[\begin{split}\begin{aligned} & a_k' \leftarrow b_k \cdot c_k \\ & a_k \leftarrow sat_{16}(round(a_k' \cdot 2^{-a\_shr})) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_s16_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • a_shr[in] Right-shift appled to 32-bit products

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar a\)

headroom_t vect_s16_rect(int16_t a[], const int16_t b[], const unsigned length)

Rectify the elements of a 16-bit vector.

Rectification ensures that all outputs are non-negative, changing negative values to 0.

a[] and b[] represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

Each output element a[k] is set to the value of the corresponding input element b[k] if it is positive, and a[k] is set to zero otherwise.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \begin{cases} b_k & b_k > 0 \\ 0 & b_k \leq 0\end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_s16_scale(int16_t a[], const int16_t b[], const unsigned length, const int16_t c, const right_shift_t a_shr)

Multiply a 16-bit vector by a 16-bit scalar.

a[] and b[] represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

c is the 16-bit scalar \(c\) by which elements of \(\bar b\) are multiplied.

a_shr is an unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.

Operation Performed

\[\begin{split}\begin{aligned} & a_k' \leftarrow b_k \cdot c \\ & a_k \leftarrow sat_{16}(round(a_k' \cdot 2^{-a\_shr})) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_s16_scale_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • a_shr[in] Right-shift appled to 32-bit products

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar a\)

void vect_s16_set(int16_t a[], const int16_t b, const unsigned length)

Set all elements of a 16-bit vector to the specified value.

a[] represents the 16-bit vector \(\bar a\). It must begin at a word-aligned address.

b is the value elements of \(\bar a\) are set to.

length is the number of elements in a[].

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow b \\ & \qquad\text{for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(b\) is the mantissa of floating-point value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input value \(b\)

  • length[in] Number of elements in vector \(\bar a\)

Throws ET_LOAD_STORE:

Raised if `a` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_s16_shl(int16_t a[], const int16_t b[], const unsigned length, const left_shift_t b_shl)

Left-shift the elements of a 16-bit vector by a specified number of bits.

a[] and b[] represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in vectors \(\bar a\) and \(\bar b\).

b_shl is the signed arithmetic left-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{16}(\lfloor b_k \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shl[in] Arithmetic left-shift applied to elements of \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

headroom_t vect_s16_shr(int16_t a[], const int16_t b[], const unsigned length, const right_shift_t b_shr)

Right-shift the elements of a 16-bit vector by a specified number of bits.

a[] and b[] represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in vectors \(\bar a\) and \(\bar b\).

b_shr is the signed arithmetic right-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{-b\_shr}\) and \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Arithmetic right-shift applied to elements of \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

headroom_t vect_s16_sqrt(int16_t a[], const int16_t b[], const unsigned length, const right_shift_t b_shr, const unsigned depth)

Compute the square roots of elements of a 16-bit vector.

a[] and b[] represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each vector must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

depth is the number of most significant bits to calculate of each \(a_k\). For example, a depth value of 8 will only compute the 8 most significant byte of the result, with the remaining byte as 0. The maximum value for this parameter is VECT_SQRT_S16_MAX_DEPTH (31). The time cost of this operation is approximately proportional to the number of bits computed.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow \begin{cases} \sqrt{ b_k' } & b_k' >= 0 \\ 0 & otherwise\end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \\ & \qquad\text{ where } \sqrt{\cdot} \text{ computes the most significant } depth \text{ bits of the square root.} \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = (b\_exp + b\_shr - 14)/2\).

Note that because exponents must be integers, that means \(b\_exp + b\_shr\) must be even.

The function vect_s16_sqrt_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).

Notes

  • This function assumes roots are real. Negative input elements will result in corresponding outputs of 0.

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • depth[in] Number of bits of each output value to compute

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

headroom_t vect_s16_sub(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Subtract one 16-bit BFP vector from another.

a[], b[] and c[] represent the 16-bit vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' = sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' = sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow sat_{16}\!\left( b_k' - c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_s16_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

int32_t vect_s16_sum(const int16_t b[], const unsigned length)

Get the sum of elements of a 16-bit vector.

b[] represents the 16-bit vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} a \leftarrow \sum_{k=0}^{length-1} b_k \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

The 32-bit sum \(a\)

unsigned chunk_s16_accumulate(split_acc_s32_t *acc, const int16_t b[VPU_INT16_EPV], const right_shift_t b_shr, const unsigned vpu_ctrl)

Accumulate a 16-bit vector chunk into a 32-bit accumulator chunk.

16-bit vector chunk \(\bar b\) is shifted and accumulated into 32-bit accumulator vector chunk \(\bar a\) (acc). This function is used for efficiently accumulating multiple (possibly many) 16-bit vectors together.

The accumulator vector \(\bar a\) stores its elements across two 16-bit vector chunks, which corresponds to how the accumulators are stored internally across VPU registers vD and vR. See split_acc_s32_t for details about the accumulator structure.

The signed arithmetic right-shift b_shr is applied to \(\bar b\) prior to being accumulated into \(\bar a\). When \(\bar b\) and \(\bar a\), are the mantissas of block floating point vectors, using b_shr allows those vectors to have different exponents. This is also important when this function is to be called periodically where each \(\bar b\) may have a different exponent.

b_shr must meet the condition -14 <= b_shr <= 14 or the behavior of this function is undefined.

Operation Performed

\[\begin{aligned} & a_k \leftarrow a_k + floor( \frac{b_k}{2^{-\mathtt{b\_shr}}} ) \end{aligned}\]

The input vpu_ctrl tracks the VPU’s control register state during accumulation. In particular, it is used for keeping track of the headroom of the accumulator vector \(\bar a\). When beginning a sequence of accumulation calls, the value passed in should be initialized to VPU_INT16_CTRL_INIT. On completion, this function returns the updated VPU control register state, which should be passed in as vpu_ctrl on the next accumulation call.

VPU Control Value

The idea is that each call to this function processes only a single ‘chunk’ (in 16-bit mode, a 16-element block) at a time, but the caller usually wants to know the headroom of a whole vector, which may comprise many such chunks. So vpu_ctrl is a value which persists through each of these calls to track the whole vector.

Once all chunks have been accumulated, the VPU_INT16_HEADROOM_FROM_CTRL() macro can be used to get the headroom of the accumulator vector. Note that this will produce a maximum value of 15.

If many vector chunks \(\bar b\) are accumulated into the same accumulators (when using block floating-point, it may be only a few accumulations if the exponent associated with \(\bar b\) is significantly larger than that associated with \(\bar a\)), saturation becomes possible.

Accumulating Many Values

When saturation is possible, the user must monitor the headroom of \(\bar a\) (using the returned value and VPU_INT16_HEADROOM_FROM_CTRL()) to detect when there is no further headroom. As long as there is at least 1 bit of headroom, a call to this function cannot saturate.

Typically, when using block floating-point, this will be handled by:

  • Converting \(\bar a\) to a standard vector of int32_t using vect_s32_merge_accs()

  • Right-shift the values of \(\bar a\) using vect_s32_shr()

  • Increment the exponent associated with \(\bar a\) by the same amount right-shifted

  • Convert \(\bar a\) back into the split accumulator format using vect_s32_split_accs()

When accumulating, setting b_shr to the exponent associated with \(\bar b\) minus the exponent associated with \(\bar a\) will automatically adjust for the new exponent of \(\bar a\).

Parameters:
  • acc[inout] b

  • b[in] v

  • b_shr[in] v

  • vpu_ctrl[in] e

Throws ET_LOAD_STORE:

Raised if `acc` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Current state of VPU control register.

void vect_s16_to_vect_s32(int32_t a[], const int16_t b[], const unsigned length)

Convert a 16-bit vector to a 32-bit vector.

a[] represents the 32-bit output vector \(\bar a\).

b[] represents the 16-bit input vector \(\bar b\).

Each vector must begin at a word-aligned address.

length is the number of elements in each of the vectors.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow b_k \cdot 2^{8} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\). If \(a\_exp = b\_exp - 8\), then this operation has effectively not changed the values represented.

Notes

  • The multiplication by \(2^8\) is an artifact of the VPU’s behavior. It turns out to be significantly more efficient to include the factor of \(2^8\). If this is unwanted, vect_s32_shr() can be used with a b_shr value of 8 to remove the scaling afterwards.

  • The headroom of output vector \(\bar a\) is not returned by this function. The headroom of the output is always 8 bits greater than the headroom of the input.

Parameters:
  • a[out] 32-bit output vector \(\bar a\)

  • b[in] 16-bit input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

void vect_s16_extract_high_byte(int8_t a[], const int16_t b[], const unsigned len)

Extract an 8-bit vector containing the most significant byte of a 16-bit vector.

This is a utility function used, for example, in optimizing mixed-width products. The most significant byte of each element is extracted (without rounding or saturation) and inserted into the output vector.

Parameters:
  • a[out] 8-bit output vector \(\bar a\)

  • b[in] 16-bit input vector \(\bar b\)

  • len[in] The number of elements in \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

void vect_s16_extract_low_byte(int8_t a[], const int16_t b[], const unsigned len)

Extract an 8-bit vector containing the least significant byte of a 16-bit vector.

This is a utility function used, for example, in optimizing mixed-width products. The least significant byte of each element is extracted (without rounding or saturation) and inserted into the output vector.

Parameters:
  • a[out] 8-bit output vector \(\bar a\)

  • b[in] 16-bit input vector \(\bar b\)

  • len[in] The number of elements in \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$32-bit vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_s32.html#bit-vector-api
group vect_s32_api

Defines

VECT_SQRT_S32_MAX_DEPTH

Maximum bit-depth that can be calculated by vect_s32_sqrt().

See also

vect_s32_sqrt

Enums

enum pad_mode_e

Supported padding modes for convolutions in “same” mode.

Values:

enumerator PAD_MODE_REFLECT

Vector is reflected at its boundaries, such that

\( \tilde{x}_i \begin{cases} x_{-i} & i < 0 \\ x_{2N - 2 - i} & i \ge N \\ x_i & otherwise \end{cases} \)

For example, if the length \(N\) of input vector \(\bar x\) is \(7\) and the order \(K\) of the filter is \(5\), then

\( \bar{x} = [x_0, x_1, x_2, x_3, x_4, x_5, x_6] \)

\( \tilde{x} = [x_2, x_1, x_0, x_1, x_2, x_3, x_4, x_5, x_6, x_5, x_4] \)

Note that by convention the first element of \(\tilde{x}\) is considered to be at index \(-P\), where \(P = \lfloor K/2 \rfloor\).

enumerator PAD_MODE_EXTEND

Vector is padded using the value of the bounding elements.

\( \tilde{x}_i \begin{cases} x_{0} & i < 0 \\ x_{N-1} & i \ge N \\ x_i & otherwise \end{cases} \)

For example, if the length \(N\) of input vector \(\bar x\) is \(7\) and the order \(K\) of the filter is \(5\), then

\( \bar{x} = [x_0, x_1, x_2, x_3, x_4, x_5, x_6] \)

\( \tilde{x} = [x_0, x_0, x_0, x_1, x_2, x_3, x_4, x_5, x_6, x_6, x_6] \)

Note that by convention the first element of \(\tilde{x}\) is considered to be at index \(-P\), where \(P = \lfloor K/2 \rfloor\).

enumerator PAD_MODE_ZERO

Vector is padded with zeroes.

\( \tilde{x}_i \begin{cases} 0 & i < 0 \\ 0 & i \ge N \\ x_i & otherwise \end{cases} \)

For example, if the length \(N\) of input vector \(\bar x\) is \(7\) and the order \(K\) of the filter is \(5\), then

\( \bar{x} = [x_0, x_1, x_2, x_3, x_4, x_5, x_6] \)

\( \tilde{x} = [0, 0, x_0, x_1, x_2, x_3, x_4, x_5, x_6, 0, 0] \)

Note that by convention the first element of \(\tilde{x}\) is considered to be at index \(-P\), where \(P = \lfloor K/2 \rfloor\).

Functions

headroom_t vect_s32_copy(int32_t a[], const int32_t b[], const unsigned length)

Copy one 32-bit vector to another.

This function is effectively a constrained version of memcpy.

With the constraints below met, this function should be modestly faster than memcpy.

a[] is the output vector to which elements are copied.

b[] is the input vector from which elements are copied.

a and b each must begin at a word-aligned address.

length is the number of elements to be copied. length must be a multiple of 8.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow b_k \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

headroom_t vect_s32_abs(int32_t a[], const int32_t b[], const unsigned length)

Compute the element-wise absolute value of a 32-bit vector.

a[] and b[] represent the 32-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{32}(\left| b_k \right|) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

int64_t vect_s32_abs_sum(const int32_t b[], const unsigned length)

Compute the sum of the absolute values of elements of a 32-bit vector.

b[] represents the 32-bit mantissa vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} \sum_{k=0}^{length-1} sat_{32}(\left| b_k \right|) \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 64-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Additional Details

Internally the sum accumulates into 8 separate 40-bit accumulators. These accumulators apply symmetric 40-bit saturation logic (with bounds \(\pm (2^{39}-1)\)) with each added element. At the end, the 8 accumulators are summed together into the 64-bit value \(a\) which is returned by this function. No saturation logic is applied at this final step.

Because symmetric 32-bit saturation logic is applied when computing the absolute value, in the corner case where each element is INT32_MIN, each of the 8 accumulators can accumulate \(256\) elements before saturation is possible. Therefore, with \(b\_hr\) bits of headroom, no saturation of intermediate results is possible with fewer than \(2^{11 + b\_hr}\) elements in \(\bar b\).

If the length of \(\bar b\) is greater than \(2^{11 + b\_hr}\), the sum can be computed piece-wise in several calls to this function, with the partial results summed in user code.

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

The 64-bit sum \(a\)

headroom_t vect_s32_add(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Add together two 32-bit vectors.

a[], b[] and c[] represent the 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' = sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' = sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}\!\left( b_k' + c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_s32_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_s32_add_scalar(int32_t a[], const int32_t b[], const int32_t c, const unsigned length, const right_shift_t b_shr)

Add a scalar to a 32-bit vector.

a[], b[] represent the 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

c is the scalar \(c\) to be added to each element of \(\bar b\).

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' = sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}\!\left( b_k' + c \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If elements of \(\bar b\) are the mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp} \), and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_s32_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Note that \(c\_shr\) is an output of vect_s32_add_scalar_prepare(), but is not a parameter to this function. The \(c\_shr\) produced by vect_s32_add_scalar_prepare() is to be applied by the user, and the result passed as input c.

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input scalar \(c\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

unsigned vect_s32_argmax(const int32_t b[], const unsigned length)

Obtain the array index of the maximum element of a 32-bit vector.

b[] represents the 32-bit input vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmax_k\{ b_k \} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elemetns in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

\(a\), the index of the maximum element of vector \(\bar b\). If there is a tie for the maximum value, the lowest tying index is returned.

unsigned vect_s32_argmin(const int32_t b[], const unsigned length)

Obtain the array index of the minimum element of a 32-bit vector.

b[] represents the 32-bit input vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a \leftarrow argmin_k\{ b_k \} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elemetns in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

\(a\), the index of the minimum element of vector \(\bar b\). If there is a tie for the minimum value, the lowest tying index is returned.

headroom_t vect_s32_clip(int32_t a[], const int32_t b[], const unsigned length, const int32_t lower_bound, const int32_t upper_bound, const right_shift_t b_shr)

Clamp the elements of a 32-bit vector to a specified range.

a[] and b[] represent the 32-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

lower_bound and upper_bound are the lower and upper bounds of the clipping range respectively. These bounds are checked for each element of \(\bar b\) only after b_shr is applied.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\) before being compared to the upper and lower bounds.

If \(\bar b\) are the mantissas for a BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the exponent \(a\_exp\) of the output BFP vector \(\bar{a} \cdot 2^{a\_exp}\) is given by \(a\_exp = b\_exp + b\_shr\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow \begin{cases} lower\_bound & b_k' \le lower\_bound \\ upper\_bound & b_k' \ge upper\_bound \\ b_k' & otherwise \end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • lower_bound[in] Lower bound of clipping range

  • upper_bound[in] Upper bound of clipping range

  • b_shr[in] Arithmetic right-shift applied to elements of \(\bar b\) prior to clipping

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

int64_t vect_s32_dot(const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Compute the inner product between two 32-bit vectors.

b[] and c[] represent the 32-bit mantissa vectors \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address.

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a \leftarrow \sum_{k=0}^{length-1}\left(round( b_k' \cdot c_k' \cdot 2^{-30} ) \right) \\ & \qquad\text{where } a \text{ is returned} \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of the BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c}\cdot 2^{c\_exp}\), then result \(a\) is the 64-bit mantissa of the result \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr + 30\).

If needed, the bit-depth of \(a\) can then be reduced to 32 bits to get a new result \(a' \cdot 2^{a\_exp'}\) where \(a' = a \cdot 2^{-a\_shr}\) and \(a\_exp' = a\_exp + a\_shr\).

The function vect_s32_dot_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Additional Details

The 30-bit rounding right-shift applied to each of the 64-bit products \(b_k \cdot c_k\) is a feature of the hardware and cannot be avoided. As such, if the input vectors \(\bar b\) and \(\bar c\) together have too much headroom (i.e. \(b\_hr + c\_hr\)), the sum may effectively vanish. To avoid this situation, negative values of b_shr and c_shr may be used (with the stipulation that \(b\_shr \ge -b\_hr\) and \(c\_shr \ge -c\_hr\) if saturation of \(b_k'\) and \(c_k'\) is to be avoided). The less headroom \(b_k'\) and \(c_k'\) have, the greater the precision of the final result.

Internally, each product \((b_k' \cdot c_k' \cdot 2^{-30})\) accumulates into one of eight 40-bit accumulators (which are all used simultaneously) which apply symmetric 40-bit saturation logic (with bounds \(\approx 2^{39}\)) with each value added. The saturating arithmetic employed is not associative and no indication is given if saturation occurs at an intermediate step. To avoid satuation errors, length should be no greater than \(2^{10+b\_hr+c\_hr}\), where \(b\_hr\) and \(c\_hr\) are the headroom of \(\bar b\) and \(\bar c\) respectively.

If the caller’s mantissa vectors are longer than that, the full inner product can be found by calling this function multiple times for partial inner products on sub-sequences of the input vectors, and adding the results in user code.

In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are application-specific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.

Parameters:
  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

The inner product of vectors \(\bar b\) and \(\bar c\), scaled as indicated above.

int64_t vect_s32_energy(const int32_t b[], const unsigned length, const right_shift_t b_shr)

Calculate the energy (sum of squares of elements) of a 32-bit vector.

b[] represents the 32-bit mantissa vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a \leftarrow \sum_{k=0}^{length-1} round((b_k')^2 \cdot 2^{-30}) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of the BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then floating-point result is \(a \cdot 2^{a\_exp}\), where the 64-bit mantissa \(a\) is returned by this function, and \(a\_exp = 30 + 2 \cdot (b\_exp + b\_shr) \).

The function vect_s32_energy_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Additional Details

The 30-bit rounding right-shift applied to each element of the 64-bit products \((b_k')^2\) is a feature of the hardware and cannot be avoided. As such, if the input vector \(\bar b\) has too much headroom (i.e. \(2\cdot b\_hr\)), the sum may effectively vanish. To avoid this situation, negative values of b_shr may be used (with the stipulation that \(b\_shr \ge -b\_hr\) if satuartion of \(b_k'\) is to be avoided). The less headroom \(b_k'\) has, the greater the precision of the final result.

Internally, each product \((b_k')^2 \cdot 2^{-30}\) accumulates into one of eight 40-bit accumulators (which are all used simultaneously) which apply symmetric 40-bit saturation logic (with bounds \(\approx 2^{39}\)) with each value added. The saturating arithmetic employed is not associative and no indication is given if saturation occurs at an intermediate step. To avoid saturation errors, length should be no greater than \(2^{10+2\cdot b\_hr}\), where \(b\_hr\) is the headroom of \(\bar b\).

If the caller’s mantissa vector is longer than that, the full result can be found by calling this function multiple times for partial results on sub-sequences of the input, and adding the results in user code.

In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are application-specific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

64-bit mantissa of vector \(\bar b\)’s energy

headroom_t vect_s32_headroom(const int32_t x[], const unsigned length)

Calculate the headroom of a 32-bit vector.

The headroom of an N-bit integer is the number of bits that the integer’s value may be left-shifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.

The headroom of an int32_t array is the minimum of the headroom of each of its int32_t elements.

This function efficiently traverses the elements of a[] to determine its headroom.

x[] represents the 32-bit vector \(\bar x\). x[] must begin at a word-aligned address.

length is the number of elements in x[].

Operation Performed

\[\begin{aligned} min\!\{ HR_{32}\left(x_0\right), HR_{32}\left(x_1\right), ..., HR_{32}\left(x_{length-1}\right) \} \end{aligned}\]

Parameters:
  • x[in] Input vector \(\bar x\)

  • length[in] The number of elements in x[]

Throws ET_LOAD_STORE:

Raised if `x` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar x\)

headroom_t vect_s32_inverse(int32_t a[], const int32_t b[], const unsigned length, const unsigned scale)

Compute the inverse of elements of a 32-bit vector.

a[] and b[] represent the 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each vector must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

scale is a scaling parameter used to maximize the precision of the result.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \lfloor\frac{2^{scale}}{b_k}\rfloor \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = scale - b\_exp\).

The function vect_s32_inverse_prepare() can be used to obtain values for \(a\_exp\) and \(scale\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • scale[in] Scale factor applied to dividend when computing inverse

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

int32_t vect_s32_max(const int32_t b[], const unsigned length)

Find the maximum value in a 32-bit vector.

b[] represents the 32-bit vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} max\{ x_0, x_1, ..., x_{length-1} \} \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Maximum value from \(\bar b\)

headroom_t vect_s32_max_elementwise(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Get the element-wise maximum of two 32-bit vectors.

a[], b[] and c[] represent the 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[], but not on c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow max(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).

The function vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Warning

For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar a\)

int32_t vect_s32_min(const int32_t b[], const unsigned length)

Find the minimum value in a 32-bit vector.

b[] represents the 32-bit vector \(\bar b\). It must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} max\{ x_0, x_1, ..., x_{length-1} \} \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Minimum value from \(\bar b\)

headroom_t vect_s32_min_elementwise(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Get the element-wise minimum of two 32-bit vectors.

a[], b[] and c[] represent the 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[], but not on c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow min(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).

The function vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Warning

For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar a\)

headroom_t vect_s32_mul(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one 32-bit vector element-wise by another.

a[], b[] and c[] represent the 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}(round(b_k' \cdot c_k' \cdot 2^{-30})) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr + 30\).

The function vect_s32_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar a\)

headroom_t vect_s32_macc(int32_t acc[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one 32-bit vector element-wise by another, and add the result to an accumulator.

acc[] represents the 32-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the 32-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr, b_shr and c_shr are the signed arithmetic right-shifts applied to input elements \(a_k\), \(b_k\) and \(c_k\).

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{-b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{-c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{-acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( \tilde{b}_k \cdot \tilde{c}_k \cdot 2^{-30} ) ) \\ & a_k \leftarrow sat_{32}( \tilde{a}_k + v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Accumulator \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • b_shr[in] Signed arithmetic right-shift applied to elements of \(\bar b\)

  • c_shr[in] Signed arithmetic right-shift applied to elements of \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_s32_nmacc(int32_t acc[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one 32-bit vector element-wise by another, and subtract the result from an accumulator.

acc[] represents the 32-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the 32-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr, b_shr and c_shr are the signed arithmetic right-shifts applied to input elements \(a_k\), \(b_k\) and \(c_k\).

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{-b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{-c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{-acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( \tilde{b}_k \cdot \tilde{c}_k \cdot 2^{-30} ) ) \\ & a_k \leftarrow sat_{32}( \tilde{a}_k - v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Accumulator \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • b_shr[in] Signed arithmetic right-shift applied to elements of \(\bar b\)

  • c_shr[in] Signed arithmetic right-shift applied to elements of \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_s32_rect(int32_t a[], const int32_t b[], const unsigned length)

Rectify the elements of a 32-bit vector.

a[] and b[] represent the 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \begin{cases} b_k & b_k > 0 \\ 0 & b_k \leq 0 \end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_s32_scale(int32_t a[], const int32_t b[], const unsigned length, const int32_t c, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply a 32-bit vector by a scalar.

a[] and b[]represent the 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

c is the 32-bit scalar \(c\) by which each element of \(\bar b\) is multiplied.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and to \(c\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}(round(c \cdot b_k' \cdot 2^{-30})) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr + 30\).

The function vect_s32_scale_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • c[in] Scalar to be multiplied by elements of \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift applied to \(c\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

void vect_s32_set(int32_t a[], const int32_t b, const unsigned length)

Set all elements of a 32-bit vector to the specified value.

a[] represents the 32-bit output vector \(\bar a\). a[] must begin at a word-aligned address.

b is the new value to set each element of \(\bar a\) to.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow b \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(b\) is the mantissa of floating-point value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] New value for the elements of \(\bar a\)

  • length[in] Number of elements in \(\bar a\)

Throws ET_LOAD_STORE:

Raised if `a` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_s32_shl(int32_t a[], const int32_t b[], const unsigned length, const left_shift_t b_shl)

Left-shift the elements of a 32-bit vector by a specified number of bits.

a[] and b[] represent the 32-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in vectors \(\bar a\) and \(\bar b\).

b_shl is the signed arithmetic left-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shl[in] Arithmetic left-shift applied to elements of \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

headroom_t vect_s32_shr(int32_t a[], const int32_t b[], const unsigned length, const right_shift_t b_shr)

Right-shift the elements of a 32-bit vector by a specified number of bits.

a[] and b[] represent the 32-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in vectors \(\bar a\) and \(\bar b\).

b_shr is the signed arithmetic right-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{-b\_shr}\) and \(a\_exp = b\_exp\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Arithmetic right-shift applied to elements of \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

headroom_t vect_s32_sqrt(int32_t a[], const int32_t b[], const unsigned length, const right_shift_t b_shr, const unsigned depth)

Compute the square root of elements of a 32-bit vector.

a[] and b[] represent the 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each vector must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

depth is the number of most significant bits to calculate of each \(a_k\). For example, a depth value of 8 will only compute the 8 most significant byte of the result, with the remaining 3 bytes as 0. The maximum value for this parameter is VECT_SQRT_S32_MAX_DEPTH (31). The time cost of this operation is approximately proportional to the number of bits computed.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow \sqrt{ b_k' } \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \\ & \qquad\text{ where } sqrt() \text{ computes the first } depth \text{ bits of the square root.} \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = (b\_exp + b\_shr - 30)/2\).

Note that because exponents must be integers, that means \(b\_exp + b\_shr\) must be even.

The function vect_s32_sqrt_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • depth[in] Number of bits of each output value to compute

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

headroom_t vect_s32_sub(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Subtract one 32-bit vector from another.

a[], b[] and c[] represent the 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' = sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' = sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}\!\left( b_k' - c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_s32_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and * \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\)

int64_t vect_s32_sum(const int32_t b[], const unsigned length)

Sum the elements of a 32-bit vector.

b[] represents the 32-bit mantissa vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{aligned} a \leftarrow \sum_{k=0}^{length-1} b_k \end{aligned}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 64-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Additional Details

Internally, each element accumulates into one of eight 40-bit accumulators (which are all used simultaneously) which apply symmetric 40-bit saturation logic (with bounds \(\approx 2^{39}\)) with each value added. The saturating arithmetic employed is not associative and no indication is given if saturation occurs at an intermediate step. To avoid the possibility of saturation errors, length should be no greater than \(2^{11+b\_hr}\), where \(b\_hr\) is the headroom of \(\bar b\).

If the caller’s mantissa vector is longer than that, the full result can be found by calling this function multiple times for partial results on sub-sequences of the input, and adding the results in user code.

In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are application-specific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.

Parameters:
  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vector \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

Returns:

64-bit mantissa of the sum, \(a\).

void vect_s32_zip(complex_s32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Interleave the elements of two vectors into a single vector.

Elements of 32-bit input vectors \(\bar b\) and \(\bar c\) are interleaved into 32-bit output vector \(\bar a\). Each element of \(\bar b\) has a right-shift of \(b\_shr\) applied, and each element of \(\bar c\) has a right-shift of \(c\_shr\) applied.

Alternatively (and equivalently), this function can be conceived of as taking two real vectors \(\bar b\) and \(\bar c\) and forming a new complex vector \(\bar a\) where \(\bar{a} = \bar{b} + i\cdot\bar{c}\).

If vectors \(\bar b\) and \(\bar c\) each have \(N\) elements, then the resulting \(\bar a\) will have either \(2N\) int32_t elements or (equivalently) \(N\) complex_s32_t elements (and must have space for such).

Each element \(b_k\) of \(\bar b\) will end up as end up as element \(a_{2k}\) of \(\bar a\) (with the bit-shift applied). Each element \(c_k\) will end up as element \(a_{2k+1}\) of \(\bar a\).

a[] is the output vector \(\bar a\).

b[] and c[] are the input vectors \(\bar b\) and \(\bar c\) respectively.

a, b and c must each begin at a double word-aligned (8 byte) address. (see DWORD_ALIGNED).

length is the number \(N\) of int32_t elements in \(\bar b\) and \(\bar c\).

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

c_shr is the signed arithmetic right-shift applied to elements of \(\bar c\).

Operation Performed

\[\begin{split}\begin{aligned} & Re{a_{k}} \leftarrow sat_{32}( b_k \cdot 2^{-b\_shr} \\ & Im{a_{k}} \leftarrow sat_{32}( c_k \cdot 2^{-c\_shr} \\ & \qquad\text{ for }k\in 0\ ...\ (N-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements \(N\) in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Signed arithmetic right-shift applied to elements of \(\bar b\)

  • c_shr[in] Signed arithmetic right-shift applied to elements of \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not double word-aligned (See Note: Vector Alignment)

void vect_s32_unzip(int32_t a[], int32_t b[], const complex_s32_t c[], const unsigned length)

Deinterleave the real and imaginary parts of a complex 32-bit vector into two separate vectors.

Complex 32-bit input vector \(\bar c\) has its real and imaginary parts (which correspond to the even and odd-indexed elements, if reinterpreted as an int32_t array) split apart to create real 32-bit output vectors \(\bar a\) and \(\bar b\), such that \(\bar{a} = Re{\bar{c}}\) and \(\bar{b} = Im{\bar{c}}\).

a[] and b[] are the real output vectors \(\bar a\) and \(\bar b\) which receive the real and imaginary parts respectively of \(\bar c\). a and b must each begin at a word-aligned address.

c[] is the complex input vector \(\bar c\). c must begin at a double word-aligned address.

length is the number \(N\) of int32_t elements in \(\bar a\) and \(\bar b\) and the number of complex_s32_t in \(\bar c\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k = Re\{c_k\} \\ & b_k = Im\{c_k\} \\ & \qquad\text{ for }k\in 0\ ...\ (N-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[out] Output vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] The number of elements \(N\) in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Throws ET_LOAD_STORE:

Raised if `c` is not double word-aligned (See Note: Vector Alignment)

headroom_t vect_s32_convolve_valid(int32_t y[], const int32_t x[], const int32_t b_q30[], const unsigned x_length, const unsigned b_length)

Convolve a 32-bit vector with a short kernel.

32-bit input vector \(\bar x\) is convolved with a short fixed-point kernel \(\bar b\) to produce 32-bit output vector \(\bar y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar x\). The convolution is “valid” in the sense that no output elements are emitted where the filter taps extend beyond the bounds of the input vector, resulting in an output vector \(\bar y\) with fewer elements.

The maximum filter order \(K\) supported by this function is \(7\).

y[] is the output vector \(\bar y\). If input \(\bar x\) has \(N\) elements, and the filter has \(K\) elements, then \(\bar y\) has \(N-2P\) elements, where \(P = \lfloor K / 2 \rfloor\).

x[] is the input vector \(\bar x\) with length \(N\).

b_q30[] is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixed-point format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{-30}\).

x_length is the length \(N\) of \(\bar x\) in elements.

b_length is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps). b_length must be one of \( \{ 1, 3, 5, 7 \} \).

Operation Performed

\[\begin{split}\begin{aligned} & y_k \leftarrow \sum_{l=0}^{K-1} (x_{(k+l)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{aligned}\end{split}\]

Additional Details

To avoid the possibility of saturating any output elements, \(\bar b\) may be constrained such that \( \sum_{i=0}^{K-1} \left|b_i\right| \leq 2^{30} \).

This operation can be applied safely in-place on x[].

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

  • b_q30[in] Filter coefficient vector \(\bar b\)

  • x_length[in] The number of elements \(N\) in vector \(\bar x\)

  • b_length[in] The number of elements \(K\) in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `x` or `y` or `b_q30` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_s32_convolve_same(int32_t y[], const int32_t x[], const int32_t b_q30[], const unsigned x_length, const unsigned b_length, const pad_mode_e padding_mode)

Convolve a 32-bit vector with a short kernel.

32-bit input vector \(\bar x\) is convolved with a short fixed-point kernel \(\bar b\) to produce 32-bit output vector \(\bar y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar x\). The convolution mode is “same” in that the input vector is effectively padded such that the input and output vectors are the same length. The padding behavior is one of those given by pad_mode_e.

The maximum filter order \(K\) supported by this function is \(7\).

y[] and x[] are the output and input vectors \(\bar y\) and \(\bar x\) respectively.

b_q30[] is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixed-point format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{-30}\).

x_length is the length \(N\) of \(\bar x\) and \(\bar y\) in elements.

b_length is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps). b_length must be one of \( \{ 1, 3, 5, 7 \} \).

padding_mode is one of the values from the pad_mode_e enumeration. The padding mode indicates the filter input values for filter taps that have extended beyond the bounds of the input vector \(\bar x\). See pad_mode_e for a list of supported padding modes and associated behaviors.

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{x}_i = \begin{cases} \text{determined by padding mode} & i < 0 \\ \text{determined by padding mode} & i \ge N \\ x_i & otherwise \end{cases} \\ & y_k \leftarrow \sum_{l=0}^{K-1} (\tilde{x}_{(k+l-P)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{aligned}\end{split}\]

Additional Details

To avoid the possibility of saturating any output elements, \(\bar b\) may be constrained such that \( \sum_{i=0}^{K-1} \left|b_i\right| \leq 2^{30} \).

Note

Unlike vect_s32_convolve_valid(), this operation cannot be performed safely in-place on x[]

Parameters:
  • y[out] Output vector \(\bar y\)

  • x[in] Input vector \(\bar x\)

  • b_q30[in] Filter coefficient vector \(\bar b\)

  • x_length[in] The number of elements \(N\) in vector \(\bar x\)

  • b_length[in] The number of elements \(K\) in \(\bar b\)

  • padding_mode[in] The padding mode to be applied at signal boundaries

Throws ET_LOAD_STORE:

Raised if `x` or `y` or `b_q30` is not word-aligned (See Note: Vector Alignment)

void vect_s32_merge_accs(int32_t a[], const split_acc_s32_t b[], const unsigned length)

Merge a vector of split 32-bit accumulators into a vector of int32_t’s.

Convert a vector of split_acc_s32_t into a vector of int32_t. This is useful when a function (e.g. mat_mul_s8_x_s8_yield_s32) outputs a vector of accumulators in the XS3 VPU’s native split 32-bit format, which has the upper half of each accumulator in the first 32 bytes and the lower half in the following 32 bytes.

This function is most efficient (in terms of cycles/accumulator) when length is a multiple of

  1. In any case, length will be rounded up such that a multiple of 16 accumulators will always be merged.

This function can safely merge accumulators in-place.

Parameters:
  • a[out] Output vector of int32_t

  • b[in] Input vector of split_acc_s32_t

  • length[in] Number of accumulators to merge

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not word-aligned (See Note: Vector Alignment)

void vect_s32_split_accs(split_acc_s32_t a[], const int32_t b[], const unsigned length)

Split a vector of int32_t’s into a vector of split_acc_s32_t.

Convert a vector of int32_t into a vector of split_acc_s32_t, the native format for the XS3 VPU’s 32-bit accumulators. This is useful when a function (e.g. mat_mul_s8_x_s8_yield_s32) takes in a vector of accumulators in that native format.

This function is most efficient (in terms of cycles/accumulator) when length is a multiple of

  1. In any case, length will be rounded up such that a multiple of 16 accumulators will always be split.

This function can safely split accumulators in-place.

Parameters:
  • a[out] Output vector of split_acc_s32_t

  • b[in] Input vector of int32_t

  • length[in] Number of accumulators to split

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not word-aligned (See Note: Vector Alignment)

void vect_split_acc_s32_shr(split_acc_s32_t a[], const unsigned length, const right_shift_t shr)

Apply a right-shift to the elements of a 32-bit split accumulator vector.

This function may be used in conjunction with chunk_s16_accumulate() or bfp_s16_accumulate() to avoid saturation of accumulators.

This function updates \(\bar a\) in-place.

Parameters:
  • a[inout] Accumulator vector \(\bar a\)

  • length[in] Number of elements of \(\bar a\)

  • shr[in] Number of bits to right-shift the elements of \(\bar a\)

Throws ET_LOAD_STORE:

Raised if `a` is not double-word-aligned (See Note: Vector Alignment)

void vect_q30_power_series(int32_t a[], const q2_30 b[], const int32_t c[], const unsigned term_count, const unsigned length)

Compute a power series sum on a vector of Q2.30 values.

This function is used to compute a power series summation on a vector \(\bar b\). \(\bar b\) contains Q2.30 values. \(\bar c\) is a vector containing coefficients to be multiplied by powers of \(\bar b\), and may have any associated exponent. The output is vector \(\bar a\) and has the same exponent as \(\bar c\).

c[] is an array with shape (term_count, VPU_INT32_EPV), where the second axis contains the same value replicated across all VPU_INT32_EPV elements. That is, c[k][i] = c[k][j] for i and j in 0..(VPU_INT32_EPV-1). This is for performance reasons. (For the purpose of this explanation, \(\bar c\) is considered to be single-dimensional, without redundancy.)

Operation Performed

\[\begin{split}\begin{aligned} & b_{k,0} = 2^{30} \\ & b_{k,i} = round\left(\frac{b_{k,i-1}\cdot{}b_k}{2^{30}}\right) \\ & \qquad\text{for }i \in {1..(N-1)} \\ & a_k \leftarrow \sum_{i=0}^{N-1} round\left( \frac{b_{k,i}\cdot c_i}{2^{30}} \right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Coefficient vector \(\bar c\)

  • term_count[in] Number of power series terms, \(N\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

void vect_float_s32_log_base(q8_24 a[], const float_s32_t b[], const q2_30 inv_ln_base_q30, const unsigned length)

Compute the logarithm (in the specified base) of a vector of float_s32_t.

This function computes the logarithm of a vector \(\bar b\) of float_s32_t values. The base of the computed logarithm is given by parameter inv_ln_base_q30. The result is written to output \(\bar a\), a vector of Q8.24 values.

If the desired base is \(D\), then inv_ln_base_q30, represented here by \(R\), should be \(\mathtt{Q30}\left(\frac{1}{ln\left(D\right)}\right)\). That is: the inverse of the natural logarithm of the desired base, expressed as a Q2.30 value. Typically the desired base is known at compile time, so this value will usually be a precomputed constant.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow log_{D}\left(b_k\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • inv_ln_base_q30[in] Coefficient \(R\) converting from natural log to desired base \(D\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_float_s32_log(q8_24 a[], const float_s32_t b[], const unsigned length)

Compute the natural logarithm of a vector of float_s32_t.

This function computes the natural logarithm of a vector \(\bar b\) of float_s32_t values. The result is written to output \(\bar a\), a vector of Q8.24 values.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow ln\left(b_k\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_float_s32_log2(q8_24 a[], const float_s32_t b[], const unsigned length)

Compute the base 2 logarithm of a vector of float_s32_t.

This function computes the base 2 logarithm of a vector \(\bar b\) of float_s32_t values. The result is written to output \(\bar a\), a vector of Q8.24 values.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow log_2\left(b_k\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_float_s32_log10(q8_24 a[], const float_s32_t b[], const unsigned length)

Compute the base 10 logarithm of a vector of float_s32_t.

This function computes the base 10 logarithm of a vector \(\bar b\) of float_s32_t values. The result is written to output \(\bar a\), a vector of Q8.24 values.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow log_{10}\left(b_k\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_s32_log_base(q8_24 a[], const int32_t b[], const exponent_t b_exp, const q2_30 inv_ln_base_q30, const unsigned length)

Compute the logarithm (in the specified base) of a block floating-point vector.

This function computes the logarithm of the block floating-point vector \(\bar{b}\cdot 2^{b\_exp}\). The base of the computed logarithm is given by parameter inv_ln_base_q30. The result is written to output \(\bar a\), a vector of Q8.24 values.

If the desired base is \(D\), then inv_ln_base_q30, represented here by \(R\), should be \(\mathtt{Q30}\left(\frac{1}{ln\left(D\right)}\right)\). That is: the inverse of the natural logarithm of the desired base, expressed as a Q2.30 value. Typically the desired base is known at compile time, so this value will usually be a precomputed constant.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow log_{D}\left(b_k\cdot 2^{b\_exp}\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input mantissa vector \(\bar b\)

  • b_exp[in] Exponent associated with \(\bar b\)

  • inv_ln_base_q30[in] Coefficient \(R\) converting from natural log to desired base \(D\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_s32_log(q8_24 a[], const int32_t b[], const exponent_t b_exp, const unsigned length)

Compute the natural logarithm of a block floating-point vector.

This function computes the natural logarithm of the block floating-point vector \(\bar{b}\cdot 2^{b\_exp}\). The result is written to output \(\bar a\), a vector of Q8.24 values.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow ln\left(b_k\cdot 2^{b\_exp}\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input mantissa vector \(\bar b\)

  • b_exp[in] Exponent associated with \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_s32_log2(q8_24 a[], const int32_t b[], const exponent_t b_exp, const unsigned length)

Compute the base 2 logarithm of a block floating-point vector.

This function computes the base 2 logarithm of the block floating-point vector \(\bar{b}\cdot 2^{b\_exp}\). The result is written to output \(\bar a\), a vector of Q8.24 values.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow log_2\left(b_k\cdot 2^{b\_exp}\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input mantissa vector \(\bar b\)

  • b_exp[in] Exponent associated with \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_s32_log10(q8_24 a[], const int32_t b[], const exponent_t b_exp, const unsigned length)

Compute the base 10 logarithm of a block floating-point vector.

This function computes the base 10 logarithm of the block floating-point vector \(\bar{b}\cdot 2^{b\_exp}\). The result is written to output \(\bar a\), a vector of Q8.24 values.

The resulting \(a_k\) for \(b_k \le 0\) is undefined.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow log_{10}\left(b_k\cdot 2^{b\_exp}\right) \\ & \qquad\text{for }k \in {0..\mathtt{length}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output Q8.24 vector \(\bar a\)

  • b[in] Input mantissa vector \(\bar b\)

  • b_exp[in] Exponent associated with \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void vect_q30_exp_small(q2_30 a[], const q2_30 b[], const unsigned length)

Compute \(e^x\) for Q2.30 value near \(0\).

This function computes \(e^{b_k \cdot 2^{-30}}\) for each \(b_k\) in input vector \(\bar b\). The results are placed in output vector \(\bar a\) as Q2.30 values.

This function is meant to compute \(e^x\) for values of \(x\) in the interval \( \left[-0.5, 0.5\right] \). The error grows quickly outside of this range.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \frac{ e^{b_k \cdot 2^{-30}} }{ 2^{30} } \\ & \qquad\text{for }k \in {0..(\mathtt{length}-1)} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

void vect_s32_to_vect_s16(int16_t a[], const int32_t b[], const unsigned length, const right_shift_t b_shr)

Convert a 32-bit vector to a 16-bit vector.

This function converts a 32-bit mantissa vector \(\bar b\) into a 16-bit mantissa vector \(\bar a\). Conceptually, the output BFP vector \(\bar{a}\cdot 2^{a\_exp}\) represents the same values as the input BFP vector \(\bar{b}\cdot 2^{b\_exp}\), only with a reduced bit-depth.

In most cases \(b\_shr\) should be \(16 - b\_hr\), where \(b\_hr\) is the headroom of the 32-bit input mantissa vector \(\bar b\).

The output exponent \(a\_exp\) will be given by

\( a\_exp = b\_exp + b\_shr \)

Parameter Details

a[] represents the 16-bit output mantissa vector \(\bar a\).

b[] represents the 32-bit input mantissa vector \(\bar b\).

a[] and b[] must each begin at a word-aligned address.

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the 32-bit mantissas of a BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$32-bit IEEE 754 float API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_f32.html#bit-ieee-754-float-api
group vect_f32_api

Functions

complex_float_t *fft_f32_forward(float x[], const unsigned fft_length)

Perform forward FFT on a vector of IEEE754 floats.

This function takes real input vector \(\bar x\) and performs a forward FFT on the signal in-place to get output vector \(\bar{X} = FFT{\bar{x}}\). This implementation is accelerated by converting the IEEE754 float vector into a block floating-point representation to compute the FFT. The resulting BFP spectrum is then converted back to IEEE754 single-precision floats. The operation is performed in-place on x[].

See bfp_fft_forward_mono() for the details of the FFT.

Whereas the input x[] is an array of fft_length float elements, the output (placed in x[]) is an array of fft_length/2 complex_float_t elements, so the input should be cast after calling this.

const unsigned FFT_N = 512
float time_series[FFT_N] = { ... };
fft_f32_forward(time_series, FFT_N);
complex_float_t* freq_spectrum = (complex_float_t*) &time_series[0];
const unsigned FREQ_BINS = FFT_N/2;
// e.g.   freq_spectrum[FREQ_BINS-1].re

x[] must begin at a double-word-aligned address.

Operation Performed

\[\begin{aligned} & \bar{X} \leftarrow FFT{\bar{x}} \end{aligned}\]

Parameters:
  • x[inout] Input vector \(\bar x\)

  • fft_length[in] The length of \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` is not double-word-aligned (See Note: Vector Alignment)

Returns:

Pointer to frequency-domain spectrum (i.e. ((complex_float_t*) &x[0]))

float *fft_f32_inverse(complex_float_t X[], const unsigned fft_length)

Perform inverse FFT on a vector of complex_float_t.

This function takes complex input vector \(\bar X\) and performs an inverse real FFT on the spectrum in-place to get output vector \(\bar{x} = IFFT{\bar{X}}\). This implementation is accelerated by converting the IEEE754 float vector into a block floating-point representation to compute the IFFT. The resulting BFP signal is then converted back to IEEE754 single-precision floats. The operation is performed in-place on X[].

See bfp_fft_inverse_mono() for the details of the IFFT.

Input X[] is an array of fft_length/2 complex_float_t elements. The output (placed in X[]) is an array of fft_length float elements.

const unsigned FFT_N = 512
complex_float_t freq_spectrum[FFT_N/2] = { ... };
fft_f32_inverse(freq_spectrum, FFT_N);
float* time_series = (float*) &freq_spectrum[0];

X[] must begin at a double-word-aligned address.

Parameters:
  • X[inout] Input vector \(\bar X\)

  • fft_length[in] The FFT length. Twice the element count of \(\bar X\).

Throws ET_LOAD_STORE:

Raised if `X` is not double-word-aligned (See Note: Vector Alignment)

Returns:

Pointer to time-domain signal (i.e. ((float*) &X[0]))

exponent_t vect_f32_max_exponent(const float b[], const unsigned length)

Get the maximum (32-bit BFP) exponent from a vector of IEEE754 floats.

This function is used to determine the BFP exponent to use when converting a vector of IEEE754 single-precision floats into a 32-bit BFP vector.

The exponent returned, if used with vect_f32_to_vect_s32(), is the one which will result in no headroom in the BFP vector &#8212; that is, the minimum permissible exponent for the BFP vector. The minimum permissible exponent is derived from the maximum exponent found in the float elements themselves.

More specifically, the FSEXP instruction is used on each element to determine its exponent. The value returned is the maximum exponent given by the FSEXP instruction plus 30.

b[] must begin at a double-word-aligned address.

Note

If required, when converting to a 32-bit BFP vector, additional headroom can be included by adding the amount of required headroom to the exponent returned by this function.

Parameters:
  • b[in] Input vector of IEEE754 single-precision floats \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

Throws ET_LOAD_STORE:

Raised if`b` is not double-word-aligned (See Note: Vector Alignment)

Throws ET_ARITHMETIC:

Raised if Any element of `b` is infinite or not-a-number.

Returns:

Exponent used for converting to 32-bit BFP vector.

void vect_f32_to_vect_s32(int32_t a[], const float b[], const unsigned length, const exponent_t a_exp)

Convert a vector of IEEE754 single-precision floats into a 32-bit BFP vector.

This function converts a vector of IEEE754 single-precision floats \(\bar b\) into the mantissa vector \(\bar a\) of a 32-bit BFP vector, given BFP vector exponent \(a\_exp\). Conceptually, the elements of output vector \(\bar{a} \cdot 2^{a\_exp}\) represent the same values as those of the input vector.

Because the output exponent \(a\_exp\) is shared by all elements of the output vector, even though the output vector has 32-bit mantissas, precision may be lost on some elements if the exponents of the input elements \(b_k\) span a wide range.

The function vect_f32_max_exponent() can be used to determine the value for \(a\_exp\) which minimizes headroom of the output vector.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow round(\frac{b_k}{2^{b\_exp}}) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameter Details

a[] represents the 32-bit output mantissa vector \(\bar a\).

b[] represents the IEEE754 float input vector \(\bar b\).

a[] and b[] must each begin at a double-word-aligned address.

b[] can be safely updated in-place.

length is the number of elements in each of the vectors.

a_exp is the exponent associated with the output vector \(\bar a\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • a_exp[in] Exponent \(a\_exp\) of output vector \(\bar a\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not double-word-aligned (See Note: Vector Alignment)

Throws ET_ARITHMETIC:

Raised if Any element of `b` is infinite or not-a-number.

float vect_f32_dot(const float b[], const float c[], const unsigned length)

Compute the inner product of two IEEE754 float vectors.

This function takes two vectors of IEEE754 single-precision floats and computes their inner product &#8212; the sum of the elementwise products. The FMACC instruction is used, granting full precision in the addition.

The inner product \(a\) is returned.

Operation Performed

\[\begin{aligned} & a \leftarrow \sum_{k=0}^{length-1} ( b_k \cdot c_k ) \end{aligned}\]

Parameters:
  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar b\) and \(\bar c\)

Returns:

The inner product

void vect_f32_add(float a[], const float b[], const float c[], const unsigned length)

Adds together two IEEE754 float vectors.

This function takes two vectors of IEEE754 single-precision floats and computes the element-wise sum of the two vectors.

a[] is the output vector \(\bar a\) into which results are placed.

b[] and c[] are the input vectors \(\bar b\) and \(\bar c\) respectively.

a, b and c each must begin at a double-word-aligned address.

This operation can be performed safely in-place on b[] or c[].

Operation Performed

\[\begin{split}\begin{aligned} & a_k \gets b_k + c_k \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not double-word-aligned (See Note: Vector Alignment)

void vect_complex_f32_add(complex_float_t a[], const complex_float_t b[], const complex_float_t c[], const unsigned length)

Adds together two complex IEEE754 float vectors.

This function takes two vectors \(\bar b\) and \(\bar c\) of complex IEEE754 single-precision floats and computes the element-wise sum of the two vectors.

a[] is the output vector \(\bar a\) into which results are placed.

b[] and c[] are the complex input vectors \(\bar b\) and \(\bar c\) respectively.

a, b and c each must begin at a double-word-aligned address.

This operation can be performed safely in-place on b[] or c[].

Operation Performed

\[\begin{split}\begin{aligned} & a_k \gets b_k + c_k \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not double-word-aligned (See Note: Vector Alignment)

void vect_complex_f32_mul(complex_float_t a[], const complex_float_t b[], const complex_float_t c[], const unsigned length)

Multiplies together two complex IEEE754 float vectors.

This function takes two complex float vectors \(\bar b\) and \(\bar c\) as inputs. Each output element \(a_k\) is computed as \(b_k\) multiplied by \(c_k\) (using complex multiplication).

a[] is the output vector \(\bar a\) into which results are placed.

b[] and c[] are the complex input vectors \(\bar b\) and \(\bar c\) respectively.

a, b and c each must begin at a double-word-aligned address.

This operation can be performed safely in-place on b[] or c[].

Operation Performed

\[\begin{split}\begin{aligned} & a_k \gets b_k \cdot c_k \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not double-word-aligned (See Note: Vector Alignment)

void vect_complex_f32_conj_mul(complex_float_t a[], const complex_float_t b[], const complex_float_t c[], const unsigned length)

Conjugate multiplies together two complex IEEE754 float vectors.

This function takes two complex float vectors \(\bar b\) and \(\bar c\) as inputs. Each output element \(a_k\) is computed as \(b_k\) multiplied by the complex conjugate of \(c_k\) (using complex multiplication).

a[] is the output vector \(\bar a\) into which results are placed.

b[] and c[] are the complex input vectors \(\bar b\) and \(\bar c\) respectively.

a, b and c each must begin at a double-word-aligned address.

This operation can be performed safely in-place on b[] or c[].

Operation Performed

\[\begin{split}\begin{aligned} & a_k \gets b_k \cdot (c_k^*) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not double-word-aligned (See Note: Vector Alignment)

void vect_complex_f32_macc(complex_float_t a[], const complex_float_t b[], const complex_float_t c[], const unsigned length)

Adds the product of two complex IEEE754 float vectors to a third float vector.

This function takes three complex float vectors \(\bar a\), \(\bar b\) and \(\bar c\) as inputs. Each output element \(a_k\) is computed as input \(a_k\) plus \(b_k\) multiplied by \(c_k\).

a[] is accumulator vector \(\bar a\), serving as both input and output.

b[] and c[] are the complex input vectors \(\bar b\) and \(\bar c\) respectively.

a, b and c each must begin at a double-word-aligned address.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \gets a_k + b_k \cdot c_k \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[inout] Input/Output accumulator vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not double-word-aligned (See Note: Vector Alignment)

void vect_complex_f32_conj_macc(complex_float_t a[], const complex_float_t b[], const complex_float_t c[], const unsigned length)

Adds the product of two complex IEEE754 float vectors to a third float vector.

This function takes three complex float vectors \(\bar a\), \(\bar b\) and \(\bar c\) as inputs. Each output element \(a_k\) is computed as input \(a_k\) plus \(b_k\) multiplied by the complex conjugate of \(c_k\).

a[] is accumulator vector \(\bar a\), serving as both input and output.

b[] and c[] are the complex input vectors \(\bar b\) and \(\bar c\) respectively.

a, b and c each must begin at a double-word-aligned address.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \gets a_k + b_k \cdot (c_k^*) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[inout] Input/Output accumulator vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • c[in] Input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not double-word-aligned (See Note: Vector Alignment)

void vect_s32_to_vect_f32(float a[], const int32_t b[], const unsigned length, const exponent_t b_exp)

Convert a 32-bit BFP vector into a vector of IEEE754 single-precision floats.

This function converts a 32-bit mantissa vector and exponent \(\bar b \cdot 2^{b\_exp}\) into a vector of 32-bit IEEE754 single-precision floating-point elements \(\bar a\). Conceptually, the elements of output vector \(\bar a\) represent the same values as those of the input vector.

Because IEEE754 single-precision floats hold fewer mantissa bits, this operation may result in a loss of precision for some elements.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow b_k \cdot 2^{b\_exp} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Parameter Details

a[] represents the output IEEE754 float vector \(\bar a\).

b[] represents the 32-bit input mantissa vector \(\bar b\).

a[] and b[] must each begin at a double-word-aligned address.

b[] can be safely updated in-place.

length is the number of elements in each of the vectors.

b_exp is the exponent associated with the input vector \(\bar b\).

Parameters:
  • a[out] Output vector \(\bar a\)

  • b[in] Input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_exp[in] Exponent \(b\_exp\) of input vector \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not double-word-aligned (See Note: Vector Alignment)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$Complex 16-bit vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s16.html#complex-16-bit-vector-api
group vect_complex_s16_api

Functions

headroom_t vect_complex_s16_add(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Add one complex 16-bit vector to another.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[], b_imag[], c_real[] and c_imag[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the complex 16-bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_complex_s16_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift applied to \(\bar b\)

  • c_shr[in] Right-shift applied to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real`, `b_imag`, `c_real` or `c_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\).

headroom_t vect_complex_s16_add_scalar(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const complex_s16_t c, const unsigned length, const right_shift_t b_shr)

Add a scalar to a complex 16-bit vector.

a[] and b[]represent the complex 16-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

c is the complex scalar \(c\)to be added to each element of \(\bar b\).

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If elements of \(\bar b\) are the complex mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp}\), and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_complex_s16_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Note that \(c\_shr\) is an output of vect_complex_s16_add_scalar_prepare(), but is not a parameter to this function. The \(c\_shr\) produced by vect_complex_s16_add_scalar_prepare() is to be applied by the user, and the result passed as input c.

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c[in] Complex input scalar \(c\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift applied to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\).

headroom_t vect_complex_s16_conj_mul(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t a_shr)

Multiply one complex 16-bit vector element-wise by the complex conjugate of another.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[], b_imag[], c_real[] and c_imag[].

length is the number of elements in each of the vectors.

a_shr is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.

Operation Performed

\[\begin{split}\begin{aligned} & v_k = \leftarrow Re\{b_k\} \cdot Re\{c_k\} + Im\{b_k\} \cdot Im\{c_k\} \\ & s_k = \leftarrow Im\{b_k\} \cdot Re\{c_k\} - Re\{b_k\} \cdot Im\{c_k\} \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \(\bar{b} \cdot 2^{b\_exp}\) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_complex_s16_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • a_shr[in] Right-shift applied to 32-bit intermediate results.

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real`, `b_imag`, `c_real` or `c_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_headroom(const int16_t b_real[], const int16_t b_imag[], const unsigned length)

Calculate the headroom of a complex 16-bit array.

The headroom of an N-bit integer is the number of bits that the integer’s value may be left-shifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.

The headroom of a complex_s16_t struct is the minimum of the headroom of each of its 16-bit fields, re and im.

The headroom of a complex_s16_t array is the minimum of the headroom of each of its complex_s16_t elements.

This function efficiently traverses the elements of \(\bar x\) to determine its headroom.

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\).

length is the number of elements in b_real[] and b_imag[].

Operation Performed

\[\begin{aligned} min\!\{ HR_{16}\left(x_0\right), HR_{16}\left(x_1\right), ..., HR_{16}\left(x_{length-1}\right) \} \end{aligned}\]

Parameters:
  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • length[in] Number of elements in \(\bar x\)

Returns:

Headroom of vector \(\bar x\)

headroom_t vect_complex_s16_mag(int16_t a[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const right_shift_t b_shr, const int16_t *rot_table, const unsigned table_rows)

Compute the magnitude of each element of a complex 16-bit vector.

a[] represents the real 16-bit output mantissa vector \(\bar a\).

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[] or b_imag[].

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

rot_table must point to a pre-computed table of complex vectors used in calculating the magnitudes. table_rows is the number of rows in the table. This library is distributed with a default version of the required rotation table. The following symbols can be used to refer to it in user code:

const extern unsigned rot_table16_rows;
const extern complex_s16_t rot_table16[30][4];

Faster computation (with reduced precision) can be achieved by generating a smaller version of the table. A python script is provided to generate this table.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow b_k \cdot 2^{-b\_shr} \\ & a_k \leftarrow \sqrt { {\left( Re\{v_k\} \right)}^2 + {\left( Im\{v_k\} \right)}^2 } \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).

The function vect_complex_s16_mag_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).

Parameters:
  • a[out] Real output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imag part of complex input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • rot_table[in] Pre-computed rotation table required for calculating magnitudes

  • table_rows[in] Number of rows in rot_table

Throws ET_LOAD_STORE:

Raised if `a`, `b_real` or `b_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_complex_s16_macc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)

Multiply one complex 16-bit vector element-wise by another, and add the result to an accumulator.

acc_real[] and acc_imag[] together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is acc_real[k], and each \(Im\{a_k\}\) is acc_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr is the signed arithmetic right-shift applied to the accumulators \(a_k\).

bc_sat is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before being added to the accumulator.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} - Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} + Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} + round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} + round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc_real[inout] Real part of complex accumulator \(\bar a\)

  • acc_imag[inout] Imaginary aprt of complex accumulator \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • bc_sat[in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)

Throws ET_LOAD_STORE:

Raised if `acc_real`, `acc_imag`, `b_real`, `b_imag`, `c_real` or c_imag is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_nmacc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)

Multiply one complex 16-bit vector element-wise by another, and subtract the result from an accumulator.

acc_real[] and acc_imag[] together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is acc_real[k], and each \(Im\{a_k\}\) is acc_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr is the signed arithmetic right-shift applied to the accumulators \(a_k\).

bc_sat is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before being subtracted from the accumulator.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} - Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} + Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} - round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} - round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_nmacc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc_real[inout] Real part of complex accumulator \(\bar a\)

  • acc_imag[inout] Imaginary aprt of complex accumulator \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • bc_sat[in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)

Throws ET_LOAD_STORE:

Raised if `acc_real`, `acc_imag`, `b_real`, `b_imag`, `c_real` or c_imag is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_conj_macc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)

Multiply one complex 16-bit vector element-wise by the complex conjugate of another, and add the result to an accumulator.

acc_real[] and acc_imag[] together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is acc_real[k], and each \(Im\{a_k\}\) is acc_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr is the signed arithmetic right-shift applied to the accumulators \(a_k\).

bc_sat is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k^*\) before being added to the accumulator.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} + Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} - Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} + round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} + round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc_real[inout] Real part of complex accumulator \(\bar a\)

  • acc_imag[inout] Imaginary aprt of complex accumulator \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • bc_sat[in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k^*\)

Throws ET_LOAD_STORE:

Raised if `acc_real`, `acc_imag`, `b_real`, `b_imag`, `c_real` or c_imag is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_conj_nmacc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)

Multiply one complex 16-bit vector element-wise by the complex conjugate of another, and subtract the result from an accumulator.

acc_real[] and acc_imag[] together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is acc_real[k], and each \(Im\{a_k\}\) is acc_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr is the signed arithmetic right-shift applied to the accumulators \(a_k\).

bc_sat is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k^*\) before being subtracted from the accumulator.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} + Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} - Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} - round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} - round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).

The function vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc_real[inout] Real part of complex accumulator \(\bar a\)

  • acc_imag[inout] Imaginary aprt of complex accumulator \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • bc_sat[in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k^*\)

Throws ET_LOAD_STORE:

Raised if `acc_real`, `acc_imag`, `b_real`, `b_imag`, `c_real` or c_imag is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_mul(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t a_shr)

Multiply one complex 16-bit vector element-wise by another.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[], b_imag[], c_real[] and c_imag[].

length is the number of elements in each of the vectors.

a_shr is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding intermediate results.

Operation Performed

\[\begin{split}\begin{aligned} & v_k = \leftarrow Re\{b_k\} \cdot Re\{c_k\} - Im\{b_k\} \cdot Im\{c_k\} \\ & s_k = \leftarrow Im\{b_k\} \cdot Re\{c_k\} + Re\{b_k\} \cdot Im\{c_k\} \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \(\bar{b} \cdot 2^{b\_exp}\) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_complex_s16_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • a_shr[in] Right-shift applied to 32-bit intermediate results.

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real`, `b_imag`, `c_real` or `c_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_real_mul(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const unsigned length, const right_shift_t a_shr)

Multiply a complex 16-bit vector element-wise by a real 16-bit vector.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] represents the real 16-bit input mantissa vector \(\bar c\).

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[], b_imag[] and c_real[].

length is the number of elements in each of the vectors.

a_shr is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.

Operation Performed

\[\begin{split}\begin{aligned} & v_k = \leftarrow Re\{b_k\} \cdot c_k \\ & s_k = \leftarrow Im\{b_k\} \cdot c_k \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_s16_real_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • a_shr[in] Right-shift applied to 32-bit intermediate results.

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real`, `b_imag` or `c_real` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_complex_s16_real_scale(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c, const unsigned length, const right_shift_t a_shr)

Multiply a complex 16-bit vector by a real scalar.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[] and b_imag[].

c is the real 16-bit input mantissa \(c\).

length is the number of elements in each of the vectors.

a_shr is an unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.

Operation Performed

\[\begin{split}\begin{aligned} & v_k = \leftarrow Re\{b_k\} \cdot c \\ & s_k = \leftarrow Im\{b_k\} \cdot c \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_complex_s16_real_scale_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c[in] Real input scalar \(c\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • a_shr[in] Right-shift applied to 32-bit intermediate results.

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real`, `b_imag` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_complex_s16_scale(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real, const int16_t c_imag, const unsigned length, const right_shift_t a_shr)

Multiply a complex 16-bit vector by a complex 16-bit scalar.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[] and b_imag[].

c_real and c_imag are the real and imaginary parts of the complex 16-bit input mantissa \(c\).

length is the number of elements in each of the vectors.

a_shr is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.

Operation Performed

\[\begin{split}\begin{aligned} & v_k = \leftarrow Re\{b_k\} \cdot Re\{c\} - Im\{b_k\} \cdot Im\{c\} \\ & s_k = \leftarrow Im\{b_k\} \cdot Re\{c\} + Re\{b_k\} \cdot Im\{c\} \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_complex_s16_scale_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input scalar \(c\)

  • c_imag[in] Imaginary part of complex input scalar \(c\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • a_shr[in] Right-shift applied to 32-bit intermediate results

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real`, `b_imag`, `c_real` or `c_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

void vect_complex_s16_set(int16_t a_real[], int16_t a_imag[], const int16_t b_real, const int16_t b_imag, const unsigned length)

Set each element of a complex 16-bit vector to a specified value.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k]. Each must begin at a word-aligned address.

b_real and b_imag are the real and imaginary parts of the complex 16-bit input mantissa \(b\). Each a_real[k] will be set to b_real. Each a_imag[k] will be set to b_imag.

length is the number of elements in a_real[] and a_imag[].

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow Re\{b\} \\ & Im\{a_k\} \leftarrow Im\{b\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(b\) is the mantissa of floating-point value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input scalar \(b\)

  • b_imag[in] Imaginary part of complex input scalar \(b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a_real` or `a_imag` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_complex_s16_shl(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const left_shift_t b_shl)

Left-shift each element of a complex 16-bit vector by a specified number of bits.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[] and b_imag[].

length is the number of elements in \(\bar a\) and \(\bar b\).

b_shl is the signed arithmetic left-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow sat_{16}(\lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{16}(\lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shl[in] Left-shift applied to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real` or `b_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_shr(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const right_shift_t b_shr)

Right-shift each element of a complex 16-bit vector by a specified number of bits.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[] and b_imag[].

length is the number of elements in \(\bar a\) and \(\bar b\).

b_shr is the signed arithmetic right-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow sat_{16}(\lfloor Re\{b_k\} \cdot 2^{-b\_shr} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{16}(\lfloor Im\{b_k\} \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{-b\_shr}\) and \(a\_exp = b\_exp\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift applied to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real` or `b_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s16_squared_mag(int16_t a[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const right_shift_t a_shr)

Get the squared magnitudes of elements of a complex 16-bit vector.

a[] represents the real 16-bit output mantissa vector \(\bar a\).

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

a_shr is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow ((Re\{b_k'\})^2 + (Im\{b_k'\})^2)\cdot 2^{-a\_shr} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = 2 \cdot b\_exp + a\_shr\).

The function vect_complex_s16_squared_mag_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).

Parameters:
  • a[out] Real output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • a_shr[in] Right-shift appled to 32-bit intermediate results

Throws ET_LOAD_STORE:

Raised if `a`, `b_real` or `b_imag` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_complex_s16_sub(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Subtract one complex 16-bit vector from another.

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) is a_real[k], and each \(Im\{a_k\}\) is a_imag[k].

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

c_real[] and c_imag[] together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) is c_real[k], and each \(Im\{c_k\}\) is c_imag[k].

Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs b_real[], b_imag[], c_real[] and c_imag[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} - Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} - Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the complex 16-bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_complex_s16_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\)

  • a_imag[out] Imaginary aprt of complex output vector \(\bar a\)

  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • c_real[in] Real part of complex input vector \(\bar c\)

  • c_imag[in] Imaginary part of complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift applied to \(\bar b\)

  • c_shr[in] Right-shift applied to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag`, `b_real`, `b_imag`, `c_real` or `c_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\).

complex_s32_t vect_complex_s16_sum(const int16_t b_real[], const int16_t b_imag[], const unsigned length)

Get the sum of elements of a complex 16-bit vector.

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\), and must both begin at a word-aligned address. Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

length is the number of elements in \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a\} \leftarrow \sum_{k=0}^{length-1} \left( Re\{b_k\} \right) \\ & Im\{a\} \leftarrow \sum_{k=0}^{length-1} \left( Im\{b_k\} \right) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the complex 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • b_real[in] Real part of complex input vector \(\bar b\)

  • b_imag[in] Imaginary part of complex input vector \(\bar b\)

  • length[in] Number of elements in vector \(\bar b\).

Throws ET_LOAD_STORE:

Raised if `b_real` or `b_imag` is not word-aligned (See Note: Vector Alignment)

Returns:

\(a\), the 32-bit complex sum of elements in \(\bar b\).

void vect_complex_s16_to_vect_complex_s32(complex_s32_t a[], const int16_t b_real[], const int16_t b_imag[], const unsigned length)

Convert a complex 16-bit vector into a complex 32-bit vector.

a[] represents the complex 32-bit output vector \(\bar a\). It must begin at a double word (8-byte) aligned address.

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) is b_real[k], and each \(Im\{b_k\}\) is b_imag[k].

The parameter length is the number of elements in each of the vectors.

length is the number of elements in each of the vectors.

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow Re\{b_k\} \\ & Im\{a_k\} \leftarrow Im\{b_k\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the complex 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Notes

  • The headroom of output vector \(\bar a\) is not returned by this function. The headroom of the output is always 16 bits greater than the headroom of the input.

Parameters:
  • a[out] Complex output vector \(\bar a\).

  • b_real[in] Real part of complex input vector \(\bar b\).

  • b_imag[in] Imaginary part of complex input vector \(\bar b\).

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` is not double-word-aligned (See Note: Vector Alignment)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$Complex 32-bit vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s32.html#complex-32-bit-vector-api
group vect_complex_s32_api

Functions

headroom_t vect_complex_s32_add(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Add one complex 32-bit vector to another.

a[], b[] and c[] represent the complex 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the complex 32-bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_complex_s32_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift applied to \(\bar b\)

  • c_shr[in] Right-shift applied to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\).

headroom_t vect_complex_s32_add_scalar(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c, const unsigned length, const right_shift_t b_shr)

Add a scalar to a complex 32-bit vector.

a[] and b[]represent the complex 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

c is the complex scalar \(c\)to be added to each element of \(\bar b\).

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If elements of \(\bar b\) are the complex mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp}\), and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_complex_s32_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Note that \(c\_shr\) is an output of vect_complex_s32_add_scalar_prepare(), but is not a parameter to this function. The \(c\_shr\) produced by vect_complex_s32_add_scalar_prepare() is to be applied by the user, and the result passed as input c.

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input scalar \(c\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift applied to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\).

headroom_t vect_complex_s32_conj_mul(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one complex 32-bit vector element-wise by the complex conjugate of another.

a[], b[] and c[] represent the 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{b_k'\} \cdot Re\{c_k'\} + Im\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{-30} \\ & Im\{a_k\} \leftarrow \left( Im\{b_k'\} \cdot Re\{c_k'\} - Re\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{-30} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).

The function vect_complex_s32_conj_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift applied to elements of \(\bar b\).

  • c_shr[in] Right-shift applied to elements of \(\bar c\).

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_headroom(const complex_s32_t x[], const unsigned length)

Calculate the headroom of a complex 32-bit array.

The headroom of an N-bit integer is the number of bits that the integer’s value may be left-shifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.

The headroom of a complex_s32_t struct is the minimum of the headroom of each of its 32-bit fields, re and im.

The headroom of a complex_s32_t array is the minimum of the headroom of each of its complex_s32_t elements.

This function efficiently traverses the elements of \(\bar x\) to determine its headroom.

x[] represents the complex 32-bit vector \(\bar x\). x[] must begin at a word-aligned address.

length is the number of elements in x[].

Operation Performed

\[\begin{aligned} min\!\{ HR_{32}\left(x_0\right), HR_{32}\left(x_1\right), ..., HR_{32}\left(x_{length-1}\right) \} \end{aligned}\]

Parameters:
  • x[in] Complex input vector \(\bar x\)

  • length[in] Number of elements in \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of vector \(\bar x\)

headroom_t vect_complex_s32_macc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one complex 32-bit vector element-wise by another, and add the result to an accumulator.

acc[] represents the complex 32-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the complex 32-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr, b_shr and c_shr are the signed arithmetic right-shifts applied to input elements \(a_k\), \(b_k\) and \(c_k\).

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{-b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{-c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{-acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} - Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\} + v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\} + s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).

The function vect_complex_s32_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Complex accumulator \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • b_shr[in] Signed arithmetic right-shift applied to elements of \(\bar b\)

  • c_shr[in] Signed arithmetic right-shift applied to elements of \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_nmacc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one complex 32-bit vector element-wise by another, and subtract the result from an accumulator.

acc[] represents the complex 32-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the complex 32-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr, b_shr and c_shr are the signed arithmetic right-shifts applied to input elements \(a_k\), \(b_k\) and \(c_k\).

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{-b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{-c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{-acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} - Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\} - v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\} - s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).

The function vect_complex_s32_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Complex accumulator \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • b_shr[in] Signed arithmetic right-shift applied to elements of \(\bar b\)

  • c_shr[in] Signed arithmetic right-shift applied to elements of \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_conj_macc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one complex 32-bit vector element-wise by the complex conjugate of another, and add the result to an accumulator.

acc[] represents the complex 32-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the complex 32-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr, b_shr and c_shr are the signed arithmetic right-shifts applied to input elements \(a_k\), \(b_k\) and \(c_k\).

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{-b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{-c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{-acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} - Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\} + v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\} + s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).

The function vect_complex_s32_conj_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Complex accumulator \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • b_shr[in] Signed arithmetic right-shift applied to elements of \(\bar b\)

  • c_shr[in] Signed arithmetic right-shift applied to elements of \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_conj_nmacc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one complex 32-bit vector element-wise by the complex conjugate of another, and subtract the result from an accumulator.

acc[] represents the complex 32-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) is acc[k].

b[] and c[] represent the complex 32-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) is b[k] and each \(c_k\) is c[k].

Each of the input vectors must begin at a word-aligned address.

length is the number of elements in each of the vectors.

acc_shr, b_shr and c_shr are the signed arithmetic right-shifts applied to input elements \(a_k\), \(b_k\) and \(c_k\).

Operation Performed

\[\begin{split}\begin{aligned} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{-b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{-c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{-acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} - Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{-30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\} - v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\} - s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).

For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).

The function vect_complex_s32_conj_nmacc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).

Parameters:
  • acc[inout] Complex accumulator \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • acc_shr[in] Signed arithmetic right-shift applied to accumulator elements.

  • b_shr[in] Signed arithmetic right-shift applied to elements of \(\bar b\)

  • c_shr[in] Signed arithmetic right-shift applied to elements of \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `acc`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_mag(int32_t a[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr, const complex_s32_t *rot_table, const unsigned table_rows)

Compute the magnitude of each element of a complex 32-bit vector.

a[] represents the real 32-bit output mantissa vector \(\bar a\).

b[] represents the complex 32-bit input mantissa vector \(\bar b\).

a[] and b[] must each begin at a word-aligned address.

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

rot_table must point to a pre-computed table of complex vectors used in calculating the magnitudes. table_rows is the number of rows in the table. This library is distributed with a default version of the required rotation table. The following symbols can be used to refer to it in user code:

const extern unsigned rot_table32_rows;
const extern complex_s32_t rot_table32[30][4];

Faster computation (with reduced precision) can be achieved by generating a smaller version of the table. A python script is provided to generate this table.

Todo:

Point to documentation page on generating this table.

Operation Performed

\[\begin{split}\begin{aligned} & v_k \leftarrow b_k \cdot 2^{-b\_shr} \\ & a_k \leftarrow \sqrt { {\left( Re\{v_k\} \right)}^2 + {\left( Im\{v_k\} \right)}^2 } \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).

The function vect_complex_s32_mag_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).

Parameters:
  • a[out] Real output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • rot_table[in] Pre-computed rotation table required for calculating magnitudes

  • table_rows[in] Number of rows in rot_table

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_complex_s32_mul(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply one complex 32-bit vector element-wise by another.

a[], b[] and c[] represent the 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{b_k'\} \cdot Re\{c_k'\} - Im\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{-30} \\ & Im\{a_k\} \leftarrow \left( Im\{b_k'\} \cdot Re\{c_k'\} + Re\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{-30} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).

The function vect_complex_s32_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\), and \(\bar c\)

  • b_shr[in] Right-shift appled to \(\bar b\)

  • c_shr[in] Right-shift appled to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_real_mul(complex_s32_t a[], const complex_s32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply a complex 32-bit vector element-wise by a real 32-bit vector.

a[] and b[] represent the complex 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively.

c[] represents the real 32-bit mantissa vector \(\bar c\).

a[], b[], and c[] each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{b_k'\} \cdot c_k' \right) \cdot 2^{-30} \\ & Im\{a_k\} \leftarrow \left( Im\{b_k'\} \cdot c_k' \right) \cdot 2^{-30} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).

The function vect_complex_s32_real_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\).

  • b[in] Complex input vector \(\bar b\).

  • c[in] Real input vector \(\bar c\).

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\), and \(\bar c\).

  • b_shr[in] Right-shift appled to \(\bar b\).

  • c_shr[in] Right-shift appled to \(\bar c\).

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_complex_s32_real_scale(complex_s32_t a[], const complex_s32_t b[], const int32_t c, const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply a complex 32-bit vector by a real scalar.

a[] and b[] represent the complex 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively.

c represents the real 32-bit scale factor \(c\).

a[] and b[] each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shift applied to each element of \(\bar b\) and to \(c\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} \cdot c \\ & Im\{a_k\} \leftarrow Im\{b_k'\} \cdot c \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).

The function vect_complex_s32_real_scale_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\), and \(\bar c\)

  • b_shr[in] Right-shift applied to \(\bar b\)

  • c_shr[in] Right-shift applied to \(c\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

headroom_t vect_complex_s32_scale(complex_s32_t a[], const complex_s32_t b[], const int32_t c_real, const int32_t c_imag, const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Multiply a complex 32-bit vector by a complex 32-bit scalar.

a[] and b[] represent the complex 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively.

c represents the complex 32-bit scale factor \(c\).

a[] and b[] each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and to \(c\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{v_k\} \cdot Re\{c\} - Im\{v_k\} \cdot Im\{c\} \right) \cdot 2^{-30} \\ & Im\{a_k\} \leftarrow \left( Re\{v_k\} \cdot Im\{c\} + Im\{v_k\} \cdot Re\{c\} \right) \cdot 2^{-30} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).

The function vect_complex_s32_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\).

  • b[in] Complex input vector \(\bar b\).

  • c_real[in] Real part of \(c\)

  • c_imag[in] Imaginary part of \(c\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\).

  • b_shr[in] Right-shift appled to \(\bar b\).

  • c_shr[in] Right-shift applied to \(c\).

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

void vect_complex_s32_set(complex_s32_t a[], const int32_t b_real, const int32_t b_imag, const unsigned length)

Set each element of a complex 32-bit vector to a specified value.

a[] represents a complex 32-bit vector \(\bar a\). a[] must begin at a word-aligned address.

b_real and b_imag are the real and imaginary parts to which each element will be set.

length is the number of elements in a[].

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow b\_real + j\cdot b\_imag \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \\ & \qquad\text{ where } j^2 = -1 \end{aligned}\end{split}\]

Block Floating-Point

If \(b\) is the mantissa of floating-point value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b_real[in] Value to set real part of elements of \(\bar a\) to

  • b_imag[in] Value to set imaginary part of elements of \(\bar a\) to

  • length[in] Number of elements in \(\bar a\)

Throws ET_LOAD_STORE:

Raised if `a` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_complex_s32_shl(complex_s32_t a[], const complex_s32_t b[], const unsigned length, const left_shift_t b_shl)

Left-shift each element of a complex 32-bit vector by a specified number of bits.

a[] and b[] represent the complex 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in \(\bar a\) and \(\bar b\).

b_shl is the signed arithmetic left-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow sat_{32}(\lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{32}(\lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • length[in] Number of elements in vector \(\bar b\)

  • b_shl[in] Left-shift applied to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_shr(complex_s32_t a[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)

Right-shift each element of a complex 32-bit vector by a specified number of bits.

a[] and b[] represent the complex 32-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[].

length is the number of elements in \(\bar a\) and \(\bar b\).

b_shr is the signed arithmetic right-shift applied to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow sat_{32}(\lfloor Re\{b_k\} \cdot 2^{-b\_shr} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{32}(\lfloor Im\{b_k\} \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{-b\_shr}\) and \(a\_exp = b\_exp\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • length[in] Number of elements in vector \(\bar b\)

  • b_shr[in] Right-shift applied to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\)

headroom_t vect_complex_s32_squared_mag(int32_t a[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)

Computes the squared magnitudes of elements of a complex 32-bit vector.

a[] represents the complex 32-bit mantissa vector \(\bar a\). b[] represents the real 32-bit mantissa vector \(\bar b\). Each must begin at a word-aligned address.

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift appled to each element of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow ((Re\{b_k'\})^2 + (Im\{b_k'\})^2)\cdot 2^{-30} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = 2 \cdot (b\_exp + b\_shr)\).

The function vect_complex_s32_squared_mag_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` is not double word-aligned or `b` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_complex_s32_sub(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)

Subtract one complex 32-bit vector from another.

a[], b[] and c[] represent the complex 32-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place on b[] or c[].

length is the number of elements in each of the vectors.

b_shr and c_shr are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} - Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} - Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) and \(\bar c\) are the complex 32-bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 32-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.

The function vect_complex_s32_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Parameters:
  • a[out] Complex output vector \(\bar a\)

  • b[in] Complex input vector \(\bar b\)

  • c[in] Complex input vector \(\bar c\)

  • length[in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)

  • b_shr[in] Right-shift applied to \(\bar b\)

  • c_shr[in] Right-shift applied to \(\bar c\)

Throws ET_LOAD_STORE:

Raised if `a`, `b` or `c` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of output vector \(\bar a\).

void vect_complex_s32_sum(complex_s64_t *a, const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)

Compute the sum of elements of a complex 32-bit vector.

a is the complex 64-bit mantissa of the resulting sum.

b[] represents the complex 32-bit mantissa vector \(\bar b\). b[] must begin at a word-aligned address.

length is the number of elements in \(\bar b\).

b_shr is the unsigned arithmetic right-shift appled to each element of \(\bar b\). b_shr cannot be negative.

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow b_k \cdot 2^{-b\_shr} \\ & Re\{a\} \leftarrow \sum_{k=0}^{length-1} \left( Re\{b_k'\} \right) \\ & Im\{a\} \leftarrow \sum_{k=0}^{length-1} \left( Im\{b_k'\} \right) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then \(a\) is the complex 64-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).

The function vect_complex_s32_sum_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).

Additional Details

Internally the sum accumulates into four separate complex 40-bit accumulators. These accumulators apply symmetric 40-bit saturation logic (with bounds \(\pm 2^{39}-1\)) with each added element. At the end, the 4 accumulators are summed together into the 64-bit fields of a. No saturation logic is applied at this final step.

In the most extreme case, each \(b_k\) may be \(-2^{31}\). \(256\) of these added into the same accumulator is \(-2^{39}\) which would saturate to \(-2^{39}+1\), introducing 1 LSb of error (which may or may not be acceptable given a particular circumstance). The final result for each part then may be as large as \(4\cdot(-2^{39}+1) = -2^{41}+4 \), each fitting into a 42-bit signed integer.

Parameters:
  • a[out] Complex sum \(a\)

  • b[in] Complex input vector \(\bar b\).

  • length[in] Number of elements in vector \(\bar b\).

  • b_shr[in] Right-shift appled to \(\bar b\).

Throws ET_LOAD_STORE:

Raised if `b` is not word-aligned (See Note: Vector Alignment)

void vect_complex_s32_tail_reverse(complex_s32_t x[], const unsigned length)

Reverses the order of the tail of a complex 32-bit vector.

Reverses the order of elements in the tail of the complex 32-bit vector \(\bar x\). The tail of \(\bar x\), in this context, is all elements of \(\bar x\) except for \(x_0\). In other words, the first element \(x_0\) remains where it is, and the remaining \(length-1\) elements are rearranged to have their order reversed.

This function is used when performing a forward or inverse FFT on a single sequence of real values (i.e. the mono FFT), and operates in-place on x[].

Parameter Details

x[] represents the complex 32-bit vector \(\bar x\), which is both an input to and an output of this function. x[] must begin at a word-aligned address.

length is the number of elements in \(\bar x\).

Operation Performed

\[\begin{split}\begin{aligned} & x_0 \leftarrow x_0 \\ & x_k \leftarrow x_{length - k} \\ & \qquad\text{ for }k\in 1\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • x[inout] Complex vector to have its tail reversed.

  • length[in] Number of elements in \(\bar x\)

Throws ET_LOAD_STORE:

Raised if `x` is not word-aligned (See Note: Vector Alignment)

headroom_t vect_complex_s32_conjugate(complex_s32_t a[], const complex_s32_t b[], const unsigned length)

Get the complex conjugate of a complex 32-bit vector.

The complex conjugate of a complex scalar \(z = x + yi\) is \(z^* = x - yi\). This function computes the complex conjugate of each element of \(\bar b\) (negates the imaginary part of each element) and places the result in \(\bar a\).

a[] is the complex 32-bit output vector \(\bar a\).

b[] is the complex 32-bit input vector \(\bar b\).

Both a and b must point to word-aligned addresses.

length is the number of elements in \(\bar a\) and \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & Re\{a_k\} \leftarrow Re\{b_k\} \\ & Im\{a_k\} \leftarrow - Im\{b_k\} \\ & \qquad\text{ for }k\in 1\ ...\ (length-1) \end{aligned}\end{split}\]

Parameters:
  • a[out] Complex 32-bit output vector \(\bar a\)

  • b[in] Complex 32-bit input vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `a` or `b` is not word-aligned (See Note: Vector Alignment)

Returns:

Headroom of the output vector \(\bar a\).

void vect_complex_s32_to_vect_complex_s16(int16_t a_real[], int16_t a_imag[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)

Convert a complex 32-bit vector into a complex 16-bit vector.

This function converts a complex 32-bit mantissa vector \(\bar b\) into a complex 16-bit mantissa vector \(\bar a\). Conceptually, the output BFP vector \(\bar{a}\cdot 2^{a\_exp}\) represents the same value as the input BFP vector \(\bar{b}\cdot 2^{b\_exp}\), only with a reduced bit-depth.

In most cases \(b\_shr\) should be \(16 - b\_hr\), where \(b\_hr\) is the headroom of the 32-bit input mantissa vector \(\bar b\). The output exponent \(a\_exp\) will then be given by

\( a\_exp = b\_exp + b\_shr \)

Parameter Details

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector \(\bar a\), with the real part of each \(a_k\) going in a_real[] and the imaginary part going in a_imag[].

b[] represents the complex 32-bit mantissa vector \(\bar b\).

a_real[], a_imag[] and b[] must each begin at a word-aligned address.

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of \(\bar b\).

Operation Performed

\[\begin{split}\begin{aligned} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{aligned}\end{split}\]

Block Floating-Point

If \(\bar b\) are the complex 32-bit mantissas of a BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).

Parameters:
  • a_real[out] Real part of complex output vector \(\bar a\).

  • a_imag[out] Imaginary part of complex output vector \(\bar a\).

  • b[in] Complex input vector \(\bar b\).

  • length[in] Number of elements in vectors \(\bar a\) and \(\bar b\)

  • b_shr[in] Right-shift appled to \(\bar b\).

Throws ET_LOAD_STORE:

Raised if `a_real`, `a_imag` or `b` are not word-aligned (See Note: Vector Alignment)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$Mixed-precision vector API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_mixed.html#mixed-precision-vector-api
group vect_mixed_api

Functions

void mat_mul_s8_x_s16_yield_s32(int32_t output[], const int8_t matrix[], const int16_t input_vect[], const unsigned M_rows, const unsigned N_cols, int8_t scratch[])

Multiply an 8-bit matrix by a 16-bit vetor for a 32-bit result vector.

This function multiplies an 8-bit \(M \times N\) matrix \(\bar W\) by a 16-bit \(N\)-element column vector \(\bar v\) and returns the result as a 32-bit \(M\)-element vector \(\bar a\).

output is the output vector \(\bar a\).

matrix is the matrix \(\bar W\).

input_vect is the vector \(\bar v\).

matrix and input_vect must both begin at a word-aligned offsets.

M_rows and N_rows are the dimensions \(M\) and \(N\) of matrix \(\bar W\). \(M\) must be a multiple of 16, and \(N\) must be a multiple of 32.

scratch is a pointer to a word-aligned buffer that this function may use to store intermediate results. This buffer must be at least \(N\) bytes long.

The result of this multiplication is exact, so long as saturation does not occur.

Parameters:
  • output[inout] The output vector \(\bar a\)

  • matrix[in] The weight matrix \(\bar W\)

  • input_vect[in] The input vector \(\bar v\)

  • M_rows[in] The number of rows \(M\) in matrix \(\bar W\)

  • N_cols[in] The number of columns \(N\) in matrix \(\bar W\)

  • scratch[in] Scratch buffer required by this function.

Throws ET_LOAD_STORE:

Raised if `matrix` or `input_vect` is not word-aligned (See Note: Vector Alignment)

unsigned vect_sXX_add_scalar(int32_t a[], const int32_t b[], const unsigned length_bytes, const int32_t c, const int32_t d, const right_shift_t b_shr, const unsigned mode_bits)

Add a scalar to a vector.

Add a scalar to a vector. This works for 8, 16 or 32 bits, real or complex.

length_bytes is the total number of bytes to be output. So, for 16-bit vectors, length_bytes is twice the number of elements, whereas for complex 32-bit vectors, length_bytes is 8 times the number of elements.

c and d are the values that populate the internal buffer to be added to the input vector as follows: Internally an 8 word (32 byte) buffer is allocated (on the stack). Even-indexed words are populated with c and odd-indexed words are populated with d. For real vectors, c and d should be the same value &#8212; the reason for d is to allow this same function to work for complex 32-bit vectors. This also means that for 16-bit vectors, the value to be added needs to be duplicated in both the higher 2 bytes and lower 2 bytes of the word.

mode_bits should be 0x0000 for 32-bit mode, 0x0100 for 16-bit mode or 0x0200 for 8-bit mode.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$16-bit vector prepare functions£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_s16_prepare.html#bit-vector-prepare-functions
group vect_s16_prepare_api

Defines

vect_s16_add_prepare

Obtain the output exponent and shifts required for a call to vect_s16_add().

The logic for computing the shifts and exponents of vect_s16_add() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

vect_s16_add_scalar_prepare

Obtain the output exponent and shifts required for a call to vect_s16_add_scalar().

The logic for computing the shifts and exponents of vect_s16_add_scalar() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

vect_s16_nmacc_prepare

Obtain the output exponent and shifts required for a call to vect_s16_nmacc().

The logic for computing the shifts and exponents of vect_s16_nmacc() is identical to that for vect_s16_macc_prepare().

This macro is provided as a convenience to developers and to make the code more readable.

vect_s16_sub_prepare

Obtain the output exponent and shifts required for a call to vect_s16_sub().

The logic for computing the shifts and exponents of vect_s16_sub() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

Functions

void vect_s16_clip_prepare(exponent_t *a_exp, right_shift_t *b_shr, int16_t *lower_bound, int16_t *upper_bound, const exponent_t b_exp, const exponent_t bound_exp, const headroom_t b_hr)

Obtain the output exponent, input shift and modified bounds used by vect_s16_clip().

This function is used in conjunction with vect_s16_clip() to bound the elements of a 32-bit BFP vector to a specified range.

This function computes a_exp, b_shr, lower_bound and upper_bound.

a_exp is the exponent associated with the 16-bit mantissa vector \(\bar a\) computed by vect_s32_clip().

b_shr is the shift parameter required by vect_s16_clip() to achieve the output exponent a_exp.

lower_bound and upper_bound are the 16-bit mantissas which indicate the lower and upper clipping bounds respectively. The values are modified by this function, and the resulting values should be passed along to vect_s16_clip().

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

bound_exp is the exponent associated with the bound mantissas lower_bound and upper_bound respectively.

b_hr is the headroom of \(\bar b\). If unknown, it can be obtained using vect_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

See also

vect_s16_clip

Parameters:
  • a_exp[out] Exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_s32_clip()

  • lower_bound[inout] Lower bound of clipping range

  • upper_bound[inout] Upper bound of clipping range

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • bound_exp[in] Exponent associated with clipping bounds lower_bound and upper_bound

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

void vect_s16_inverse_prepare(exponent_t *a_exp, unsigned *scale, const int16_t b[], const exponent_t b_exp, const unsigned length)

Obtain the output exponent and scaling parameter used by vect_s16_inverse().

This function is used in conjunction with vect_s16_inverse() to compute the inverse of elements of a 16-bit BFP vector.

This function computes a_exp and scale.

a_exp is the exponent associated with output mantissa vector \(\bar a\), and must be chosen to avoid overflow in the smallest element of the input vector, which when inverted becomes the largest output element. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation. The a_exp chosen by this function is derived from the exponent and smallest element of the input vector.

scale is a scaling parameter used by vect_s16_inverse() to achieve the chosen output exponent.

b[] is the input mantissa vector \(\bar b\).

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

length is the number of elements in \(\bar b\).

See also

vect_s16_inverse

Parameters:
  • a_exp[out] Exponent of output vector \(\bar a\)

  • scale[out] Scale factor to be applied when computing inverse

  • b[in] Input vector \(\bar b\)

  • b_exp[in] Exponent of \(\bar b\)

  • length[in] Number of elements in vector \(\bar b\)

void vect_s16_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *bc_sat, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const headroom_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and shifts needed by vect_s16_macc().

This function is used in conjunction with vect_s16_macc() to perform an element-wise multiply-accumlate of 16-bit BFP vectors.

This function computes new_acc_exp and acc_shr and bc_sat, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of vect_s16_macc().

acc_exp is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereas new_acc_exp is the exponent corresponding to the updated accumulator vector.

b_exp and c_exp are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

acc_hr, b_hr and c_hr are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling vect_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the acc_shr and bc_sat produced by this function can be adjusted according to the following:

// Presumed to be set somewhere
exponent_t acc_exp, b_exp, c_exp;
headroom_t acc_hr, b_hr, c_hr;
exponent_t desired_exp;

...

// Call prepare
right_shift_t acc_shr, bc_sat;
vect_s16_macc_prepare(&acc_exp, &acc_shr, &bc_sat, 
                          acc_exp, b_exp, c_exp,
                          acc_hr, b_hr, c_hr);

// Modify results
right_shift_t mant_shr = desired_exp - acc_exp;
acc_exp += mant_shr;
acc_shr += mant_shr;
bc_sat  += mant_shr;

// acc_shr and bc_sat may now be used in a call to vect_s16_macc() 

When applying the above adjustment, the following conditions should be maintained:

  • bc_sat >= 0 (bc_sat is an unsigned right-shift)

  • acc_shr > -acc_hr (Shifting any further left may cause saturation)

It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.

See also

vect_s16_macc

Parameters:
  • new_acc_exp[out] Exponent associated with output mantissa vector \(\bar a\) (after macc)

  • acc_shr[out] Signed arithmetic right-shift used for \(\bar a\) in vect_s16_macc()

  • bc_sat[out] Unsigned arithmetic right-shift applied to the product of elements \(b_k\) and \(c_k\) in vect_s16_macc()

  • acc_exp[in] Exponent associated with input mantissa vector \(\bar a\) (before macc)

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • acc_hr[in] Headroom of input mantissa vector \(\bar a\) (before macc)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_s16_mul_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and output shift used by vect_s16_mul().

This function is used in conjunction with vect_s16_mul() to perform an element-wise multiplication of two 16-bit BFP vectors.

This function computes a_exp and a_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms of associated with the input vectors.

a_shr is an arithmetic right-shift applied by vect_complex_s16_mul() to the 32-bit products of input elements to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling vect_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the a_shr produced by this function can be adjusted according to the following:

exponent_t a_exp;
right_shift_t a_shr;
vect_s16_mul_prepare(&a_exp, &a_shr, b_exp, c_exp, b_hr, c_hr);
exponent_t desired_exp = ...; // Value known a priori
a_shr = a_shr + (desired_exp - a_exp);
a_exp = desired_exp;

When applying the above adjustment, the following conditions should be maintained:

  • a_shr >= 0

Be aware that using a smaller value than strictly necessary for a_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT16_MIN will instead saturate to -INT16_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

See also

vect_s16_mul

Parameters:
  • a_exp[out] Exponent of output elements of vect_s16_mul()

  • a_shr[out] Right-shift supplied to vect_s16_mul()

  • b_exp[in] Exponent associated with \(\bar b\)

  • c_exp[in] Exponent associated with \(\bar c\)

  • b_hr[in] Headroom of \(\bar b\)

  • c_hr[in] Headroom of \(\bar c\)

void vect_s16_scale_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and output shift used by vect_s16_scale().

This function is used in conjunction with vect_s16_scale() to perform multiplication of a 16-bit BFP vector \(\bar{b} \cdot 2^{b\_exp}\) by a 16-bit scalar \(c \cdot 2^{c\_exp}\). The result is another 16-bit BFP vector \(\bar{a} \cdot 2^{a\_exp}\).

This function computes a_exp and a_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms of associated with the inputs.

a_shr is an arithmetic right-shift applied by vect_complex_s16_scale() to the 32-bit products of input elements to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with \(\bar b\) and \(c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(c\) respectively. If the headroom of \(\bar b\) or \(c\) are unknown, they can be obtained by calling vect_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the a_shr produced by this function can be adjusted according to the following:

exponent_t a_exp;
right_shift_t a_shr;
vect_s16_scale_prepare(&a_exp, &a_shr, b_exp, c_exp, b_hr, c_hr);
exponent_t desired_exp = ...; // Value known a priori
a_shr = a_shr + (desired_exp - a_exp);
a_exp = desired_exp;

When applying the above adjustment, the following conditions should be maintained:

  • a_shr >= 0

Be aware that using a smaller value than strictly necessary for a_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT16_MIN will instead saturate to -INT16_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

See also

vect_s16_scale

Parameters:
  • a_exp[out] Exponent of output elements of vect_s16_scale()

  • a_shr[out] Right-shift supplied to vect_s16_scale()

  • b_exp[in] Exponent associated with \(\bar b\)

  • c_exp[in] Exponent associated with \(\bar c\)

  • b_hr[in] Headroom of \(\bar b\)

  • c_hr[in] Headroom of \(\bar c\)

void vect_s16_sqrt_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const right_shift_t b_hr)

Obtain the output exponent and shift parameter used by vect_s16_sqrt().

This function is used in conjunction withx vect_s16_sqrt() to compute the square root of elements of a 16-bit BFP vector.

This function computes a_exp and b_shr.

a_exp is the exponent associated with output mantissa vector \(\bar a\), and should be chosen to maximize the precision of the results. To that end, this function chooses a_exp to be the smallest exponent known to avoid saturation of the resulting mantissa vector \(\bar a\). It is derived from the exponent and headroom of the input BFP vector.

b_shr is the shift parameter required by vect_s16_sqrt() to achieve the chosen output exponent a_exp.

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

b_hr is the headroom of \(\bar b\). If it is unknown, it can be obtained using vect_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr produced by this function can be adjusted according to the following:

exponent_t a_exp;
right_shift_t b_shr;
vect_s16_mul_prepare(&a_exp, &b_shr, b_exp, c_exp, b_hr, c_hr);
exponent_t desired_exp = ...; // Value known a priori
b_shr = b_shr + (desired_exp - a_exp);
a_exp = desired_exp;

When applying the above adjustment, the following condition should be maintained:

  • b_hr + b_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Also, if a larger exponent is used than necessary, a larger depth parameter (see vect_s16_sqrt()) will be required to achieve the same precision, as the results are computed bit by bit, starting with the most significant bit.

See also

vect_s16_sqrt

Parameters:
  • a_exp[out] Exponent of outputs of vect_s16_sqrt()

  • b_shr[out] Right-shift to be applied to elements of \(\bar b\)

  • b_exp[in] Exponent of vector{b}

  • b_hr[in] Headroom of vector{b}

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$32-bit vector prepare functions£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_s32_prepare.html#bit-vector-prepare-functions
group vect_s32_prepare_api

Defines

vect_s32_add_scalar_prepare

Obtain the output exponent and shifts required for a call to vect_s32_add_scalar().

The logic for computing the shifts and exponents of vect_s32_add_scalar() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

vect_s32_nmacc_prepare

Obtain the output exponent and shifts required for a call to vect_s32_nmacc().

The logic for computing the shifts and exponents of vect_s32_nmacc() is identical to that for vect_s32_macc_prepare().

This macro is provided as a convenience to developers and to make the code more readable.

vect_s32_scale_prepare

Obtain the output exponent and shifts required for a call to vect_s32_scale().

The logic for computing the shifts and exponents of vect_s32_scale() is identical to that for vect_s32_mul().

This macro is provided as a convenience to developers and to make the code more readable.

vect_s32_sub_prepare

Obtain the output exponent and shifts required for a call to vect_s32_sub().

The logic for computing the shifts and exponents of vect_s32_sub() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

Functions

void vect_s32_add_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and input shifts to add or subtract two 16- or 32-bit BFP vectors.

The block floating-point functions in this library which add or subtract vectors are of the general form:

\( \bar{a} \cdot 2^{a\_exp} = \bar{b}\cdot 2^{b\_exp} \pm \bar{c}\cdot 2^{c\_exp} \) }

\(\bar b\) and \(\bar c\) are the input mantissa vectors with exponents \(b\_exp\) and \(c\_exp\), which are shared by each element of their respective vectors. \(\bar a\) is the output mantissa vector with exponent \(a\_exp\). Two additional properties, \(b\_hr\) and \(c\_hr\), which are the headroom of mantissa vectors \(\bar b\) and \(\bar c\) respectively, are required by this function.

In order to avoid any overflows in the output mantissas, the output exponent \(a\_exp\) must be chosen such that the largest (in the sense of absolute value) possible output mantissa will fit into the allotted space (e.g. 32 bits for vect_s32_add()). Once \(a\_exp\) is chosen, the input bit-shifts \(b\_shr\) and \(c\_shr\) are calculated to achieve that resulting exponent.

This function chooses \(a\_exp\) to be the minimum exponent known to avoid overflows, given the input exponents ( \(b\_exp\) and \(c\_exp\)) and input headroom ( \(b\_hr\) and \(c\_hr\)).

This function is used calculate the output exponent and input bit-shifts for each of the following functions:

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);
right_shift_t new_c_shr = c_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

  • c_hr + c_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr and c_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • If \(b\_hr\) or \(c\_hr\) are unknown, they can be calculated using the appropriate headroom function (e.g. vect_complex_s16_headroom() for complex 16-bit vectors) or the value 0 can always be safely used (but may result in reduced precision).

Parameters:
  • a_exp[out] Output exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift to be applied to elements of \(\bar b\). Used by the function which computes the output mantissas \(\bar a\)

  • c_shr[out] Signed arithmetic right-shift to be applied to elements of \(\bar c\). Used by the function which computes the output mantissas \(\bar a\)

  • b_exp[in] Exponent of BFP vector \(\bar b\)

  • c_exp[in] Exponent of BFP vector \(\bar c\)

  • b_hr[in] Headroom of BFP vector \(\bar b\)

  • c_hr[in] Headroom of BFP vector \(\bar c\)

void vect_s32_clip_prepare(exponent_t *a_exp, right_shift_t *b_shr, int32_t *lower_bound, int32_t *upper_bound, const exponent_t b_exp, const exponent_t bound_exp, const headroom_t b_hr)

Obtain the output exponent, input shift and modified bounds used by vect_s32_clip().

This function is used in conjunction with vect_s32_clip() to bound the elements of a 32-bit BFP vector to a specified range.

This function computes a_exp, b_shr, lower_bound and upper_bound.

a_exp is the exponent associated with the 32-bit mantissa vector \(\bar a\) computed by vect_s32_clip().

b_shr is the shift parameter required by vect_s32_clip() to achieve the output exponent a_exp.

lower_bound and upper_bound are the 32-bit mantissas which indicate the lower and upper clipping bounds respectively. The values are modified by this function, and the resulting values should be passed along to vect_s32_clip().

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

bound_exp is the exponent associated with the bound mantissas lower_bound and upper_bound respectively.

b_hr is the headroom of \(\bar b\). If unknown, it can be obtained using vect_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

See also

vect_s32_clip

Parameters:
  • a_exp[out] Exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_s32_clip()

  • lower_bound[inout] Lower bound of clipping range

  • upper_bound[inout] Upper bound of clipping range

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • bound_exp[in] Exponent associated with clipping bounds lower_bound and upper_bound

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

void vect_s32_dot_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr, const unsigned length)

Obtain the output exponent and input shift used by vect_s32_dot().

This function is used in conjunction with vect_s32_dot() to compute the inner product of two 32-bit BFP vectors.

This function computes a_exp, b_shr and c_shr.

a_exp is the exponent associated with the 64-bit mantissa \(a\) returned by vect_s32_dot(), and must be chosen to be large enough to avoid saturation when \(a\) is computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms associated with the input vectors.

b_shr and c_shr are the shift parameters required by vect_s32_dot() to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If either is unknown, they can be obtained using vect_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

length is the number of elements in the input mantissa vectors \(\bar b\) and \(\bar c\).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);
right_shift_t new_c_shr = c_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

  • c_hr + c_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr or c_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

See also

vect_s32_dot

Parameters:
  • a_exp[out] Exponent associated with output mantissa \(a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_s32_dot()

  • c_shr[out] Signed arithmetic right-shift for \(\bar c\) used by vect_s32_dot()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar b\)

  • length[in] Number of elements in vectors \(\bar b\) and \(\bar c\)

void vect_s32_energy_prepare(exponent_t *a_exp, right_shift_t *b_shr, const unsigned length, const exponent_t b_exp, const headroom_t b_hr)

Obtain the output exponent and input shift used by vect_s32_energy().

This function is used in conjunction with vect_s32_energy() to compute the inner product of a 32-bit BFP vector with itself.

This function computes a_exp and b_shr.

a_exp is the exponent associated with the 64-bit mantissa \(a\) returned by vect_s32_energy(), and must be chosen to be large enough to avoid saturation when \(a\) is computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponent and headroom associated with the input vector.

b_shr is the shift parameter required by vect_s32_energy() to achieve the chosen output exponent a_exp.

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

b_hr is the headroom of \(\bar b\). If it is unknown, it can be obtained using vect_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

length is the number of elements in the input mantissa vector \(\bar b\).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);

When applying the above adjustment, the following condition should be maintained:

  • b_hr + b_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

See also

vect_s32_energy

Parameters:
  • a_exp[out] Exponent of outputs of vect_s32_energy()

  • b_shr[out] Right-shift to be applied to elements of \(\bar b\)

  • length[in] Number of elements in vector \(\bar b\)

  • b_exp[in] Exponent of vector{b}

  • b_hr[in] Headroom of vector{b}

void vect_s32_inverse_prepare(exponent_t *a_exp, unsigned *scale, const int32_t b[], const exponent_t b_exp, const unsigned length)

Obtain the output exponent and scale used by vect_s32_inverse().

This function is used in conjunction with vect_s32_inverse() to compute the inverse of elements of a 32-bit BFP vector.

This function computes a_exp and scale.

a_exp is the exponent associated with output mantissa vector \(\bar a\), and must be chosen to avoid overflow in the smallest element of the input vector, which when inverted becomes the largest output element. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation. The a_exp chosen by this function is derived from the exponent and smallest element of the input vector.

scale is a scaling parameter used by vect_s32_inverse() to achieve the chosen output exponent.

b[] is the input mantissa vector \(\bar b\).

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

length is the number of elements in \(\bar b\).

See also

vect_s32_inverse

Parameters:
  • a_exp[out] Exponent of output vector \(\bar a\)

  • scale[out] Scale factor to be applied when computing inverse

  • b[in] Input vector \(\bar b\)

  • b_exp[in] Exponent of \(\bar b\)

  • length[in] Number of elements in vector \(\bar b\)

void vect_s32_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const headroom_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and shifts needed by vect_s32_macc().

This function is used in conjunction with vect_s32_macc() to perform an element-wise multiply-accumlate of 32-bit BFP vectors.

This function computes new_acc_exp, acc_shr, b_shr and c_shr, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of vect_s32_macc().

acc_exp is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereas new_acc_exp is the exponent corresponding to the updated accumulator vector.

b_exp and c_exp are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

acc_hr, b_hr and c_hr are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling vect_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the acc_shr and bc_sat produced by this function can be adjusted according to the following:

// Presumed to be set somewhere
exponent_t acc_exp, b_exp, c_exp;
headroom_t acc_hr, b_hr, c_hr;
exponent_t desired_exp;

...

// Call prepare
right_shift_t acc_shr, b_shr, c_shr;
vect_s32_macc_prepare(&acc_exp, &acc_shr, &b_shr, &c_shr, 
                          acc_exp, b_exp, c_exp,
                          acc_hr, b_hr, c_hr);

// Modify results
right_shift_t mant_shr = desired_exp - acc_exp;
acc_exp += mant_shr;
acc_shr += mant_shr;
b_shr  += mant_shr;
c_shr  += mant_shr;

// acc_shr, b_shr and c_shr may now be used in a call to vect_s32_macc() 

When applying the above adjustment, the following conditions should be maintained:

  • acc_shr > -acc_hr (Shifting any further left may cause saturation)

  • b_shr => -b_hr (Shifting any further left may cause saturation)

  • c_shr => -c_hr (Shifting any further left may cause saturation)

It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.

See also

vect_s32_macc

Parameters:
  • new_acc_exp[out] Exponent associated with output mantissa vector \(\bar a\) (after macc)

  • acc_shr[out] Signed arithmetic right-shift used for \(\bar a\) in vect_s32_macc()

  • b_shr[out] Signed arithmetic right-shift used for \(\bar b\) in vect_s32_macc()

  • c_shr[out] Signed arithmetic right-shift used for \(\bar c\) in vect_s32_macc()

  • acc_exp[in] Exponent associated with input mantissa vector \(\bar a\) (before macc)

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • acc_hr[in] Headroom of input mantissa vector \(\bar a\) (before macc)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_s32_mul_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and input shifts used by vect_s32_mul().

This function is used in conjunction with vect_s32_mul() to perform an element-wise multiplication of two 32-bit BFP vectors.

This function computes a_exp, b_shr, c_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms of associated with the input vectors.

b_shr and c_shr are the shift parameters required by vect_complex_s32_mul() to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling vect_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);
right_shift_t new_c_shr = c_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

  • c_hr + c_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr and c_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT32_MIN will instead saturate to -INT32_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

See also

vect_s32_mul

Parameters:
  • a_exp[out] Exponent of output elements of vect_s32_mul()

  • b_shr[out] Right-shift to be applied to elements of \(\bar b\)

  • c_shr[out] Right-shift to be applied to elemetns of \(\bar c\)

  • b_exp[in] Exponent of \(\bar b\)

  • c_exp[in] Exponent of \(\bar c\)

  • b_hr[in] Headroom of \(\bar b\)

  • c_hr[in] Headroom of \(\bar c\)

void vect_s32_sqrt_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const right_shift_t b_hr)

Obtain the output exponent and shift parameter used by vect_s32_sqrt().

This function is used in conjunction withx vect_s32_sqrt() to compute the square root of elements of a 32-bit BFP vector.

This function computes a_exp and b_shr.

a_exp is the exponent associated with output mantissa vector \(\bar a\), and should be chosen to maximize the precision of the results. To that end, this function chooses a_exp to be the smallest exponent known to avoid saturation of the resulting mantissa vector \(\bar a\). It is derived from the exponent and headroom of the input BFP vector.

b_shr is the shift parameter required by vect_s32_sqrt() to achieve the chosen output exponent a_exp.

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

b_hr is the headroom of \(\bar b\). If it is unknown, it can be obtained using vect_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr produced by this function can be adjusted according to the following:

exponent_t a_exp;
right_shift_t b_shr;
vect_s16_mul_prepare(&a_exp, &b_shr, b_exp, c_exp, b_hr, c_hr);
exponent_t desired_exp = ...; // Value known a priori
b_shr = b_shr + (desired_exp - a_exp);
a_exp = desired_exp;

When applying the above adjustment, the following condition should be maintained:

  • b_hr + b_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Also, if a larger exponent is used than necessary, a larger depth parameter (see vect_s32_sqrt()) will be required to achieve the same precision, as the results are computed bit by bit, starting with the most significant bit.

See also

vect_s32_sqrt

Parameters:
  • a_exp[out] Exponent of outputs of vect_s32_sqrt()

  • b_shr[out] Right-shift to be applied to elements of \(\bar b\)

  • b_exp[in] Exponent of vector{b}

  • b_hr[in] Headroom of vector{b}

void vect_2vec_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr, const headroom_t extra_operand_hr)

Obtain the output exponent and input shifts required to perform a binary add-like operation.

This function computes the output exponent and input shifts required for BFP operations which take two vectors as input, where the operation is “add-like”.

Here, “add-like” operations are loosely defined as those which require input vectors to share an exponent before their mantissas can be meaningfully used to perform that operation.

For example, consider adding \( 3 \cdot 2^{x} + 4 \cdot 2^{y} \). If \(x = y\), then the mantissas can be added directly to get a meaningful result \( (3+4) \cdot 2^{x} \). If \(x \ne y\) however, adding the mantissas together is meaningless. Before the mantissas can be added in this case, one or both of the input mantissas must be shifted so that the representations correspond to the same exponent. Likewise, similar logic applies to binary comparisons.

This is in contrast to a “multiply-like” operation, which does not have this same requirement (e.g. \(a \cdot 2^x \cdot b \cdot 2^y = ab \cdot 2^{x+y}\), regardless of whether \(x=y\)).

For a general operation like:

\( \bar{a} \cdot 2^{a\_exp} = \bar{b}\cdot 2^{b\_exp} \oplus \bar{c}\cdot 2^{c\_exp} \)

\(\bar b\) and \(\bar c\) are the input mantissa vectors with exponents \(b\_exp\) and \(c\_exp\), which are shared by each element of their respective vectors. \(\bar a\) is the output mantissa vector with exponent \(a\_exp\). Two additional properties, \(b\_hr\) and \(c\_hr\), which are the headroom of mantissa vectors \(\bar b\) and \(\bar c\) respectively, are required by this function.

In addition to \(a\_exp\), this function computes \(b\_shr\) and \(c\_shr\), signed arithmetic right-shifts applied to the mantissa vectors \(\bar b\) and \(\bar c\) so that the add-like \(\oplus\) operation can be applied.

This function chooses \(a\_exp\) to be the minimum exponent which can be used to express both \(\bar B\) and \(\bar C\) without saturation of their mantissas, and which leaves both \(\bar b\) and \(\bar c\) with at least extra_operand_hr bits of headroom. The shifts \(b\_shr\) and \(c\_shr\) are derived from \(a\_exp\) using \(b\_exp\) and \(c\_exp\).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);
right_shift_t new_c_shr = c_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

  • c_hr + c_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr and c_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • If \(b\_hr\) or \(c\_hr\) are unknown, they can be calculated using the appropriate headroom function (e.g. vect_complex_s16_headroom() for complex 16-bit vectors) or the value 0 can always be safely used (but may result in reduced precision).

Parameters:
  • a_exp[out] Output exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift to be applied to elements of \(\bar b\). Used by the function which computes the output mantissas \(\bar a\)

  • c_shr[out] Signed arithmetic right-shift to be applied to elements of \(\bar c\). Used by the function which computes the output mantissas \(\bar a\)

  • b_exp[in] Exponent of BFP vector \(\bar b\)

  • c_exp[in] Exponent of BFP vector \(\bar c\)

  • b_hr[in] Headroom of BFP vector \(\bar b\)

  • c_hr[in] Headroom of BFP vector \(\bar c\)

  • extra_operand_hr[in] The minimum amount of headroom that will be left in the mantissa vectors following the arithmetic right-shift, as required by some operations.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$16-Bit vomplex vector prepare functions£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s16_prepare.html#bit-vomplex-vector-prepare-functions
group vect_complex_s16_prepare_api

Defines

vect_complex_s16_add_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_add().

The logic for computing the shifts and exponents of vect_complex_s16_add() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_add_scalar_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_add_scalar().

The logic for computing the shifts and exponents of vect_complex_s16_add_scalar() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_conj_mul_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_conj_mul().

The logic for computing the shifts and exponents of vect_complex_s16_conj_mul() is identical to that for vect_complex_s16_mul().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_nmacc_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_nmacc().

The logic for computing the shifts and exponents of vect_complex_s16_nmacc() is identical to that for vect_complex_s16_macc().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_conj_macc_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_conj_macc().

The logic for computing the shifts and exponents of vect_complex_s16_conj_macc() is identical to that for vect_complex_s16_macc().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_conj_nmacc_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_conj_nmacc().

The logic for computing the shifts and exponents of vect_complex_s16_conj_nmacc() is identical to that for vect_complex_s16_macc().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_mag_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_mag().

The logic for computing the shifts and exponents of vect_complex_s16_mag() is identical to that for vect_complex_s32_mag().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_real_scale_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_real_scale().

The logic for computing the shifts and exponents of vect_complex_s16_real_scale() is identical to that for vect_s32_scale().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_scale_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_scale().

The logic for computing the shifts and exponents of vect_complex_s16_scale() is identical to that for vect_complex_s32_mul().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s16_sub_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s16_sub().

The logic for computing the shifts and exponents of vect_complex_s16_sub() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

Functions

void vect_complex_s16_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *bc_sat, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const headroom_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and shifts needed by vect_complex_s16_macc().

This function is used in conjunction with vect_complex_s16_macc() to perform an element-wise multiply-accumlate of complex 16-bit BFP vectors.

This function computes new_acc_exp and acc_shr and bc_sat, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of vect_complex_s16_macc().

acc_exp is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereas new_acc_exp is the exponent corresponding to the updated accumulator vector.

b_exp and c_exp are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

acc_hr, b_hr and c_hr are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling vect_complex_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the acc_shr and bc_sat produced by this function can be adjusted according to the following:

// Presumed to be set somewhere
exponent_t acc_exp, b_exp, c_exp;
headroom_t acc_hr, b_hr, c_hr;
exponent_t desired_exp;

...

// Call prepare
right_shift_t acc_shr, bc_sat;
vect_complex_s16_macc_prepare(&acc_exp, &acc_shr, &bc_sat, 
                                  acc_exp, b_exp, c_exp,
                                  acc_hr, b_hr, c_hr);

// Modify results
right_shift_t mant_shr = desired_exp - acc_exp;
acc_exp += mant_shr;
acc_shr += mant_shr;
bc_sat  += mant_shr;

// acc_shr and bc_sat may now be used in a call to vect_complex_s16_macc() 

When applying the above adjustment, the following conditions should be maintained:

  • bc_sat >= 0 (bc_sat is an unsigned right-shift)

  • acc_shr > -acc_hr (Shifting any further left may cause saturation)

It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.

Parameters:
  • new_acc_exp[out] Exponent associated with output mantissa vector \(\bar a\) (after macc)

  • acc_shr[out] Signed arithmetic right-shift used for \(\bar a\) in vect_complex_s16_macc()

  • bc_sat[out] Unsigned arithmetic right-shift applied to the product of elements \(b_k\) and \(c_k\) in vect_complex_s16_macc()

  • acc_exp[in] Exponent associated with input mantissa vector \(\bar a\) (before macc)

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • acc_hr[in] Headroom of input mantissa vector \(\bar a\) (before macc)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_complex_s16_mul_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and output shift used by vect_complex_s16_mul() and vect_complex_s16_conj_mul().

This function is used in conjunction with vect_complex_s16_mul() to perform a complex element-wise multiplication of two complex 16-bit BFP vectors.

This function computes a_exp and a_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms of associated with the input vectors.

a_shr is the shift parameter required by vect_complex_s16_mul() to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling vect_complex_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the a_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_a_shr = a_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • new_a_shr >= 0

Be aware that using smaller values than strictly necessary for a_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT16_MIN will instead saturate to -INT16_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

Parameters:
  • a_exp[out] Exponent associated with output mantissa vector \(\bar a\)

  • a_shr[out] Unsigned arithmetic right-shift for \(\bar b\) used by vect_complex_s16_mul()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_complex_s16_real_mul_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and output shift used by vect_complex_s16_real_mul().

This function is used in conjunction with vect_complex_s16_real_mul() to perform a complex element-wise multiplication of a complex 16-bit BFP vector by a real 16-bit vector.

This function computes a_exp and a_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms of associated with the input vectors.

a_shr is the shift parameter required by vect_complex_s16_real_mul() to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling vect_complex_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the a_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_a_shr = a_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • new_a_shr >= 0

Be aware that using smaller values than strictly necessary for a_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT16_MIN will instead saturate to -INT16_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

Parameters:
  • a_exp[out] Exponent associated with output mantissa vector \(\bar a\)

  • a_shr[out] Unsigned arithmetic right-shift for \(\bar a\) used by vect_complex_s16_real_mul()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_complex_s16_squared_mag_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const headroom_t b_hr)

Obtain the output exponent and input shift used by vect_complex_s16_squared_mag().

This function is used in conjunction with vect_complex_s16_squared_mag() to compute the squared magnitude of each element of a complex 16-bit BFP vector.

This function computes a_exp and a_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and is be chosen to maximize precision when elements of \(\bar a\) are computed. The a_exp chosen by this function is derived from the exponent and headroom associated with the input vector.

a_shr is the shift parameter required by vect_complex_s16_mag() to achieve the chosen output exponent a_exp.

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

b_hr is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using vect_complex_s16_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the a_shr produced by this function can be adjusted according to the following:

exponent_t a_exp;
right_shift_t a_shr;
vect_s16_mul_prepare(&a_exp, &a_shr, b_exp, c_exp, b_hr, c_hr);
exponent_t desired_exp = ...; // Value known a priori
a_shr = a_shr + (desired_exp - a_exp);
a_exp = desired_exp;

When applying the above adjustment, the following condition should be maintained:

  • a_shr >= 0

Using larger values than strictly necessary for a_shr may result in unnecessary underflows or loss of precision.

Parameters:
  • a_exp[out] Output exponent associated with output mantissa vector \(\bar a\)

  • a_shr[out] Unsigned arithmetic right-shift for \(\bar a\) used by vect_complex_s16_squared_mag()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$32-Bit complex vector prepare functions£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s32_prepare.html#bit-complex-vector-prepare-functions
group vect_complex_s32_prepare_api

Defines

vect_complex_s32_add_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_add().

The logic for computing the shifts and exponents of vect_complex_s32_add() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more coherent.

vect_complex_s32_add_scalar_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_add_scalar().

The logic for computing the shifts and exponents of vect_complex_s32_add_scalar() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s32_conj_mul_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_conj_mul().

The logic for computing the shifts and exponents of vect_complex_s32_conj_mul() is identical to that for vect_complex_s32_mul().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s32_nmacc_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_nmacc().

The logic for computing the shifts and exponents of vect_complex_s32_nmacc() is identical to that for vect_complex_s32_macc_prepare().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s32_conj_macc_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_conj_macc().

The logic for computing the shifts and exponents of vect_complex_s32_conj_macc() is identical to that for vect_complex_s32_macc_prepare().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s32_conj_nmacc_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_conj_nmacc().

The logic for computing the shifts and exponents of vect_complex_s32_conj_nmacc() is identical to that for vect_complex_s32_macc_prepare().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s32_real_scale_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_real_scale().

The logic for computing the shifts and exponents of vect_complex_s32_real_scale() is identical to that for vect_s32_mul().

This macro is provided as a convenience to developers and to make the code more readable.

vect_complex_s32_sub_prepare

Obtain the output exponent and shifts required for a call to vect_complex_s32_sub().

The logic for computing the shifts and exponents of vect_complex_s32_sub() is identical to that for vect_s32_add().

This macro is provided as a convenience to developers and to make the code more readable.

Functions

void vect_complex_s32_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const exponent_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and shifts needed by vect_complex_s32_macc().

This function is used in conjunction with vect_complex_s32_macc() to perform an element-wise multiply-accumlate of 32-bit BFP vectors.

This function computes new_acc_exp, acc_shr, b_shr and c_shr, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of vect_complex_s32_macc().

acc_exp is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereas new_acc_exp is the exponent corresponding to the updated accumulator vector.

b_exp and c_exp are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

acc_hr, b_hr and c_hr are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling vect_complex_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the acc_shr and bc_sat produced by this function can be adjusted according to the following:

// Presumed to be set somewhere
exponent_t acc_exp, b_exp, c_exp;
headroom_t acc_hr, b_hr, c_hr;
exponent_t desired_exp;

...

// Call prepare
right_shift_t acc_shr, b_shr, c_shr;
vect_complex_s32_macc_prepare(&acc_exp, &acc_shr, &b_shr, &c_shr, 
                                  acc_exp, b_exp, c_exp,
                                  acc_hr, b_hr, c_hr);

// Modify results
right_shift_t mant_shr = desired_exp - acc_exp;
acc_exp += mant_shr;
acc_shr += mant_shr;
b_shr  += mant_shr;
c_shr  += mant_shr;

// acc_shr, b_shr and c_shr may now be used in a call to vect_complex_s32_macc() 

When applying the above adjustment, the following conditions should be maintained:

  • acc_shr > -acc_hr (Shifting any further left may cause saturation)

  • b_shr => -b_hr (Shifting any further left may cause saturation)

  • c_shr => -c_hr (Shifting any further left may cause saturation)

It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.

Parameters:
  • new_acc_exp[out] Exponent associated with output mantissa vector \(\bar a\) (after macc)

  • acc_shr[out] Signed arithmetic right-shift used for \(\bar a\) in vect_complex_s32_macc()

  • b_shr[out] Signed arithmetic right-shift used for \(\bar b\) in vect_complex_s32_macc()

  • c_shr[out] Signed arithmetic right-shift used for \(\bar c\) in vect_complex_s32_macc()

  • acc_exp[in] Exponent associated with input mantissa vector \(\bar a\) (before macc)

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • acc_hr[in] Headroom of input mantissa vector \(\bar a\) (before macc)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_complex_s32_mag_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const headroom_t b_hr)

Obtain the output exponent and input shift used by vect_complex_s32_mag() and vect_complex_s16_mag().

This function is used in conjunction with vect_complex_s32_mag() to compute the magnitude of each element of a complex 32-bit BFP vector.

This function computes a_exp and b_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and is be chosen to maximize precision when elements of \(\bar a\) are computed. The a_exp chosen by this function is derived from the exponent and headroom associated with the input vector.

b_shr is the shift parameter required by vect_complex_s32_mag() to achieve the chosen output exponent a_exp.

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

b_hr is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using vect_complex_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);

When applying the above adjustment, the following condition should be maintained:

  • b_hr + b_shr >= 0

Using larger values than strictly necessary for b_shr may result in unnecessary underflows or loss of precision.

Parameters:
  • a_exp[out] Output exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_complex_s32_mag()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

void vect_complex_s32_mul_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and input shifts used by vect_complex_s32_mul() and vect_complex_s32_conj_mul().

This function is used in conjunction with vect_complex_s32_mul() to perform a complex element-wise multiplication of two complex 32-bit BFP vectors.

This function computes a_exp, b_shr and c_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms of associated with the input vectors.

b_shr and c_shr are the shift parameters required by vect_complex_s32_mul() to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling vect_complex_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);
right_shift_t new_c_shr = c_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

  • c_hr + c_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr and c_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT32_MIN will instead saturate to -INT32_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

Parameters:
  • a_exp[out] Exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_complex_s32_mul()

  • c_shr[out] Signed arithmetic right-shift for \(\bar c\) used by vect_complex_s32_mul()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_complex_s32_real_mul_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and input shifts used by vect_complex_s32_real_mul().

This function is used in conjunction with vect_complex_s32_real_mul() to perform a the element-wise multiplication of complex 32-bit BFP vector by a real 32-bit BFP vector.

This function computes a_exp, b_shr and c_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms of associated with the input vectors.

b_shr and c_shr are the shift parameters required by vect_complex_s32_mul() to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling vect_complex_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);
right_shift_t new_c_shr = c_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

  • c_hr + c_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr and c_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT32_MIN will instead saturate to -INT32_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

Parameters:
  • a_exp[out] Output exponent associated with \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_complex_s32_real_mul()

  • c_shr[out] Signed arithmetic right-shift for \(\bar c\) used by vect_complex_s32_real_mul()

  • b_exp[in] Exponent associated with \(\bar b\)

  • c_exp[in] Exponent associated with \(\bar c\)

  • b_hr[in] Headroom of mantissa vector \(\bar b\)

  • c_hr[in] Headroom of mantissa vector \(\bar c\)

void vect_complex_s32_scale_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)

Obtain the output exponent and input shifts used by vect_complex_s32_scale().

This function is used in conjunction with vect_complex_s32_scale() to perform a complex multiplication of a complex 32-bit BFP vector by a complex 32-bit scalar.

This function computes a_exp, b_shr and c_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms associated with the input vectors.

b_shr and c_shr are the shift parameters required by vect_complex_s32_mul() to achieve the chosen output exponent a_exp.

b_exp and c_exp are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.

b_hr and c_hr are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling vect_complex_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr and c_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);
right_shift_t new_c_shr = c_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

  • c_hr + c_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr and c_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Notes

  • Using the outputs of this function, an output mantissa which would otherwise be INT32_MIN will instead saturate to -INT32_MAX. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.

Parameters:
  • a_exp[out] Exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_complex_s32_scale()

  • c_shr[out] Signed arithmetic right-shift for \(\bar c\) used by vect_complex_s32_scale()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • c_exp[in] Exponent associated with input mantissa vector \(\bar c\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • c_hr[in] Headroom of input mantissa vector \(\bar c\)

void vect_complex_s32_squared_mag_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const headroom_t b_hr)

Obtain the output exponent and input shift used by vect_complex_s32_squared_mag().

This function is used in conjunction with vect_complex_s32_squared_mag() to compute the squared magnitude of each element of a complex 32-bit BFP vector.

This function computes a_exp and b_shr.

a_exp is the exponent associated with mantissa vector \(\bar a\), and is be chosen to maximize precision when elements of \(\bar a\) are computed. The a_exp chosen by this function is derived from the exponent and headroom associated with the input vector.

b_shr is the shift parameter required by vect_complex_s32_mag() to achieve the chosen output exponent a_exp.

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

b_hr is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using vect_complex_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);

When applying the above adjustment, the following condition should be maintained:

  • b_hr + b_shr >= 0

Using larger values than strictly necessary for b_shr may result in unnecessary underflows or loss of precision.

Parameters:
  • a_exp[out] Output exponent associated with output mantissa vector \(\bar a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_complex_s32_squared_mag()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

void vect_complex_s32_sum_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const headroom_t b_hr, const unsigned length)

Obtain the output exponent and input shift used by vect_complex_s32_sum().

This function is used in conjunction with vect_complex_s32_sum() to compute the sum of elements of a complex 32-bit BFP vector.

This function computes a_exp and b_shr.

a_exp is the exponent associated with the 64-bit mantissa \(a\) returned by vect_complex_s32_sum(), and must be chosen to be large enough to avoid saturation when \(a\) is computed. To maximize precision, this function chooses a_exp to be the smallest exponent known to avoid saturation (see exception below). The a_exp chosen by this function is derived from the exponents and headrooms associated with the input vector.

b_shr is the shift parameter required by vect_complex_s32_sum() to achieve the chosen output exponent a_exp.

b_exp is the exponent associated with the input mantissa vector \(\bar b\).

b_hr is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using vect_complex_s32_headroom(). Alternatively, the value 0 can always be safely used (but may result in reduced precision).

length is the number of elements in the input mantissa vector \(\bar b\).

Adjusting Output Exponents

If a specific output exponent desired_exp is needed for the result (e.g. for emulating fixed-point arithmetic), the b_shr produced by this function can be adjusted according to the following:

exponent_t desired_exp = ...; // Value known a priori
right_shift_t new_b_shr = b_shr + (desired_exp - a_exp);

When applying the above adjustment, the following conditions should be maintained:

  • b_hr + b_shr >= 0

Be aware that using smaller values than strictly necessary for b_shr can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.

Parameters:
  • a_exp[out] Exponent associated with output mantissa \(a\)

  • b_shr[out] Signed arithmetic right-shift for \(\bar b\) used by vect_complex_s32_sum()

  • b_exp[in] Exponent associated with input mantissa vector \(\bar b\)

  • b_hr[in] Headroom of input mantissa vector \(\bar b\)

  • length[in] Number of elements in \(\bar b\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Vector API$$$32-Bit Vector Chunk (8-Element) API£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/vect/chunk_s32.html#bit-vector-chunk-8-element-api
group chunk32_api

Functions

int32_t chunk_s32_dot(const int32_t b[VPU_INT32_EPV], const q2_30 c[VPU_INT32_EPV])

Compute the inner product between two vector chunks.

This function computes the inner product of two vector chunks, \(\bar b\) and \(\bar c\).

Conceptually, elements of \(\bar b\) may have any number of fractional bits (int, fixed-point, mantissas of a BFP vector) so long as they’re all the same. Elements of \(\bar c\) are Q2.30 fixed-point values. Given that, the returned value \(a\) will have the same number of fractional bits as \(\bar b\).

Only the lowest 32 bits of the sum \(a\) are returned.

Operation Performed

\[\begin{aligned} & a \leftarrow \sum_{k=0}^{\mathtt{VPU\_INT32\_EPV}-1} \left( round\left( \frac{b_k\cdot{}c_k}{2^{30}} \right) \right) \end{aligned}\]

Parameters:
  • b[in] Input chunk \(\bar b\)

  • c[in] Input chunk \(\bar c\)

Returns:

\(a\)

void chunk_s32_log(q8_24 a[VPU_INT32_EPV], const int32_t b[VPU_INT32_EPV], const exponent_t b_exp)

Compute the natural log of a vector chunk of 32-bit values.

This function computes the natural logarithm of each of the 8 elements in vector chunk \(\bar b\). The result is returned as an 8-element chunk \(\bar a\) of Q8.24 values.

b_exp is the exponent associated with elements of \(\bar b\).

Any input \(b_k \le 0\) will result in a corresponding output \(a_k = \mathtt{INT32_MIN}\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \ \begin{cases} log(b_k\cdot{}2^{\mathtt{b\_exp}}) & b_k > 0 \\ \mathtt{INT32\_MIN} & \text{otherwise} \\ \end{cases} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)

  • b_exp[in] Exponent associated with \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void chunk_float_s32_log(q8_24 a[VPU_INT32_EPV], const float_s32_t b[VPU_INT32_EPV])

Compute the natural log of a vector chunk of float_s32_t.

This function computes the natural logarithm of each of the VPU_INT32_EPV elements in vector chunk \(\bar b\). The result is returned as an 8-element chunk \(\bar a\) of Q8.24 values.

Any input \(b_k \le 0\) will result in a corresponding output \(a_k = \mathtt{INT32_MIN}\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \ \begin{cases} log(b_k) & b_k > 0 \\ \mathtt{INT32\_MIN} & \text{otherwise} \\ \end{cases} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)

Throws ET_LOAD_STORE:

Raised if `b` or `a` is not double word-aligned (See Note: Vector Alignment)

void chunk_q30_power_series(int32_t a[VPU_INT32_EPV], const q2_30 b[VPU_INT32_EPV], const int32_t c[], const unsigned term_count)

Compute a power series on a vector chunk of Q2.30 values.

This function is used to compute a power series summation on a vector chunk (VPU_INT32_EPV-element vector) \(\bar b\). \(\bar b\) contains Q2.30 values. \(\bar c\) is a vector containing coefficients to be multiplied by powers of \(\bar b\), and may have any associated exponent. The output is vector chunk \(\bar a\) and has the same exponent as \(\bar c\).

c[] is an array with shape (term_count, VPU_INT32_EPV), where the second axis contains the same value replicated across all VPU_INT32_EPV elements. That is, c[k][i] = c[k][j] for i and j in 0..(VPU_INT32_EPV-1). This is for performance reasons. (For the purpose of this explanation, \(\bar c\) is considered to be single-dimensional, without redundancy.)

Operation Performed

\[\begin{split}\begin{aligned} & b_{k,0} = 2^{30} \\ & b_{k,i} = round\left(\frac{b_{k,i-1}\cdot{}b_k}{2^{30}}\right) \\ & \qquad\text{for }i \in {1..(N-1)} \\ & a_k \leftarrow \sum_{i=0}^{N-1} round\left( \frac{b_{k,i}\cdot c_i}{2^{30}} \right) \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)

  • c[in] Coefficient vector \(\bar c\)

  • term_count[in] Number of power series terms, \(N\)

void chunk_q30_exp_small(q2_30 a[VPU_INT32_EPV], const q2_30 b[VPU_INT32_EPV])

Compute \(e^b\) on a vector chunk of Q2.30 values.

This function computes \(e^{b_k}\) for each element of a vector chunk (VPU_INT32_EPV-element vector) \(\bar b\) of Q2.30 values near \(0\). The result is computed using the power series approximation of \(e^x\) near zero. It is recommended that this function only be used for \( -0.5 \le b_k\cdot{}2^{-30} \le 0.5\).

The output vector chunk \(\bar a\) is also in a Q2.30 format.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow e^{b_k\cdot{}2^{-30}} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}} \end{aligned}\end{split}\]

Parameters:
  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Q-format macros£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/q_format.html#q-format-macros

group qfmt_macros

Defines

F(N)

Convert fixed-point value to double-precision float.

This macro is meant to allow for parameterized access to the more specific conversion macros, such as F8(), F24(), F31() and so on. Being parameterized allows the user to specify the Q-format (fractional bit count) using another macro. For example:

#define X_FRAC_BITS   24
int32_t x = ...;
...
// Convert x to double
double dbl_x = F(X_FRAC_BITS)(x);

Q(N)

Convert floating-point value to fixed-point value.

This macro is meant to allow for parameterized access to the more specific conversion macros, such as Q8(), Q24(), Q31() and so on. Being parameterized allows the user to specify the Q-format (fractional bit count) using another macro. For example:

#include <math.h>
...
#define PI_FRAC_BITS   24
int32_t x = Q(PI_FRAC_BITS)(M_PI);

Q31(f)

Convert double value to Q1.31 fixed-point value, with rounding.

Q30(f)

Convert double value to Q2.30 fixed-point value, with rounding.

Q29(f)

Convert double value to Q3.29 fixed-point value, with rounding.

Q28(f)

Convert double value to Q4.28 fixed-point value, with rounding.

Q27(f)

Convert double value to Q5.27 fixed-point value, with rounding.

Q26(f)

Convert double value to Q6.26 fixed-point value, with rounding.

Q25(f)

Convert double value to Q7.25 fixed-point value, with rounding.

Q24(f)

Convert double value to Q8.24 fixed-point value, with rounding.

Q23(f)

Convert double value to Q9.23 fixed-point value, with rounding.

Q22(f)

Convert double value to Q10.22 fixed-point value, with rounding.

Q21(f)

Convert double value to Q11.21 fixed-point value, with rounding.

Q20(f)

Convert double value to Q12.20 fixed-point value, with rounding.

Q19(f)

Convert double value to Q13.19 fixed-point value, with rounding.

Q18(f)

Convert double value to Q14.18 fixed-point value, with rounding.

Q17(f)

Convert double value to Q15.17 fixed-point value, with rounding.

Q16(f)

Convert double value to Q16.16 fixed-point value, with rounding.

Q15(f)

Convert double value to Q17.15 fixed-point value, with rounding.

Q14(f)

Convert double value to Q18.14 fixed-point value, with rounding.

Q13(f)

Convert double value to Q19.13 fixed-point value, with rounding.

Q12(f)

Convert double value to Q20.12 fixed-point value, with rounding.

Q11(f)

Convert double value to Q21.11 fixed-point value, with rounding.

Q10(f)

Convert double value to Q22.10 fixed-point value, with rounding.

Q9(f)

Convert double value to Q23.9 fixed-point value, with rounding.

Q8(f)

Convert double value to Q24.8 fixed-point value, with rounding.

F31(x)

Convert Q1.31 fixed-point value to double value.

F30(x)

Convert Q2.30 fixed-point value to double value.

F29(x)

Convert Q3.29 fixed-point value to double value.

F28(x)

Convert Q4.28 fixed-point value to double value.

F27(x)

Convert Q5.27 fixed-point value to double value.

F26(x)

Convert Q6.26 fixed-point value to double value.

F25(x)

Convert Q7.25 fixed-point value to double value.

F24(x)

Convert Q8.24 fixed-point value to double value.

F23(x)

Convert Q9.23 fixed-point value to double value.

F22(x)

Convert Q10.22 fixed-point value to double value.

F21(x)

Convert Q11.21 fixed-point value to double value.

F20(x)

Convert Q12.20 fixed-point value to double value.

F19(x)

Convert Q13.19 fixed-point value to double value.

F18(x)

Convert Q14.18 fixed-point value to double value.

F17(x)

Convert Q15.17 fixed-point value to double value.

F16(x)

Convert Q16.16 fixed-point value to double value.

F15(x)

Convert Q17.15 fixed-point value to double value.

F14(x)

Convert Q18.14 fixed-point value to double value.

F13(x)

Convert Q19.13 fixed-point value to double value.

F12(x)

Convert Q20.12 fixed-point value to double value.

F11(x)

Convert Q21.11 fixed-point value to double value.

F10(x)

Convert Q22.10 fixed-point value to double value.

F9(x)

Convert Q23.9 fixed-point value to double value.

F8(x)

Convert Q24.8 fixed-point value to double value.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Util functions and macros£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/utils.html#util-functions-and-macros

group util_macros

Defines

MAX(A, B)

Takes the greater of arguments A and B, preferring A on equality.

Note

This is not safe from multiple evaluation of arguments.

Parameters:
  • A[in] First input

  • B[in] Second input

Returns:

Maximum of the inputs.

MIN(A, B)

Takes the lesser of arguments A and B, preferring A on equality.

Note

This is not safe from multiple evaluation of arguments.

Parameters:
  • A[in] First input

  • B[in] Second input

Returns:

Minimum of the inputs.

CLS_S8(X)

Count leading sign bits of an int8_t.

Parameters:
  • X[in] Input

Returns:

Leading sign bits of X

CLS_S16(X)

Count leading sign bits of an int16_t.

Parameters:
  • X[in] Input

Returns:

Leading sign bits of X

CLS_S32(X)

Count leading sign bits of an int32_t.

Parameters:
  • X[in] Input

Returns:

Leading sign bits of X

CLS_S64(X)

Count leading sign bits of an int64_t.

Parameters:
  • X[in] Input

Returns:

Leading sign bits of X

CLS_C16(X)

Count leading sign bits of a complex_s16_t.

The number of leading sign bits for a complex integer is defined as the minimum of the number of leading sign bits for its real part and for its imaginary part.

Parameters:
  • X[in] Input

Returns:

Leading sign bits of X

CLS_C32(X)

Count leading sign bits of a complex_s32_t.

The number of leading sign bits for a complex integer is defined as the minimum of the number of leading sign bits for its real part and for its imaginary part.

Parameters:
  • X[in] Input

Returns:

Leading sign bits of X

HR_S64(X)

Get the headroom of an int64_t.

Parameters:
  • X[in] Input

Returns:

Headroom of X

HR_S32(X)

Get the headroom of an int32_t.

Parameters:
  • X[in] Input

Returns:

Headroom of X

HR_S16(X)

Get the headroom of an int16_t.

Parameters:
  • X[in] Input

Returns:

Headroom of X

HR_S8(X)

Get the headroom of an int8_t.

Parameters:
  • X[in] Input

Returns:

Headroom of X

HR_C32(X)

Get the headroom of a complex_s32_t.

The headroom of a complex N-bit integer is the minimum of the headroom of each of its N-bit real and imaginary parts.

Parameters:
  • X[in] Input

Returns:

Headroom of X

HR_C16(X)

Get the headroom of a complex_s16_t.

The headroom of a complex N-bit integer is the minimum of the headroom of each of its N-bit real and imaginary parts.

Parameters:
  • X[in] Input

Returns:

Headroom of X

Functions

void xs3_memcpy(void *dst, const void *src, unsigned bytes)

VPU-based memcpy implementation.

Same as standard memcpy() except for an extra constraint that both dst andsrc` must be word-aligned addresses.

Parameters:
  • dst[out] Destination address

  • src[in] Source address

  • bytes[in] Number of bytes to copy

static inline unsigned cls(const int32_t a)

Count leading sign bits of int32_t.

This function returns the number of most-significant bits in a which are equal to its sign bit.

Note

This is the total number of leading sign bits, not redundant leading sign bits.

Parameters:
  • a[in] Input value

Returns:

Number of leading sign bits of a

static inline unsigned n_bitrev(const unsigned index, const unsigned bits)

Reverse the bits of an integer.

This function returns takes an integer index and reverses the bits least-significant bits to form a new integer which is returned. All more significant bits are ignored.

This is useful for algorithms, such as the FFT, whose implementation requires reordering of elements by reversing the bits of the indices.

Parameters:
  • index[in] Input value

  • bits[in] The number of least-significant bits to reverse.

Returns:

The bits LSb’s of index reversed.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Configuration£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/config_options.html#library-configuration

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Configuration$$$Configuration Options£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/config_options.html#configuration-options
group config_options

Defines

XMATH_BFP_DEBUG_CHECK_LENGTHS

Indicates whether the BFP functions should check vector lengths for errors.

Iff true, BFP functions will check (assert()) to ensure that each BFP vector argument does not violate any length constraints. Most often this simply ensures that, where BFP functions take multiple vectors as parameters, each of the vectors has the same length.

Defaults to false (0).

XMATH_BFP_SQRT_DEPTH_S16

The number of most significant bits which are computed by bfp_s16_sqrt().

The function bfp_sqrt_s16() computes results one bit at a time, starting with bit 14 (the second-to-most significant bit). Because this is a relatively expensive operation, it may be desirable to trade off precision of results for a speed-up.

The time cost of bfp_sqrt_s16() is approximately linear with respect to the depth.

Defaults to VECT_SQRT_S16_MAX_DEPTH (15)

See also

bfp_s16_sqrt

XMATH_BFP_SQRT_DEPTH_S32

The number of most significant bits which are computed by bfp_s32_sqrt().

The function bfp_sqrt_s32() computes results one bit at a time, starting with bit 30 (the second-to-most significant bit). Because this is a relatively expensive operation, it may be desirable to trade off precision of results for a speed-up.

The time cost of bfp_sqrt_s32() is approximately linear with respect to the depth.

Defaults to VECT_SQRT_S32_MAX_DEPTH (31)

See also

bfp_s32_sqrt

XMATH_MALLOC

Function used to dynamically allocate memory.

This function is used to dynamically allocate memory. Defaults to malloc. Must have same signature as malloc()

See also

XMATH_FREE

XMATH_FREE

Function use to free dynamically allocated memory.

This function is used to deallocate dynamically allocated memory. Defaults to free. Must have same signature as free()

See also

XMATH_MALLOC

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Notes£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/notes.html#library-notes

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Notes$$$Note: Vector Alignment£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/notes.html#note-vector-alignment
page note_vector_alignment

This library makes use of the XMOS architecture’s vector processing unit (VPU). All loads and stores to and from the XS3 VPU have the requirement that the loaded/stored addresses must be aligned to a 4-byte boundary (word-aligned).

In the current version of the API, this leads to the requirement that most API functions require vectors (or the data backing a BFP vector) to begin at word-aligned addresses. Vectors are not required, however, to have a size (in bytes) that is a multiple of 4.

Writing Alignment-safe Code

The alignment requirement is ultimately always on the data that backs a vector. For the low-level API, that is the pointers passed to the functions themselves. For the high-level API, that is the memory to which the data field (or the real and imag fields in the case of bfp_complex_s16_t) points, specified when the BFP vector is initialized.

Arrays of type int32_t and complex_s32_t will normally be guaranteed to be word-aligned by the compiler. However, if the user manually specifies the beginning of an int32_t array, as in the following..

uint8_t byte_buffer[100];
int32_t* integer_array = (int32_t*) &byte_buffer[1];

.. the vector may not be word-aligned. It is the responsibility of the user to ensure proper alignment of data.

For int16_t arrays, the compiler does not by default guarantee that the array starts on a word-aligned address. To force word-alignment on arrays of this type, use __attribute__((aligned (4))) in the variable definition, as in the following.

int16_t __attribute__((aligned (4))) data[100];

Occasionally, 8-byte (double word) alignment is required. In this case, neither int32_t nor int16_t is necessarily guaranteed to align as required. Similar to the above, this can be hinted to the compiler as in the following.

int32_t __attribute__((aligned (8))) data[100];

This library also provides the macros WORD_ALIGNED and DWORD_ALIGNED which force 4- and 8-byte alignment respectively as above.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Notes$$$Note: Symmetrically Saturating Arithmetic£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/notes.html#note-symmetrically-saturating-arithmetic
page note_symmetric_saturation

With ordinary integer arithmetic the block floating-point logic chooses exponents and operand shifts to prevent integer overflow with worst-case input values. However, the XS3 VPU uses symmetrically saturating integer arithmetic.

Saturating arithmetic is that where partial results of the applied operation use a bit depth greater than the output bit depth, and values that can’t be properly expressed with the output bit depth are set to the nearest expressible value.

For example, in ordinary C integer arithmetic, a function which multiplies two 32-bit integers may internally compute the full 64-bit product and then clamp values to the range (INT32_MIN, INT32_MAX) before returning a 32-bit result.

Symmetrically saturating arithmetic also includes the property that the lower bound of the expressible range is the negative of the upper bound of the expressible range.

One of the major troubles with non-saturating integer arithmetic is that in a twos complement encoding, there exists a non-zero integer (e.g. INT16_MIN in 16-bit twos complement arithmetic) value \(x\) for which \(-1 \cdot x = x\). Serious arithmetic errors can result when this case is not accounted for.

One of the results of symmetric saturation, on the other hand, is that there is a corner case where (using the same exponent and shift logic as non-saturating arithmetic) saturation may occur for a particular combination of input mantissas. The corner case is different for different operations.

When the corner case occurs, the minimum (and largest magnitude) value of the resulting vector is 1 LSb greater than its ideal value (e.g. -0x3FFF instead of -0x4000 for 16-bit arithmetic). The error in this output element’s mantissa is then 1 LSb, or \(2^p\), where \(p\) is the exponent of the resulting BFP vector.

Of course, the very nature of BFP arithmetic routinely involves errors of this magnitude.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Notes$$$Note: Spectrum Packing£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/notes.html#note-spectrum-packing
page note_spectrum_packing

In its general form, the \(N\)-point Discrete Fourier Transform is an operation applied to a complex \(N\)-point signal \(x[n]\) to produce a complex spectrum \(X[f]\). Any spectrum \(X[f]\) which is the result of a \(N\)-point DFT has the property that \(X[f+N] = X[f]\). Thus, the complete representation of the \(N\)-point DFT of \(X[n]\) requires \(N\) complex elements.

Complex DFT and IDFT

In this library, when performing a complex DFT (e.g. using fft_bfp_forward_complex()), the spectral representation that results in a straight-forward mapping:

X[f] \(\longleftarrow X[f]\) for \(0 \le f < N\)

where X is an \(N\)-element array of complex_s32_t, where the real part of \(X[f]\) is in X[f].re and the imaginary part in X[f].im.

Likewise, when performing an \(N\)-point complex inverse DFT, that is also the representation that is expected.

Real DFT and IDFT

Oftentimes we instead wish to compute the DFT of real signals. In addition to the periodicity property ( \(X[f+N] = X[f]\)), the DFT of a real signal also has a complex conjugate symmetry such that \(X[-f] = X^*[f]\), where \(X^*[f]\) is the complex conjugate of \(X[f]\). This symmetry makes it redundant (and thus undesirable) tostore such symmetric pairs of elements. This would allow us to get away with only explicitly storing \(X[f\) for \(0 \le f \le N/2\) in \((N/2)+1\) complex elements.

Unfortunately, using such a representation has the undesirable property that the DFT of an \(N\)-point real signal cannot be computed in-place, as the representation requires more memory than we started with.

However, if we take the periodicity and complex conjugate symmetry properties together:

\[\begin{split} & X[0] = X^*[0] \rightarrow Imag\{X[0]\} = 0 \\ & X[-(N/2) + N] = X[N/2] \\ & X[-N/2] = X^*[N/2] \rightarrow X[N/2] = X^*[N/2] \rightarrow Imag \{ X[N/2] \} = 0 \end{split}\]

Because both \(X[0]\) and \(X[N/2]\) are guaranteed to be real, we can recover the benefit of in-place computation in our representation by packing the real part of \(X[N/2]\) into the imaginary part of \(X[0]\).

Therefore, the functions in this library that produce the spectra of real signals (such as fft_bfp_forward_mono() and fft_bfp_forward_stereo()) will pack the spectra in a slightly less straight-forward manner (as compared with the complex DFTs):

X[f] \(\longleftarrow X[f]\) for \(1 \le f < N/2\)

X[0] \(\longleftarrow X[0] + j X[N/2]\)

where X is an \(N/2\)-element array of complex_s32_t.

Likewise, this is the encoding expected when computing the \(N\)-point inverse DFT, such as by fft_bfp_inverse_mono() or fft_bfp_inverse_stereo().

Note

One additional note, when performing a stereo DFT or inverse DFT, so as to preserve the in-place computation of the result, the spectra of the two signals will be encoded into adjacent blocks of memory, with the second spectrum (i.e. associated with ‘channel b’) occupying the higher memory address.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Notes$$$Note: Library FFT Length Support£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/notes.html#note-library-fft-length-support
page fft_length_support

When computing DFTs this library relies on one or both of a pair of look-up tables which contain portions of the Discrete Fourier Transform matrix. Longer FFT lengths require larger look-up tables. When building using CMake, the maximum FFT length can be specified as a CMake option, and these tables are auto-generated at build time.

If not using CMake, you can manually generate these files using a python script included with the library. The script is located at lib_xcore_math/python/gen_fft_table.py. If generated manually, you must add the generated .c file as a source, and the directory containing xmath_fft_lut.h must be added as an include directory when compiling the library’s files.

Note that the header file must be named xmath_fft_lut.h as it is included via #include "xmath_fft_lut.h".

By default the tables contain the coefficients necessary to perform forward or inverse DFTs of up to 1024 points. If larger DFTs are required, or if the maximum required DFT size is known to be less than 1024 points, the MAX_FFT_LEN_LOG2 CMake option can be modified from its default value of 10.

The two look-up tables correspond to the decimation-in-time and decimation-in-frequency FFT algorithms, and the run-time symbols for the tables are xmath_dit_fft_lut and xmath_dif_fft_lut respectively. Each table contains \(N-4\) complex 32-bit values, with a size of \(8\cdot (N-4)\) bytes each.

To manually regenerate the tables for amaximum FFT length of \(16384\) ( \(=2^{14}\)), supporting only the decimation-in-time algorithm, for example, use the following:

python lib_xcore_math/script/gen_fft_table.py --dit --max_fft_log2 14

Use the --help flag with gen_fft_table.py for a more detailed description of its syntax and parameters.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$API Reference$$$Library Notes$$$Note: Digital Filter Conversion£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/reference/notes.html#note-digital-filter-conversion
page filter_conversion

This library supports optimized implementations of 16- and 32-bit FIR filters, as well as cascaded 32-bit biquad filters. Each of these filter implementations requires that the filter coefficients be represented in a compatible form.

To assist with that, several python scripts are distributed with this library which can be used to convert existing floating-point filter coefficients into a code which is easily callable from within an xCore application.

Each script reads in floating-point filter coefficients from a file and computes a new representation for the filter with coefficients which attempt to maximize precision and are compatible with the lib_xcore_math filtering API.

Each script outputs two files which can be included in your own xCore application. The first output is a C source (.c) file containing the computed filter parameters and several function definitions for initializing and executing the generated filter. The second output is a C header (.h) file which can be #included into your own application to give access to those functions.

Additionally, each script also takes a user-provided filter name as an input parameter. The output files (as well as the function names within) include the filter name so that more than one filter can be generated and executed using this mechanism.

As an example, take the following command to generate a 32-bit FIR filter:

python lib_xcore_math/script/gen_fir_filter_s32.py MyFilter filter_coefs.txt

This command creates a filter named “MyFilter”, with coefficients taken from a file filter_coefs.txt. Two output files will be generated, MyFilter.c and MyFilter.h. Including MyFilter.h provides access to 3 functions, MyFilter_init(), MyFilter_add_sample(), and MyFilter() which correspond to the library functions filter_fir_s32_init(), filter_fir_s32_add_sample() and filter_fir_s32() respectively.

Use the --help flag with the scripts for more detailed descriptions of inputs and other options.

Filter Type

Script

32-bit FIR

lib_xcore_math/script/gen_fir_filter_s32.py

16-bit FIR

lib_xcore_math/script/gen_fir_filter_s16.py

32-bit Biquad

lib_xcore_math/script/gen_biquad_filter_s32.py

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Example Applications£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/examples.html#example-applications

Several example applications are offered to demonstrate use of the lib_xcore_math APIs through simple code examples.

  • app_bfp_demo - Demonstration of the block floating-point arithmetic API

  • app_vect_demo - Demonstration of the low-level vectorized arithmetic API

  • app_fft_demo - Demonstration of the Fast Fourier Transform API

  • app_filter_demo - Demonstration of the filtering API

This section assumes you have downloaded and installed the XMOS XTC tools (see README for required version). Installation instructions can be found here.

Particular attention should be paid to the section Installation of required third-party tools.

The application examples uses the xcommon-cmake build system as bundled with the XTC tools.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Example Applications$$$Building Examples£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/examples.html#building-examples

To build the applications, from an XTC command prompt run the following commands in the lib_xcore_math/examples directory:

cmake -B build -G "Unix Makefiles"
xmake -C build

Individual examples can be built using a command similar to the following:

xmake -C build EXAMPLE_NAME

where EXAMPLE_NAME is the example to build.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Example Applications$$$Running Examples£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/examples.html#running-examples

Once built, the example EXAMPLE_NAME can be run on the XK-EVK-XU316 board using the following command:

xrun --xscope examples/EXAMPLE_NAME/bin/EXAMPLE_NAME.xe

For instance, to run the bfp_demo example, use:

xrun --xscope examples/app_bfp_demo/bin/app_bfp_demo.xe

To run the example using the xcore simulator instead, use:

xsim examples/EXAMPLE_NAME/bin/EXAMPLE_NAME.xe

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Example Applications$$$app_bfp_demo£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/examples.html#app-bfp-demo

The purpose of this example application is to demonstrate how the arithmetic functions of lib_xcore_math’s block floating-point API may be used.

In it, three 32-bit BFP vectors are allocated, initialized and filled with random data. Then several BFP operations are applied using those vectors as inputs and/or outputs.

The example only demonstrates the real 32-bit arithmetic BFP functions (that is, functions with names bfp_s32_*). The real 16-bit (bfp_s16_*), complex 32-bit (bfp_complex_s32_*) and complex 16-bit (bfp_complex_s16_*) functions all use similar naming conventions.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Example Applications$$$app_vect_demo£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/examples.html#app-vect-demo

The purpose of this example application is to demonstrate how the arithmetic functions of lib_xcore_math’s lower-level vector API may be used.

In general the low-level arithmetic API are the functions in this library whose names begin with vect_*, such as vect_s32_mul() for element-wise multiplication of 32-bit vectors, and vect_complex_s16_scale() for multiplying a complex 16-bit vector by a complex scalar.

We assume that where the low-level API is being used it is because some behavior other than the default behavior of the high-level block floating-point API is required. Given that, rather than showcasing the breadth of operations available, this example examines first how to achieve comparable behavior to the BFP API, and then ways in which that behavior can be modified.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Example Applications$$$app_fft_demo£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/examples.html#app-fft-demo

The purpose of this example application is to demonstrate how the FFT functions of lib_xcore_math’s block floating-point API may be used.

In this example we demonstrate each of the offered forward and inverse FFTs of the BFP API.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Example Applications$$$app_filter_demo£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/examples.html#app-filter-demo

The purpose of this example application is to demonstrate how the functions of lib_xcore_math’s filtering vector API may be used.

The filtering API currently supports three different filter types:

  • 32-bit FIR Filter

  • 16-bit FIR Filter

  • 32-bit Biquad Filter

This example application presents simple demonstrations of how to use each of these filter types.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Unit tests£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/tests.html#unit-tests

This project uses XCommon CMake to build the unit tests in a similar fashion to the examples.

Unit tests target the XK-EVK-XU316 board and x86 platforms. All unit tests are located in the /tests/ directory:

All unit tests and examples are built and executed in a similar manner. The following shows how to do this with the BFP unit tests.

XCORE ® -VOICE Solutions$$$lib_xcore_math: xcore optimised math$$$Unit tests$$$BFP unit tests£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/tests.html#bfp-unit-tests

This application runs unit tests for the various 16- and 32-bit BFP vectorized arithmetic functions. This application is located at /tests/bfp_tests/.

To build the test, from an XTC command prompt run the following commands in the lib_xcore_math/tests/bfp_tests directory:

cmake -B build -G "Unix Makefiles"
xmake -C build

To execute the BFP unit tests on the XK-EVK-XU316, use the following (after ensuring that the hardware is connected and drivers properly installed):

xrun --xscope bin/bfp_tests.xe

Or, to run the unit tests in the software simulator:

xsim bin/bfp_tests.xe

Warning

Running the unit tests in the simulator may be very slow.

To execute the BFP unit tests built for an x86 host platform, configure the build using the NATIVE_BUILD option:

cmake -B build_x86 -G "Unix Makefiles" -D BUILD_NATIVE=TRUE
xmake -C build_x86

on Linux and macOS run the tests as follows:

bin/bfp_tests/bfp_tests -v

and on Windows:

bin\bfp_tests\bfp_tests.exe -v

where -v is an optional argument to increase verbosity.

XCORE ® -VOICE Solutions$$$Indices and tables£££modules/core/modules/xcore_math/lib_xcore_math/doc/rst/src/tests.html#indices-and-tables