Library Notes¶
Note: Vector Alignment¶
This library utilizes the XMOS XS3 architecture’s vector processing unit (VPU). All loads and stores to and from the XS3 VPU have the requirement that the loaded/stored addresses must be aligned to a 4byte boundary (wordaligned).
In the current version of the API, this leads to the requirement that most API functions require vectors (or the data backing a BFP vector) to begin at wordaligned addresses. Vectors are not required, however, to have a size (in bytes) that is a multiple of 4.
The alignment requirement is ultimately always on the data that backs a vector. For the lowlevel API, that is the pointers passed to the functions themselves. For the highlevel API, that is the memory to which the data
field (or the real
and imag
fields in the case of bfp_complex_s16_t
) points, specified when the BFP vector is initialized.
Arrays of type int32_t
and complex_s32_t
will normally be guaranteed to be wordaligned by the compiler. However, if the user manually specifies the beginning of an int32_t
array, as in the following..
uint8_t byte_buffer[100];
int32_t* integer_array = (int32_t*) &byte_buffer[1];
.. it is the responsibility of the user to ensure proper alignment of data.
For int16_t
arrays, the compiler does not by default guarantee that the array starts on a wordaligned address. To force wordalignment on arrays of this type, use __attribute__((align 4))
in the variable definition, as in the following.
int16_t __attribute__((align 4)) data[100];
Occasionally, 8byte (double word) alignment is required. In this case, neither int32_t
nor int16_t
is necessarily guaranteed to align as required. Similar to the above, this can be hinted to the compiler as in the following.
int32_t __attribute__((align 8)) data[100];
Note: Symmetrically Saturating Arithmetic¶
With ordinary integer arithmetic the block floatingpoint logic chooses exponents and operand shifts to prevent integer overflow with worstcase input values. However, the XS3 VPU utilizes symmetrically saturating integer arithmetic.
Saturating arithmetic is that where partial results of the applied operation use a bit depth greater than the output bit depth, and values that can’t be properly expressed with the output bit depth are set to the nearest expressible value.
For example, in ordinary C integer arithmetic, a function which multiplies two 32bit integers may internally compute the full 64bit product and then clamp values to the range (INT32_MIN, INT32_MAX)
before returning a 32bit result.
Symmetrically saturating arithmetic also includes the property that the lower bound of the expressible range is the negative of the upper bound of the expressible range.
One of the major troubles with nonsaturating integer arithmetic is that in a twos complement encoding, there exists a nonzero integer (e.g. INT16_MIN in 16bit twos complement arithmetic) value \(x\) for which \(1 \cdot x = x\). Serious arithmetic errors can result when this case is not accounted for.
One of the results of symmetric saturation, on the other hand, is that there is a corner case where (using the same exponent and shift logic as nonsaturating arithmetic) saturation may occur for a particular combination of input mantissas. The corner case is different for different operations.
When the corner case occurs, the minimum (and largest magnitude) value of the resulting vector is 1 LSb greater than its ideal value (e.g. 0x3FFF
instead of 0x4000
for 16bit arithmetic). The error in this output element’s mantissa is then 1 LSb, or \(2^p\), where \(p\) is the exponent of the resulting BFP vector.
Of course, the very nature of BFP arithmetic routinely involves errors of this magnitude.
Note: Spectrum Packing¶
In its general form, the \(N\)point Discrete Fourier Transform is an operation applied to a complex \(N\)point signal \(x[n]\) to produce a complex spectrum \(X[f]\). Any spectrum \(X[f]\) which is the result of a \(N\)point DFT has the property that \(X[f+N] = X[f]\). Thus, the complete representation of the \(N\)point DFT of \(X[n]\) requires \(N\) complex elements.
In this library, when performing a complex DFT (e.g. using bfp_fft_forward_complex()), the spectral representation that results in a straightforward mapping:
X[f]
\(\longleftarrow X[f]\) for \(0 \le f \lt N\)
where X
is an \(N\)element array of complex_s32_t
, where the real part of \(X[f]\) is in X[f].re
and the imaginary part in X[f].im
.
Likewise, when performing an \(N\)point complex inverse DFT, that is also the representation that is expected.
Oftentimes we instead wish to compute the DFT of real signals. In addition to the periodicity property ( \(X[f+N] = X[f]\)), the DFT of a real signal also has a complex conjugate symmetry such that \(X[f] = X^*[f]\), where \(X^*[f]\) is the complex conjugate of \(X[f]\). This symmetry makes it redundant (and thus undesirable) tostore such symmetric pairs of elements. This would allow us to get away with only explicitly storing \(X[f\) for \(0 \le f \le N/2\) in \((N/2)+1\) complex elements.
Unfortunately, using such a representation has the undesirable property that the DFT of an \(N\)point real signal cannot be computed inplace, as the representation requires more memory than we started with.
However, if we take the periodicity and complex conjugate symmetry properties together:
Because both \(X[0]\) and \(X[N/2]\) are guaranteed to be real, we can recover the benefit of inplace computation in our representation by packing the real part of \(X[N/2]\) into the imaginary part of \(X[0]\).
Therefore, the functions in this library that produce the spectra of real signals (such as bfp_fft_forward_mono() and bfp_fft_forward_stereo()) will pack the spectra in a slightly less straightforward manner (as compared with the complex DFTs):
X[f]
\(\longleftarrow X[f]\) for \(1 \le f \lt N/2\)
X[0]
\(\longleftarrow X[0] + j X[N/2]\)
where X
is an \(N/2\)element array of complex_s32_t
.
Likewise, this is the encoding expected when computing the \(N\)point inverse DFT, such as by bfp_fft_inverse_mono() or bfp_fft_inverse_stereo().
Note
One additional note, when performing a stereo DFT or inverse DFT, so as to preserve the inplace computation of the result, the spectra of the two signals will be encoded into adjacent blocks of memory, with the second spectrum (i.e. associated with ‘channel b’) occupying the higher memory address.
Note: Library FFT Length Support¶
When computing DFTs this library relies on one or both of a pair of lookup tables which contain portions of the Discrete Fourier Transform matrix. Longer FFT lengths require larger lookup tables. When building using CMake, the maximum FFT length can be specified as a CMake option, and these tables are autogenerated at build time.
If not using CMake, you can manually generate these files using a python script included with the library. The script is located at lib_xs3_math/scripts/gen_fft_table.py
. If generated manually, you must add the generated .c file as a source, and the directory containing xs3_fft_lut.h
must be added as an include directory when compiling the library’s files.
Note that the header file must be named xs3_fft_lut.h
as it is included via #include "xs3_fft_lut.h"
.
By default the tables contain the coefficients necessary to perform forward or inverse DFTs of up to 1024 points. If larger DFTs are required, or if the maximum required DFT size is known to be less than 1024 points, the MAX_FFT_LEN_LOG2
CMake option can be modified from its default value of 10
.
The two lookup tables correspond to the decimationintime and decimationinfrequency FFT algorithms, and the runtime symbols for the tables are xs3_dit_fft_lut
and xs3_dif_fft_lut
respectively. Each table contains \(N4\) complex 32bit values, with a size of \(8\cdot (N4)\) bytes each.
To manually regenerate the tables for amaximum FFT length of \(16384\) ( \(=2^{14}\)), supporting only the decimationintime algorithm, for example, use the following:
python lib_xs3_math/script/gen_fft_table.py dit max_fft_log2 14
Use the help
flag with gen_fft_table.py
for a more detailed description of its syntax and parameters.
Note: Digital Filter Conversion¶
This library supports optimized implementations of 16 and 32bit FIR filters, as well as cascaded 32bit biquad filters. Each of these filter implementations requires that the filter coefficients be represented in a compatible form.
To assist with that, several python scripts are distributed with this library which can be used to convert existing floatingpoint filter coefficients into a code which is easily callable from within an xCore application.
Each script reads in floatingpoint filter coefficients from a file and computes a new representation for the filter with coefficients which attempt to maximize precision and are compatible with the lib_xs3_math
filtering API.
Each script outputs two files which can be included in your own xCore application. The first output is a C source (.c
) file containing the computed filter parameters and several function definitions for initializing and executing the generated filter. The second output is a C header (.h
) file which can be #include
d into your own application to give access to those functions.
Additionally, each script also takes a userprovided filter name as an input parameter. The output files (as well as the function names within) include the filter name so that more than one filter can be generated and executed using this mechanism.
As an example, take the following command to generate a 32bit FIR filter:
python lib_xs3_math/script/gen_fir_filter_s32.py MyFilter filter_coefs.txt
This command creates a filter named “MyFilter”, with coefficients taken from a file filter_coefs.txt
. Two output files will be generated, MyFilter.c
and MyFilter.h
. Including MyFilter.h
provides access to 3 functions, MyFilter_init()
, MyFilter_add_sample()
, and MyFilter()
which correspond to the library functions xs3_filter_fir_s32_init()
, xs3_filter_fir_s32_add_sample()
and xs3_filter_fir_s32()
respectively.
Use the help
flag with the scripts for more detailed descriptions of inputs and other options.
Filter Type 
Script 

32bit FIR 

16bit FIR 

32bit Biquad 
