32-Bit Block Floating-Point API#

group bfp_s32_api

Functions

void bfp_s32_init(bfp_s32_t *a, int32_t *data, const exponent_t exp, const unsigned length, const unsigned calc_hr)#

Initialize a 32-bit BFP vector.

This function initializes each of the fields of BFP vector a.

data points to the memory buffer used to store elements of the vector, so it must be at least length * 4 bytes long, and must begin at a word-aligned address.

exp is the exponent assigned to the BFP vector. The logical value associated with the kth element of the vector after initialization is \( data_k \cdot 2^{exp} \).

If calc_hr is false, a->hr is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initialize a->hr.

Parameters:

a – [out] BFP vector to initialize
data – [in] int32_t buffer used to back a
exp – [in] Exponent of BFP vector
length – [in] Number of elements in the BFP vector
calc_hr – [in] Boolean indicating whether the HR of the BFP vector should be calculated

bfp_s32_t bfp_s32_alloc(unsigned length)#

Dynamically allocate a 32-bit BFP vector from the heap.

If allocation was unsuccessful, the data field of the returned vector will be NULL, and the length field will be zero. Otherwise, data will point to the allocated memory and the length field will be the user-specified length. The length argument must not be zero.

Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_s32_set() on the retuned BFP vector.

BFP vectors allocated using this function must be deallocated using bfp_s32_dealloc() to avoid a memory leak.

To initialize a BFP vector using static memory allocation, use bfp_s32_init() instead.

See also

bfp_s32_alloc

Parameters:

vector – [in] BFP vector to be deallocated.

void bfp_s32_set(bfp_s32_t *a, const int32_t b, const exponent_t exp)#

Set all elements of a 32-bit BFP vector to a specified value.

The exponent of a is set to exp, and each element’s mantissa is set to b.

After performing this operation, all elements will represent the same value \(b \cdot 2^{exp}\).

a must have been initialized (see bfp_s32_init()).

Parameters:

a – [out] BFP vector to update
b – [in] New value each mantissa is set to
exp – [in] New exponent for the BFP vector

void bfp_s32_use_exponent(bfp_s32_t *a, const exponent_t exp)#

Modify a 32-bit BFP vector to use a specified exponent.

This function forces BFP vector \(\bar A\) to use a specified exponent. The mantissa vector \(\bar a\) will be bit-shifted left or right to compensate for the changed exponent.

This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.

Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).

If the required fixed-point Q-format is QX.Y, where Y is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameter exp) is -Y.

a points to input BFP vector \(\bar A\), with mantissa vector \(\bar a\) and exponent \(a\_exp\). a is updated in place to produce resulting BFP vector \(\tilde{A}\) with mantissa vector \(\tilde{a}\) and exponent \(\tilde{a}\_exp\).

exp is \(\tilde{a}\_exp\), the required exponent. \(\Delta{}p = \tilde{a}\_exp - a\_exp\) is the required change in exponent.

If \(\Delta{}p = 0\), the BFP vector is left unmodified.

If \(\Delta{}p > 0\), the required exponent is larger than the current exponent and an arithmetic right-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When applying a right-shift, precision may be lost by discarding the \(\Delta{}p\) least significant bits.

If \(\Delta{}p < 0\), the required exponent is smaller than the current exponent and a left-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 32-bit saturation bounds.

The exponent and headroom of a are updated by this function.

Operation Performed:

\[\begin{split}\begin{flalign*} & \Delta{}p = \tilde{a}\_exp - a\_exp \\ & \tilde{a_k} \leftarrow sat_{32}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } && \end{flalign*}\end{split}\]

Parameters:

a – [inout] Input BFP vector \(\bar A\) / Output BFP vector \(\tilde{A}\)
exp – [in] The required exponent, \(\tilde{a}\_exp\)

headroom_t bfp_s32_headroom(bfp_s32_t *b)#

Get the headroom of a 32-bit BFP vector.

The headroom of a vector is the number of bits its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.

In a BFP context, headroom applies to mantissas only, not exponents.

In particular, if the 32-bit mantissa vector \(\bar x\) has \(N\) bits of headroom, then for any element \(x_k\) of \(\bar x\)

\(-2^{31-N} \le x_k < 2^{31-N}\)

And for any element \(X_k = x_k \cdot 2^{x\_exp}\) of a complex BFP vector \(\bar X\)

\(-2^{31 + x\_exp - N} \le X_k < 2^{31 + x\_exp - N} \)

This function determines the headroom of b, updates b->hr with that value, and then returns b->hr.

Parameters:

b – BFP vector to get the headroom of

Returns:

Headroom of BFP vector b

void bfp_s32_shl(bfp_s32_t *a, const bfp_s32_t *b, const left_shift_t b_shl)#

Apply a left-shift to the mantissas of a 32-bit BFP vector.

Each mantissa of input BFP vector \(\bar B\) is left-shifted b_shl bits and stored in the corresponding element of output BFP vector \(\bar A\).

This operation can be used to add or remove headroom from a BFP vector.

b_shl is the number of bits that each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values for b_shl will right-shift the mantissas.

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 32-bit range (the open interval \((-2^{31}, 2^{31})\)). To avoid saturation, b_shl should be no greater than the headroom of b (b->hr).

Operation Performed:

\[\begin{split}\begin{flalign*} & a_k \leftarrow sat_{32}( \lfloor b_k \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} && \end{flalign*}\end{split}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
b_shl – [in] Signed arithmetic left-shift to be applied to mantissas of \(\bar B\).

void bfp_s32_add(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)#

Add two 32-bit BFP vectors together.

Add together two input BFP vectors \(\bar B\) and \(\bar C\) and store the result in BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\[\begin{flalign*} \bar{A} \leftarrow \bar{B} + \bar{C} && \end{flalign*}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)

void bfp_s32_add_scalar(bfp_s32_t *a, const bfp_s32_t *b, const float_s32_t c)#

Add a scalar to a 32-bit BFP vector.

Add a real scalar \(c\) to input BFP vector \(\bar B\) and store the result in BFP vector \(\bar A\).

a, and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\[\begin{flalign*} \bar{A} \leftarrow \bar{B} + c && \end{flalign*}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input scalar \(c\)

void bfp_s32_sub(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)#

Subtract one 32-bit BFP vector from another.

Subtract input BFP vector \(\bar C\) from input BFP vector \(\bar C\) and store the result in BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\[\begin{flalign*} \bar{A} \leftarrow \bar{B} - \bar{C} && \end{flalign*}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)

void bfp_s32_mul(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)#

Multiply one 32-bit BFP vector by another element-wise.

Multiply each element of input BFP vector \(\bar B\) by the corresponding element of input BFP vector \(\bar C\) and store the results in output BFP vector \(\bar A\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} && \end{flalign*}\end{split}\]

Parameters:

a – Output BFP vector \(\bar A\)
b – Input BFP vector \(\bar B\)
c – Input BFP vector \(\bar C\)

void bfp_s32_macc(bfp_s32_t *acc, const bfp_s32_t *b, const bfp_s32_t *c)#

Multiply one 32-bit BFP vector by another element-wise and add the result to a third vector.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow A_k + B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} && \end{flalign*}\end{split}\]

Parameters:

acc – [inout] Input/Output accumulator BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)

void bfp_s32_nmacc(bfp_s32_t *acc, const bfp_s32_t *b, const bfp_s32_t *c)#

Multiply one 32-bit BFP vector by another element-wise and subtract the result from a third vector.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow A_k - B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} && \end{flalign*}\end{split}\]

Parameters:

acc – [inout] Input/Output accumulator BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)

void bfp_s32_scale(bfp_s32_t *a, const bfp_s32_t *b, const float_s32_t alpha)#

Multiply a 32-bit BFP vector by a scalar.

Multiply input BFP vector \(\bar B\) by scalar \(\alpha \cdot 2^{\alpha\_exp}\) and store the result in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

alpha represents the scalar \(\alpha \cdot 2^{\alpha\_exp}\), where \(\alpha\) is alpha.mant and \(\alpha\_exp\) is alpha.exp.

This operation can be performed safely in-place on b.

Operation Performed:

\[\begin{flalign*} \bar{A} \leftarrow \bar{B} \cdot \left(\alpha \cdot 2^{\alpha\_exp}\right) && \end{flalign*}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
alpha – [in] Scalar by which \(\bar B\) is multiplied

void bfp_s32_abs(bfp_s32_t *a, const bfp_s32_t *b)#

Get the absolute values of elements of a 32-bit BFP vector.

Compute the absolute value of each element \(B_k\) of input BFP vector \(\bar B\) and store the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)

float_s64_t bfp_s32_sum(const bfp_s32_t *b)#

Sum the elements of a 32-bit BFP vector.

Sum the elements of input BFP vector \(\bar B\) to get a result \(A = a \cdot 2^{a\_exp}\), which is returned. The returned value has a 64-bit mantissa.

b must have been initialized (see bfp_s32_init()).

Operation Performed:

\[\begin{split}\begin{flalign*} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input BFP vector \(\bar B\)

Returns:

\(A\), the sum of elements of \(\bar B\)

float_s64_t bfp_s32_dot(const bfp_s32_t *b, const bfp_s32_t *c)#

Compute the inner product of two 32-bit BFP vectors.

Adds together the element-wise products of input BFP vectors \(\bar B\) and \(\bar C\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is the 64-bit mantissa of the result and \(a\_exp\) is its associated exponent. \(A\) is returned.

b and c must have been initialized (see bfp_s32_init()), and must be the same length.

Operation Performed:

\[\begin{split}\begin{flalign*} & a \cdot 2^{a\_exp} \leftarrow \sum_{k=0}^{N-1} \left( B_k \cdot C_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)

Returns:

\(A\), the inner product of vectors \(\bar B\) and \(\bar C\)

void bfp_s32_clip(bfp_s32_t *a, const bfp_s32_t *b, const int32_t lower_bound, const int32_t upper_bound, const int bound_exp)#

Clamp the elements of a 32-bit BFP vector to a specified range.

Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is in the range \( [ L \cdot 2^{bound\_exp}, U \cdot 2^{bound\_exp} ] \), otherwise it is set to the nearest value inside that range.

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow \begin{cases} L \cdot 2^{bound\_exp} & B_k < L \cdot 2^{bound\_exp} \\ U \cdot 2^{bound\_exp} & B_k > U \cdot 2^{bound\_exp} \\ B_k & otherwise \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
lower_bound – [in] Mantissa of the lower clipping bound, \(L\)
upper_bound – [in] Mantissa of the upper clipping bound, \(U\)
bound_exp – [in] Shared exponent of the clipping bounds

void bfp_s32_rect(bfp_s32_t *a, const bfp_s32_t *b)#

Rectify a 32-bit BFP vector.

Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is non-negative, otherwise it is set to \(0\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow \begin{cases} 0 & B_k < 0 \\ B_k & otherwise \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)

void bfp_s32_to_bfp_s16(bfp_s16_t *a, const bfp_s32_t *b)#

Convert a 32-bit BFP vector into a 16-bit BFP vector.

Reduces the bit-depth of each 32-bit element \(B_k\) of input BFP vector \(\bar B\) to 16 bits, and stores the 16-bit result in the corresponding element \(A_k\) of output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init() and bfp_s16_init()), and must be the same length.

As much precision as possible will be retained.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \overset{16-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)

void bfp_s32_sqrt(bfp_s32_t *a, const bfp_s32_t *b)#

Get the square roots of elements of a 32-bit BFP vector.

Computes the square root of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow \sqrt{B_k} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Notes

Only the XMATH_BFP_SQRT_DEPTH_S32 (see xmath_conf.h) most significant bits of each result are computed.
This function only computes real roots. For any \(B_k < 0\), the corresponding output \(A_k\) is set to \(0\).

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)

void bfp_s32_inverse(bfp_s32_t *a, const bfp_s32_t *b)#

Get the inverses of elements of a 32-bit BFP vector.

Computes the inverse of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).

a and b must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow B_k^{-1} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)

float_s64_t bfp_s32_abs_sum(const bfp_s32_t *b)#

Sum the absolute values of elements of a 32-bit BFP vector.

Sum the absolute values of elements of input BFP vector \(\bar B\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is a 64-bit mantissa and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s32_init()).

Operation Performed:

\[\begin{split}\begin{flalign*} & A \leftarrow \sum_{k=0}^{N-1} \left| A_k \right| \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input BFP vector \(\bar B\)

Returns:

\(A\), the sum of absolute values of elements of \(\bar B\)

float_s32_t bfp_s32_mean(const bfp_s32_t *b)#

Get the mean value of a 32-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the mean value of elements of input BFP vector \(\bar B\), where \(a\) is the 32-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s32_init()).

Operation Performed:

\[\begin{split}\begin{flalign*} & A \leftarrow \frac{1}{N} \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input BFP vector \(\bar B\)

Returns:

\(A\), the mean value of \(\bar B\)’s elements

float_s64_t bfp_s32_energy(const bfp_s32_t *b)#

Get the energy (sum of squared of elements) of a 32-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the sum of squares of elements of input BFP vector \(\bar B\), where \(a\) is the 64-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

b must have been initialized (see bfp_s32_init()).

Operation Performed:

\[\begin{split}\begin{flalign*} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k^2 \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input BFP vector \(\bar B\)

Returns:

\(A\), \(\bar B\)’s energy

float_s32_t bfp_s32_rms(const bfp_s32_t *b)#

Get the RMS value of elements of a 32-bit BFP vector.

Computes \(A = a \cdot 2^{a\_exp}\), the RMS value of elements of input BFP vector \(\bar B\), where \(a\) is the 32-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.

The RMS (root-mean-square) value of a vector is the square root of the sum of the squares of the vector’s elements.

b must have been initialized (see bfp_s32_init()).

Operation Performed:

\[\begin{split}\begin{flalign*} & A \leftarrow \sqrt{\frac{1}{N}\sum_{k=0}^{N-1} \left( B_k^2 \right) } \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input BFP vector \(\bar B\)

Returns:

\(A\), the RMS value of \(\bar B\)’s elements

float_s32_t bfp_s32_max(const bfp_s32_t *b)#

Get the maximum value of a 32-bit BFP vector.

Finds \(A\), the maximum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.

b must have been initialized (see bfp_s32_init()).

Operation Performed:

\[\begin{split}\begin{flalign*} & A \leftarrow max\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input vector

Returns:

\(A\), the value of \(\bar B\)’s maximum element

void bfp_s32_max_elementwise(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)#

Get the element-wise maximum of two 32-bit BFP vectors.

Each element of output vector \(\bar A\) is set to the maximum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow max(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} && \end{flalign*}\end{split}\]

Parameters:

a – Output BFP vector \(\bar A\)
b – Input BFP vector \(\bar B\)
c – Input BFP vector \(\bar C\)

float_s32_t bfp_s32_min(const bfp_s32_t *b)#

Get the minimum value of a 32-bit BFP vector.

Finds \(A\), the minimum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.

b must have been initialized (see bfp_s32_init()).

Operation Performed:

\[\begin{split}\begin{flalign*} & A \leftarrow min\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Parameters:

b – [in] Input vector

Returns:

\(A\), the value of \(\bar B\)’s minimum element

void bfp_s32_min_elementwise(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)#

Get the element-wise minimum of two 32-bit BFP vectors.

Each element of output vector \(\bar A\) is set to the minimum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).

a, b and c must have been initialized (see bfp_s32_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed:

\[\begin{split}\begin{flalign*} & A_k \leftarrow min(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} && \end{flalign*}\end{split}\]

Parameters:

a – Output BFP vector \(\bar A\)
b – Input BFP vector \(\bar B\)
c – Input BFP vector \(\bar C\)

unsigned bfp_s32_argmax(const bfp_s32_t *b)#

Get the index of the maximum value of a 32-bit BFP vector.

Finds \(a\), the index of the maximum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.

If i is the value returned, then the maximum value in \(\bar B\) is ldexp(b->data[i], b->exp).

Operation Performed:

\[\begin{split}\begin{flalign*} & a \leftarrow argmax_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Notes

If there is a tie for maximum value, the lowest tying index is returned.

Parameters:

b – [in] Input vector

Returns:

\(a\), the index of the maximum value from \(\bar B\)

unsigned bfp_s32_argmin(const bfp_s32_t *b)#

Get the index of the minimum value of a 32-bit BFP vector.

Finds \(a\), the index of the minimum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.

If i is the value returned, then the minimum value in \(\bar B\) is ldexp(b->data[i], b->exp).

Operation Performed:

\[\begin{split}\begin{flalign*} & a \leftarrow argmin_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} && \end{flalign*}\end{split}\]

Notes

If there is a tie for minimum value, the lowest tying index is returned.

Parameters:

b – [in] Input vector

Returns:

\(a\), the index of the minimum value from \(\bar B\)

void bfp_s32_convolve_valid(bfp_s32_t *y, const bfp_s32_t *x, const int32_t b_q30[], const unsigned b_length)#

Convolve a 32-bit BFP vector with a short convolution kernel (“valid” mode).

Input BFP vector \(\bar X\) is convolved with a short fixed-point convolution kernel \(\bar b\) to produce output BFP vector \(\bar Y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar X\). The convolution is “valid” in the sense that no output elements are emitted where the filter taps extend beyond the bounds of the input vector, resulting in an output vector \(\bar Y\) with fewer elements.

The maximum filter order \(K\) supported by this function is \(7\).

y is the output vector \(\bar Y\). If input \(\bar X\) has \(N\) elements, and the filter has \(K\) coefficients, then \(\bar Y\) has \(N-2P\) elements, where \(P = \lfloor K / 2 \rfloor\).

x is the input vector \(\bar X\) with length \(N\) and elements.

b_q30[] is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixed-point format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{-30}\).

b_length is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps). b_length must be one of \( \{ 1, 3, 5, 7 \} \).

Operation Performed:

\[\begin{split}\begin{flalign*} & Y_k \leftarrow \sum_{l=0}^{K-1} (X_{(k+l)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor && \end{flalign*}\end{split}\]

Parameters:

y – [out] Output BFP vector \(\bar Y\)
x – [in] Input BFP vector \(\bar X\)
b_q30 – [in] Convolution kernel \(\bar b\)
b_length – [in] The number of elements \(K\) in \(\bar b\)

void bfp_s32_convolve_same(bfp_s32_t *y, const bfp_s32_t *x, const int32_t b_q30[], const unsigned b_length, const pad_mode_e padding_mode)#

Convolve a 32-bit BFP vector with a short convolution kernel (“same” mode).

Input BFP vector \(\bar X\) is convolved with a short fixed-point convolution kernel \(\bar b\) to produce output BFP vector \(\bar Y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar X\). The convolution mode is “same” in that the input vector is effectively padded such that the input and output vectors are the same length. The padding behavior is one of those given by pad_mode_e.

The maximum filter order \(K\) supported by this function is \(7\).

y and x are the output and input BFP vectors \(\bar Y\) and \(\bar X\) respectively.

b_length is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps). b_length must be one of \( \{ 1, 3, 5, 7 \} \).

padding_mode is one of the values from the pad_mode_e enumeration. The padding mode indicates the filter input values for filter taps that have extended beyond the bounds of the input vector \(\bar X\). See pad_mode_e for a list of supported padding modes and associated behaviors.

Operation Performed:

\[\begin{split}\begin{flalign*} & \tilde{x}_i = \begin{cases} \text{determined by padding mode} & i < 0 \\ \text{determined by padding mode} & i \ge N \\ x_i & otherwise \end{cases} \\ & y_k \leftarrow \sum_{l=0}^{K-1} (\tilde{x}_{(k+l-P)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor && \end{flalign*}\end{split}\]

Note

Unlike bfp_s32_convolve_valid(), this operation cannot be performed safely in-place on x

Parameters:

y – [out] Output BFP vector \(\bar Y\)
x – [in] Input BFP vector \(\bar X\)
b_q30 – [in] Convolution kernel \(\bar b\)
b_length – [in] The number of elements \(K\) in \(\bar b\)
padding_mode – [in] The padding mode to be applied at signal boundaries