# 16-Bit Block Floating-Point Functions¶

void bfp_complex_s16_set(bfp_complex_s16_t *a, const complex_s16_t b, const exponent_t exp)

Set all elements of a complex 16-bit BFP vector to a specified value.

The exponent of a is set to exp, and each element’s mantissa is set to b.

After performing this operation, all elements will represent the same value $$b \cdot 2^{exp}$$.

a must have been initialized (see bfp_complex_s16_init()).

Parameters
• a[out] BFP vector to update

• b[in] New value each complex mantissa is set to

• exp[in] New exponent for the BFP vector

void bfp_complex_s16_use_exponent(bfp_complex_s16_t *a, const exponent_t exp)

Modify a complex 16-bit BFP vector to use a specified exponent.

This function forces complex BFP vector $$\bar A$$ to use a specified exponent. The mantissa vector $$\bar a$$ will be bit-shifted left or right to compensate for the changed exponent.

This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.

Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).

If the required fixed-point Q-format is QX.Y, where Y is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameter exp) is -Y.

a points to input BFP vector $$\bar A$$, with complex mantissa vector $$\bar a$$ and exponent $$a\_exp$$. a is updated in place to produce resulting BFP vector $$\bar \tilde{A}$$ with complex mantissa vector $$\bar \tilde{a}$$ and exponent $$\tilde{a}\_exp$$.

exp is $$\tilde{a}\_exp$$, the required exponent. $$\Delta{}p = \tilde{a}\_exp - a\_exp$$ is the required change in exponent.

If $$\Delta{}p = 0$$, the BFP vector is left unmodified.

If $$\Delta{}p > 0$$, the required exponent is larger than the current exponent and an arithmetic right-shift of $$\Delta{}p$$ bits is applied to the mantissas $$\bar a$$. When applying a right-shift, precision may be lost by discarding the $$\Delta{}p$$ least significant bits.

If $$\Delta{}p < 0$$, the required exponent is smaller than the current exponent and a left-shift of $$\Delta{}p$$ bits is applied to the mantissas $$\bar a$$. When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 16-bit saturation bounds.

The exponent and headroom of a are updated by this function.

Operation Performed:

\begin{split}\begin{align*} & \Delta{}p = \tilde{a}\_exp - a\_exp & \tilde{a_k} \leftarrow sat_{16}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{align*}\end{split}

Parameters
• a[inout] Input BFP vector $$\bar A$$ / Output BFP vector $$\bar \tilde{A}$$

• exp[in] The required exponent, $$\tilde{a}\_exp$$

Get the headroom of a complex 16-bit BFP vector.

The headroom of a complex vector is the number of bits that the real and imaginary parts of each of its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.

In a BFP context, headroom applies to mantissas only, not exponents.

In particular, if the complex 16-bit mantissa vector $$\bar x$$ has $$N$$ bits of headroom, then for any element $$x_k$$ of $$\bar x$$

$$-2^{15-N} \le Re\{x_k\} \lt 2^{15-N}$$

and

$$-2^{15-N} \le Im\{x_k\} \lt 2^{15-N}$$

And for any element $$X_k = x_k \cdot 2^{x\_exp}$$ of a complex BFP vector $$\bar X$$

$$-2^{15 + x\_exp - N} \le Re\{X_k\} \lt 2^{15 + x\_exp - N}$$

and

$$-2^{15 + x\_exp - N} \le Im\{X_k\} \lt 2^{15 + x\_exp - N}$$

This function determines the headroom of b, updates b->hr with that value, and then returns b->hr.

Parameters

b – complex BFP vector to get the headroom of

Returns

Headroom of complex BFP vector b

void bfp_complex_s16_shl(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const left_shift_t b_shl)

Apply a left-shift to the mantissas of a complex 16-bit BFP vector.

Each complex mantissa of input BFP vector $$\bar B$$ is left-shifted b_shl bits and stored in the corresponding element of output BFP vector $$\bar A$$.

This operation can be used to add or remove headroom from a BFP vector.

b_shr is the number of bits that the real and imaginary parts of each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values for b_shl will right-shift the mantissas.

a and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 16-bit range ( $$-2^{15} \lt \lt 2^{15}$$). To avoid saturation, b_shl should be no greater than the headroom of b (b->hr).

Operation Performed:

\begin{split}\begin{align*} & Re\{a_k\} \leftarrow sat_{16}( \lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & Im\{a_k\} \leftarrow sat_{16}( \lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{align*}\end{split}

Parameters
• a[out] Complex output BFP vector $$\bar A$$

• b[in] Complex input BFP vector $$\bar B$$

• b_shl[in] Signed arithmetic left-shift to be applied to mantissas of $$\bar B$$.

void bfp_complex_s16_real_mul(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_s16_t *c)

Multiply a complex 16-bit BFP vector element-wise by a real 16-bit BFP vector.

Each complex output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the complex product of $$B_k$$ and $$C_k$$, the corresponding elements of complex input BFP vector $$\bar B$$ and real input BFP vector $$\bar C$$ respectively.

a, b and c must have been initialized (see bfp_complex_s16_init() and bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input real BFP vector $$\bar C$$

void bfp_complex_s16_mul(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector element-wise another.

Each complex output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the complex product of $$B_k$$ and $$C_k$$, the corresponding elements of complex input BFP vectors $$\bar B$$ and $$\bar C$$ respectively.

a, b and c must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_conj_mul(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector element-wise by the complex conjugate of another.

Each complex output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the complex product of $$B_k$$, the corresponding element of complex input BFP vectors $$\bar B$$, and $$(C_k)^*$$, the complex conjugate of the corresponding element of complex input BFP vector $$\bar C$$.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot (C_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{align*}\end{split}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_macc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by another element-wise and add the result to a third vector.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow A_k + (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• acc[inout] Input/Output accumulator complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_nmacc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by another element-wise and subtract the result from a third vector.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow A_k - (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• acc[inout] Input/Output accumulator complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_conj_macc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by the complex conjugate of another element-wise and add the result to a third vector.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow A_k + (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{align*}\end{split}

Parameters
• acc[inout] Input/Output accumulator complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_conj_nmacc(bfp_complex_s16_t *acc, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Multiply one complex 16-bit BFP vector by the complex conjugate of another element-wise and subtract the result from a third vector.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow A_k - (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{align*}\end{split}

Parameters
• acc[inout] Input/Output accumulator complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_real_scale(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const float alpha)

Multiply a complex 16-bit BFP vector by a real scalar.

Each complex output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the complex product of $$B_k$$, the corresponding element of complex input BFP vector $$\bar B$$, and real scalar $$\alpha\cdot 2^{\alpha\_exp}$$, where $$\alpha$$ and $$\alpha\_exp$$ are the mantissa and exponent respectively of parameter alpha.

a and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{align*}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• alpha[in] Real scalar by which $$\bar B$$ is multiplied

void bfp_complex_s16_scale(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const float_complex_s16_t alpha)

Multiply a complex 16-bit BFP vector by a complex scalar.

Each complex output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the complex product of $$B_k$$, the corresponding element of complex input BFP vector $$\bar B$$, and complex scalar $$\alpha\cdot 2^{\alpha\_exp}$$, where $$\alpha$$ and $$\alpha\_exp$$ are the complex mantissa and exponent respectively of parameter alpha.

a and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{align*}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• alpha[in] Complex scalar by which $$\bar B$$ is multiplied

void bfp_complex_s16_add(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Add one complex 16-bit BFP vector to another.

Each complex output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the sum of $$B_k$$ and $$C_k$$, the corresponding elements of complex input BFP vectors $$\bar B$$ and $$\bar C$$ respectively.

a, b and c must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} + \bar{C} \end{align*}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_add_scalar(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const float_complex_s16_t c)

Add a complex scalar to a complex 16-bit BFP vector.

Add a real scalar $$c$$ to input BFP vector $$\bar B$$ and store the result in BFP vector $$\bar A$$.

a, and b must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} + c \end{align*}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex scalar $$c$$

void bfp_complex_s16_sub(bfp_complex_s16_t *a, const bfp_complex_s16_t *b, const bfp_complex_s16_t *c)

Subtract one complex 16-bit BFP vector from another.

Each complex output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the difference between $$B_k$$ and $$C_k$$, the corresponding elements of complex input BFP vectors $$\bar B$$ and $$\bar C$$ respectively.

a, b and c must have been initialized (see bfp_complex_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} - \bar{C} \end{align*}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

• c[in] Input complex BFP vector $$\bar C$$

void bfp_complex_s16_to_complex_s32(bfp_complex_s32_t *a, const bfp_complex_s16_t *b)

Convert a complex 16-bit BFP vector to a complex 32-bit BFP vector.

Each complex 32-bit output element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the value of $$B_k$$, the corresponding element of complex 16-bit input BFP vector $$\bar B$$, sign-extended to 32 bits.

a and b must have been initialized (see bfp_complex_s32_init() and bfp_complex_s16_init()), and must be the same length.

Operation Performed:

\begin{split}\begin{align*} & A_k \overset{32-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters
• a[out] Output complex 32-bit BFP vector $$\bar A$$

• b[in] Input complex 16-bit BFP vector $$\bar B$$

void bfp_complex_s16_squared_mag(bfp_s16_t *a, const bfp_complex_s16_t *b)

Get the squared magnitude of each element of a complex 16-bit BFP vector.

Each element $$A_k$$ of real output BFP vector $$\bar A$$ is set to the squared magnitude of $$B_k$$, the corresponding element of complex input BFP vector $$\bar B$$.

a and b must have been initialized (see bfp_s16_init() bfp_complex_s16_init()), and must be the same length.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot (B_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } (B_k)^* \text{ is the complex conjugate of } B_k \end{align*}\end{split}

Parameters
• a[out] Output real BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

void bfp_complex_s16_mag(bfp_s16_t *a, const bfp_complex_s16_t *b)

Get the magnitude of each element of a complex 16-bit BFP vector.

Each element $$A_k$$ of real output BFP vector $$\bar A$$ is set to the magnitude of $$B_k$$, the corresponding element of complex input BFP vector $$\bar B$$.

a and b must have been initialized (see bfp_s16_init() bfp_complex_s16_init()), and must be the same length.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters
• a[out] Output real BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

float_complex_s32_t bfp_complex_s16_sum(const bfp_complex_s16_t *b)

Get the sum of elements of a complex 16-bit BFP vector.

The elements of complex input BFP vector $$\bar B$$ are summed together. The result is a complex 32-bit floating-point scalar $$a$$, which is returned.

b must have been initialized (see bfp_complex_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & a \leftarrow \sum_{k=0}^{N-1} \left( b_k \cdot 2^{B\_exp} \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input complex BFP vector $$\bar B$$

Returns

$$a$$, the sum of vector $$\bar B$$’s elements

void bfp_complex_s16_conjugate(bfp_complex_s16_t *a, const bfp_complex_s16_t *b)

Get the complex conjugate of each element of a complex 16-bit BFP vector.

Each element $$A_k$$ of complex output BFP vector $$\bar A$$ is set to the complex conjugate of $$B_k$$, the corresponding element of complex input BFP vector $$\bar B$$.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow B_k^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } B_k^* \text{ is the complex conjugate of } B_k \end{align*}\end{split}

Parameters
• a[out] Output complex BFP vector $$\bar A$$

• b[in] Input complex BFP vector $$\bar B$$

float_s64_t bfp_complex_s16_energy(const bfp_complex_s16_t *b)

Get the energy of a complex 16-bit BFP vector.

The energy of a complex 16-bit BFP vector here is the sum of the squared magnitudes of each of the vector’s elements.

Operation Performed:

\begin{split}\begin{align*} & a \leftarrow \sum_{k=0}^{N-1} \left( \left|b_k \cdot 2^{B\_exp}\right|^2 \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input complex BFP vector $$\bar B$$

Returns

$$a$$, the energy of vector $$\bar B$$

void bfp_s16_init(bfp_s16_t *a, int16_t *data, const exponent_t exp, const unsigned length, const unsigned calc_hr)

Initialize a 16-bit BFP vector.

This function initializes each of the fields of BFP vector a.

data points to the memory buffer used to store elements of the vector, so it must be at least length * 2 bytes long, and must begin at a word-aligned address.

exp is the exponent assigned to the BFP vector. The logical value associated with the kth element of the vector after initialization is $$data_k \cdot 2^{exp}$$.

If calc_hr is false, a->hr is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initialize a->hr.

Parameters
• a[out] BFP vector to initialize

• data[in] int16_t buffer used to back a

• exp[in] Exponent of BFP vector

• length[in] Number of elements in the BFP vector

• calc_hr[in] Boolean indicating whether the HR of the BFP vector should be calculated

void bfp_complex_s16_init(bfp_complex_s16_t *a, int16_t *real_data, int16_t *imag_data, const exponent_t exp, const unsigned length, const unsigned calc_hr)

Initialize a complex 16-bit BFP vector.

This function initializes each of the fields of BFP vector a.

Unlike complex 32-bit BFP vectors (bfp_complex_s16_t), for the sake of various optimizations the real and imaginary parts of elements’ mantissas are stored in separate memory buffers.

real_data points to the memory buffer used to store the real part of each mantissa. It must be at least length * 2 bytes long, and must begin at a word-aligned address.

imag_data points to the memory buffer used to store the imaginary part of each mantissa. It must be at least length * 2 bytes long, and must begin at a word-aligned address.

exp is the exponent assigned to the BFP vector. The logical value associated with the kth element of the vector after initialization is $$data_k \cdot 2^{exp}$$.

If calc_hr is false, a->hr is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initialize a->hr.

Parameters
• a[out] BFP vector to initialize

• real_data[in] int16_t buffer used to back the real part of a

• imag_data[in] int16_t buffer used to back the imaginary part of a

• exp[in] Exponent of BFP vector

• length[in] Number of elements in BFP vector

• calc_hr[in] Boolean indicating whether the HR of the BFP vector should be calculated

bfp_s16_t bfp_s16_alloc(const unsigned length)

Dynamically allocate a 16-bit BFP vector from the heap.

If allocation was unsuccessful, the data field of the returned vector will be NULL, and the length field will be zero. Otherwise, data will point to the allocated memory and the length field will be the user-specified length. The length argument must not be zero.

Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_s16_set() on the retuned BFP vector.

BFP vectors allocated using this function must be deallocated using bfp_s16_dealloc() to avoid a memory leak.

To initialize a BFP vector using static memory allocation, use bfp_s16_init() instead.

Note

Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.

Parameters

length[in] The length of the BFP vector to be allocated (in elements)

Returns

16-bit BFP vector

bfp_complex_s16_t bfp_complex_s16_alloc(const unsigned length)

Dynamically allocate a complex 16-bit BFP vector from the heap.

If allocation was unsuccessful, the real and imag fields of the returned vector will be NULL, and the length field will be zero. Otherwise, real and imag will point to the allocated memory and the length field will be the user-specified length. The length argument must not be zero.

This function allocates a single block of memory for both the real and imaginary parts of the BFP vector. Because all BFP functions require the mantissa buffers to begin at a word- aligned address, if length is odd, this function will allocate an extra int16_t element for the buffer.

Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_complex_s16_set() on the retuned BFP vector.

BFP vectors allocated using this function must be deallocated using bfp_complex_s16_dealloc() to avoid a memory leak.

To initialize a BFP vector using static memory allocation, use bfp_complex_s16_init() instead.

Note

Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.

Parameters

length[in] The length of the BFP vector to be allocated (in elements)

Returns

Complex 16-bit BFP vector

void bfp_s16_dealloc(bfp_s16_t *vector)

Deallocate a 16-bit BFP vector allocated by bfp_s16_alloc().

Use this function to free the heap memory allocated by bfp_s16_alloc().

BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_s16_t which has not had its flags or data manually manipulated, including:

In the latter two cases, this function does nothing. In the former, the data, length and flags fields of vector are cleared to zero.

Parameters

vector[in] BFP vector to be deallocated.

void bfp_complex_s16_dealloc(bfp_complex_s16_t *vector)

Deallocate a complex 16-bit BFP vector allocated by bfp_complex_s16_alloc().

Use this function to free the heap memory allocated by bfp_complex_s16_alloc().

BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_complex_s16_t which has not had its flags or real manually manipulated, including:

In the latter two cases, this function does nothing. In the former, the real, imag, length and flags fields of vector are cleared to zero.

Parameters

vector[in] BFP vector to be deallocated.

void bfp_s16_set(bfp_s16_t *a, const int16_t b, const exponent_t exp)

Set all elements of a 16-bit BFP vector to a specified value.

The exponent of a is set to exp, and each element’s mantissa is set to b.

After performing this operation, all elements will represent the same value $$b \cdot 2^{exp}$$.

a must have been initialized (see bfp_s16_init()).

Parameters
• a[out] BFP vector to update

• b[in] New value each mantissa is set to

• exp[in] New exponent for the BFP vector

Get the headroom of a 16-bit BFP vector.

The headroom of a vector is the number of bits its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.

In a BFP context, headroom applies to mantissas only, not exponents.

In particular, if the 16-bit mantissa vector $$\bar x$$ has $$N$$ bits of headroom, then for any element $$x_k$$ of $$\bar x$$

$$-2^{15-N} \le x_k \lt 2^{15-N}$$

And for any element $$X_k = x_k \cdot 2^{x\_exp}$$ of a complex BFP vector $$\bar X$$

$$-2^{15 + x\_exp - N} \le X_k \lt 2^{15 + x\_exp - N}$$

This function determines the headroom of b, updates b->hr with that value, and then returns b->hr.

Parameters

b – BFP vector to get the headroom of

Returns

Headroom of BFP vector b

void bfp_s16_use_exponent(bfp_s16_t *a, const exponent_t exp)

Modify a 16-bit BFP vector to use a specified exponent.

This function forces BFP vector $$\bar A$$ to use a specified exponent. The mantissa vector $$\bar a$$ will be bit-shifted left or right to compensate for the changed exponent.

This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.

Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).

If the required fixed-point Q-format is QX.Y, where Y is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameter exp) is -Y.

a points to input BFP vector $$\bar A$$, with mantissa vector $$\bar a$$ and exponent $$a\_exp$$. a is updated in place to produce resulting BFP vector $$\bar \tilde{A}$$ with mantissa vector $$\bar \tilde{a}$$ and exponent $$\tilde{a}\_exp$$.

exp is $$\tilde{a}\_exp$$, the required exponent. $$\Delta{}p = \tilde{a}\_exp - a\_exp$$ is the required change in exponent.

If $$\Delta{}p = 0$$, the BFP vector is left unmodified.

If $$\Delta{}p > 0$$, the required exponent is larger than the current exponent and an arithmetic right-shift of $$\Delta{}p$$ bits is applied to the mantissas $$\bar a$$. When applying a right-shift, precision may be lost by discarding the $$\Delta{}p$$ least significant bits.

If $$\Delta{}p < 0$$, the required exponent is smaller than the current exponent and a left-shift of $$\Delta{}p$$ bits is applied to the mantissas $$\bar a$$. When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 16-bit saturation bounds.

The exponent and headroom of a are updated by this function.

Operation Performed:

\begin{split}\begin{align*} & \Delta{}p = \tilde{a}\_exp - a\_exp & \tilde{a_k} \leftarrow sat_{16}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{align*}\end{split}

Parameters
• a[inout] Input BFP vector $$\bar A$$ / Output BFP vector $$\bar \tilde{A}$$

• exp[in] The required exponent, $$\tilde{a}\_exp$$

void bfp_s16_shl(bfp_s16_t *a, const bfp_s16_t *b, const left_shift_t b_shl)

Apply a left-shift to the mantissas of a 16-bit BFP vector.

Each mantissa of input BFP vector $$\bar B$$ is left-shifted b_shl bits and stored in the corresponding element of output BFP vector $$\bar A$$.

This operation can be used to add or remove headroom from a BFP vector.

b_shl is the number of bits that each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values for b_shl will right-shift the mantissas.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 16-bit range ( $$-2^{15} \lt \lt 2^{15}$$). To avoid saturation, b_shl should be no greater than the headroom of b (b->hr).

Operation Performed:

\begin{split}\begin{align*} & a_k \leftarrow sat_{16}( \lfloor b_k \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{align*}\end{split}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• b_shl[in] Signed arithmetic left-shift to be applied to mantissas of $$\bar B$$.

void bfp_s16_add(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Add two 16-bit BFP vectors together.

Add together two input BFP vectors $$\bar B$$ and $$\bar C$$ and store the result in BFP vector $$\bar A$$.

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} + \bar{C} \end{align*}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• c[in] Input BFP vector $$\bar C$$

void bfp_s16_add_scalar(bfp_s16_t *a, const bfp_s16_t *b, const float c)

Add a scalar to a 16-bit BFP vector.

Add a real scalar $$c$$ to input BFP vector $$\bar B$$ and store the result in BFP vector $$\bar A$$.

a, and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} + c \end{align*}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• c[in] Input scalar $$c$$

void bfp_s16_sub(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Subtract one 16-bit BFP vector from another.

Subtract input BFP vector $$\bar C$$ from input BFP vector $$\bar C$$ and store the result in BFP vector $$\bar A$$.

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} - \bar{C} \end{align*}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• c[in] Input BFP vector $$\bar C$$

void bfp_s16_mul(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Multiply one 16-bit BFP vector by another element-wise.

Multiply each element of input BFP vector $$\bar B$$ by the corresponding element of input BFP vector $$\bar C$$ and store the results in output BFP vector $$\bar A$$.

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b or c.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• a – Output BFP vector $$\bar A$$

• b – Input BFP vector $$\bar B$$

• c – Input BFP vector $$\bar C$$

void bfp_s16_macc(bfp_s16_t *acc, const bfp_s16_t *b, const bfp_s16_t *c)

Multiply one 16-bit BFP vector by another element-wise and add the result to a third vector.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow A_k + B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• acc[inout] Input/Output accumulator BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• c[in] Input BFP vector $$\bar C$$

void bfp_s16_nmacc(bfp_s16_t *acc, const bfp_s16_t *b, const bfp_s16_t *c)

Multiply one 16-bit BFP vector by another element-wise and subtract the result from a third vector.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow A_k - B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• acc[inout] Input/Output accumulator BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• c[in] Input BFP vector $$\bar C$$

void bfp_s16_scale(bfp_s16_t *a, const bfp_s16_t *b, const float alpha)

Multiply a 16-bit BFP vector by a scalar.

Multiply input BFP vector $$\bar B$$ by scalar $$\alpha \cdot 2^{\alpha\_exp}$$ and store the result in output BFP vector $$\bar A$$.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

alpha represents the scalar $$\alpha \cdot 2^{\alpha\_exp}$$, where $$\alpha$$ is alpha.mant and $$\alpha\_exp$$ is alpha.exp.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{align*} \bar{A} \leftarrow \bar{B} \cdot \left(\alpha \cdot 2^{\alpha\_exp}\right) \end{align*}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• alpha[in] Scalar by which $$\bar B$$ is multiplied

void bfp_s16_abs(bfp_s16_t *a, const bfp_s16_t *b)

Get the absolute values of elements of a 16-bit BFP vector.

Compute the absolute value of each element $$B_k$$ of input BFP vector $$\bar B$$ and store the results in output BFP vector $$\bar A$$.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{split}\begin{align*} A_k \leftarrow \left| B_k \right| \\ \qquad\text{for } k \in 0\ ...\ (N-1) \\ \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

float_s32_t bfp_s16_sum(const bfp_s16_t *b)

Sum the elements of a 16-bit BFP vector.

Sum the elements of input BFP vector $$\bar B$$ to get a result $$A = a \cdot 2^{a\_exp}$$, which is returned. The returned value has a 32-bit mantissa.

b must have been initialized (see bfp_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input BFP vector $$\bar B$$

Returns

$$A$$, the sum of elements of $$\bar B$$

float_s64_t bfp_s16_dot(const bfp_s16_t *b, const bfp_s16_t *c)

Compute the inner product of two 16-bit BFP vectors.

Adds together the element-wise products of input BFP vectors $$\bar B$$ and $$\bar C$$ for a result $$A = a \cdot 2^{a\_exp}$$, where $$a$$ is the 64-bit mantissa of the result and $$a\_exp$$ is its associated exponent. $$A$$ is returned.

b and c must have been initialized (see bfp_s16_init()), and must be the same length.

Operation Performed:

\begin{split}\begin{align*} & a \cdot 2^{a\_exp} \leftarrow \sum_{k=0}^{N-1} \left( B_k \cdot C_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• b[in] Input BFP vector $$\bar B$$

• c[in] Input BFP vector $$\bar C$$

Returns

$$A$$, the inner product of vectors $$\bar B$$ and $$\bar C$$

void bfp_s16_clip(bfp_s16_t *a, const bfp_s16_t *b, const int16_t lower_bound, const int16_t upper_bound, const int bound_exp)

Clamp the elements of a 16-bit BFP vector to a specified range.

Each element $$A_k$$ of output BFP vector $$\bar A$$ is set to the corresponding element $$B_k$$ of input BFP vector $$\bar B$$ if it is in the range $$[ L \cdot 2^{bound\_exp}, U \cdot 2^{bound\_exp} ]$$, otherwise it is set to the nearest value inside that range.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow \begin{cases} & L \cdot 2^{bound\_exp} & B_k \lt L \cdot 2^{bound\_exp} \\ & U \cdot 2^{bound\_exp} & B_k \gt U \cdot 2^{bound\_exp} \\ & B_k & otherwise & \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

• lower_bound[in] Mantissa of the lower clipping bound, $$L$$

• upper_bound[in] Mantissa of the upper clipping bound, $$U$$

• bound_exp[in] Shared exponent of the clipping bounds

void bfp_s16_rect(bfp_s16_t *a, const bfp_s16_t *b)

Rectify a 16-bit BFP vector.

Each element $$A_k$$ of output BFP vector $$\bar A$$ is set to the corresponding element $$B_k$$ of input BFP vector $$\bar B$$ if it is non-negative, otherwise it is set to $$0$$.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow \begin{cases} & 0 & B_k \lt 0 \\ & B_k & otherwise & \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

void bfp_s16_to_s32(bfp_s32_t *a, const bfp_s16_t *b)

Convert a 16-bit BFP vector into a 32-bit BFP vector.

Increases the bit-depth of each 16-bit element $$B_k$$ of input BFP vector $$\bar B$$ to 32 bits, and stores the 32-bit result in the corresponding element $$A_k$$ of output BFP vector $$\bar A$$.

a and b must have been initialized (see bfp_s16_init() and bfp_s32_init()), and must be the same length.

Operation Performed:

\begin{split}\begin{align*} & A_k \overset{32-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

void bfp_s16_sqrt(bfp_s16_t *a, const bfp_s16_t *b)

Get the square roots of elements of a 16-bit BFP vector.

Computes the square root of each element $$B_k$$ of input BFP vector $$\bar B$$ and stores the results in output BFP vector $$\bar A$$.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow \sqrt{B_k} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Notes

• Only the XS3_BFP_SQRT_DEPTH_S16 (see xs3_math_conf.h) most significant bits of each result are computed.

• This function only computes real roots. For any $$B_k \lt 0$$, the corresponding output $$A_k$$ is set to $$0$$.

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

void bfp_s16_inverse(bfp_s16_t *a, const bfp_s16_t *b)

Get the inverses of elements of a 16-bit BFP vector.

Computes the inverse of each element $$B_k$$ of input BFP vector $$\bar B$$ and stores the results in output BFP vector $$\bar A$$.

a and b must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow B_k^{-1} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters
• a[out] Output BFP vector $$\bar A$$

• b[in] Input BFP vector $$\bar B$$

float_s32_t bfp_s16_abs_sum(const bfp_s16_t *b)

Sum the absolute values of elements of a 16-bit BFP vector.

Sum the absolute values of elements of input BFP vector $$\bar B$$ for a result $$A = a \cdot 2^{a\_exp}$$, where $$a$$ is a 32-bit mantissa and $$a\_exp$$ is its associated exponent. $$A$$ is returned.

b must have been initialized (see bfp_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & A \leftarrow \sum_{k=0}^{N-1} \left| A_k \right| \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input BFP vector $$\bar B$$

Returns

$$A$$, the sum of absolute values of elements of $$\bar B$$

float bfp_s16_mean(const bfp_s16_t *b)

Get the mean value of a 16-bit BFP vector.

Computes $$A = a \cdot 2^{a\_exp}$$, the mean value of elements of input BFP vector $$\bar B$$, where $$a$$ is the 16-bit mantissa of the result, and $$a\_exp$$ is its associated exponent. $$A$$ is returned.

b must have been initialized (see bfp_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & A \leftarrow \frac{1}{N} \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input BFP vector $$\bar B$$

Returns

$$A$$, the mean value of $$\bar B$$’s elements

float_s64_t bfp_s16_energy(const bfp_s16_t *b)

Get the energy (sum of squared of elements) of a 16-bit BFP vector.

Computes $$A = a \cdot 2^{a\_exp}$$, the sum of squares of elements of input BFP vector $$\bar B$$, where $$a$$ is the 64-bit mantissa of the result, and $$a\_exp$$ is its associated exponent. $$A$$ is returned.

b must have been initialized (see bfp_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k^2 \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input BFP vector $$\bar B$$

Returns

$$A$$, $$\bar B$$’s energy

float_s32_t bfp_s16_rms(const bfp_s16_t *b)

Get the RMS value of elements of a 16-bit BFP vector.

Computes $$A = a \cdot 2^{a\_exp}$$, the RMS value of elements of input BFP vector $$\bar B$$, where $$a$$ is the 32-bit mantissa of the result, and $$a\_exp$$ is its associated exponent. $$A$$ is returned.

The RMS (root-mean-square) value of a vector is the square root of the sum of the squares of the vector’s elements.

b must have been initialized (see bfp_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & A \leftarrow \sqrt{\frac{1}{N}\sum_{k=0}^{N-1} \left( B_k^2 \right) } \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input BFP vector $$\bar B$$

Returns

$$A$$, the RMS value of $$\bar B$$’s elements

float bfp_s16_max(const bfp_s16_t *b)

Get the maximum value of a 16-bit BFP vector.

Finds $$A$$, the maximum value among elements of input BFP vector $$\bar B$$. $$A$$ is returned by this function.

b must have been initialized (see bfp_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & A \leftarrow max\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input vector

Returns

$$A$$, the value of $$\bar B$$’s maximum element

void bfp_s16_max_elementwise(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Get the element-wise maximum of two 16-bit BFP vectors.

Each element of output vector $$\bar A$$ is set to the maximum of the corresponding elements in the input vectors $$\bar B$$ and $$\bar C$$.

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow max(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• a – Output BFP vector $$\bar A$$

• b – Input BFP vector $$\bar B$$

• c – Input BFP vector $$\bar C$$

float bfp_s16_min(const bfp_s16_t *b)

Get the minimum value of a 16-bit BFP vector.

Finds $$A$$, the minimum value among elements of input BFP vector $$\bar B$$. $$A$$ is returned by this function.

b must have been initialized (see bfp_s16_init()).

Operation Performed:

\begin{split}\begin{align*} & A \leftarrow min\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Parameters

b[in] Input vector

Returns

$$A$$, the value of $$\bar B$$’s minimum element

void bfp_s16_min_elementwise(bfp_s16_t *a, const bfp_s16_t *b, const bfp_s16_t *c)

Get the element-wise minimum of two 16-bit BFP vectors.

Each element of output vector $$\bar A$$ is set to the minimum of the corresponding elements in the input vectors $$\bar B$$ and $$\bar C$$.

a, b and c must have been initialized (see bfp_s16_init()), and must be the same length.

This operation can be performed safely in-place on b, but not on c.

Operation Performed:

\begin{split}\begin{align*} & A_k \leftarrow min(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}

Parameters
• a – Output BFP vector $$\bar A$$

• b – Input BFP vector $$\bar B$$

• c – Input BFP vector $$\bar C$$

unsigned bfp_s16_argmax(const bfp_s16_t *b)

Get the index of the maximum value of a 16-bit BFP vector.

Finds $$a$$, the index of the maximum value among the elements of input BFP vector $$\bar B$$. $$a$$ is returned by this function.

If i is the value returned, then the maximum value in $$\bar B$$ is ldexp(b->data[i], b->exp).

Operation Performed:

\begin{split}\begin{align*} & a \leftarrow argmax_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Notes

• If there is a tie for maximum value, the lowest tying index is returned.

Parameters

b[in] Input vector

Returns

$$a$$, the index of the maximum value from $$\bar B$$

unsigned bfp_s16_argmin(const bfp_s16_t *b)

Get the index of the minimum value of a 16-bit BFP vector.

Finds $$a$$, the index of the minimum value among the elements of input BFP vector $$\bar B$$. $$a$$ is returned by this function.

If i is the value returned, then the minimum value in $$\bar B$$ is ldexp(b->data[i], b->exp).

Operation Performed:

\begin{split}\begin{align*} & a \leftarrow argmin_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}

Notes

• If there is a tie for minimum value, the lowest tying index is returned.

Parameters

b[in] Input vector

Returns

$$a$$, the index of the minimum value from $$\bar B$$