32-Bit Block Floating-Point Functions¶
-
void bfp_complex_s32_set(bfp_complex_s32_t *a, const complex_s32_t b, const exponent_t exp)¶
Set all elements of a complex 32-bit BFP vector to a specified value.
The exponent of
a
is set toexp
, and each element’s mantissa is set tob
.After performing this operation, all elements will represent the same value \(b \cdot 2^{exp}\).
a
must have been initialized (see bfp_complex_s32_init()).- Parameters
a – [out] BFP vector to update
b – [in] New value each complex mantissa is set to
exp – [in] New exponent for the BFP vector
-
void bfp_complex_s32_use_exponent(bfp_complex_s32_t *a, const exponent_t exp)¶
Modify a complex 32-bit BFP vector to use a specified exponent.
This function forces complex BFP vector \(\bar A\) to use a specified exponent. The mantissa vector \(\bar a\) will be bit-shifted left or right to compensate for the changed exponent.
This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.
Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).
If the required fixed-point Q-format is
QX.Y
, whereY
is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameterexp
) is-Y
.a
points to input BFP vector \(\bar A\), with complex mantissa vector \(\bar a\) and exponent \(a\_exp\).a
is updated in place to produce resulting BFP vector \( \tilde{A} \) with complex mantissa vector \( \tilde{a} \) and exponent \(\tilde{a}\_exp\).exp
is \(\tilde{a}\_exp\), the required exponent. \(\Delta{}p = \tilde{a}\_exp - a\_exp\) is the required change in exponent.If \(\Delta{}p = 0\), the BFP vector is left unmodified.
If \(\Delta{}p > 0\), the required exponent is larger than the current exponent and an arithmetic right-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When applying a right-shift, precision may be lost by discarding the \(\Delta{}p\) least significant bits.
If \(\Delta{}p < 0\), the required exponent is smaller than the current exponent and a left-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 32-bit saturation bounds.
The exponent and headroom of
a
are updated by this function.- Operation Performed:
- \[\begin{split}\begin{align*} & \Delta{}p = \tilde{a}\_exp - a\_exp & \tilde{a_k} \leftarrow sat_{32}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{align*}\end{split}\]
- Parameters
a – [inout] Input BFP vector \(\bar A\) / Output BFP vector \(\tilde{A}\)
exp – [in] The required exponent, \(\tilde{a}\_exp\)
-
headroom_t bfp_complex_s32_headroom(bfp_complex_s32_t *b)¶
Get the headroom of a complex 32-bit BFP vector.
The headroom of a complex vector is the number of bits that the real and imaginary parts of each of its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.
In a BFP context, headroom applies to mantissas only, not exponents.
In particular, if the complex 32-bit mantissa vector \(\bar x\) has \(N\) bits of headroom, then for any element \(x_k\) of \(\bar x\)
\(-2^{31-N} \le Re\{x_k\} \lt 2^{31-N}\)
and
\(-2^{31-N} \le Im\{x_k\} \lt 2^{31-N}\)
And for any element \(X_k = x_k \cdot 2^{x\_exp}\) of a complex BFP vector \(\bar X\)
\(-2^{31 + x\_exp - N} \le Re\{X_k\} \lt 2^{31 + x\_exp - N} \)
and
\(-2^{31 + x\_exp - N} \le Im\{X_k\} \lt 2^{31 + x\_exp - N} \)
This function determines the headroom of
b
, updatesb->hr
with that value, and then returnsb->hr
.- Parameters
b – complex BFP vector to get the headroom of
- Returns
Headroom of complex BFP vector
b
-
void bfp_complex_s32_shl(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const left_shift_t b_shl)¶
Apply a left-shift to the mantissas of a complex 32-bit BFP vector.
Each complex mantissa of input BFP vector \(\bar B\) is left-shifted
b_shl
bits and stored in the corresponding element of output BFP vector \(\bar A\).This operation can be used to add or remove headroom from a BFP vector.
b_shl
is the number of bits that the real and imaginary parts of each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values forb_shl
will right-shift the mantissas.a
andb
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 32-bit range ( \(-2^{31} \lt \lt 2^{31}\)). To avoid saturation,
b_shl
should be no greater than the headroom ofb
(b->hr
).- Operation Performed:
- \[\begin{split}\begin{align*} & Re\{a_k\} \leftarrow sat_{32}( \lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & Im\{a_k\} \leftarrow sat_{32}( \lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{align*}\end{split}\]
- Parameters
a – [out] Complex output BFP vector \(\bar A\)
b – [in] Complex input BFP vector \(\bar B\)
b_shl – [in] Signed arithmetic left-shift to be applied to mantissas of \(\bar B\).
-
void bfp_complex_s32_real_mul(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_s32_t *c)¶
Multiply a complex 32-bit BFP vector element-wise by a real 32-bit BFP vector.
Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vector \(\bar B\) and real input BFP vector \(\bar C\) respectively.
a
,b
andc
must have been initialized (see bfp_complex_s32_init() and bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input real BFP vector \(\bar C\)
-
void bfp_complex_s32_mul(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Multiply one complex 32-bit BFP vector element-wise by another.
Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.
a
,b
andc
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
orc
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_conj_mul(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Multiply one complex 32-bit BFP vector element-wise by the complex conjugate of another.
Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vectors \(\bar B\), and \((C_k)^*\), the complex conjugate of the corresponding element of complex input BFP vector \(\bar C\).
a
,b
andc
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
orc
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot (C_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{align*}\end{split}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_macc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Multiply one complex 32-bit BFP vector by another element-wise and add the result to a third vector.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow A_k + (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
acc – [inout] Input/Output accumulator complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_nmacc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Multiply one complex 32-bit BFP vector by another element-wise and subtract the result from a third vector.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow A_k - (B_k \cdot C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
acc – [inout] Input/Output accumulator complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_conj_macc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Multiply one complex 32-bit BFP vector by the complex conjugate of another element-wise and add the result to a third vector.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow A_k + (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{align*}\end{split}\]
- Parameters
acc – [inout] Input/Output accumulator complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_conj_nmacc(bfp_complex_s32_t *acc, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Multiply one complex 32-bit BFP vector by the complex conjugate of another element-wise and subtract the result from a third vector.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow A_k - (B_k \cdot C_k^*) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } (C_k)^* \text{ is the complex conjugate of } C_k \end{align*}\end{split}\]
- Parameters
acc – [inout] Input/Output accumulator complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_real_scale(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const float_s32_t alpha)¶
Multiply a complex 32-bit BFP vector by a real scalar.
Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\), and real scalar \(\alpha\cdot 2^{\alpha\_exp}\), where \(\alpha\) and \(\alpha\_exp\) are the mantissa and exponent respectively of parameter
alpha
.a
andb
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{align*}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
alpha – [in] Real scalar by which \(\bar B\) is multiplied
-
void bfp_complex_s32_scale(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const float_complex_s32_t alpha)¶
Multiply a complex 32-bit BFP vector by a complex scalar.
Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex product of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\), and complex scalar \(\alpha\cdot 2^{\alpha\_exp}\), where \(\alpha\) and \(\alpha\_exp\) are the complex mantissa and exponent respectively of parameter
alpha
.a
andb
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} \cdot \left( \alpha \cdot 2^{\alpha\_exp} \right) \end{align*}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
alpha – [in] Complex scalar by which \(\bar B\) is multiplied
-
void bfp_complex_s32_add(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Add one complex 32-bit BFP vector to another.
Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the sum of \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.
a
,b
andc
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
orc
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} + \bar{C} \end{align*}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_add_scalar(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const float_complex_s32_t c)¶
Add a complex scalar to a complex 32-bit BFP vector.
Add a real scalar \(c\) to input BFP vector \(\bar B\) and store the result in BFP vector \(\bar A\).
a
, andb
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} + c \end{align*}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex scalar \(c\)
-
void bfp_complex_s32_sub(bfp_complex_s32_t *a, const bfp_complex_s32_t *b, const bfp_complex_s32_t *c)¶
Subtract one complex 32-bit BFP vector from another.
Each complex output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the difference between \(B_k\) and \(C_k\), the corresponding elements of complex input BFP vectors \(\bar B\) and \(\bar C\) respectively.
a
,b
andc
must have been initialized (see bfp_complex_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
orc
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} - \bar{C} \end{align*}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
c – [in] Input complex BFP vector \(\bar C\)
-
void bfp_complex_s32_to_complex_s16(bfp_complex_s16_t *a, const bfp_complex_s32_t *b)¶
Convert a complex 32-bit BFP vector to a complex 16-bit BFP vector.
Each complex 16-bit output element \(A_k\) of complex output BFP vector \(\bar A\) is set to the value of \(B_k\), the corresponding element of complex 32-bit input BFP vector \(\bar B\), with its bit-depth reduced to 16 bits.
a
andb
must have been initialized (see bfp_complex_s16_init() and bfp_complex_s32_init()), and must be the same length.This function preserves as much precision as possible.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \overset{16-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
a – [out] Output complex 16-bit BFP vector \(\bar A\)
b – [in] Input complex 32-bit BFP vector \(\bar B\)
-
void bfp_complex_s32_squared_mag(bfp_s32_t *a, const bfp_complex_s32_t *b)¶
Get the squared magnitude of each element of a complex 32-bit BFP vector.
Each element \(A_k\) of real output BFP vector \(\bar A\) is set to the squared magnitude of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).
a
andb
must have been initialized (see bfp_s32_init() bfp_complex_s32_init()), and must be the same length.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot (B_k)^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } (B_k)^* \text{ is the complex conjugate of } B_k \end{align*}\end{split}\]
- Parameters
a – [out] Output real BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
-
void bfp_complex_s32_mag(bfp_s32_t *a, const bfp_complex_s32_t *b)¶
Get the magnitude of each element of a complex 32-bit BFP vector.
Each element \(A_k\) of real output BFP vector \(\bar A\) is set to the magnitude of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).
a
andb
must have been initialized (see bfp_s32_init() bfp_complex_s32_init()), and must be the same length.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
a – [out] Output real BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
-
float_complex_s64_t bfp_complex_s32_sum(const bfp_complex_s32_t *b)¶
Get the sum of elements of a complex 32-bit BFP vector.
The elements of complex input BFP vector \(\bar B\) are summed together. The result is a complex 64-bit floating-point scalar \(a\), which is returned.
b
must have been initialized (see bfp_complex_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & a \leftarrow \sum_{k=0}^{N-1} \left( b_k \cdot 2^{B\_exp} \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input complex BFP vector \(\bar B\)
- Returns
\(a\), the sum of vector \(\bar B\)’s elements
-
void bfp_complex_s32_conjugate(bfp_complex_s32_t *a, const bfp_complex_s32_t *b)¶
Get the complex conjugate of each element of a complex 32-bit BFP vector.
Each element \(A_k\) of complex output BFP vector \(\bar A\) is set to the complex conjugate of \(B_k\), the corresponding element of complex input BFP vector \(\bar B\).
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow B_k^* \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \\ & \qquad\text{and } B_k^* \text{ is the complex conjugate of } B_k \end{align*}\end{split}\]
- Parameters
a – [out] Output complex BFP vector \(\bar A\)
b – [in] Input complex BFP vector \(\bar B\)
-
float_s64_t bfp_complex_s32_energy(const bfp_complex_s32_t *b)¶
Get the energy of a complex 32-bit BFP vector.
The energy of a complex 32-bit BFP vector here is the sum of the squared magnitudes of each of the vector’s elements.
- Operation Performed:
- \[\begin{split}\begin{align*} & a \leftarrow \sum_{k=0}^{N-1} \left( \left|b_k \cdot 2^{B\_exp}\right|^2 \right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input complex BFP vector \(\bar B\)
- Returns
\(a\), the energy of vector \(\bar B\)
-
void bfp_s32_init(bfp_s32_t *a, int32_t *data, const exponent_t exp, const unsigned length, const unsigned calc_hr)¶
Initialize a 32-bit BFP vector.
This function initializes each of the fields of BFP vector
a
.data
points to the memory buffer used to store elements of the vector, so it must be at leastlength * 4
bytes long, and must begin at a word-aligned address.exp
is the exponent assigned to the BFP vector. The logical value associated with thek
th element of the vector after initialization is \( data_k \cdot 2^{exp} \).If
calc_hr
is false,a->hr
is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initializea->hr
.- Parameters
a – [out] BFP vector to initialize
data – [in]
int32_t
buffer used to backa
exp – [in] Exponent of BFP vector
length – [in] Number of elements in the BFP vector
calc_hr – [in] Boolean indicating whether the HR of the BFP vector should be calculated
-
void bfp_complex_s32_init(bfp_complex_s32_t *a, complex_s32_t *data, const exponent_t exp, const unsigned length, const unsigned calc_hr)¶
Initialize a 32-bit complex BFP vector.
This function initializes each of the fields of
a
.Unlike
bfp_complex_s16_t
, complex 32-bit BFP vectors use a single buffer to store the real and imaginary parts of each mantissa, such that the imaginary part of elementk
follows the real part of elementk
in memory.data
points to the memory buffer used to store elements of the vector, and must be at leastlength * 8
bytes long.exp
is the exponent assigned to the BFP vector. The logical value associated with thek
th complex element of the vector after initialization will be \( \left(data_{2k} + i\cdot data_{2k+1} \right)\cdot2^{exp} \).If
calc_hr
is false,a->hr
is initialized to 0. Otherwise, the headroom of the the BFP vector is calculated and used to initializea->hr
.- Parameters
a – [out] BFP vector struct to initialize
data – [in]
complex_s32_t
buffer used to backa
exp – [in] Exponent of BFP vector
length – [in] Number of elements in BFP vector
calc_hr – [in] Boolean indicating whether the HR of the BFP vector should be calculated
-
bfp_s32_t bfp_s32_alloc(unsigned length)¶
Dynamically allocate a 32-bit BFP vector from the heap.
If allocation was unsuccessful, the
data
field of the returned vector will be NULL, and thelength
field will be zero. Otherwise,data
will point to the allocated memory and thelength
field will be the user-specified length. Thelength
argument must not be zero.Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_s32_set() on the retuned BFP vector.
BFP vectors allocated using this function must be deallocated using bfp_s32_dealloc() to avoid a memory leak.
To initialize a BFP vector using static memory allocation, use bfp_s32_init() instead.
See also
Note
This function always allocates an extra 2 elements so that
bfp_fft_unpack_mono()
can safely be used, but these two elements will NOT be reflected in the returned vector length.Note
Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.
- Parameters
length – [in] The length of the BFP vector to be allocated (in elements)
- Returns
32-bit BFP vector
-
bfp_complex_s32_t bfp_complex_s32_alloc(const unsigned length)¶
Dynamically allocate a complex 32-bit BFP vector from the heap.
If allocation was unsuccessful, the
data
field of the returned vector will be NULL, and thelength
field will be zero. Otherwise,data
will point to the allocated memory and thelength
field will be the user-specified length. Thelength
argument must not be zero.Neither the BFP exponent, headroom, nor the elements of the allocated mantissa vector are set by this function. To set the BFP vector elements to a known value, use bfp_complex_s32_set() on the retuned BFP vector.
BFP vectors allocated using this function must be deallocated using bfp_complex_s32_dealloc() to avoid a memory leak.
To initialize a BFP vector using static memory allocation, use bfp_complex_s32_init() instead.
See also
Note
Dynamic allocation of BFP vectors relies on allocation from the heap, and offers no guarantees about the execution time. Use of this function in any time-critical section of code is highly discouraged.
- Parameters
length – [in] The length of the BFP vector to be allocated (in elements)
- Returns
Complex 32-bit BFP vector
-
void bfp_s32_dealloc(bfp_s32_t *vector)¶
Deallocate a 32-bit BFP vector allocated by bfp_s32_alloc().
Use this function to free the heap memory allocated by bfp_s32_alloc().
BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_s32_t which has not had its
flags
ordata
manually manipulated, including:bfp_s32_t resulting from a successful call to bfp_s32_alloc()
bfp_s32_t resulting from an unsuccessful call to bfp_s32_alloc()
bfp_s32_t initialized with a call to bfp_s32_init()
In the latter two cases, this function does nothing. In the former, the
data
,length
andflags
fields ofvector
are cleared to zero.See also
- Parameters
vector – [in] BFP vector to be deallocated.
-
void bfp_complex_s32_dealloc(bfp_complex_s32_t *vector)¶
Deallocate a complex 32-bit BFP vector allocated by bfp_complex_s32_alloc().
Use this function to free the heap memory allocated by bfp_complex_s32_alloc().
BFP vectors whose mantissa buffer was (successfully) dynamically allocated have a flag set which indicates as much. This function can safely be called on any bfp_complex_s32_t which has not had its
flags
ordata
manually manipulated, including:bfp_complex_s32_t resulting from a successful call to bfp_complex_s32_alloc()
bfp_complex_s32_t resulting from an unsuccessful call to bfp_complex_s32_alloc()
bfp_complex_s32_t initialized with a call to bfp_complex_s32_init()
In the latter two cases, this function does nothing. In the former, the
data
,length
andflags
fields ofvector
are cleared to zero.See also
- Parameters
vector – [in] BFP vector to be deallocated.
-
void bfp_s32_set(bfp_s32_t *a, const int32_t b, const exponent_t exp)¶
Set all elements of a 32-bit BFP vector to a specified value.
The exponent of
a
is set toexp
, and each element’s mantissa is set tob
.After performing this operation, all elements will represent the same value \(b \cdot 2^{exp}\).
a
must have been initialized (see bfp_s32_init()).- Parameters
a – [out] BFP vector to update
b – [in] New value each mantissa is set to
exp – [in] New exponent for the BFP vector
-
void bfp_s32_use_exponent(bfp_s32_t *a, const exponent_t exp)¶
Modify a 32-bit BFP vector to use a specified exponent.
This function forces BFP vector \(\bar A\) to use a specified exponent. The mantissa vector \(\bar a\) will be bit-shifted left or right to compensate for the changed exponent.
This function can be used, for example, before calling a fixed-point arithmetic function to ensure the underlying mantissa vector has the needed Q-format. As another example, this may be useful when communicating with peripheral devices (e.g. via I2S) that require sample data to be in a specified format.
Note that this sets the current encoding, and does not fix the exponent permanently (i.e. subsequent operations may change the exponent as usual).
If the required fixed-point Q-format is
QX.Y
, whereY
is the number of fractional bits in the resulting mantissas, then the associated exponent (and value for parameterexp
) is-Y
.a
points to input BFP vector \(\bar A\), with mantissa vector \(\bar a\) and exponent \(a\_exp\).a
is updated in place to produce resulting BFP vector \(\tilde{A}\) with mantissa vector \(\tilde{a}\) and exponent \(\tilde{a}\_exp\).exp
is \(\tilde{a}\_exp\), the required exponent. \(\Delta{}p = \tilde{a}\_exp - a\_exp\) is the required change in exponent.If \(\Delta{}p = 0\), the BFP vector is left unmodified.
If \(\Delta{}p > 0\), the required exponent is larger than the current exponent and an arithmetic right-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When applying a right-shift, precision may be lost by discarding the \(\Delta{}p\) least significant bits.
If \(\Delta{}p < 0\), the required exponent is smaller than the current exponent and a left-shift of \(\Delta{}p\) bits is applied to the mantissas \(\bar a\). When left-shifting, saturation logic will be applied such that any element that can’t be represented exactly with the new exponent will saturate to the 32-bit saturation bounds.
The exponent and headroom of
a
are updated by this function.- Operation Performed:
- \[\begin{split}\begin{align*} & \Delta{}p = \tilde{a}\_exp - a\_exp & \tilde{a_k} \leftarrow sat_{32}( a_k \cdot 2^{-\Delta{}p} ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{A} \text{ (in elements) } \end{align*}\end{split}\]
- Parameters
a – [inout] Input BFP vector \(\bar A\) / Output BFP vector \(\tilde{A}\)
exp – [in] The required exponent, \(\tilde{a}\_exp\)
-
headroom_t bfp_s32_headroom(bfp_s32_t *b)¶
Get the headroom of a 32-bit BFP vector.
The headroom of a vector is the number of bits its elements can be left-shifted without losing any information. It conveys information about the range of values that vector may contain, which is useful for determining how best to preserve precision in potentially lossy block floating-point operations.
In a BFP context, headroom applies to mantissas only, not exponents.
In particular, if the 32-bit mantissa vector \(\bar x\) has \(N\) bits of headroom, then for any element \(x_k\) of \(\bar x\)
\(-2^{31-N} \le x_k \lt 2^{31-N}\)
And for any element \(X_k = x_k \cdot 2^{x\_exp}\) of a complex BFP vector \(\bar X\)
\(-2^{31 + x\_exp - N} \le X_k \lt 2^{31 + x\_exp - N} \)
This function determines the headroom of
b
, updatesb->hr
with that value, and then returnsb->hr
.- Parameters
b – BFP vector to get the headroom of
- Returns
Headroom of BFP vector
b
-
void bfp_s32_shl(bfp_s32_t *a, const bfp_s32_t *b, const left_shift_t b_shl)¶
Apply a left-shift to the mantissas of a 32-bit BFP vector.
Each mantissa of input BFP vector \(\bar B\) is left-shifted
b_shl
bits and stored in the corresponding element of output BFP vector \(\bar A\).This operation can be used to add or remove headroom from a BFP vector.
b_shl
is the number of bits that each mantissa will be left-shifted. This shift is signed and arithmetic, so negative values forb_shl
will right-shift the mantissas.a
andb
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.Note that this operation bypasses the logic protecting the caller from saturation or underflows. Output values saturate to the symmetric 32-bit range ( \(-2^{31} \lt \lt 2^{31}\)). To avoid saturation,
b_shl
should be no greater than the headroom ofb
(b->hr
).- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow sat_{32}( \lfloor b_k \cdot 2^{b\_shl} \rfloor ) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \\ & \qquad\text{ and } b_k \text{ and } a_k \text{ are the } k\text{th mantissas from } \bar{B}\text{ and } \bar{A}\text{ respectively} \end{align*}\end{split}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
b_shl – [in] Signed arithmetic left-shift to be applied to mantissas of \(\bar B\).
-
void bfp_s32_add(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)¶
Add two 32-bit BFP vectors together.
Add together two input BFP vectors \(\bar B\) and \(\bar C\) and store the result in BFP vector \(\bar A\).
a
,b
andc
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
orc
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} + \bar{C} \end{align*}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)
-
void bfp_s32_add_scalar(bfp_s32_t *a, const bfp_s32_t *b, const float_s32_t c)¶
Add a scalar to a 32-bit BFP vector.
Add a real scalar \(c\) to input BFP vector \(\bar B\) and store the result in BFP vector \(\bar A\).
a
, andb
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} + c \end{align*}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input scalar \(c\)
-
void bfp_s32_sub(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)¶
Subtract one 32-bit BFP vector from another.
Subtract input BFP vector \(\bar C\) from input BFP vector \(\bar C\) and store the result in BFP vector \(\bar A\).
a
,b
andc
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
orc
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} - \bar{C} \end{align*}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)
-
void bfp_s32_mul(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)¶
Multiply one 32-bit BFP vector by another element-wise.
Multiply each element of input BFP vector \(\bar B\) by the corresponding element of input BFP vector \(\bar C\) and store the results in output BFP vector \(\bar A\).
a
,b
andc
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
orc
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
a – Output BFP vector \(\bar A\)
b – Input BFP vector \(\bar B\)
c – Input BFP vector \(\bar C\)
-
void bfp_s32_macc(bfp_s32_t *acc, const bfp_s32_t *b, const bfp_s32_t *c)¶
Multiply one 32-bit BFP vector by another element-wise and add the result to a third vector.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow A_k + B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
acc – [inout] Input/Output accumulator BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)
-
void bfp_s32_nmacc(bfp_s32_t *acc, const bfp_s32_t *b, const bfp_s32_t *c)¶
Multiply one 32-bit BFP vector by another element-wise and subtract the result from a third vector.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow A_k - B_k \cdot C_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
acc – [inout] Input/Output accumulator BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)
-
void bfp_s32_scale(bfp_s32_t *a, const bfp_s32_t *b, const float_s32_t alpha)¶
Multiply a 32-bit BFP vector by a scalar.
Multiply input BFP vector \(\bar B\) by scalar \(\alpha \cdot 2^{\alpha\_exp}\) and store the result in output BFP vector \(\bar A\).
a
andb
must have been initialized (see bfp_s32_init()), and must be the same length.alpha
represents the scalar \(\alpha \cdot 2^{\alpha\_exp}\), where \(\alpha\) isalpha.mant
and \(\alpha\_exp\) isalpha.exp
.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{align*} \bar{A} \leftarrow \bar{B} \cdot \left(\alpha \cdot 2^{\alpha\_exp}\right) \end{align*}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
alpha – [in] Scalar by which \(\bar B\) is multiplied
-
void bfp_s32_abs(bfp_s32_t *a, const bfp_s32_t *b)¶
Get the absolute values of elements of a 32-bit BFP vector.
Compute the absolute value of each element \(B_k\) of input BFP vector \(\bar B\) and store the results in output BFP vector \(\bar A\).
a
andb
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow \left| B_k \right| \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
-
float_s64_t bfp_s32_sum(const bfp_s32_t *b)¶
Sum the elements of a 32-bit BFP vector.
Sum the elements of input BFP vector \(\bar B\) to get a result \(A = a \cdot 2^{a\_exp}\), which is returned. The returned value has a 64-bit mantissa.
b
must have been initialized (see bfp_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input BFP vector \(\bar B\)
- Returns
\(A\), the sum of elements of \(\bar B\)
-
float_s64_t bfp_s32_dot(const bfp_s32_t *b, const bfp_s32_t *c)¶
Compute the inner product of two 32-bit BFP vectors.
Adds together the element-wise products of input BFP vectors \(\bar B\) and \(\bar C\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is the 64-bit mantissa of the result and \(a\_exp\) is its associated exponent. \(A\) is returned.
b
andc
must have been initialized (see bfp_s32_init()), and must be the same length.- Operation Performed:
- \[\begin{split}\begin{align*} & a \cdot 2^{a\_exp} \leftarrow \sum_{k=0}^{N-1} \left( B_k \cdot C_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
b – [in] Input BFP vector \(\bar B\)
c – [in] Input BFP vector \(\bar C\)
- Returns
\(A\), the inner product of vectors \(\bar B\) and \(\bar C\)
-
void bfp_s32_clip(bfp_s32_t *a, const bfp_s32_t *b, const int32_t lower_bound, const int32_t upper_bound, const int bound_exp)¶
Clamp the elements of a 32-bit BFP vector to a specified range.
Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is in the range \( [ L \cdot 2^{bound\_exp}, U \cdot 2^{bound\_exp} ] \), otherwise it is set to the nearest value inside that range.
a
andb
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow \begin{cases} & L \cdot 2^{bound\_exp} & B_k \lt L \cdot 2^{bound\_exp} \\ & U \cdot 2^{bound\_exp} & B_k \gt U \cdot 2^{bound\_exp} \\ & B_k & otherwise & \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
lower_bound – [in] Mantissa of the lower clipping bound, \(L\)
upper_bound – [in] Mantissa of the upper clipping bound, \(U\)
bound_exp – [in] Shared exponent of the clipping bounds
-
void bfp_s32_rect(bfp_s32_t *a, const bfp_s32_t *b)¶
Rectify a 32-bit BFP vector.
Each element \(A_k\) of output BFP vector \(\bar A\) is set to the corresponding element \(B_k\) of input BFP vector \(\bar B\) if it is non-negative, otherwise it is set to \(0\).
a
andb
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow \begin{cases} & 0 & B_k \lt 0 \\ & B_k & otherwise & \end{cases} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
-
void bfp_s32_to_s16(bfp_s16_t *a, const bfp_s32_t *b)¶
Convert a 32-bit BFP vector into a 16-bit BFP vector.
Reduces the bit-depth of each 32-bit element \(B_k\) of input BFP vector \(\bar B\) to 16 bits, and stores the 16-bit result in the corresponding element \(A_k\) of output BFP vector \(\bar A\).
a
andb
must have been initialized (see bfp_s32_init() and bfp_s16_init()), and must be the same length.As much precision as possible will be retained.
- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \overset{16-bit}{\longleftarrow} B_k \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
-
void bfp_s32_sqrt(bfp_s32_t *a, const bfp_s32_t *b)¶
Get the square roots of elements of a 32-bit BFP vector.
Computes the square root of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).
a
andb
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow \sqrt{B_k} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Notes
Only the
XS3_BFP_SQRT_DEPTH_S32
(see xs3_math_conf.h) most significant bits of each result are computed.This function only computes real roots. For any \(B_k \lt 0\), the corresponding output \(A_k\) is set to \(0\).
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
-
void bfp_s32_inverse(bfp_s32_t *a, const bfp_s32_t *b)¶
Get the inverses of elements of a 32-bit BFP vector.
Computes the inverse of each element \(B_k\) of input BFP vector \(\bar B\) and stores the results in output BFP vector \(\bar A\).
a
andb
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow B_k^{-1} \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
a – [out] Output BFP vector \(\bar A\)
b – [in] Input BFP vector \(\bar B\)
-
float_s64_t bfp_s32_abs_sum(const bfp_s32_t *b)¶
Sum the absolute values of elements of a 32-bit BFP vector.
Sum the absolute values of elements of input BFP vector \(\bar B\) for a result \(A = a \cdot 2^{a\_exp}\), where \(a\) is a 64-bit mantissa and \(a\_exp\) is its associated exponent. \(A\) is returned.
b
must have been initialized (see bfp_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & A \leftarrow \sum_{k=0}^{N-1} \left| A_k \right| \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input BFP vector \(\bar B\)
- Returns
\(A\), the sum of absolute values of elements of \(\bar B\)
-
float_s32_t bfp_s32_mean(const bfp_s32_t *b)¶
Get the mean value of a 32-bit BFP vector.
Computes \(A = a \cdot 2^{a\_exp}\), the mean value of elements of input BFP vector \(\bar B\), where \(a\) is the 32-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.
b
must have been initialized (see bfp_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & A \leftarrow \frac{1}{N} \sum_{k=0}^{N-1} \left( B_k \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input BFP vector \(\bar B\)
- Returns
\(A\), the mean value of \(\bar B\)’s elements
-
float_s64_t bfp_s32_energy(const bfp_s32_t *b)¶
Get the energy (sum of squared of elements) of a 32-bit BFP vector.
Computes \(A = a \cdot 2^{a\_exp}\), the sum of squares of elements of input BFP vector \(\bar B\), where \(a\) is the 64-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.
b
must have been initialized (see bfp_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & A \leftarrow \sum_{k=0}^{N-1} \left( B_k^2 \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input BFP vector \(\bar B\)
- Returns
\(A\), \(\bar B\)’s energy
-
float_s32_t bfp_s32_rms(const bfp_s32_t *b)¶
Get the RMS value of elements of a 32-bit BFP vector.
Computes \(A = a \cdot 2^{a\_exp}\), the RMS value of elements of input BFP vector \(\bar B\), where \(a\) is the 32-bit mantissa of the result, and \(a\_exp\) is its associated exponent. \(A\) is returned.
The RMS (root-mean-square) value of a vector is the square root of the sum of the squares of the vector’s elements.
b
must have been initialized (see bfp_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & A \leftarrow \sqrt{\frac{1}{N}\sum_{k=0}^{N-1} \left( B_k^2 \right) } \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input BFP vector \(\bar B\)
- Returns
\(A\), the RMS value of \(\bar B\)’s elements
-
float_s32_t bfp_s32_max(const bfp_s32_t *b)¶
Get the maximum value of a 32-bit BFP vector.
Finds \(A\), the maximum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.
b
must have been initialized (see bfp_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & A \leftarrow max\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input vector
- Returns
\(A\), the value of \(\bar B\)’s maximum element
-
void bfp_s32_max_elementwise(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)¶
Get the element-wise maximum of two 32-bit BFP vectors.
Each element of output vector \(\bar A\) is set to the maximum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).
a
,b
andc
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
, but not onc
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow max(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
a – Output BFP vector \(\bar A\)
b – Input BFP vector \(\bar B\)
c – Input BFP vector \(\bar C\)
-
float_s32_t bfp_s32_min(const bfp_s32_t *b)¶
Get the minimum value of a 32-bit BFP vector.
Finds \(A\), the minimum value among elements of input BFP vector \(\bar B\). \(A\) is returned by this function.
b
must have been initialized (see bfp_s32_init()).- Operation Performed:
- \[\begin{split}\begin{align*} & A \leftarrow min\left(B_0, B_1, ..., B_{N-1} \right) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Parameters
b – [in] Input vector
- Returns
\(A\), the value of \(\bar B\)’s minimum element
-
void bfp_s32_min_elementwise(bfp_s32_t *a, const bfp_s32_t *b, const bfp_s32_t *c)¶
Get the element-wise minimum of two 32-bit BFP vectors.
Each element of output vector \(\bar A\) is set to the minimum of the corresponding elements in the input vectors \(\bar B\) and \(\bar C\).
a
,b
andc
must have been initialized (see bfp_s32_init()), and must be the same length.This operation can be performed safely in-place on
b
, but not onc
.- Operation Performed:
- \[\begin{split}\begin{align*} & A_k \leftarrow min(B_k, C_k) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B}\text{ and }\bar{C} \end{align*}\end{split}\]
- Parameters
a – Output BFP vector \(\bar A\)
b – Input BFP vector \(\bar B\)
c – Input BFP vector \(\bar C\)
-
unsigned bfp_s32_argmax(const bfp_s32_t *b)¶
Get the index of the maximum value of a 32-bit BFP vector.
Finds \(a\), the index of the maximum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.
If
i
is the value returned, then the maximum value in \(\bar B\) isldexp(b->data[i], b->exp)
.- Operation Performed:
- \[\begin{split}\begin{align*} & a \leftarrow argmax_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Notes
If there is a tie for maximum value, the lowest tying index is returned.
- Parameters
b – [in] Input vector
- Returns
\(a\), the index of the maximum value from \(\bar B\)
-
unsigned bfp_s32_argmin(const bfp_s32_t *b)¶
Get the index of the minimum value of a 32-bit BFP vector.
Finds \(a\), the index of the minimum value among the elements of input BFP vector \(\bar B\). \(a\) is returned by this function.
If
i
is the value returned, then the minimum value in \(\bar B\) isldexp(b->data[i], b->exp)
.- Operation Performed:
- \[\begin{split}\begin{align*} & a \leftarrow argmin_k\left(b_k\right) \\ & \qquad\text{for } k \in 0\ ...\ (N-1) \\ & \qquad\text{where } N \text{ is the length of } \bar{B} \end{align*}\end{split}\]
- Notes
If there is a tie for minimum value, the lowest tying index is returned.
- Parameters
b – [in] Input vector
- Returns
\(a\), the index of the minimum value from \(\bar B\)
-
void bfp_s32_convolve_valid(bfp_s32_t *y, const bfp_s32_t *x, const int32_t b_q30[], const unsigned b_length)¶
Convolve a 32-bit BFP vector with a short convolution kernel (“valid” mode).
Input BFP vector \(\bar X\) is convolved with a short fixed-point convolution kernel \(\bar b\) to produce output BFP vector \(\bar Y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar X\). The convolution is “valid” in the sense that no output elements are emitted where the filter taps extend beyond the bounds of the input vector, resulting in an output vector \(\bar Y\) with fewer elements.
The maximum filter order \(K\) supported by this function is \(7\).
y
is the output vector \(\bar Y\). If input \(\bar X\) has \(N\) elements, and the filter has \(K\) coefficients, then \(\bar Y\) has \(N-2P\) elements, where \(P = \lfloor K / 2 \rfloor\).x
is the input vector \(\bar X\) with length \(N\) and elements.b_q30[]
is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixed-point format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{-30}\).b_length
is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps).b_length
must be one of \( \{ 1, 3, 5, 7 \} \).- Operation Performed:
- \[\begin{split}\begin{align*} & Y_k \leftarrow \sum_{l=0}^{K-1} (X_{(k+l)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{align*}\end{split}\]
- Parameters
y – [out] Output BFP vector \(\bar Y\)
x – [in] Input BFP vector \(\bar X\)
b_q30 – [in] Convolution kernel \(\bar b\)
b_length – [in] The number of elements \(K\) in \(\bar b\)
-
void bfp_s32_convolve_same(bfp_s32_t *y, const bfp_s32_t *x, const int32_t b_q30[], const unsigned b_length, const pad_mode_e padding_mode)¶
Convolve a 32-bit BFP vector with a short convolution kernel (“same” mode).
Input BFP vector \(\bar X\) is convolved with a short fixed-point convolution kernel \(\bar b\) to produce output BFP vector \(\bar Y\). In other words, this function applies the \(K\)th-order FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar X\). The convolution mode is “same” in that the input vector is effectively padded such that the input and output vectors are the same length. The padding behavior is one of those given by pad_mode_e.
The maximum filter order \(K\) supported by this function is \(7\).
y
andx
are the output and input BFP vectors \(\bar Y\) and \(\bar X\) respectively.b_q30[]
is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixed-point format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{-30}\).b_length
is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps).b_length
must be one of \( \{ 1, 3, 5, 7 \} \).padding_mode
is one of the values from the pad_mode_e enumeration. The padding mode indicates the filter input values for filter taps that have extended beyond the bounds of the input vector \(\bar X\). See pad_mode_e for a list of supported padding modes and associated behaviors.- Operation Performed:
- \[\begin{split}\begin{align*} & \tilde{x}_i = \begin{cases} \text{determined by padding mode} & i \lt 0 \\ \text{determined by padding mode} & i \ge N \\ x_i & otherwise \end{cases} \\ & y_k \leftarrow \sum_{l=0}^{K-1} (\tilde{x}_{(k+l-P)} \cdot b_l \cdot 2^{-30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N-2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{align*}\end{split}\]
Note
Unlike bfp_s32_convolve_valid(), this operation cannot be performed safely in-place on
x
- Parameters
y – [out] Output BFP vector \(\bar Y\)
x – [in] Input BFP vector \(\bar X\)
b_q30 – [in] Convolution kernel \(\bar b\)
b_length – [in] The number of elements \(K\) in \(\bar b\)
padding_mode – [in] The padding mode to be applied at signal boundaries