XS3 32Bit Vector Functions¶

enum pad_mode_e¶
Supported padding modes for convolutions in “same” mode.
Values:

enumerator PAD_MODE_REFLECT¶
Vector is reflected at its boundaries, such that
\( \tilde{x}_i \begin{cases} x_{i} & i \lt 0 \\ x_{2N  2  i} & i \ge N \\ x_i & otherwise \end{cases} \)
For example, if the length \(N\) of input vector \(\bar x\) is \(7\) and the order \(K\) of the filter is \(5\), then
\( \bar{x} = [x_0, x_1, x_2, x_3, x_4, x_5, x_6] \)
\( \tilde{x} = [x_2, x_1, x_0, x_1, x_2, x_3, x_4, x_5, x_6, x_5, x_4] \)
Note that by convention the first element of \(\tilde{x}\) is considered to be at index \(P\), where \(P = \lfloor K/2 \rfloor\).

enumerator PAD_MODE_EXTEND¶
Vector is padded using the value of the bounding elements.
\( \tilde{x}_i \begin{cases} x_{0} & i \lt 0 \\ x_{N1} & i \ge N \\ x_i & otherwise \end{cases} \)
For example, if the length \(N\) of input vector \(\bar x\) is \(7\) and the order \(K\) of the filter is \(5\), then
\( \bar{x} = [x_0, x_1, x_2, x_3, x_4, x_5, x_6] \)
\( \tilde{x} = [x_0, x_0, x_0, x_1, x_2, x_3, x_4, x_5, x_6, x_6, x_6] \)
Note that by convention the first element of \(\tilde{x}\) is considered to be at index \(P\), where \(P = \lfloor K/2 \rfloor\).

enumerator PAD_MODE_ZERO¶
Vector is padded with zeroes.
\( \tilde{x}_i \begin{cases} 0 & i \lt 0 \\ 0 & i \ge N \\ x_i & otherwise \end{cases} \)
For example, if the length \(N\) of input vector \(\bar x\) is \(7\) and the order \(K\) of the filter is \(5\), then
\( \bar{x} = [x_0, x_1, x_2, x_3, x_4, x_5, x_6] \)
\( \tilde{x} = [0, 0, x_0, x_1, x_2, x_3, x_4, x_5, x_6, 0, 0] \)
Note that by convention the first element of \(\tilde{x}\) is considered to be at index \(P\), where \(P = \lfloor K/2 \rfloor\).

enumerator PAD_MODE_REFLECT¶

headroom_t xs3_vect_complex_s32_add(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Add one complex 32bit vector to another.
a[]
,b[]
andc[]
represent the complex 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the complex 32bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 32bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_complex_s32_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift applied to \(\bar b\)
c_shr – [in] Rightshift applied to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\).

headroom_t xs3_vect_complex_s32_add_scalar(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c, const unsigned length, const right_shift_t b_shr)¶
Add a scalar to a complex 32bit vector.
a[]
andb[]
represent the complex 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.c
is the complex scalar \(c\)to be added to each element of \(\bar b\).length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic rightshift applied to each element of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c\} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If elements of \(\bar b\) are the complex mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp}\), and \(c\) is the mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_complex_s32_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Note that \(c\_shr\) is an output of
xs3_vect_complex_s32_add_scalar_prepare()
, but is not a parameter to this function. The \(c\_shr\) produced byxs3_vect_complex_s32_add_scalar_prepare()
is to be applied by the user, and the result passed as inputc
.
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input scalar \(c\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Rightshift applied to \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\).

headroom_t xs3_vect_complex_s32_conj_mul(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one complex 32bit vector elementwise by the complex conjugate of another.
a[]
,b[]
andc[]
represent the 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{b_k'\} \cdot Re\{c_k'\} + Im\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{30} \\ & Im\{a_k\} \leftarrow \left( Im\{b_k'\} \cdot Re\{c_k'\}  Re\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{30} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32bit mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_complex_s32_conj_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift applied to elements of \(\bar b\).
c_shr – [in] Rightshift applied to elements of \(\bar c\).
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_headroom(const complex_s32_t x[], const unsigned length)¶
Calculate the headroom of a complex 32bit array.
The headroom of an Nbit integer is the number of bits that the integer’s value may be leftshifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.
The headroom of a
complex_s32_t
struct is the minimum of the headroom of each of its 32bit fields,re
andim
.The headroom of a
complex_s32_t
array is the minimum of the headroom of each of itscomplex_s32_t
elements.This function efficiently traverses the elements of \(\bar x\) to determine its headroom.
x[]
represents the complex 32bit vector \(\bar x\).x[]
must begin at a wordaligned address.length
is the number of elements inx[]
. Operation Performed:
 \[\begin{align*} min\!\{ HR_{32}\left(x_0\right), HR_{32}\left(x_1\right), ..., HR_{32}\left(x_{length1}\right) \} \end{align*}\]
 Parameters
x – [in] Complex input vector \(\bar x\)
length – [in] Number of elements in \(\bar x\)
 Throws
ET_LOAD_STORE – Raised if
x
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of vector \(\bar x\)

headroom_t xs3_vect_complex_s32_macc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one complex 32bit vector elementwise by another, and add the result to an accumulator.
acc[]
represents the complex 32bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the complex 32bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a wordaligned address.
length
is the number of elements in each of the vectors.acc_shr
,b_shr
andc_shr
are the signed arithmetic rightshifts applied to input elements \(a_k\), \(b_k\) and \(c_k\). Operation Performed:
 \[\begin{split}\begin{align*} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\}  Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\} + v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\} + s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).
The function xs3_vect_complex_s32_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
 Parameters
acc – [inout] Complex accumulator \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic rightshift applied to accumulator elements.
b_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar b\)
c_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_nmacc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one complex 32bit vector elementwise by another, and subtract the result from an accumulator.
acc[]
represents the complex 32bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the complex 32bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a wordaligned address.
length
is the number of elements in each of the vectors.acc_shr
,b_shr
andc_shr
are the signed arithmetic rightshifts applied to input elements \(a_k\), \(b_k\) and \(c_k\). Operation Performed:
 \[\begin{split}\begin{align*} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\}  Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\}  v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\}  s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).
The function xs3_vect_complex_s32_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
 Parameters
acc – [inout] Complex accumulator \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic rightshift applied to accumulator elements.
b_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar b\)
c_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_conj_macc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one complex 32bit vector elementwise by the complex conjugate of another, and add the result to an accumulator.
acc[]
represents the complex 32bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the complex 32bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a wordaligned address.
length
is the number of elements in each of the vectors.acc_shr
,b_shr
andc_shr
are the signed arithmetic rightshifts applied to input elements \(a_k\), \(b_k\) and \(c_k\). Operation Performed:
 \[\begin{split}\begin{align*} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\}  Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\} + v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\} + s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).
The function xs3_vect_complex_s32_conj_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
 Parameters
acc – [inout] Complex accumulator \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic rightshift applied to accumulator elements.
b_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar b\)
c_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_conj_nmacc(complex_s32_t acc[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one complex 32bit vector elementwise by the complex conjugate of another, and subtract the result from an accumulator.
acc[]
represents the complex 32bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the complex 32bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a wordaligned address.
length
is the number of elements in each of the vectors.acc_shr
,b_shr
andc_shr
are the signed arithmetic rightshifts applied to input elements \(a_k\), \(b_k\) and \(c_k\). Operation Performed:
 \[\begin{split}\begin{align*} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( ( Re\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\} + Im\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & s_k \leftarrow round( sat_{32}( ( Im\{\tilde{b}_k\} \cdot Re\{\tilde{c}_k\}  Re\{\tilde{b}_k\} \cdot Im\{\tilde{c}_k\} ) \cdot 2^{30}) ) \\ & Re\{a_k\} \leftarrow sat_{32}( Re\{\tilde{a}_k\}  v_k ) \\ & Im\{a_k\} \leftarrow sat_{32}( Im\{\tilde{a}_k\}  s_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + b\_shr + c\_shr \).
The function xs3_vect_complex_s32_conj_nmacc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
 Parameters
acc – [inout] Complex accumulator \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic rightshift applied to accumulator elements.
b_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar b\)
c_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_mag(int32_t a[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr, const complex_s32_t *rot_table, const unsigned table_rows)¶
Compute the magnitude of each element of a complex 32bit vector.
a[]
represents the real 32bit output mantissa vector \(\bar a\).b[]
represents the complex 32bit input mantissa vector \(\bar b\).a[]
andb[]
must each begin at a wordaligned address.length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic rightshift applied to elements of \(\bar b\).rot_table
must point to a precomputed table of complex vectors used in calculating the magnitudes.table_rows
is the number of rows in the table. This library is distributed with a default version of the required rotation table. The following symbols can be used to refer to it in user code:const extern unsigned rot_table32_rows; const extern complex_s32_t rot_table32[30][4];
Faster computation (with reduced precision) can be achieved by generating a smaller version of the table. A python script is provided to generate this table.
 Todo:
Point to documentation page on generating this table.
 Operation Performed:
 \[\begin{split}\begin{align*} & v_k \leftarrow b_k \cdot 2^{b\_shr} \\ & a_k \leftarrow \sqrt { {\left( Re\{v_k\} \right)}^2 + {\left( Im\{v_k\} \right)}^2 } & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 32bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).
The function xs3_vect_complex_s32_mag_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).
See also
 Parameters
a – [out] Real output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Rightshift appled to \(\bar b\)
rot_table – [in] Precomputed rotation table required for calculating magnitudes
table_rows – [in] Number of rows in
rot_table
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

headroom_t xs3_vect_complex_s32_mul(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one complex 32bit vector elementwise by another.
a[]
,b[]
andc[]
represent the 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{b_k'\} \cdot Re\{c_k'\}  Im\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{30} \\ & Im\{a_k\} \leftarrow \left( Im\{b_k'\} \cdot Re\{c_k'\} + Re\{b_k'\} \cdot Im\{c_k'\} \right) \cdot 2^{30} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32bit mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).
The function xs3_vect_complex_s32_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\), and \(\bar c\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift appled to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_real_mul(complex_s32_t a[], const complex_s32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply a complex 32bit vector elementwise by a real 32bit vector.
a[]
andb[]
represent the complex 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively.c[]
represents the real 32bit mantissa vector \(\bar c\).a[]
,b[]
, andc[]
each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{b_k'\} \cdot c_k' \right) \cdot 2^{30} \\ & Im\{a_k\} \leftarrow \left( Im\{b_k'\} \cdot c_k' \right) \cdot 2^{30} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32bit mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).
The function xs3_vect_complex_s32_real_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Parameters
a – [out] Complex output vector \(\bar a\).
b – [in] Complex input vector \(\bar b\).
c – [in] Real input vector \(\bar c\).
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\), and \(\bar c\).
b_shr – [in] Rightshift appled to \(\bar b\).
c_shr – [in] Rightshift appled to \(\bar c\).
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

headroom_t xs3_vect_complex_s32_real_scale(complex_s32_t a[], const complex_s32_t b[], const int32_t c, const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply a complex 32bit vector by a real scalar.
a[]
andb[]
represent the complex 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively.c
represents the real 32bit scale factor \(c\).a[]
andb[]
each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshift applied to each element of \(\bar b\) and to \(c\). Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} \cdot c \\ & Im\{a_k\} \leftarrow Im\{b_k'\} \cdot c \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 16bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16bit mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).
The function xs3_vect_complex_s32_real_scale_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\), and \(\bar c\)
b_shr – [in] Rightshift applied to \(\bar b\)
c_shr – [in] Rightshift applied to \(c\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

headroom_t xs3_vect_complex_s32_scale(complex_s32_t a[], const complex_s32_t b[], const int32_t c_real, const int32_t c_imag, const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply a complex 32bit vector by a complex 32bit scalar.
a[]
andb[]
represent the complex 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively.c
represents the complex 32bit scale factor \(c\).a[]
andb[]
each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and to \(c\). Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow \left( Re\{v_k\} \cdot Re\{c\}  Im\{v_k\} \cdot Im\{c\} \right) \cdot 2^{30} \\ & Im\{a_k\} \leftarrow \left( Re\{v_k\} \cdot Im\{c\} + Im\{v_k\} \cdot Re\{c\} \right) \cdot 2^{30} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 32bit mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr\).
The function xs3_vect_complex_s32_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Parameters
a – [out] Complex output vector \(\bar a\).
b – [in] Complex input vector \(\bar b\).
c_real – [in] Real part of \(c\)
c_imag – [in] Imaginary part of \(c\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\).
b_shr – [in] Rightshift appled to \(\bar b\).
c_shr – [in] Rightshift applied to \(c\).
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

void xs3_vect_complex_s32_set(complex_s32_t a[], const int32_t b_real, const int32_t b_imag, const unsigned length)¶
Set each element of a complex 32bit vector to a specified value.
a[]
represents a complex 32bit vector \(\bar a\).a[]
must begin at a wordaligned address.b_real
andb_imag
are the real and imaginary parts to which each element will be set.length
is the number of elements ina[]
. Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow b\_real + j\cdot b\_imag \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \\ & \qquad\text{ where } j^2 = 1 \end{align*}\end{split}\]
 Block FloatingPoint

If \(b\) is the mantissa of floatingpoint value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Parameters
a – [out] Complex output vector \(\bar a\)
b_real – [in] Value to set real part of elements of \(\bar a\) to
b_imag – [in] Value to set imaginary part of elements of \(\bar a\) to
length – [in] Number of elements in \(\bar a\)
 Throws
ET_LOAD_STORE – Raised if
a
is not wordaligned (See Note: Vector Alignment)

headroom_t xs3_vect_complex_s32_shl(complex_s32_t a[], const complex_s32_t b[], const unsigned length, const left_shift_t b_shl)¶
Leftshift each element of a complex 32bit vector by a specified number of bits.
a[]
andb[]
represent the complex 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in \(\bar a\) and \(\bar b\).b_shl
is the signed arithmetic leftshift applied to each element of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & Re\{a_k\} \leftarrow sat_{32}(\lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{32}(\lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 32bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
length – [in] Number of elements in vector \(\bar b\)
b_shl – [in] Leftshift applied to \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_shr(complex_s32_t a[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)¶
Rightshift each element of a complex 32bit vector by a specified number of bits.
a[]
andb[]
represent the complex 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in \(\bar a\) and \(\bar b\).b_shr
is the signed arithmetic rightshift applied to each element of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & Re\{a_k\} \leftarrow sat_{32}(\lfloor Re\{b_k\} \cdot 2^{b\_shr} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{32}(\lfloor Im\{b_k\} \cdot 2^{b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 32bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shr}\) and \(a\_exp = b\_exp\).
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
length – [in] Number of elements in vector \(\bar b\)
b_shr – [in] Rightshift applied to \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_complex_s32_squared_mag(int32_t a[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)¶
Computes the squared magnitudes of elements of a complex 32bit vector.
a[]
represents the complex 32bit mantissa vector \(\bar a\).b[]
represents the real 32bit mantissa vector \(\bar b\). Each must begin at a wordaligned address.length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic rightshift appled to each element of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & a_k \leftarrow ((Re\{b_k'\})^2 + (Im\{b_k'\})^2)\cdot 2^{30} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the complex 32bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 32bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = 2 \cdot (b\_exp + b\_shr)\).
The function xs3_vect_complex_s32_squared_mag_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Rightshift appled to \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
is not double wordaligned orb
is not wordaligned (See Note: Vector Alignment)

headroom_t xs3_vect_complex_s32_sub(complex_s32_t a[], const complex_s32_t b[], const complex_s32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Subtract one complex 32bit vector from another.
a[]
,b[]
andc[]
represent the complex 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\}  Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\}  Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the complex 32bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 32bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_complex_s32_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
 Parameters
a – [out] Complex output vector \(\bar a\)
b – [in] Complex input vector \(\bar b\)
c – [in] Complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift applied to \(\bar b\)
c_shr – [in] Rightshift applied to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\).

void xs3_vect_complex_s32_sum(complex_s64_t *a, const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)¶
Compute the sum of elements of a complex 32bit vector.
a
is the complex 64bit mantissa of the resulting sum.b[]
represents the complex 32bit mantissa vector \(\bar b\).b[]
must begin at a wordaligned address.length
is the number of elements in \(\bar b\).b_shr
is the unsigned arithmetic rightshift appled to each element of \(\bar b\).b_shr
cannot be negative. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow b_k \cdot 2^{b\_shr} \\ & Re\{a\} \leftarrow \sum_{k=0}^{length1} \left( Re\{b_k'\} \right) \\ & Im\{a\} \leftarrow \sum_{k=0}^{length1} \left( Im\{b_k'\} \right) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then \(a\) is the complex 64bit mantissa of floatingpoint value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).
The function xs3_vect_complex_s32_sum_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Additional Details

Internally the sum accumulates into four separate complex 40bit accumulators. These accumulators apply symmetric 40bit saturation logic (with bounds \(\pm 2^{39}1\)) with each added element. At the end, the 4 accumulators are summed together into the 64bit fields of
a
. No saturation logic is applied at this final step.In the most extreme case, each \(b_k\) may be \(2^{31}\). \(256\) of these added into the same accumulator is \(2^{39}\) which would saturate to \(2^{39}+1\), introducing 1 LSb of error (which may or may not be acceptable given a particular circumstance). The final result for each part then may be as large as \(4\cdot(2^{39}+1) = 2^{41}+4 \), each fitting into a 42bit signed integer.
See also
 Parameters
a – [out] Complex sum \(a\)
b – [in] Complex input vector \(\bar b\).
length – [in] Number of elements in vector \(\bar b\).
b_shr – [in] Rightshift appled to \(\bar b\).
 Throws
ET_LOAD_STORE – Raised if
b
is not wordaligned (See Note: Vector Alignment)

void xs3_vect_complex_s32_tail_reverse(complex_s32_t x[], const unsigned length)¶
Reverses the order of the tail of a complex 32bit vector.
Reverses the order of elements in the tail of the complex 32bit vector \(\bar x\). The tail of \(\bar x\), in this context, is all elements of \(\bar x\) except for \(x_0\). In other words, the first element \(x_0\) remains where it is, and the remaining \(length1\) elements are rearranged to have their order reversed.
This function is used when performing a forward or inverse FFT on a single sequence of real values (i.e. the mono FFT), and operates inplace on
x[]
. Parameter Details

x[]
represents the complex 32bit vector \(\bar x\), which is both an input to and an output of this function.x[]
must begin at a wordaligned address.length
is the number of elements in \(\bar x\).
See also
 Operation Performed:
 \[\begin{split}\begin{align*} & x_0 \leftarrow x_0 \\ & x_k \leftarrow x_{length  k} \\ & \qquad\text{ for }k\in 1\ ...\ (length1) \end{align*}\end{split}\]
 Parameters
x – [inout] Complex vector to have its tail reversed.
length – [in] Number of elements in \(\bar x\)
 Throws
ET_LOAD_STORE – Raised if
x
is not wordaligned (See Note: Vector Alignment)

headroom_t xs3_vect_complex_s32_conjugate(complex_s32_t a[], const complex_s32_t b[], const unsigned length)¶
Get the complex conjugate of a complex 32bit vector.
The complex conjugate of a complex scalar \(z = x + yi\) is \(z^* = x  yi\). This function computes the complex conjugate of each element of \(\bar b\) (negates the imaginary part of each element) and places the result in \(\bar a\).
a[]
is the complex 32bit output vector \(\bar a\).b[]
is the complex 32bit input vector \(\bar b\).Both
a
andb
must point to wordaligned addresses.length
is the number of elements in \(\bar a\) and \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & Re\{a_k\} \leftarrow Re\{b_k\} \\ & Im\{a_k\} \leftarrow  Im\{b_k\} \\ & \qquad\text{ for }k\in 1\ ...\ (length1) \end{align*}\end{split}\]
 Parameters
a – [out] Complex 32bit output vector \(\bar a\)
b – [in] Complex 32bit input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

headroom_t xs3_vect_s32_copy(int32_t a[], const int32_t b[], const unsigned length)¶
Copy one 32bit vector to another.
This function is effectively a constrained version of
memcpy
.With the constraints below met, this function should be modestly faster than
memcpy
.a[]
is the output vector to which elements are copied.b[]
is the input vector from which elements are copied.a
andb
each must begin at a wordaligned address.length
is the number of elements to be copied.length
must be a multiple of 8. Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow b_k \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar a\) and \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

headroom_t xs3_vect_s32_abs(int32_t a[], const int32_t b[], const unsigned length)¶
Compute the elementwise absolute value of a 32bit vector.
a[]
andb[]
represent the 32bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors. Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow sat_{32}(\left b_k \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

int64_t xs3_vect_s32_abs_sum(const int32_t b[], const unsigned length)¶
Compute the sum of the absolute values of elements of a 32bit vector.
b[]
represents the 32bit mantissa vector \(\bar b\).b[]
must begin at a wordaligned address.length
is the number of elements in \(\bar b\). Operation Performed:
 \[\begin{align*} \sum_{k=0}^{length1} sat_{32}(\left b_k \right) \end{align*}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 64bit mantissa of floatingpoint value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Additional Details

Internally the sum accumulates into 8 separate 40bit accumulators. These accumulators apply symmetric 40bit saturation logic (with bounds \(\pm (2^{39}1)\)) with each added element. At the end, the 8 accumulators are summed together into the 64bit value \(a\) which is returned by this function. No saturation logic is applied at this final step.
Because symmetric 32bit saturation logic is applied when computing the absolute value, in the corner case where each element is
INT32_MIN
, each of the 8 accumulators can accumulate \(256\) elements before saturation is possible. Therefore, with \(b\_hr\) bits of headroom, no saturation of intermediate results is possible with fewer than \(2^{11 + b\_hr}\) elements in \(\bar b\).If the length of \(\bar b\) is greater than \(2^{11 + b\_hr}\), the sum can be computed piecewise in several calls to this function, with the partial results summed in user code.
 Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
b
is not wordaligned (See Note: Vector Alignment) Returns
The 64bit sum \(a\)

headroom_t xs3_vect_s32_add(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Add together two 32bit vectors.
a[]
,b[]
andc[]
represent the 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' = sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' = sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}\!\left( b_k' + c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_s32_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift appled to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

headroom_t xs3_vect_s32_add_scalar(int32_t a[], const int32_t b[], const int32_t c, const unsigned length, const right_shift_t b_shr)¶
Add a scalar to a 32bit vector.
a[]
,b[]
represent the 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.c
is the scalar \(c\) to be added to each element of \(\bar b\).length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic rightshift applied to each element of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & b_k' = sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}\!\left( b_k' + c \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If elements of \(\bar b\) are the mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp} \), and \(c\) is the mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_s32_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Note that \(c\_shr\) is an output of
xs3_vect_s32_add_scalar_prepare()
, but is not a parameter to this function. The \(c\_shr\) produced byxs3_vect_s32_add_scalar_prepare()
is to be applied by the user, and the result passed as inputc
.
See also
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input scalar \(c\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Rightshift appled to \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\).

unsigned xs3_vect_s32_argmin(const int32_t b[], const unsigned length)¶
Obtain the array index of the minimum element of a 32bit vector.
b[]
represents the 32bit input vector \(\bar b\). It must begin at a wordaligned address.length
is the number of elements in \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & a \leftarrow argmin_k\{ b_k \} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elemetns in \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
b
is not wordaligned (See Note: Vector Alignment) Returns
\(a\), the index of the minimum element of vector \(\bar b\). If there is a tie for the minimum value, the lowest tying index is returned.

headroom_t xs3_vect_s32_clip(int32_t a[], const int32_t b[], const unsigned length, const int32_t lower_bound, const int32_t upper_bound, const right_shift_t b_shr)¶
Clamp the elements of a 32bit vector to a specified range.
a[]
andb[]
represent the 32bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors.lower_bound
andupper_bound
are the lower and upper bounds of the clipping range respectively. These bounds are checked for each element of \(\bar b\) only afterb_shr
is applied.b_shr
is the signed arithmetic rightshift applied to elements of \(\bar b\) before being compared to the upper and lower bounds.If \(\bar b\) are the mantissas for a BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the exponent \(a\_exp\) of the output BFP vector \(\bar{a} \cdot 2^{a\_exp}\) is given by \(a\_exp = b\_exp + b\_shr\).
 Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & a_k \leftarrow \begin{cases} lower\_bound & b_k' \le lower\_bound \\ & upper\_bound & b_k' \ge upper\_bound \\ & b_k' & otherwise \end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
lower_bound – [in] Lower bound of clipping range
upper_bound – [in] Upper bound of clipping range
b_shr – [in] Arithmetic rightshift applied to elements of \(\bar b\) prior to clipping
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

int64_t xs3_vect_s32_dot(const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Compute the inner product between two 32bit vectors.
b[]
andc[]
represent the 32bit mantissa vectors \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & a \leftarrow \sum_{k=0}^{length1}\left(round( b_k' \cdot c_k' \cdot 2^{30} ) \right) \\ & \qquad\text{where } a \text{ is returned} \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the mantissas of the BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c}\cdot 2^{c\_exp}\), then result \(a\) is the 64bit mantissa of the result \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr + 30\).
If needed, the bitdepth of \(a\) can then be reduced to 32 bits to get a new result \(a' \cdot 2^{a\_exp'}\) where \(a' = a \cdot 2^{a\_shr}\) and \(a\_exp' = a\_exp + a\_shr\).
The function xs3_vect_s32_dot_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Additional Details

The 30bit rounding rightshift applied to each of the 64bit products \(b_k \cdot c_k\) is a feature of the hardware and cannot be avoided. As such, if the input vectors \(\bar b\) and \(\bar c\) together have too much headroom (i.e. \(b\_hr + c\_hr\)), the sum may effectively vanish. To avoid this situation, negative values of
b_shr
andc_shr
may be used (with the stipulation that \(b\_shr \ge b\_hr\) and \(c\_shr \ge c\_hr\) if saturation of \(b_k'\) and \(c_k'\) is to be avoided). The less headroom \(b_k'\) and \(c_k'\) have, the greater the precision of the final result.Internally, each product \((b_k' \cdot c_k' \cdot 2^{30})\) accumulates into one of eight 40bit accumulators (which are all used simultaneously) which apply symmetric 40bit saturation logic (with bounds \(\approx 2^{39}\)) with each value added. The saturating arithmetic employed is not associative and no indication is given if saturation occurs at an intermediate step. To avoid satuation errors,
length
should be no greater than \(2^{10+b\_hr+c\_hr}\), where \(b\_hr\) and \(c\_hr\) are the headroom of \(\bar b\) and \(\bar c\) respectively.If the caller’s mantissa vectors are longer than that, the full inner product can be found by calling this function multiple times for partial inner products on subsequences of the input vectors, and adding the results in user code.
In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are applicationspecific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.
 Parameters
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift appled to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
b
orc
is not wordaligned (See Note: Vector Alignment) Returns
The inner product of vectors \(\bar b\) and \(\bar c\), scaled as indicated above.

int64_t xs3_vect_s32_energy(const int32_t b[], const unsigned length, const right_shift_t b_shr)¶
Calculate the energy (sum of squares of elements) of a 32bit vector.
b[]
represents the 32bit mantissa vector \(\bar b\).b[]
must begin at a wordaligned address.length
is the number of elements in \(\bar b\).b_shr
is the signed arithmetic rightshift applied to elements of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & a \leftarrow \sum_{k=0}^{length1} round((b_k')^2 \cdot 2^{30}) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of the BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then floatingpoint result is \(a \cdot 2^{a\_exp}\), where the 64bit mantissa \(a\) is returned by this function, and \(a\_exp = 30 + 2 \cdot (b\_exp + b\_shr) \).
The function xs3_vect_s32_energy_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Additional Details

The 30bit rounding rightshift applied to each element of the 64bit products \((b_k')^2\) is a feature of the hardware and cannot be avoided. As such, if the input vector \(\bar b\) has too much headroom (i.e. \(2\cdot b\_hr\)), the sum may effectively vanish. To avoid this situation, negative values of
b_shr
may be used (with the stipulation that \(b\_shr \ge b\_hr\) if satuartion of \(b_k'\) is to be avoided). The less headroom \(b_k'\) has, the greater the precision of the final result.Internally, each product \((b_k')^2 \cdot 2^{30}\) accumulates into one of eight 40bit accumulators (which are all used simultaneously) which apply symmetric 40bit saturation logic (with bounds \(\approx 2^{39}\)) with each value added. The saturating arithmetic employed is not associative and no indication is given if saturation occurs at an intermediate step. To avoid saturation errors,
length
should be no greater than \(2^{10+2\cdotb\_hr}\), where \(b\_hr\) is the headroom of \(\bar b\).If the caller’s mantissa vector is longer than that, the full result can be found by calling this function multiple times for partial results on subsequences of the input, and adding the results in user code.
In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are applicationspecific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.
 Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
b_shr – [in] Rightshift appled to \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
b
is not wordaligned (See Note: Vector Alignment) Returns
64bit mantissa of vector \(\bar b\)’s energy

headroom_t xs3_vect_s32_headroom(const int32_t x[], const unsigned length)¶
Calculate the headroom of a 32bit vector.
The headroom of an Nbit integer is the number of bits that the integer’s value may be leftshifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.
The headroom of an
int32_t
array is the minimum of the headroom of each of itsint32_t
elements.This function efficiently traverses the elements of
a[]
to determine its headroom.x[]
represents the 32bit vector \(\bar x\).x[]
must begin at a wordaligned address.length
is the number of elements inx[]
. Operation Performed:
 \[\begin{align*} min\!\{ HR_{32}\left(x_0\right), HR_{32}\left(x_1\right), ..., HR_{32}\left(x_{length1}\right) \} \end{align*}\]
 Parameters
x – [in] Input vector \(\bar x\)
length – [in] The number of elements in
x[]
 Throws
ET_LOAD_STORE – Raised if
x
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of vector \(\bar x\)

headroom_t xs3_vect_s32_inverse(int32_t a[], const int32_t b[], const unsigned length, const unsigned scale)¶
Compute the inverse of elements of a 32bit vector.
a[]
andb[]
represent the 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each vector must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors.scale
is a scaling parameter used to maximize the precision of the result. Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow \lfloor\frac{2^{scale}}{b_k}\rfloor \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = scale  b\_exp\).
The function xs3_vect_s32_inverse_prepare() can be used to obtain values for \(a\_exp\) and \(scale\).
See also
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
scale – [in] Scale factor applied to dividend when computing inverse
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

int32_t xs3_vect_s32_max(const int32_t b[], const unsigned length)¶
Find the maximum value in a 32bit vector.
b[]
represents the 32bit vector \(\bar b\). It must begin at a wordaligned address.length
is the number of elements in \(\bar b\). Operation Performed:
 \[\begin{align*} max\{ x_0, x_1, ..., x_{length1} \} \end{align*}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32bit mantissa of floatingpoint value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
b
is not wordaligned (See Note: Vector Alignment) Returns
Maximum value from \(\bar b\)

headroom_t xs3_vect_s32_max_elementwise(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Get the elementwise maximum of two 32bit vectors.
a[]
,b[]
andc[]
represent the 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
, but not onc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & a_k \leftarrow max(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).
The function xs3_vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Warning
For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift appled to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of vector \(\bar a\)

int32_t xs3_vect_s32_min(const int32_t b[], const unsigned length)¶
Find the minimum value in a 32bit vector.
b[]
represents the 32bit vector \(\bar b\). It must begin at a wordaligned address.length
is the number of elements in \(\bar b\). Operation Performed:
 \[\begin{align*} max\{ x_0, x_1, ..., x_{length1} \} \end{align*}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32bit mantissa of floatingpoint value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
b
is not wordaligned (See Note: Vector Alignment) Returns
Minimum value from \(\bar b\)

headroom_t xs3_vect_s32_min_elementwise(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Get the elementwise minimum of two 32bit vectors.
a[]
,b[]
andc[]
represent the 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
, but not onc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & a_k \leftarrow min(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).
The function xs3_vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Warning
For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift appled to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of vector \(\bar a\)

headroom_t xs3_vect_s32_mul(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one 32bit vector elementwise by another.
a[]
,b[]
andc[]
represent the 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}(round(b_k' \cdot c_k' \cdot 2^{30})) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr + 30\).
The function xs3_vect_s32_mul_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift appled to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) [xs3_vect_s32_mul] Returns
Headroom of vector \(\bar a\)

headroom_t xs3_vect_s32_macc(int32_t acc[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)¶
[xs3_vect_s32_mul]
Multiply one 32bit vector elementwise by another, and add the result to an accumulator.
acc[]
represents the 32bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the 32bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a wordaligned address.
length
is the number of elements in each of the vectors.acc_shr
,b_shr
andc_shr
are the signed arithmetic rightshifts applied to input elements \(a_k\), \(b_k\) and \(c_k\). Operation Performed:
 \[\begin{split}\begin{align*} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( \tilde{b}_k \cdot \tilde{c}_k \cdot 2^{30} ) ) \\ & a_k \leftarrow sat_{32}( \tilde{a}_k + v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
 Parameters
acc – [inout] Accumulator \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic rightshift applied to accumulator elements.
b_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar b\)
c_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_s32_nmacc(int32_t acc[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply one 32bit vector elementwise by another, and subtract the result from an accumulator.
acc[]
represents the 32bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the 32bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a wordaligned address.
length
is the number of elements in each of the vectors.acc_shr
,b_shr
andc_shr
are the signed arithmetic rightshifts applied to input elements \(a_k\), \(b_k\) and \(c_k\). Operation Performed:
 \[\begin{split}\begin{align*} & \tilde{b}_k \leftarrow sat_{32}( b_k \cdot 2^{b\_shr} ) \\ & \tilde{c}_k \leftarrow sat_{32}( c_k \cdot 2^{c\_shr} ) \\ & \tilde{a}_k \leftarrow sat_{32}( a_k \cdot 2^{acc\_shr} ) \\ & v_k \leftarrow round( sat_{32}( \tilde{b}_k \cdot \tilde{c}_k \cdot 2^{30} ) ) \\ & a_k \leftarrow sat_{32}( \tilde{a}_k  v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
 Parameters
acc – [inout] Accumulator \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic rightshift applied to accumulator elements.
b_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar b\)
c_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_s32_rect(int32_t a[], const int32_t b[], const unsigned length)¶
Rectify the elements of a 32bit vector.
a[]
andb[]
represent the 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors. Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow \begin{cases} b_k & b_k \gt 0 \\ & 0 & b_k \leq 0 \end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of the output vector \(\bar a\)

headroom_t xs3_vect_s32_scale(int32_t a[], const int32_t b[], const unsigned length, const int32_t c, const right_shift_t b_shr, const right_shift_t c_shr)¶
Multiply a 32bit vector by a scalar.
a[]
andb[]
represent the 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors.c
is the 32bit scalar \(c\) by which each element of \(\bar b\) is multiplied.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and to \(c\). Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}(round(c \cdot b_k' \cdot 2^{30})) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the mantissa of floatingpoint value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + b\_shr + c\_shr + 30\).
The function xs3_vect_s32_scale_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
c – [in] Scalar to be multiplied by elements of \(\bar b\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift applied to \(c\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

void xs3_vect_s32_set(int32_t a[], const int32_t b, const unsigned length)¶
Set all elements of a 32bit vector to the specified value.
a[]
represents the 32bit output vector \(\bar a\).a[]
must begin at a wordaligned address.b
is the new value to set each element of \(\bar a\) to. Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow b \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(b\) is the mantissa of floatingpoint value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] New value for the elements of \(\bar a\)
length – [in] Number of elements in \(\bar a\)
 Throws
ET_LOAD_STORE – Raised if
a
is not wordaligned (See Note: Vector Alignment)

headroom_t xs3_vect_s32_shl(int32_t a[], const int32_t b[], const unsigned length, const left_shift_t b_shl)¶
Leftshift the elements of a 32bit vector by a specified number of bits.
a[]
andb[]
represent the 32bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in vectors \(\bar a\) and \(\bar b\).b_shl
is the signed arithmetic leftshift applied to each element of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shl – [in] Arithmetic leftshift applied to elements of \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

headroom_t xs3_vect_s32_shr(int32_t a[], const int32_t b[], const unsigned length, const right_shift_t b_shr)¶
Rightshift the elements of a 32bit vector by a specified number of bits.
a[]
andb[]
represent the 32bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in vectors \(\bar a\) and \(\bar b\).b_shr
is the signed arithmetic rightshift applied to each element of \(\bar b\). Operation Performed:
 \[\begin{split}\begin{align*} & a_k \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shr}\) and \(a\_exp = b\_exp\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Arithmetic rightshift applied to elements of \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

headroom_t xs3_vect_s32_sqrt(int32_t a[], const int32_t b[], const unsigned length, const right_shift_t b_shr, const unsigned depth)¶
Compute the square root of elements of a 32bit vector.
a[]
andb[]
represent the 32bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each vector must begin at a wordaligned address. This operation can be performed safely inplace onb[]
.length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic rightshift applied to elements of \(\bar b\).depth
is the number of most significant bits to calculate of each \(a_k\). For example, adepth
value of 8 will only compute the 8 most significant byte of the result, with the remaining 3 bytes as 0. The maximum value for this parameter isXS3_VECT_SQRT_S32_MAX_DEPTH
(31). The time cost of this operation is approximately proportional to the number of bits computed. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & a_k \leftarrow \sqrt{ b_k' } \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \\ & \qquad\text{ where } sqrt() \text{ computes the first } depth \text{ bits of the square root.} \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = (b\_exp + b\_shr  30)/2\).
Note that because exponents must be integers, that means \(b\_exp + b\_shr\) must be even.
The function xs3_vect_s32_sqrt_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Rightshift appled to \(\bar b\)
depth – [in] Number of bits of each output value to compute
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

headroom_t xs3_vect_s32_sub(int32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Subtract one 32bit vector from another.
a[]
,b[]
andc[]
represent the 32bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a wordaligned address. This operation can be performed safely inplace onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic rightshifts applied to each element of \(\bar b\) and \(\bar c\) respectively. Operation Performed:
 \[\begin{split}\begin{align*} & b_k' = sat_{32}(\lfloor b_k \cdot 2^{b\_shr} \rfloor) \\ & c_k' = sat_{32}(\lfloor c_k \cdot 2^{c\_shr} \rfloor) \\ & a_k \leftarrow sat_{32}\!\left( b_k'  c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length1) \end{align*}\end{split}\]
 Block FloatingPoint

If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_s32_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and * \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Rightshift appled to \(\bar b\)
c_shr – [in] Rightshift appled to \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not wordaligned (See Note: Vector Alignment) Returns
Headroom of output vector \(\bar a\)

int64_t xs3_vect_s32_sum(const int32_t b[], const unsigned length)¶
Sum the elements of a 32bit vector.
b[]
represents the 32bit mantissa vector \(\bar b\).b[]
must begin at a wordaligned address.length
is the number of elements in \(\bar b\). Operation Performed:
 \[\begin{align*} a \leftarrow \sum_{k=0}^{length1} b_k \end{align*}\]
 Block FloatingPoint

If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 64bit mantissa of floatingpoint value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
 Additional Details

Internally, each element accumulates into one of eight 40bit accumulators (which are all used simultaneously) which apply symmetric 40bit saturation logic (with bounds \(\approx 2^{39}\)) with each value added. The saturating arithmetic employed is not associative and no indication is given if saturation occurs at an intermediate step. To avoid the possibility of saturation errors,
length
should be no greater than \(2^{11+b\_hr}\), where \(b\_hr\) is the headroom of \(\bar b\).If the caller’s mantissa vector is longer than that, the full result can be found by calling this function multiple times for partial results on subsequences of the input, and adding the results in user code.
In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are applicationspecific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.
 Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vector \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
b
is not wordaligned (See Note: Vector Alignment) Returns
64bit mantissa of the sum, \(a\).

void xs3_vect_s32_zip(complex_s32_t a[], const int32_t b[], const int32_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Interleave the elements of two vectors into a single vector.
Elements of 32bit input vectors \(\bar b\) and \(\bar c\) are interleaved into 32bit output vector \(\bar a\). Each element of \(\bar b\) has a rightshift of \(b\_shr\) applied, and each element of \(\bar c\) has a rightshift of \(c\_shr\) applied.
Alternatively (and equivalently), this function can be conceived of as taking two real vectors \(\bar b\) and \(\bar c\) and forming a new complex vector \(\bar a\) where \(\bar{a} = \bar{b} + i\cdot\bar{c}\).
If vectors \(\bar b\) and \(\bar c\) each have \(N\) elements, then the resulting \(\bar a\) will have either \(2N\)
int32_t
elements or (equivalently) \(N\)complex_s32_t
elements (and must have space for such).Each element \(b_k\) of \(\bar b\) will end up as end up as element \(a_{2k}\) of \(\bar a\) (with the bitshift applied). Each element \(c_k\) will end up as element \(a_{2k+1}\) of \(\bar a\).
a[]
is the output vector \(\bar a\).b[]
andc[]
are the input vectors \(\bar b\) and \(\bar c\) respectively.a
,b
andc
must each begin at a double wordaligned (8 byte) address. (see DWORD_ALIGNED).length
is the number \(N\) ofint32_t
elements in \(\bar b\) and \(\bar c\).b_shr
is the signed arithmetic rightshift applied to elements of \(\bar b\).c_shr
is the signed arithmetic rightshift applied to elements of \(\bar c\). Operation Performed:
 \[\begin{split}\begin{align*} & Re{a_{k}} \leftarrow sat_{32}( b_k \cdot 2^{b\_shr} \\ & Im{a_{k}} \leftarrow sat_{32}( c_k \cdot 2^{c\_shr} \\ & \qquad\text{ for }k\in 0\ ...\ (N1) \end{align*}\end{split}\]
 Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements \(N\) in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar b\)
c_shr – [in] Signed arithmetic rightshift applied to elements of \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not double wordaligned (See Note: Vector Alignment)

void xs3_vect_s32_unzip(int32_t a[], int32_t b[], const complex_s32_t c[], const unsigned length)¶
Deinterleave the real and imaginary parts of a complex 32bit vector into two separate vectors.
Complex 32bit input vector \(\bar c\) has its real and imaginary parts (which correspond to the even and oddindexed elements, if reinterpreted as an
int32_t
array) split apart to create real 32bit output vectors \(\bar a\) and \(\bar b\), such that \(\bar{a} = Re{\bar{c}}\) and \(\bar{b} = Im{\bar{c}}\).a[]
andb[]
are the real output vectors \(\bar a\) and \(\bar b\) which receive the real and imaginary parts respectively of \(\bar c\).a
andb
must each begin at a wordaligned address.c[]
is the complex input vector \(\bar c\).c
must begin at a double wordaligned address.length
is the number \(N\) ofint32_t
elements in \(\bar a\) and \(\bar b\) and the number ofcomplex_s32_t
in \(\bar c\). Operation Performed:
 \[\begin{split}\begin{align*} & a_k = Re\{c_k\} \\ & b_k = Im\{c_k\} \\ & \qquad\text{ for }k\in 0\ ...\ (N1) \end{align*}\end{split}\]
 Parameters
a – [out] Output vector \(\bar a\)
b – [out] Output vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] The number of elements \(N\) in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
 Throws
ET_LOAD_STORE – Raised if
a
orb
is not wordaligned (See Note: Vector Alignment)ET_LOAD_STORE – Raised if
c
is not double wordaligned (See Note: Vector Alignment)

headroom_t xs3_vect_s32_convolve_valid(int32_t y[], const int32_t x[], const int32_t b_q30[], const unsigned x_length, const unsigned b_length)¶
Convolve a 32bit vector with a short kernel.
32bit input vector \(\bar x\) is convolved with a short fixedpoint kernel \(\bar b\) to produce 32bit output vector \(\bar y\). In other words, this function applies the \(K\)thorder FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar x\). The convolution is “valid” in the sense that no output elements are emitted where the filter taps extend beyond the bounds of the input vector, resulting in an output vector \(\bar y\) with fewer elements.
The maximum filter order \(K\) supported by this function is \(7\).
y[]
is the output vector \(\bar y\). If input \(\bar x\) has \(N\) elements, and the filter has \(K\) elements, then \(\bar y\) has \(N2P\) elements, where \(P = \lfloor K / 2 \rfloor\).x[]
is the input vector \(\bar x\) with length \(N\).b_q30[]
is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixedpoint format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{30}\).x_length
is the length \(N\) of \(\bar x\) in elements.b_length
is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps).b_length
must be one of \( \{ 1, 3, 5, 7 \} \). Operation Performed:
 \[\begin{split}\begin{align*} & y_k \leftarrow \sum_{l=0}^{K1} (x_{(k+l)} \cdot b_l \cdot 2^{30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{align*}\end{split}\]
 Additional Details

To avoid the possibility of saturating any output elements, \(\bar b\) may be constrained such that \( \sum_{i=0}^{K1} \leftb_i\right \leq 2^{30} \).
This operation can be applied safely inplace on
x[]
.
 Parameters
y – [out] Output vector \(\bar y\)
x – [in] Input vector \(\bar x\)
b_q30 – [in] Filter coefficient vector \(\bar b\)
x_length – [in] The number of elements \(N\) in vector \(\bar x\)
b_length – [in] The number of elements \(K\) in \(\bar b\)
 Throws
ET_LOAD_STORE – Raised if
x
ory
orb_q30
is not wordaligned (See Note: Vector Alignment)

headroom_t xs3_vect_s32_convolve_same(int32_t y[], const int32_t x[], const int32_t b_q30[], const unsigned x_length, const unsigned b_length, const pad_mode_e padding_mode)¶
Convolve a 32bit vector with a short kernel.
32bit input vector \(\bar x\) is convolved with a short fixedpoint kernel \(\bar b\) to produce 32bit output vector \(\bar y\). In other words, this function applies the \(K\)thorder FIR filter with coefficients given by \(\bar b\) to the input signal \(\bar x\). The convolution mode is “same” in that the input vector is effectively padded such that the input and output vectors are the same length. The padding behavior is one of those given by pad_mode_e.
The maximum filter order \(K\) supported by this function is \(7\).
y[]
andx[]
are the output and input vectors \(\bar y\) and \(\bar x\) respectively.b_q30[]
is the vector \(\bar b\) of filter coefficients. The coefficients of \(\bar b\) are encoded in a Q2.30 fixedpoint format. The effective value of the \(i\)th coefficient is then \(b_i \cdot 2^{30}\).x_length
is the length \(N\) of \(\bar x\) and \(\bar y\) in elements.b_length
is the length \(K\) of \(\bar b\) in elements (i.e. the number of filter taps).b_length
must be one of \( \{ 1, 3, 5, 7 \} \).padding_mode
is one of the values from the pad_mode_e enumeration. The padding mode indicates the filter input values for filter taps that have extended beyond the bounds of the input vector \(\bar x\). See pad_mode_e for a list of supported padding modes and associated behaviors. Operation Performed:
 \[\begin{split}\begin{align*} & \tilde{x}_i = \begin{cases} \text{determined by padding mode} & i \lt 0 \\ \text{determined by padding mode} & i \ge N \\ x_i & otherwise \end{cases} \\ & y_k \leftarrow \sum_{l=0}^{K1} (\tilde{x}_{(k+lP)} \cdot b_l \cdot 2^{30} ) \\ & \qquad\text{ for }k\in 0\ ...\ (N2P) \\ & \qquad\text{ where }P = \lfloor K/2 \rfloor \end{align*}\end{split}\]
 Additional Details

To avoid the possibility of saturating any output elements, \(\bar b\) may be constrained such that \( \sum_{i=0}^{K1} \leftb_i\right \leq 2^{30} \).
Note
Unlike xs3_vect_s32_convolve_valid(), this operation cannot be performed safely inplace on
x[]
 Parameters
y – [out] Output vector \(\bar y\)
x – [in] Input vector \(\bar x\)
b_q30 – [in] Filter coefficient vector \(\bar b\)
x_length – [in] The number of elements \(N\) in vector \(\bar x\)
b_length – [in] The number of elements \(K\) in \(\bar b\)
padding_mode – [in] The padding mode to be applied at signal boundaries
 Throws
ET_LOAD_STORE – Raised if
x
ory
orb_q30
is not wordaligned (See Note: Vector Alignment)

void xs3_vect_s32_merge_accs(int32_t a[], const xs3_split_acc_s32_t b[], const unsigned length)¶
Merge a vector of split 32bit accumulators into a vector of int32_t’s.
Convert a vector of xs3_split_acc_s32_t into a vector of
int32_t
. This is useful when a function (e.g.xs3_mat_mul_s8_x_s8_yield_s32
) outputs a vector of accumulators in the XS3 VPU’s native split 32bit format, which has the upper half of each accumulator in the first 32 bytes and the lower half in the following 32 bytes.This function is most efficient (in terms of cycles/accumulator) when
length
is a multiple ofIn any case,
length
will be rounded up such that a multiple of 16 accumulators will always be merged.
This function can safely merge accumulators inplace.
 Parameters
a – [out] Output vector of int32_t
b – [in] Input vector of xs3_split_acc_s32_t
length – [in] Number of accumulators to merge
 Throws
ET_LOAD_STORE – Raised if
b
ora
is not wordaligned (See Note: Vector Alignment)

void xs3_vect_s32_split_accs(xs3_split_acc_s32_t a[], const int32_t b[], const unsigned length)¶
Split a vector of
int32_t
’s into a vector ofxs3_split_acc_s32_t
.Convert a vector of
int32_t
into a vector of xs3_split_acc_s32_t, the native format for the XS3 VPU’s 32bit accumulators. This is useful when a function (e.g.xs3_mat_mul_s8_x_s8_yield_s32
) takes in a vector of accumulators in that native format.This function is most efficient (in terms of cycles/accumulator) when
length
is a multiple ofIn any case,
length
will be rounded up such that a multiple of 16 accumulators will always be merged.
This function can safely split accumulators inplace.
 Parameters
a – [out] Output vector of xs3_split_acc_s32_t
b – [in] Input vector of int32_t
length – [in] Number of accumulators to merge
 Throws
ET_LOAD_STORE – Raised if
b
ora
is not wordaligned (See Note: Vector Alignment)
XS3 32Bit Prepare Functions¶

void xs3_vect_complex_s32_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const exponent_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and shifts needed by xs3_vect_complex_s32_macc().
This function is used in conjunction with xs3_vect_complex_s32_macc() to perform an elementwise multiplyaccumlate of 32bit BFP vectors.
This function computes
new_acc_exp
,acc_shr
,b_shr
andc_shr
, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of xs3_vect_complex_s32_macc().acc_exp
is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereasnew_acc_exp
is the exponent corresponding to the updated accumulator vector.b_exp
andc_exp
are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.acc_hr
,b_hr
andc_hr
are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling xs3_vect_complex_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theacc_shr
andbc_sat
produced by this function can be adjusted according to the following:// Presumed to be set somewhere exponent_t acc_exp, b_exp, c_exp; headroom_t acc_hr, b_hr, c_hr; exponent_t desired_exp; ... // Call prepare right_shift_t acc_shr, b_shr, c_shr; xs3_vect_complex_s32_macc_prepare(&acc_exp, &acc_shr, &b_shr, &c_shr, acc_exp, b_exp, c_exp, acc_hr, b_hr, c_hr); // Modify results right_shift_t mant_shr = desired_exp  acc_exp; acc_exp += mant_shr; acc_shr += mant_shr; b_shr += mant_shr; c_shr += mant_shr; // acc_shr, b_shr and c_shr may now be used in a call to xs3_vect_complex_s32_macc()
When applying the above adjustment, the following conditions should be maintained:
acc_shr > acc_hr
(Shifting any further left may cause saturation)b_shr => b_hr
(Shifting any further left may cause saturation)c_shr => c_hr
(Shifting any further left may cause saturation)
It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.
See also
 Parameters
new_acc_exp – [out] Exponent associated with output mantissa vector \(\bar a\) (after macc)
acc_shr – [out] Signed arithmetic rightshift used for \(\bar a\) in xs3_vect_complex_s32_macc()
b_shr – [out] Signed arithmetic rightshift used for \(\bar b\) in xs3_vect_complex_s32_macc()
c_shr – [out] Signed arithmetic rightshift used for \(\bar c\) in xs3_vect_complex_s32_macc()
acc_exp – [in] Exponent associated with input mantissa vector \(\bar a\) (before macc)
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
acc_hr – [in] Headroom of input mantissa vector \(\bar a\) (before macc)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)

void xs3_vect_complex_s32_mag_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const headroom_t b_hr)¶
Obtain the output exponent and input shift used by xs3_vect_complex_s32_mag() and xs3_vect_complex_s16_mag().
This function is used in conjunction with xs3_vect_complex_s32_mag() to compute the magnitude of each element of a complex 32bit BFP vector.
This function computes
a_exp
andb_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and is be chosen to maximize precision when elements of \(\bar a\) are computed. Thea_exp
chosen by this function is derived from the exponent and headroom associated with the input vector.b_shr
is the shift parameter required by xs3_vect_complex_s32_mag() to achieve the chosen output exponenta_exp
.b_exp
is the exponent associated with the input mantissa vector \(\bar b\).b_hr
is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using xs3_vect_complex_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp);
When applying the above adjustment, the following condition should be maintained:
b_hr + b_shr >= 0
Using larger values than strictly necessary for
b_shr
may result in unnecessary underflows or loss of precision.
See also
 Parameters
a_exp – [out] Output exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_complex_s32_mag()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)

void xs3_vect_complex_s32_mul_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and input shifts used by xs3_vect_complex_s32_mul() and xs3_vect_complex_s32_conj_mul().
This function is used in conjunction with xs3_vect_complex_s32_mul() to perform a complex elementwise multiplication of two complex 32bit BFP vectors.
This function computes
a_exp
,b_shr
andc_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms of associated with the input vectors.b_shr
andc_shr
are the shift parameters required by xs3_vect_complex_s32_mul() to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling xs3_vect_complex_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp); right_shift_t new_c_shr = c_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
c_hr + c_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
andc_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
 Notes

Using the outputs of this function, an output mantissa which would otherwise be
INT32_MIN
will instead saturate toINT32_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
 Parameters
a_exp – [out] Exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_complex_s32_mul()
c_shr – [out] Signed arithmetic rightshift for \(\bar c\) used by xs3_vect_complex_s32_mul()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)

void xs3_vect_complex_s32_real_mul_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and input shifts used by xs3_vect_complex_s32_real_mul().
This function is used in conjunction with xs3_vect_complex_s32_real_mul() to perform a the elementwise multiplication of complex 32bit BFP vector by a real 32bit BFP vector.
This function computes
a_exp
,b_shr
andc_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms of associated with the input vectors.b_shr
andc_shr
are the shift parameters required by xs3_vect_complex_s32_mul() to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling xs3_vect_complex_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp); right_shift_t new_c_shr = c_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
c_hr + c_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
andc_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
 Notes

Using the outputs of this function, an output mantissa which would otherwise be
INT32_MIN
will instead saturate toINT32_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
See also
 Parameters
a_exp – [out] Output exponent associated with \(\bar a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_complex_s32_real_mul()
c_shr – [out] Signed arithmetic rightshift for \(\bar c\) used by xs3_vect_complex_s32_real_mul()
b_exp – [in] Exponent associated with \(\bar b\)
c_exp – [in] Exponent associated with \(\bar c\)
b_hr – [in] Headroom of mantissa vector \(\bar b\)
c_hr – [in] Headroom of mantissa vector \(\bar c\)

void xs3_vect_complex_s32_scale_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and input shifts used by xs3_vect_complex_s32_scale().
This function is used in conjunction with xs3_vect_complex_s32_scale() to perform a complex multiplication of a complex 32bit BFP vector by a complex 32bit scalar.
This function computes
a_exp
,b_shr
andc_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms associated with the input vectors.b_shr
andc_shr
are the shift parameters required by xs3_vect_complex_s32_mul() to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling xs3_vect_complex_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp); right_shift_t new_c_shr = c_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
c_hr + c_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
andc_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
 Notes

Using the outputs of this function, an output mantissa which would otherwise be
INT32_MIN
will instead saturate toINT32_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
See also
 Parameters
a_exp – [out] Exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_complex_s32_scale()
c_shr – [out] Signed arithmetic rightshift for \(\bar c\) used by xs3_vect_complex_s32_scale()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)

void xs3_vect_complex_s32_squared_mag_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const headroom_t b_hr)¶
Obtain the output exponent and input shift used by xs3_vect_complex_s32_squared_mag().
This function is used in conjunction with xs3_vect_complex_s32_squared_mag() to compute the squared magnitude of each element of a complex 32bit BFP vector.
This function computes
a_exp
andb_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and is be chosen to maximize precision when elements of \(\bar a\) are computed. Thea_exp
chosen by this function is derived from the exponent and headroom associated with the input vector.b_shr
is the shift parameter required by xs3_vect_complex_s32_mag() to achieve the chosen output exponenta_exp
.b_exp
is the exponent associated with the input mantissa vector \(\bar b\).b_hr
is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using xs3_vect_complex_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp);
When applying the above adjustment, the following condition should be maintained:
b_hr + b_shr >= 0
Using larger values than strictly necessary for
b_shr
may result in unnecessary underflows or loss of precision.
See also
 Parameters
a_exp – [out] Output exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_complex_s32_squared_mag()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)

void xs3_vect_complex_s32_sum_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const headroom_t b_hr, const unsigned length)¶
Obtain the output exponent and input shift used by xs3_vect_complex_s32_sum().
This function is used in conjunction with xs3_vect_complex_s32_sum() to compute the sum of elements of a complex 32bit BFP vector.
This function computes
a_exp
andb_shr
.a_exp
is the exponent associated with the 64bit mantissa \(a\) returned by xs3_vect_complex_s32_sum(), and must be chosen to be large enough to avoid saturation when \(a\) is computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms associated with the input vector.b_shr
is the shift parameter required by xs3_vect_complex_s32_sum() to achieve the chosen output exponenta_exp
.b_exp
is the exponent associated with the input mantissa vector \(\bar b\).b_hr
is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using xs3_vect_complex_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).length
is the number of elements in the input mantissa vector \(\bar b\). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
See also
 Parameters
a_exp – [out] Exponent associated with output mantissa \(a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_complex_s32_sum()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)

void xs3_vect_s32_add_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and input shifts to add or subtract two 16 or 32bit BFP vectors.
The block floatingpoint functions in this library which add or subtract vectors are of the general form:
\( \bar{a} \cdot 2^{a\_exp} = \bar{b}\cdot 2^{b\_exp} \pm \bar{c}\cdot 2^{c\_exp} \) }
\(\bar b\) and \(\bar c\) are the input mantissa vectors with exponents \(b\_exp\) and \(c\_exp\), which are shared by each element of their respective vectors. \(\bar a\) is the output mantissa vector with exponent \(a\_exp\). Two additional properties, \(b\_hr\) and \(c\_hr\), which are the headroom of mantissa vectors \(\bar b\) and \(\bar c\) respectively, are required by this function.
In order to avoid any overflows in the output mantissas, the output exponent \(a\_exp\) must be chosen such that the largest (in the sense of absolute value) possible output mantissa will fit into the allotted space (e.g. 32 bits for xs3_vect_s32_add()). Once \(a\_exp\) is chosen, the input bitshifts \(b\_shr\) and \(c\_shr\) are calculated to achieve that resulting exponent.
This function chooses \(a\_exp\) to be the minimum exponent known to avoid overflows, given the input exponents ( \(b\_exp\) and \(c\_exp\)) and input headroom ( \(b\_hr\) and \(c\_hr\)).
This function is used calculate the output exponent and input bitshifts for each of the following functions:
 Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp); right_shift_t new_c_shr = c_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
c_hr + c_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
andc_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
 Notes

If \(b\_hr\) or \(c\_hr\) are unknown, they can be calculated using the appropriate headroom function (e.g. xs3_vect_complex_s16_headroom() for complex 16bit vectors) or the value
0
can always be safely used (but may result in reduced precision).
See also
xs3_vect_s16_add, xs3_vect_s32_add, xs3_vect_s16_sub, xs3_vect_s32_sub, xs3_vect_complex_s16_add, xs3_vect_complex_s32_add, xs3_vect_complex_s16_sub, xs3_vect_complex_s32_sub
 Parameters
a_exp – [out] Output exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic rightshift to be applied to elements of \(\bar b\). Used by the function which computes the output mantissas \(\bar a\)
c_shr – [out] Signed arithmetic rightshift to be applied to elements of \(\bar c\). Used by the function which computes the output mantissas \(\bar a\)
b_exp – [in] Exponent of BFP vector \(\bar b\)
c_exp – [in] Exponent of BFP vector \(\bar c\)
b_hr – [in] Headroom of BFP vector \(\bar b\)
c_hr – [in] Headroom of BFP vector \(\bar c\)

void xs3_vect_s32_clip_prepare(exponent_t *a_exp, right_shift_t *b_shr, int32_t *lower_bound, int32_t *upper_bound, const exponent_t b_exp, const exponent_t bound_exp, const headroom_t b_hr)¶
Obtain the output exponent, input shift and modified bounds used by xs3_vect_s32_clip().
This function is used in conjunction with xs3_vect_s32_clip() to bound the elements of a 32bit BFP vector to a specified range.
This function computes
a_exp
,b_shr
,lower_bound
andupper_bound
.a_exp
is the exponent associated with the 32bit mantissa vector \(\bar a\) computed by xs3_vect_s32_clip().b_shr
is the shift parameter required by xs3_vect_s32_clip() to achieve the output exponenta_exp
.lower_bound
andupper_bound
are the 32bit mantissas which indicate the lower and upper clipping bounds respectively. The values are modified by this function, and the resulting values should be passed along to xs3_vect_s32_clip().b_exp
is the exponent associated with the input mantissa vector \(\bar b\).bound_exp
is the exponent associated with the bound mantissaslower_bound
andupper_bound
respectively.b_hr
is the headroom of \(\bar b\). If unknown, it can be obtained using xs3_vect_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).See also
 Parameters
a_exp – [out] Exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_s32_clip()
lower_bound – [inout] Lower bound of clipping range
upper_bound – [inout] Upper bound of clipping range
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
bound_exp – [in] Exponent associated with clipping bounds
lower_bound
andupper_bound
b_hr – [in] Headroom of input mantissa vector \(\bar b\)

void xs3_vect_s32_dot_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr, const unsigned length)¶
Obtain the output exponent and input shift used by xs3_vect_s32_dot().
This function is used in conjunction with xs3_vect_s32_dot() to compute the inner product of two 32bit BFP vectors.
This function computes
a_exp
,b_shr
andc_shr
.a_exp
is the exponent associated with the 64bit mantissa \(a\) returned by xs3_vect_s32_dot(), and must be chosen to be large enough to avoid saturation when \(a\) is computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms associated with the input vectors.b_shr
andc_shr
are the shift parameters required by xs3_vect_s32_dot() to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If either is unknown, they can be obtained using xs3_vect_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).length
is the number of elements in the input mantissa vectors \(\bar b\) and \(\bar c\). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp); right_shift_t new_c_shr = c_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
c_hr + c_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
orc_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
See also
 Parameters
a_exp – [out] Exponent associated with output mantissa \(a\)
b_shr – [out] Signed arithmetic rightshift for \(\bar b\) used by xs3_vect_s32_dot()
c_shr – [out] Signed arithmetic rightshift for \(\bar c\) used by xs3_vect_s32_dot()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar b\) and \(\bar c\)

void xs3_vect_s32_energy_prepare(exponent_t *a_exp, right_shift_t *b_shr, const unsigned length, const exponent_t b_exp, const headroom_t b_hr)¶
Obtain the output exponent and input shift used by xs3_vect_s32_energy().
This function is used in conjunction with xs3_vect_s32_energy() to compute the inner product of a 32bit BFP vector with itself.
This function computes
a_exp
andb_shr
.a_exp
is the exponent associated with the 64bit mantissa \(a\) returned by xs3_vect_s32_energy(), and must be chosen to be large enough to avoid saturation when \(a\) is computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponent and headroom associated with the input vector.b_shr
is the shift parameter required by xs3_vect_s32_energy() to achieve the chosen output exponenta_exp
.b_exp
is the exponent associated with the input mantissa vector \(\bar b\).b_hr
is the headroom of \(\bar b\). If it is unknown, it can be obtained using xs3_vect_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).length
is the number of elements in the input mantissa vector \(\bar b\). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp);
When applying the above adjustment, the following condition should be maintained:
b_hr + b_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
See also
 Parameters
a_exp – [out] Exponent of outputs of xs3_vect_s32_energy()
b_shr – [out] Rightshift to be applied to elements of \(\bar b\)
length – [in] Number of elements in vector \(\bar b\)
b_exp – [in] Exponent of vector{b}
b_hr – [in] Headroom of vector{b}

void xs3_vect_s32_inverse_prepare(exponent_t *a_exp, unsigned *scale, const int32_t b[], const exponent_t b_exp, const unsigned length)¶
Obtain the output exponent and scale used by xs3_vect_s32_inverse().
This function is used in conjunction with xs3_vect_s32_inverse() to compute the inverse of elements of a 32bit BFP vector.
This function computes
a_exp
andscale
.a_exp
is the exponent associated with output mantissa vector \(\bar a\), and must be chosen to avoid overflow in the smallest element of the input vector, which when inverted becomes the largest output element. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation. Thea_exp
chosen by this function is derived from the exponent and smallest element of the input vector.scale
is a scaling parameter used by xs3_vect_s32_inverse() to achieve the chosen output exponent.b[]
is the input mantissa vector \(\bar b\).b_exp
is the exponent associated with the input mantissa vector \(\bar b\).length
is the number of elements in \(\bar b\). Todo:
In lib_dsp, the inverse function has a floor, which prevents tiny values from completely dominating the output behavior. Perhaps I should include that?
See also
 Parameters
a_exp – [out] Exponent of output vector \(\bar a\)
scale – [out] Scale factor to be applied when computing inverse
b – [in] Input vector \(\bar b\)
b_exp – [in] Exponent of \(\bar b\)
length – [in] Number of elements in vector \(\bar b\)

void xs3_vect_s32_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const headroom_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and shifts needed by xs3_vect_s32_macc().
This function is used in conjunction with xs3_vect_s32_macc() to perform an elementwise multiplyaccumlate of 32bit BFP vectors.
This function computes
new_acc_exp
,acc_shr
,b_shr
andc_shr
, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of xs3_vect_s32_macc().acc_exp
is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereasnew_acc_exp
is the exponent corresponding to the updated accumulator vector.b_exp
andc_exp
are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.acc_hr
,b_hr
andc_hr
are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling xs3_vect_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theacc_shr
andbc_sat
produced by this function can be adjusted according to the following:// Presumed to be set somewhere exponent_t acc_exp, b_exp, c_exp; headroom_t acc_hr, b_hr, c_hr; exponent_t desired_exp; ... // Call prepare right_shift_t acc_shr, b_shr, c_shr; xs3_vect_s32_macc_prepare(&acc_exp, &acc_shr, &b_shr, &c_shr, acc_exp, b_exp, c_exp, acc_hr, b_hr, c_hr); // Modify results right_shift_t mant_shr = desired_exp  acc_exp; acc_exp += mant_shr; acc_shr += mant_shr; b_shr += mant_shr; c_shr += mant_shr; // acc_shr, b_shr and c_shr may now be used in a call to xs3_vect_s32_macc()
When applying the above adjustment, the following conditions should be maintained:
acc_shr > acc_hr
(Shifting any further left may cause saturation)b_shr => b_hr
(Shifting any further left may cause saturation)c_shr => c_hr
(Shifting any further left may cause saturation)
It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.
See also
 Parameters
new_acc_exp – [out] Exponent associated with output mantissa vector \(\bar a\) (after macc)
acc_shr – [out] Signed arithmetic rightshift used for \(\bar a\) in xs3_vect_s32_macc()
b_shr – [out] Signed arithmetic rightshift used for \(\bar b\) in xs3_vect_s32_macc()
c_shr – [out] Signed arithmetic rightshift used for \(\bar c\) in xs3_vect_s32_macc()
acc_exp – [in] Exponent associated with input mantissa vector \(\bar a\) (before macc)
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
acc_hr – [in] Headroom of input mantissa vector \(\bar a\) (before macc)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)

void xs3_vect_s32_mul_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and input shifts used by xs3_vect_s32_mul().
This function is used in conjunction with xs3_vect_s32_mul() to perform an elementwise multiplication of two 32bit BFP vectors.
This function computes
a_exp
,b_shr
,c_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms of associated with the input vectors.b_shr
andc_shr
are the shift parameters required by xs3_vect_complex_s32_mul() to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling xs3_vect_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp); right_shift_t new_c_shr = c_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
c_hr + c_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
andc_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
 Notes

Using the outputs of this function, an output mantissa which would otherwise be
INT32_MIN
will instead saturate toINT32_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
See also
 Parameters
a_exp – [out] Exponent of output elements of xs3_vect_s32_mul()
b_shr – [out] Rightshift to be applied to elements of \(\bar b\)
c_shr – [out] Rightshift to be applied to elemetns of \(\bar c\)
b_exp – [in] Exponent of \(\bar b\)
c_exp – [in] Exponent of \(\bar c\)
b_hr – [in] Headroom of \(\bar b\)
c_hr – [in] Headroom of \(\bar c\)

void xs3_vect_s32_sqrt_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const right_shift_t b_hr)¶
Obtain the output exponent and shift parameter used by xs3_vect_s32_sqrt().
This function is used in conjunction withx xs3_vect_s32_sqrt() to compute the square root of elements of a 32bit BFP vector.
This function computes
a_exp
andb_shr
.a_exp
is the exponent associated with output mantissa vector \(\bar a\), and should be chosen to maximize the precision of the results. To that end, this function choosesa_exp
to be the smallest exponent known to avoid saturation of the resulting mantissa vector \(\bar a\). It is derived from the exponent and headroom of the input BFP vector.b_shr
is the shift parameter required by xs3_vect_s32_sqrt() to achieve the chosen output exponenta_exp
.b_exp
is the exponent associated with the input mantissa vector \(\bar b\).b_hr
is the headroom of \(\bar b\). If it is unknown, it can be obtained using xs3_vect_s32_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
produced by this function can be adjusted according to the following:exponent_t a_exp; right_shift_t b_shr; xs3_vect_s16_mul_prepare(&a_exp, &b_shr, b_exp, c_exp, b_hr, c_hr); exponent_t desired_exp = ...; // Value known a priori b_shr = b_shr + (desired_exp  a_exp); a_exp = desired_exp;
When applying the above adjustment, the following condition should be maintained:
b_hr + b_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.Also, if a larger exponent is used than necessary, a larger
depth
parameter (see xs3_vect_s32_sqrt()) will be required to achieve the same precision, as the results are computed bit by bit, starting with the most significant bit.
See also
 Parameters
a_exp – [out] Exponent of outputs of xs3_vect_s32_sqrt()
b_shr – [out] Rightshift to be applied to elements of \(\bar b\)
b_exp – [in] Exponent of vector{b}
b_hr – [in] Headroom of vector{b}

void xs3_vect_2vec_prepare(exponent_t *a_exp, right_shift_t *b_shr, right_shift_t *c_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr, const headroom_t extra_operand_hr)¶
Obtain the output exponent and input shifts required to perform a binary addlike operation.
This function computes the output exponent and input shifts required for BFP operations which take two vectors as input, where the operation is “addlike”.
Here, “addlike” operations are loosely defined as those which require input vectors to share an exponent before their mantissas can be meaningfully used to perform that operation.
For example, consider adding \( 3 \cdot 2^{x} + 4 \cdot 2^{y} \). If \(x = y\), then the mantissas can be added directly to get a meaningful result \( (3+4) \cdot 2^{x} \). If \(x \ne y\) however, adding the mantissas together is meaningless. Before the mantissas can be added in this case, one or both of the input mantissas must be shifted so that the representations correspond to the same exponent. Likewise, similar logic applies to binary comparisons.
This is in contrast to a “multiplylike” operation, which does not have this same requirement (e.g. \(a \cdot 2^x \cdot b \cdot 2^y = ab \cdot 2^{x+y}\), regardless of whether \(x=y\)).
For a general operation like:
\( \bar{a} \cdot 2^{a\_exp} = \bar{b}\cdot 2^{b\_exp} \oplus \bar{c}\cdot 2^{c\_exp} \)
\(\bar b\) and \(\bar c\) are the input mantissa vectors with exponents \(b\_exp\) and \(c\_exp\), which are shared by each element of their respective vectors. \(\bar a\) is the output mantissa vector with exponent \(a\_exp\). Two additional properties, \(b\_hr\) and \(c\_hr\), which are the headroom of mantissa vectors \(\bar b\) and \(\bar c\) respectively, are required by this function.
In addition to \(a\_exp\), this function computes \(b\_shr\) and \(c\_shr\), signed arithmetic rightshifts applied to the mantissa vectors \(\bar b\) and \(\bar c\) so that the addlike \(\oplus\) operation can be applied.
This function chooses \(a\_exp\) to be the minimum exponent which can be used to express both \(\bar B\) and \(\bar C\) without saturation of their mantissas, and which leaves both \(\bar b\) and \(\bar c\) with at least
extra_operand_hr
bits of headroom. The shifts \(b\_shr\) and \(c\_shr\) are derived from \(a\_exp\) using \(b\_exp\) and \(c\_exp\). Adjusting Output Exponents

If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixedpoint arithmetic), theb_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_b_shr = b_shr + (desired_exp  a_exp); right_shift_t new_c_shr = c_shr + (desired_exp  a_exp);
When applying the above adjustment, the following conditions should be maintained:
b_hr + b_shr >= 0
c_hr + c_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
andc_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
 Notes

If \(b\_hr\) or \(c\_hr\) are unknown, they can be calculated using the appropriate headroom function (e.g. xs3_vect_complex_s16_headroom() for complex 16bit vectors) or the value
0
can always be safely used (but may result in reduced precision).
 Parameters
a_exp – [out] Output exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic rightshift to be applied to elements of \(\bar b\). Used by the function which computes the output mantissas \(\bar a\)
c_shr – [out] Signed arithmetic rightshift to be applied to elements of \(\bar c\). Used by the function which computes the output mantissas \(\bar a\)
b_exp – [in] Exponent of BFP vector \(\bar b\)
c_exp – [in] Exponent of BFP vector \(\bar c\)
b_hr – [in] Headroom of BFP vector \(\bar b\)
c_hr – [in] Headroom of BFP vector \(\bar c\)
extra_operand_hr – [in] The minimum amount of headroom that will be left in the mantissa vectors following the arithmetic rightshift, as required by some operations.

xs3_vect_complex_s32_add_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s32_add()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s32_add()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more coherent.
See also

xs3_vect_complex_s32_add_scalar_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s32_add_scalar()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s32_add_scalar()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also

xs3_vect_complex_s32_conj_mul_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s32_conj_mul()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s32_conj_mul()
is identical to that forxs3_vect_complex_s32_mul()
.This macro is provided as a convenience to developers and to make the code more readable.
See also

xs3_vect_complex_s32_nmacc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_complex_s32_nmacc().
The logic for computing the shifts and exponents of
xs3_vect_complex_s32_nmacc()
is identical to that forxs3_vect_complex_s32_macc_prepare()
.This macro is provided as a convenience to developers and to make the code more readable.

xs3_vect_complex_s32_conj_macc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_complex_s32_conj_macc().
The logic for computing the shifts and exponents of
xs3_vect_complex_s32_conj_macc()
is identical to that forxs3_vect_complex_s32_macc_prepare()
.This macro is provided as a convenience to developers and to make the code more readable.

xs3_vect_complex_s32_conj_nmacc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_complex_s32_conj_nmacc().
The logic for computing the shifts and exponents of
xs3_vect_complex_s32_conj_nmacc()
is identical to that forxs3_vect_complex_s32_macc_prepare()
.This macro is provided as a convenience to developers and to make the code more readable.

xs3_vect_complex_s32_real_scale_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s32_real_scale()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s32_real_scale()
is identical to that forxs3_vect_s32_mul()
.This macro is provided as a convenience to developers and to make the code more readable.
See also

xs3_vect_complex_s32_sub_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s32_sub()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s32_sub()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also

xs3_vect_s32_add_scalar_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_s32_add_scalar()
.The logic for computing the shifts and exponents of
xs3_vect_s32_add_scalar()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also

xs3_vect_s32_nmacc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_s32_nmacc().
The logic for computing the shifts and exponents of
xs3_vect_s32_nmacc()
is identical to that forxs3_vect_s32_macc_prepare()
.This macro is provided as a convenience to developers and to make the code more readable.

xs3_vect_s32_scale_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_s32_scale()
.The logic for computing the shifts and exponents of
xs3_vect_s32_scale()
is identical to that forxs3_vect_s32_mul()
.This macro is provided as a convenience to developers and to make the code more readable.
See also

xs3_vect_s32_sub_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_s32_sub()
.The logic for computing the shifts and exponents of
xs3_vect_s32_sub()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also