XS3 16-Bit Vector Functions¶
-
headroom_t xs3_vect_complex_s16_add(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Add one complex 16-bit vector to another.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
,b_imag[]
,c_real[]
andc_imag[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the complex 16-bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_complex_s16_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Right-shift applied to \(\bar b\)
c_shr – [in] Right-shift applied to \(\bar c\)
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of output vector \(\bar a\).
-
headroom_t xs3_vect_complex_s16_add_scalar(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const complex_s16_t c, const unsigned length, const right_shift_t b_shr)¶
Add a scalar to a complex 16-bit vector.
a[]
andb[]
represent the complex 16-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.c
is the complex scalar \(c\)to be added to each element of \(\bar b\).length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic right-shift applied to each element of \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} + Re\{c\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} + Im\{c\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If elements of \(\bar b\) are the complex mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp}\), and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_complex_s16_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Note that \(c\_shr\) is an output of
xs3_vect_complex_s16_add_scalar_prepare()
, but is not a parameter to this function. The \(c\_shr\) produced byxs3_vect_complex_s16_add_scalar_prepare()
is to be applied by the user, and the result passed as inputc
.
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c – [in] Complex input scalar \(c\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Right-shift applied to \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of output vector \(\bar a\).
-
headroom_t xs3_vect_complex_s16_conj_mul(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t a_shr)¶
Multiply one complex 16-bit vector element-wise by the complex conjugate of another.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
,b_imag[]
,c_real[]
andc_imag[]
.length
is the number of elements in each of the vectors.a_shr
is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k = \leftarrow Re\{b_k\} \cdot Re\{c_k\} + Im\{b_k\} \cdot Im\{c_k\} \\ & s_k = \leftarrow Im\{b_k\} \cdot Re\{c_k\} - Re\{b_k\} \cdot Im\{c_k\} \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \(\bar{b} \cdot 2^{b\_exp}\) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_complex_s16_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
a_shr – [in] Right-shift applied to 32-bit intermediate results.
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_headroom(const int16_t b_real[], const int16_t b_imag[], const unsigned length)¶
Calculate the headroom of a complex 16-bit array.
The headroom of an N-bit integer is the number of bits that the integer’s value may be left-shifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.
The headroom of a
complex_s16_t
struct is the minimum of the headroom of each of its 16-bit fields,re
andim
.The headroom of a
complex_s16_t
array is the minimum of the headroom of each of itscomplex_s16_t
elements.This function efficiently traverses the elements of \(\bar x\) to determine its headroom.
b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\).length
is the number of elements inb_real[]
andb_imag[]
.- Operation Performed:
- \[\begin{align*} min\!\{ HR_{16}\left(x_0\right), HR_{16}\left(x_1\right), ..., HR_{16}\left(x_{length-1}\right) \} \end{align*}\]
- Parameters
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
length – [in] Number of elements in \(\bar x\)
- Returns
Headroom of vector \(\bar x\)
-
headroom_t xs3_vect_complex_s16_mag(int16_t a[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const right_shift_t b_shr, const int16_t *rot_table, const unsigned table_rows)¶
Compute the magnitude of each element of a complex 16-bit vector.
a[]
represents the real 16-bit output mantissa vector \(\bar a\).b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
orb_imag[]
.length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic right-shift applied to elements of \(\bar b\).rot_table
must point to a pre-computed table of complex vectors used in calculating the magnitudes.table_rows
is the number of rows in the table. This library is distributed with a default version of the required rotation table. The following symbols can be used to refer to it in user code:const extern unsigned rot_table16_rows; const extern complex_s16_t rot_table16[30][4];
Faster computation (with reduced precision) can be achieved by generating a smaller version of the table. A python script is provided to generate this table.
- Todo:
Point to documentation page on generating this table.
- Operation Performed:
- \[\begin{split}\begin{align*} & v_k \leftarrow b_k \cdot 2^{-b\_shr} \\ & a_k \leftarrow \sqrt { {\left( Re\{v_k\} \right)}^2 + {\left( Im\{v_k\} \right)}^2 } & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).
The function xs3_vect_complex_s16_mag_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).
See also
- Parameters
a – [out] Real output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imag part of complex input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Right-shift appled to \(\bar b\)
rot_table – [in] Pre-computed rotation table required for calculating magnitudes
table_rows – [in] Number of rows in
rot_table
- Throws
ET_LOAD_STORE – Raised if
a
,b_real
orb_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
headroom_t xs3_vect_complex_s16_macc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)¶
Multiply one complex 16-bit vector element-wise by another, and add the result to an accumulator.
acc_real[]
andacc_imag[]
together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isacc_real[k]
, and each \(Im\{a_k\}\) isacc_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address.
length
is the number of elements in each of the vectors.acc_shr
is the signed arithmetic right-shift applied to the accumulators \(a_k\).bc_sat
is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before being added to the accumulator.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} - Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} + Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} + round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} + round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
- Parameters
acc_real – [inout] Real part of complex accumulator \(\bar a\)
acc_imag – [inout] Imaginary aprt of complex accumulator \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic right-shift applied to accumulator elements.
bc_sat – [in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)
- Throws
ET_LOAD_STORE – Raised if
acc_real
,acc_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_nmacc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)¶
Multiply one complex 16-bit vector element-wise by another, and subtract the result from an accumulator.
acc_real[]
andacc_imag[]
together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isacc_real[k]
, and each \(Im\{a_k\}\) isacc_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address.
length
is the number of elements in each of the vectors.acc_shr
is the signed arithmetic right-shift applied to the accumulators \(a_k\).bc_sat
is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before being subtracted from the accumulator.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} - Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} + Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} - round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} - round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_nmacc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
- Parameters
acc_real – [inout] Real part of complex accumulator \(\bar a\)
acc_imag – [inout] Imaginary aprt of complex accumulator \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic right-shift applied to accumulator elements.
bc_sat – [in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)
- Throws
ET_LOAD_STORE – Raised if
acc_real
,acc_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_conj_macc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)¶
Multiply one complex 16-bit vector element-wise by the complex conjugate of another, and add the result to an accumulator.
acc_real[]
andacc_imag[]
together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isacc_real[k]
, and each \(Im\{a_k\}\) isacc_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address.
length
is the number of elements in each of the vectors.acc_shr
is the signed arithmetic right-shift applied to the accumulators \(a_k\).bc_sat
is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k^*\) before being added to the accumulator.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} + Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} - Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} + round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} + round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
- Parameters
acc_real – [inout] Real part of complex accumulator \(\bar a\)
acc_imag – [inout] Imaginary aprt of complex accumulator \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic right-shift applied to accumulator elements.
bc_sat – [in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k^*\)
- Throws
ET_LOAD_STORE – Raised if
acc_real
,acc_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_conj_nmacc(int16_t acc_real[], int16_t acc_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)¶
Multiply one complex 16-bit vector element-wise by the complex conjugate of another, and subtract the result from an accumulator.
acc_real[]
andacc_imag[]
together represent the complex 16-bit accumulator mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isacc_real[k]
, and each \(Im\{a_k\}\) isacc_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address.
length
is the number of elements in each of the vectors.acc_shr
is the signed arithmetic right-shift applied to the accumulators \(a_k\).bc_sat
is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k^*\) before being subtracted from the accumulator.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k \leftarrow Re\{b_k\} \cdot Re\{c_k\} + Im\{b_k\} \cdot Im\{c_k\} \\ & s_k \leftarrow Im\{b_k\} \cdot Re\{c_k\} - Re\{b_k\} \cdot Im\{c_k\} \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & Re\{a_k\} \leftarrow sat_{16}( Re\{\hat{a}_k\} - round( sat_{16}( v_k \cdot 2^{-bc\_sat} ) ) ) \\ & Im\{a_k\} \leftarrow sat_{16}( Im\{\hat{a}_k\} - round( sat_{16}( s_k \cdot 2^{-bc\_sat} ) ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
- Parameters
acc_real – [inout] Real part of complex accumulator \(\bar a\)
acc_imag – [inout] Imaginary aprt of complex accumulator \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic right-shift applied to accumulator elements.
bc_sat – [in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k^*\)
- Throws
ET_LOAD_STORE – Raised if
acc_real
,acc_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_mul(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t a_shr)¶
Multiply one complex 16-bit vector element-wise by another.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
,b_imag[]
,c_real[]
andc_imag[]
.length
is the number of elements in each of the vectors.a_shr
is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding intermediate results.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k = \leftarrow Re\{b_k\} \cdot Re\{c_k\} - Im\{b_k\} \cdot Im\{c_k\} \\ & s_k = \leftarrow Im\{b_k\} \cdot Re\{c_k\} + Re\{b_k\} \cdot Im\{c_k\} \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \(\bar{b} \cdot 2^{b\_exp}\) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_complex_s16_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
a_shr – [in] Right-shift applied to 32-bit intermediate results.
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_real_mul(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const unsigned length, const right_shift_t a_shr)¶
Multiply a complex 16-bit vector element-wise by a real 16-bit vector.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
represents the real 16-bit input mantissa vector \(\bar c\).Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
,b_imag[]
andc_real[]
.length
is the number of elements in each of the vectors.a_shr
is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k = \leftarrow Re\{b_k\} \cdot c_k \\ & s_k = \leftarrow Im\{b_k\} \cdot c_k \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_s16_real_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
a_shr – [in] Right-shift applied to 32-bit intermediate results.
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
,b_imag
orc_real
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
headroom_t xs3_vect_complex_s16_real_scale(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c, const unsigned length, const right_shift_t a_shr)¶
Multiply a complex 16-bit vector by a real scalar.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
andb_imag[]
.c
is the real 16-bit input mantissa \(c\).length
is the number of elements in each of the vectors.a_shr
is an unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k = \leftarrow Re\{b_k\} \cdot c \\ & s_k = \leftarrow Im\{b_k\} \cdot c \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_complex_s16_real_scale_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c – [in] Real input scalar \(c\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
a_shr – [in] Right-shift applied to 32-bit intermediate results.
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
,b_imag
orc
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
headroom_t xs3_vect_complex_s16_scale(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real, const int16_t c_imag, const unsigned length, const right_shift_t a_shr)¶
Multiply a complex 16-bit vector by a complex 16-bit scalar.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
andb_imag[]
.c_real
andc_imag
are the real and imaginary parts of the complex 16-bit input mantissa \(c\).length
is the number of elements in each of the vectors.a_shr
is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k = \leftarrow Re\{b_k\} \cdot Re\{c\} - Im\{b_k\} \cdot Im\{c\} \\ & s_k = \leftarrow Im\{b_k\} \cdot Re\{c\} + Re\{b_k\} \cdot Im\{c\} \\ & Re\{a_k\} \leftarrow round( sat_{16}( v_k \cdot 2^{-a\_shr} ) ) \\ & Im\{a_k\} \leftarrow round( sat_{16}( s_k \cdot 2^{-a\_shr} ) ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the complex 16-bit mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_complex_s16_scale_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input scalar \(c\)
c_imag – [in] Imaginary part of complex input scalar \(c\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
a_shr – [in] Right-shift applied to 32-bit intermediate results
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
void xs3_vect_complex_s16_set(int16_t a_real[], int16_t a_imag[], const int16_t b_real, const int16_t b_imag, const unsigned length)¶
Set each element of a complex 16-bit vector to a specified value.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
. Each must begin at a word-aligned address.b_real
andb_imag
are the real and imaginary parts of the complex 16-bit input mantissa \(b\). Eacha_real[k]
will be set tob_real
. Eacha_imag[k]
will be set tob_imag
.length
is the number of elements ina_real[]
anda_imag[]
.- Operation Performed:
- \[\begin{split}\begin{align*} & Re\{a_k\} \leftarrow Re\{b\} \\ & Im\{a_k\} \leftarrow Im\{b\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(b\) is the mantissa of floating-point value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input scalar \(b\)
b_imag – [in] Imaginary part of complex input scalar \(b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a_real
ora_imag
is not word-aligned (See Note: Vector Alignment)
-
headroom_t xs3_vect_complex_s16_shl(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const left_shift_t b_shl)¶
Left-shift each element of a complex 16-bit vector by a specified number of bits.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
andb_imag[]
.length
is the number of elements in \(\bar a\) and \(\bar b\).b_shl
is the signed arithmetic left-shift applied to each element of \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & Re\{a_k\} \leftarrow sat_{16}(\lfloor Re\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{16}(\lfloor Im\{b_k\} \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shl – [in] Left-shift applied to \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
orb_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_shr(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const right_shift_t b_shr)¶
Right-shift each element of a complex 16-bit vector by a specified number of bits.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
andb_imag[]
.length
is the number of elements in \(\bar a\) and \(\bar b\).b_shr
is the signed arithmetic right-shift applied to each element of \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & Re\{a_k\} \leftarrow sat_{16}(\lfloor Re\{b_k\} \cdot 2^{-b\_shr} \rfloor) \\ & Im\{a_k\} \leftarrow sat_{16}(\lfloor Im\{b_k\} \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{-b\_shr}\) and \(a\_exp = b\_exp\).
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Right-shift applied to \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
orb_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_complex_s16_squared_mag(int16_t a[], const int16_t b_real[], const int16_t b_imag[], const unsigned length, const right_shift_t a_shr)¶
Get the squared magnitudes of elements of a complex 16-bit vector.
a[]
represents the real 16-bit output mantissa vector \(\bar a\).b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.Each of the input vectors must begin at a word-aligned address.
length
is the number of elements in each of the vectors.a_shr
is the unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow ((Re\{b_k'\})^2 + (Im\{b_k'\})^2)\cdot 2^{-a\_shr} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the complex 16-bit mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the real 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = 2 \cdot b\_exp + a\_shr\).
The function xs3_vect_complex_s16_squared_mag_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).
- Parameters
a – [out] Real output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
a_shr – [in] Right-shift appled to 32-bit intermediate results
- Throws
ET_LOAD_STORE – Raised if
a
,b_real
orb_imag
is not word-aligned (See Note: Vector Alignment)
-
headroom_t xs3_vect_complex_s16_sub(int16_t a_real[], int16_t a_imag[], const int16_t b_real[], const int16_t b_imag[], const int16_t c_real[], const int16_t c_imag[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Subtract one complex 16-bit vector from another.
a_real[]
anda_imag[]
together represent the complex 16-bit output mantissa vector \(\bar a\). Each \(Re\{a_k\}\) isa_real[k]
, and each \(Im\{a_k\}\) isa_imag[k]
.b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\). Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.c_real[]
andc_imag[]
together represent the complex 16-bit input mantissa vector \(\bar c\). Each \(Re\{c_k\}\) isc_real[k]
, and each \(Im\{c_k\}\) isc_imag[k]
.Each of the input vectors must begin at a word-aligned address. This operation can be performed safely in-place on inputs
b_real[]
,b_imag[]
,c_real[]
andc_imag[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} - Re\{c_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} - Im\{c_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the complex 16-bit mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the complex 16-bit mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_complex_s16_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
- Parameters
a_real – [out] Real part of complex output vector \(\bar a\)
a_imag – [out] Imaginary aprt of complex output vector \(\bar a\)
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
c_real – [in] Real part of complex input vector \(\bar c\)
c_imag – [in] Imaginary part of complex input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Right-shift applied to \(\bar b\)
c_shr – [in] Right-shift applied to \(\bar c\)
- Throws
ET_LOAD_STORE – Raised if
a_real
,a_imag
,b_real
,b_imag
,c_real
orc_imag
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of output vector \(\bar a\).
-
complex_s32_t xs3_vect_complex_s16_sum(const int16_t b_real[], const int16_t b_imag[], const unsigned length)¶
Get the sum of elements of a complex 16-bit vector.
b_real[]
andb_imag[]
together represent the complex 16-bit input mantissa vector \(\bar b\), and must both begin at a word-aligned address. Each \(Re\{b_k\}\) isb_real[k]
, and each \(Im\{b_k\}\) isb_imag[k]
.length
is the number of elements in \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & Re\{a\} \leftarrow \sum_{k=0}^{length-1} \left( Re\{b_k\} \right) \\ & Im\{a\} \leftarrow \sum_{k=0}^{length-1} \left( Im\{b_k\} \right) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the complex 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
b_real – [in] Real part of complex input vector \(\bar b\)
b_imag – [in] Imaginary part of complex input vector \(\bar b\)
length – [in] Number of elements in vector \(\bar b\).
- Throws
ET_LOAD_STORE – Raised if
b_real
orb_imag
is not word-aligned (See Note: Vector Alignment)- Returns
\(a\), the 32-bit complex sum of elements in \(\bar b\).
-
headroom_t xs3_vect_s16_abs(int16_t a[], const int16_t b[], const unsigned length)¶
Compute the element-wise absolute value of a 16-bit vector.
a[]
andb[]
represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.length
is the number of elements in each of the vectors.- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow sat_{32}(\left| b_k \right|) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
int32_t xs3_vect_s16_abs_sum(const int16_t b[], const unsigned length)¶
Compute the sum of the absolute values of elements of a 16-bit vector.
b[]
represents the 16-bit vector \(\bar b\).b[]
must begin at a word-aligned address.length
is the number of elements in \(\bar b\).- Operation Performed:
- \[\begin{align*} a \leftarrow \sum_{k=0}^{length-1} \left| b_k \right| \end{align*}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
The 32-bit sum \(a\)
-
headroom_t xs3_vect_s16_add(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Add one 16-bit BFP vector to another.
a[]
,b[]
andc[]
represent the 16-bit vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' = sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' = sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow sat_{16}\!\left( b_k' + c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_s16_add_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Right-shift appled to \(\bar b\)
c_shr – [in] Right-shift appled to \(\bar c\)
- Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
headroom_t xs3_vect_s16_add_scalar(int16_t a[], const int16_t b[], const int16_t c, const unsigned length, const right_shift_t b_shr)¶
Add a scalar to a 16-bit vector.
a[]
,b[]
represent the 16-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.c
is the scalar \(c\) to be added to each element of \(\bar b\).length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic right-shifts applied to each element of \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' = sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow sat_{16}\!\left( b_k' + c \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If elements of \(\bar b\) are the mantissas of BFP vector \( \bar{b} \cdot 2^{b\_exp} \), and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_s16_add_scalar_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Note that \(c\_shr\) is an output of
xs3_vect_s16_add_scalar_prepare()
, but is not a parameter to this function. The \(c\_shr\) produced byxs3_vect_s16_add_scalar_prepare()
is to be applied by the user, and the result passed as inputc
.
See also
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input scalar \(c\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Right-shift appled to \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
unsigned xs3_vect_s16_argmax(const int16_t b[], const unsigned length)¶
Obtain the array index of the maximum element of a 16-bit vector.
b[]
represents the 16-bit input vector \(\bar b\). It must begin at a word-aligned address.length
is the number of elements in \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & a \leftarrow argmax_k\{ b_k \} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elemetns in \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
\(a\), the index of the maximum element of vector \(\bar b\). If there is a tie for the maximum value, the lowest tying index is returned.
-
unsigned xs3_vect_s16_argmin(const int16_t b[], const unsigned length)¶
Obtain the array index of the minimum element of a 16-bit vector.
b[]
represents the 16-bit input vector \(\bar b\). It must begin at a word-aligned address.length
is the number of elements in \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & a \leftarrow argmin_k\{ b_k \} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elemetns in \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
\(a\), the index of the minimum element of vector \(\bar b\). If there is a tie for the minimum value, the lowest tying index is returned.
-
headroom_t xs3_vect_s16_clip(int16_t a[], const int16_t b[], const unsigned length, const int16_t lower_bound, const int16_t upper_bound, const right_shift_t b_shr)¶
Clamp the elements of a 16-bit vector to a specified range.
a[]
andb[]
represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.length
is the number of elements in each of the vectors.lower_bound
andupper_bound
are the lower and upper bounds of the clipping range respectively. These bounds are checked for each element of \(\bar b\) only afterb_shr
is applied.b_shr
is the signed arithmetic right-shift applied to elements of \(\bar b\) before being compared to the upper and lower bounds.If \(\bar b\) are the mantissas for a BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the exponent \(a\_exp\) of the output BFP vector \(\bar{a} \cdot 2^{a\_exp}\) is given by \(a\_exp = b\_exp + b\_shr\).
- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a_k \leftarrow \begin{cases} lower\_bound & b_k' \le lower\_bound \\ & upper\_bound & b_k' \ge upper\_bound \\ & b_k' & otherwise \end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
lower_bound – [in] Lower bound of clipping range
upper_bound – [in] Upper bound of clipping range
b_shr – [in] Arithmetic right-shift applied to elements of \(\bar b\) prior to clipping
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of output vector \(\bar a\)
-
int64_t xs3_vect_s16_dot(const int16_t b[], const int16_t c[], const unsigned length)¶
Compute the inner product of two 16-bit vectors.
b[]
andc[]
represent the 32-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address.length
is the number of elements in each of the vectors.- Todo:
I don’t think there are currently any functions in this library to perform this bit-depth reduction in a user-friendly way.
- Operation Performed:
- \[\begin{align*} a \leftarrow \sum_{k=0}^{length-1}\left( b_k \cdot c_k \right) \end{align*}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the mantissas of the BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c}\cdot 2^{c\_exp}\), then result \(a\) is the mantissa of the result \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp\).
If needed, the bit-depth of \(a\) can then be reduced to 16 or 32 bits to get a new result \(a' \cdot 2^{a\_exp'}\) where \(a' = a \cdot 2^{-a\_shr}\) and \(a\_exp' = a\_exp + a\_shr\).
- Notes
-
The sum \(a\) is accumulated simultaneously into 16 48-bit accumulators which are summed together at the final step. So long as
length
is less than roughly 2 million, no overflow or saturation of the resulting sum is possible.
- Parameters
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar b\) and \(\bar c\)
- Throws
ET_LOAD_STORE – Raised if
b
orc
is not word-aligned (See Note: Vector Alignment)- Returns
\(a\), the inner product of vectors \(\bar b\) and \(\bar c\).
-
int32_t xs3_vect_s16_energy(const int16_t b[], const unsigned length, const right_shift_t b_shr)¶
Calculate the energy (sum of squares of elements) of a 16-bit vector.
b[]
represents the 16-bit vector \(\bar b\).b[]
must begin at a word-aligned address.length
is the number of elements in \(\bar b\).b_shr
is the signed arithmetic right-shift applied to elements of \(\bar b\).b_shr
should be chosen to avoid the possibility of saturation. See the note below.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & a \leftarrow \sum_{k=0}^{length-1} (b_k')^2 \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of the BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then floating-point result is \(a \cdot 2^{a\_exp}\), where the 32-bit mantissa \(a\) is returned by this function, and \(a\_exp = 2 \cdot (b\_exp + b\_shr) \).
- Additional Details
-
If \(\bar b\) has \(b\_hr\) bits of headroom, then each product \((b_k')^2\) can be a maximum of \( 2^{30 - 2 \cdot (b\_hr + b\_shr)}\). So long as
length
is less than \(1 + 2\cdot (b\_hr + b\_shr) \), such errors should not be possible. Each increase of \(b\_shr\) by \(1\) doubles the number of elements that can be summed without risk of overflow.If the caller’s mantissa vector is longer than that, the full result can be found by calling this function multiple times for partial results on sub-sequences of the input, and adding the results in user code.
In many situations the caller may have a priori knowledge that saturation is impossible (or very nearly so), in which case this guideline may be disregarded. However, such situations are application-specific and are well beyond the scope of this documentation, and as such are left to the user’s discretion.
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
b_shr – [in] Right-shift appled to \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
64-bit mantissa of vector \(\bar b\)’s energy
-
headroom_t xs3_vect_s16_headroom(const int16_t b[], const unsigned length)¶
Calculate the headroom of a 16-bit vector.
The headroom of an N-bit integer is the number of bits that the integer’s value may be left-shifted without any information being lost. Equivalently, it is one less than the number of leading sign bits.
The headroom of an
int16_t
array is the minimum of the headroom of each of itsint16_t
elements.This function efficiently traverses the elements of
b[]
to determine its headroom.b[]
represents the 16-bit vector \(\bar b\).b[]
must begin at a word-aligned address.length
is the number of elements inb[]
.- Operation Performed:
- \[\begin{align*} a \leftarrow min\!\{ HR_{16}\left(x_0\right), HR_{16}\left(x_1\right), ..., HR_{16}\left(x_{length-1}\right) \} \end{align*}\]
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] The number of elements in vector \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of vector \(\bar b\)
-
void xs3_vect_s16_inverse(int16_t a[], const int16_t b[], const unsigned length, const unsigned scale)¶
Compute the inverse of elements of a 16-bit vector.
a[]
andb[]
represent the 16-bit mantissa vectors \(\bar a\) and \(\bar b\) respectively. This operation can be performed safely in-place onb[]
.length
is the number of elements in each of the vectors.scale
is a scaling parameter used to maximize the precision of the result.- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow \lfloor\frac{2^{scale}}{b_k}\rfloor \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = scale - b\_exp\).
The function xs3_vect_s16_inverse_prepare() can be used to obtain values for \(a\_exp\) and \(scale\).
See also
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
scale – [in] Scale factor applied to dividend when computing inverse
- Returns
Headroom of output vector \(\bar a\)
-
int16_t xs3_vect_s16_max(const int16_t b[], const unsigned length)¶
Find the maximum value in a 16-bit vector.
b[]
represents the 16-bit vector \(\bar b\). It must begin at a word-aligned address.length
is the number of elements in \(\bar b\).- Operation Performed:
- \[\begin{align*} max\{ x_0, x_1, ..., x_{length-1} \} \end{align*}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 16-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
Maximum value from \(\bar b\)
-
headroom_t xs3_vect_s16_max_elementwise(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Get the element-wise maximum of two 16-bit vectors.
a[]
,b[]
andc[]
represent the 16-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
, but not onc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow max(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).
The function xs3_vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Warning
For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Right-shift appled to \(\bar b\)
c_shr – [in] Right-shift appled to \(\bar c\)
- Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of vector \(\bar a\)
-
int16_t xs3_vect_s16_min(const int16_t b[], const unsigned length)¶
Find the minimum value in a 16-bit vector.
b[]
represents the 16-bit vector \(\bar b\). It must begin at a word-aligned address.length
is the number of elements in \(\bar b\).- Operation Performed:
- \[\begin{align*} max\{ x_0, x_1, ..., x_{length-1} \} \end{align*}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 16-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
Minimum value from \(\bar b\)
-
headroom_t xs3_vect_s16_min_elementwise(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Get the element-wise minimum of two 16-bit vectors.
a[]
,b[]
andc[]
represent the 16-bit mantissa vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
, but not onc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' \leftarrow sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow min(b_k', c_k') \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\).
The function xs3_vect_2vec_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
Warning
For correct operation, this function requires at least 1 bit of headroom in each mantissa vector after the shifts have been applied.
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Right-shift appled to \(\bar b\)
c_shr – [in] Right-shift appled to \(\bar c\)
- Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of vector \(\bar a\)
-
headroom_t xs3_vect_s16_macc(int16_t acc[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)¶
Multiply one 16-bit vector element-wise by another, and add the result to an accumulator.
acc[]
represents the 16-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the 16-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a word-aligned address.
length
is the number of elements in each of the vectors.acc_shr
is the signed arithmetic right-shift applied to the accumulators \(a_k\) prior to accumulation.bc_sat
is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before accumulation.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k \leftarrow round( sat_{16}( b_k \cdot c_k \cdot 2^{-bc\_sat} ) ) \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & a_k \leftarrow sat_{16}( \hat{a}_k + v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_macc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
- Parameters
acc – [inout] Accumulator \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic right-shift applied to accumulator elements.
bc_sat – [in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)
- Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_s16_nmacc(int16_t acc[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t acc_shr, const right_shift_t bc_sat)¶
Multiply one 16-bit vector element-wise by another, and subtract the result from an accumulator.
acc[]
represents the 16-bit accumulator mantissa vector \(\bar a\). Each \(a_k\) isacc[k]
.b[]
andc[]
represent the 16-bit input mantissa vectors \(\bar b\) and \(\bar c\), where each \(b_k\) isb[k]
and each \(c_k\) isc[k]
.Each of the input vectors must begin at a word-aligned address.
length
is the number of elements in each of the vectors.acc_shr
is the signed arithmetic right-shift applied to the accumulators \(a_k\) prior to accumulation.bc_sat
is the unsigned arithmetic right-shift applied to the product of \(b_k\) and \(c_k\) before accumulation.- Operation Performed:
- \[\begin{split}\begin{align*} & v_k \leftarrow round( sat_{16}( b_k \cdot c_k \cdot 2^{-bc\_sat} ) ) \\ & \hat{a}_k \leftarrow sat_{16}( a_k \cdot 2^{-acc\_shr} ) \\ & a_k \leftarrow sat_{16}( \hat{a}_k - v_k ) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If inputs \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), and input \(\bar a\) is the accumulator BFP vector \(\bar{a} \cdot 2^{a\_exp}\), then the output values of \(\bar a\) have the exponent \(2^{a\_exp + acc\_shr}\).
For accumulation to make sense mathematically, \(bc\_sat\) must be chosen such that \( a\_exp + acc\_shr = b\_exp + c\_exp + bc\_sat \).
The function xs3_vect_complex_s16_nmacc_prepare() can be used to obtain values for \(a\_exp\), \(acc\_shr\) and \(bc\_sat\) based on the input exponents \(a\_exp\), \(b\_exp\) and \(c\_exp\) and the input headrooms \(a\_hr\), \(b\_hr\) and \(c\_hr\).
See also
- Parameters
acc – [inout] Accumulator \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
acc_shr – [in] Signed arithmetic right-shift applied to accumulator elements.
bc_sat – [in] Unsigned arithmetic right-shift applied to the products of elements \(b_k\) and \(c_k\)
- Throws
ET_LOAD_STORE – Raised if
acc
,b
orc
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\)
-
headroom_t xs3_vect_s16_mul(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t a_shr)¶
Multiply two 16-bit vectors together element-wise.
a[]
,b[]
andc[]
represent the 16-bit vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
orc[]
.length
is the number of elements in each of the vectors.a_shr
is an unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.- Operation Performed:
- \[\begin{split}\begin{align*} & a_k' \leftarrow b_k \cdot c_k \\ & a_k \leftarrow sat_{16}(round(a_k' \cdot 2^{-a\_shr})) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_s16_mul_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
a_shr – [in] Right-shift appled to 32-bit products
- Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not word-aligned (See Note: Vector Alignment) [xs3_vect_s16_mul]- Returns
Headroom of vector \(\bar a\)
-
headroom_t xs3_vect_s16_rect(int16_t a[], const int16_t b[], const unsigned length)¶
Rectify the elements of a 16-bit vector.
Rectification ensures that all outputs are non-negative, changing negative values to 0.
a[]
andb[]
represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.length
is the number of elements in each of the vectors.Each output element
a[k]
is set to the value of the corresponding input elementb[k]
if it is positive, anda[k]
is set to zero otherwise.- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow \begin{cases} b_k & b_k \gt 0 \\ & 0 & b_k \leq 0\end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
headroom_t xs3_vect_s16_scale(int16_t a[], const int16_t b[], const unsigned length, const int16_t c, const right_shift_t a_shr)¶
Multiply a 16-bit vector by a 16-bit scalar.
a[]
andb[]
represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.length
is the number of elements in each of the vectors.c
is the 16-bit scalar \(c\) by which elements of \(\bar b\) are multiplied.a_shr
is an unsigned arithmetic right-shift applied to the 32-bit accumulators holding the penultimate results.- Operation Performed:
- \[\begin{split}\begin{align*} & a_k' \leftarrow b_k \cdot c \\ & a_k \leftarrow sat_{16}(round(a_k' \cdot 2^{-a\_shr})) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \) and \(c\) is the mantissa of floating-point value \(c \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp + c\_exp + a\_shr\).
The function xs3_vect_s16_scale_prepare() can be used to obtain values for \(a\_exp\) and \(a\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
a_shr – [in] Right-shift appled to 32-bit products
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of vector \(\bar a\)
-
void xs3_vect_s16_set(int16_t a[], const int16_t b, const unsigned length)¶
Set all elements of a 16-bit vector to the specified value.
a[]
represents the 16-bit vector \(\bar a\). It must begin at a word-aligned address.b
is the value elements of \(\bar a\) are set to.length
is the number of elements ina[]
.- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow b \\ & \qquad\text{for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(b\) is the mantissa of floating-point value \(b \cdot 2^{b\_exp}\), then the output vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input value \(b\)
length – [in] Number of elements in vector \(\bar a\)
- Throws
ET_LOAD_STORE – Raised if
a
is not word-aligned (See Note: Vector Alignment)
-
headroom_t xs3_vect_s16_shl(int16_t a[], const int16_t b[], const unsigned length, const left_shift_t b_shl)¶
Left-shift the elements of a 16-bit vector by a specified number of bits.
a[]
andb[]
represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.length
is the number of elements in vectors \(\bar a\) and \(\bar b\).b_shl
is the signed arithmetic left-shift applied to each element of \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow sat_{16}(\lfloor b_k \cdot 2^{b\_shl} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{b\_shl}\) and \(a\_exp = b\_exp\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shl – [in] Arithmetic left-shift applied to elements of \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of output vector \(\bar a\)
-
headroom_t xs3_vect_s16_shr(int16_t a[], const int16_t b[], const unsigned length, const right_shift_t b_shr)¶
Right-shift the elements of a 16-bit vector by a specified number of bits.
a[]
andb[]
represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.length
is the number of elements in vectors \(\bar a\) and \(\bar b\).b_shr
is the signed arithmetic right-shift applied to each element of \(\bar b\).- Operation Performed:
- \[\begin{split}\begin{align*} & a_k \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of a BFP vector \( \bar{b} \cdot 2^{b\_exp} \), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(\bar{a} = \bar{b} \cdot 2^{-b\_shr}\) and \(a\_exp = b\_exp\).
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Arithmetic right-shift applied to elements of \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of output vector \(\bar a\)
-
headroom_t xs3_vect_s16_sqrt(int16_t a[], const int16_t b[], const unsigned length, const right_shift_t b_shr, const unsigned depth)¶
Compute the square roots of elements of a 16-bit vector.
a[]
andb[]
represent the 16-bit vectors \(\bar a\) and \(\bar b\) respectively. Each vector must begin at a word-aligned address. This operation can be performed safely in-place onb[]
.length
is the number of elements in each of the vectors.b_shr
is the signed arithmetic right-shift applied to elements of \(\bar b\).depth
is the number of most significant bits to calculate of each \(a_k\). For example, adepth
value of 8 will only compute the 8 most significant byte of the result, with the remaining byte as 0. The maximum value for this parameter isXS3_VECT_SQRT_S16_MAX_DEPTH
(31). The time cost of this operation is approximately proportional to the number of bits computed.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ a_k \leftarrow \begin{cases} & \sqrt{ b_k' } & b_k' >= 0 \\ & 0 & otherwise\end{cases} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \\ & \qquad\text{ where } \sqrt{\cdot} \text{ computes the most significant } depth \text{ bits of the square root.} \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\), where \(a\_exp = (b\_exp + b\_shr - 14)/2\).
Note that because exponents must be integers, that means \(b\_exp + b\_shr\) must be even.
The function xs3_vect_s16_sqrt_prepare() can be used to obtain values for \(a\_exp\) and \(b\_shr\) based on the input exponent \(b\_exp\) and headroom \(b\_hr\).
- Notes
This function assumes roots are real. Negative input elements will result in corresponding outputs of 0.
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in vectors \(\bar a\) and \(\bar b\)
b_shr – [in] Right-shift appled to \(\bar b\)
depth – [in] Number of bits of each output value to compute
- Throws
ET_LOAD_STORE – Raised if
a
orb
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of output vector \(\bar a\)
-
headroom_t xs3_vect_s16_sub(int16_t a[], const int16_t b[], const int16_t c[], const unsigned length, const right_shift_t b_shr, const right_shift_t c_shr)¶
Subtract one 16-bit BFP vector from another.
a[]
,b[]
andc[]
represent the 16-bit vectors \(\bar a\), \(\bar b\) and \(\bar c\) respectively. Each must begin at a word-aligned address. This operation can be performed safely in-place onb[]
orc[]
.length
is the number of elements in each of the vectors.b_shr
andc_shr
are the signed arithmetic right-shifts applied to each element of \(\bar b\) and \(\bar c\) respectively.- Operation Performed:
- \[\begin{split}\begin{align*} & b_k' = sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & c_k' = sat_{16}(\lfloor c_k \cdot 2^{-c\_shr} \rfloor) \\ & a_k \leftarrow sat_{16}\!\left( b_k' - c_k' \right) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}\]
- Block Floating-Point
-
If \(\bar b\) and \(\bar c\) are the mantissas of BFP vectors \( \bar{b} \cdot 2^{b\_exp} \) and \(\bar{c} \cdot 2^{c\_exp}\), then the resulting vector \(\bar a\) are the mantissas of BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
In this case, \(b\_shr\) and \(c\_shr\) must be chosen so that \(a\_exp = b\_exp + b\_shr = c\_exp + c\_shr\). Adding or subtracting mantissas only makes sense if they are associated with the same exponent.
The function xs3_vect_s16_sub_prepare() can be used to obtain values for \(a\_exp\), \(b\_shr\) and \(c\_shr\) based on the input exponents \(b\_exp\) and \(c\_exp\) and the input headrooms \(b\_hr\) and \(c\_hr\).
See also
- Parameters
a – [out] Output vector \(\bar a\)
b – [in] Input vector \(\bar b\)
c – [in] Input vector \(\bar c\)
length – [in] Number of elements in vectors \(\bar a\), \(\bar b\) and \(\bar c\)
b_shr – [in] Right-shift appled to \(\bar b\)
c_shr – [in] Right-shift appled to \(\bar c\)
- Throws
ET_LOAD_STORE – Raised if
a
,b
orc
is not word-aligned (See Note: Vector Alignment)- Returns
Headroom of the output vector \(\bar a\).
-
int32_t xs3_vect_s16_sum(const int16_t b[], const unsigned length)¶
Get the sum of elements of a 16-bit vector.
b[]
represents the 16-bit vector \(\bar b\).b[]
must begin at a word-aligned address.length
is the number of elements in \(\bar b\).- Operation Performed:
- \[\begin{align*} a \leftarrow \sum_{k=0}^{length-1} b_k \end{align*}\]
- Block Floating-Point
-
If \(\bar b\) are the mantissas of BFP vector \(\bar{b} \cdot 2^{b\_exp}\), then the returned value \(a\) is the 32-bit mantissa of floating-point value \(a \cdot 2^{a\_exp}\), where \(a\_exp = b\_exp\).
- Parameters
b – [in] Input vector \(\bar b\)
length – [in] Number of elements in \(\bar b\)
- Throws
ET_LOAD_STORE – Raised if
b
is not word-aligned (See Note: Vector Alignment)- Returns
The 32-bit sum \(a\)
XS3 16-Bit Prepare Functions¶
-
void xs3_vect_complex_s16_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *bc_sat, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const headroom_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and shifts needed by xs3_vect_complex_s16_macc().
This function is used in conjunction with xs3_vect_complex_s16_macc() to perform an element-wise multiply-accumlate of complex 16-bit BFP vectors.
This function computes
new_acc_exp
andacc_shr
andbc_sat
, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of xs3_vect_complex_s16_macc().acc_exp
is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereasnew_acc_exp
is the exponent corresponding to the updated accumulator vector.b_exp
andc_exp
are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.acc_hr
,b_hr
andc_hr
are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling xs3_vect_complex_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), theacc_shr
andbc_sat
produced by this function can be adjusted according to the following:// Presumed to be set somewhere exponent_t acc_exp, b_exp, c_exp; headroom_t acc_hr, b_hr, c_hr; exponent_t desired_exp; ... // Call prepare right_shift_t acc_shr, bc_sat; xs3_vect_complex_s16_macc_prepare(&acc_exp, &acc_shr, &bc_sat, acc_exp, b_exp, c_exp, acc_hr, b_hr, c_hr); // Modify results right_shift_t mant_shr = desired_exp - acc_exp; acc_exp += mant_shr; acc_shr += mant_shr; bc_sat += mant_shr; // acc_shr and bc_sat may now be used in a call to xs3_vect_complex_s16_macc()
When applying the above adjustment, the following conditions should be maintained:
bc_sat >= 0
(bc_sat
is an unsigned right-shift)acc_shr > -acc_hr
(Shifting any further left may cause saturation)
It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.
See also
- Parameters
new_acc_exp – [out] Exponent associated with output mantissa vector \(\bar a\) (after macc)
acc_shr – [out] Signed arithmetic right-shift used for \(\bar a\) in xs3_vect_complex_s16_macc()
bc_sat – [out] Unsigned arithmetic right-shift applied to the product of elements \(b_k\) and \(c_k\) in xs3_vect_complex_s16_macc()
acc_exp – [in] Exponent associated with input mantissa vector \(\bar a\) (before macc)
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
acc_hr – [in] Headroom of input mantissa vector \(\bar a\) (before macc)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)
-
void xs3_vect_complex_s16_mul_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and output shift used by xs3_vect_complex_s16_mul() and xs3_vect_complex_s16_conj_mul().
This function is used in conjunction with xs3_vect_complex_s16_mul() to perform a complex element-wise multiplication of two complex 16-bit BFP vectors.
This function computes
a_exp
anda_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms of associated with the input vectors.a_shr
is the shift parameter required by xs3_vect_complex_s16_mul() to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling xs3_vect_complex_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), thea_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_a_shr = a_shr + (desired_exp - a_exp);
When applying the above adjustment, the following conditions should be maintained:
new_a_shr >= 0
Be aware that using smaller values than strictly necessary for
a_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
- Notes
-
Using the outputs of this function, an output mantissa which would otherwise be
INT16_MIN
will instead saturate to-INT16_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
- Parameters
a_exp – [out] Exponent associated with output mantissa vector \(\bar a\)
a_shr – [out] Unsigned arithmetic right-shift for \(\bar b\) used by xs3_vect_complex_s16_mul()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)
-
void xs3_vect_complex_s16_real_mul_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and output shift used by xs3_vect_complex_s16_real_mul().
This function is used in conjunction with xs3_vect_complex_s16_real_mul() to perform a complex element-wise multiplication of a complex 16-bit BFP vector by a real 16-bit vector.
This function computes
a_exp
anda_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms of associated with the input vectors.a_shr
is the shift parameter required by xs3_vect_complex_s16_real_mul() to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling xs3_vect_complex_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), thea_shr
andc_shr
produced by this function can be adjusted according to the following:exponent_t desired_exp = ...; // Value known a priori right_shift_t new_a_shr = a_shr + (desired_exp - a_exp);
When applying the above adjustment, the following conditions should be maintained:
new_a_shr >= 0
Be aware that using smaller values than strictly necessary for
a_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
- Notes
-
Using the outputs of this function, an output mantissa which would otherwise be
INT16_MIN
will instead saturate to-INT16_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
See also
- Parameters
a_exp – [out] Exponent associated with output mantissa vector \(\bar a\)
a_shr – [out] Unsigned arithmetic right-shift for \(\bar a\) used by xs3_vect_complex_s16_real_mul()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)
-
void xs3_vect_complex_s16_squared_mag_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const headroom_t b_hr)¶
Obtain the output exponent and input shift used by xs3_vect_complex_s16_squared_mag().
This function is used in conjunction with xs3_vect_complex_s16_squared_mag() to compute the squared magnitude of each element of a complex 16-bit BFP vector.
This function computes
a_exp
anda_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and is be chosen to maximize precision when elements of \(\bar a\) are computed. Thea_exp
chosen by this function is derived from the exponent and headroom associated with the input vector.a_shr
is the shift parameter required by xs3_vect_complex_s16_mag() to achieve the chosen output exponenta_exp
.b_exp
is the exponent associated with the input mantissa vector \(\bar b\).b_hr
is the headroom of \(\bar b\). If the headroom of \(\bar b\) is unknown it can be calculated using xs3_vect_complex_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), thea_shr
produced by this function can be adjusted according to the following:exponent_t a_exp; right_shift_t a_shr; xs3_vect_s16_mul_prepare(&a_exp, &a_shr, b_exp, c_exp, b_hr, c_hr); exponent_t desired_exp = ...; // Value known a priori a_shr = a_shr + (desired_exp - a_exp); a_exp = desired_exp;
When applying the above adjustment, the following condition should be maintained:
a_shr >= 0
Using larger values than strictly necessary for
a_shr
may result in unnecessary underflows or loss of precision.
See also
- Parameters
a_exp – [out] Output exponent associated with output mantissa vector \(\bar a\)
a_shr – [out] Unsigned arithmetic right-shift for \(\bar a\) used by xs3_vect_complex_s16_squared_mag()
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
-
void xs3_vect_s16_clip_prepare(exponent_t *a_exp, right_shift_t *b_shr, int16_t *lower_bound, int16_t *upper_bound, const exponent_t b_exp, const exponent_t bound_exp, const headroom_t b_hr)¶
Obtain the output exponent, input shift and modified bounds used by xs3_vect_s16_clip().
This function is used in conjunction with xs3_vect_s16_clip() to bound the elements of a 32-bit BFP vector to a specified range.
This function computes
a_exp
,b_shr
,lower_bound
andupper_bound
.a_exp
is the exponent associated with the 16-bit mantissa vector \(\bar a\) computed by xs3_vect_s32_clip().b_shr
is the shift parameter required by xs3_vect_s16_clip() to achieve the output exponenta_exp
.lower_bound
andupper_bound
are the 16-bit mantissas which indicate the lower and upper clipping bounds respectively. The values are modified by this function, and the resulting values should be passed along to xs3_vect_s16_clip().b_exp
is the exponent associated with the input mantissa vector \(\bar b\).bound_exp
is the exponent associated with the bound mantissaslower_bound
andupper_bound
respectively.b_hr
is the headroom of \(\bar b\). If unknown, it can be obtained using xs3_vect_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).See also
- Parameters
a_exp – [out] Exponent associated with output mantissa vector \(\bar a\)
b_shr – [out] Signed arithmetic right-shift for \(\bar b\) used by xs3_vect_s32_clip()
lower_bound – [inout] Lower bound of clipping range
upper_bound – [inout] Upper bound of clipping range
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
bound_exp – [in] Exponent associated with clipping bounds
lower_bound
andupper_bound
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
-
void xs3_vect_s16_inverse_prepare(exponent_t *a_exp, unsigned *scale, const int16_t b[], const exponent_t b_exp, const unsigned length)¶
Obtain the output exponent and scaling parameter used by xs3_vect_s16_inverse().
This function is used in conjunction with xs3_vect_s16_inverse() to compute the inverse of elements of a 16-bit BFP vector.
This function computes
a_exp
andscale
.a_exp
is the exponent associated with output mantissa vector \(\bar a\), and must be chosen to avoid overflow in the smallest element of the input vector, which when inverted becomes the largest output element. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation. Thea_exp
chosen by this function is derived from the exponent and smallest element of the input vector.scale
is a scaling parameter used by xs3_vect_s16_inverse() to achieve the chosen output exponent.b[]
is the input mantissa vector \(\bar b\).b_exp
is the exponent associated with the input mantissa vector \(\bar b\).length
is the number of elements in \(\bar b\).- Todo:
In lib_dsp, the inverse function has a floor, which prevents tiny values from completely dominating the output behavior. Perhaps I should include that?
See also
- Parameters
a_exp – [out] Exponent of output vector \(\bar a\)
scale – [out] Scale factor to be applied when computing inverse
b – [in] Input vector \(\bar b\)
b_exp – [in] Exponent of \(\bar b\)
length – [in] Number of elements in vector \(\bar b\)
-
void xs3_vect_s16_macc_prepare(exponent_t *new_acc_exp, right_shift_t *acc_shr, right_shift_t *bc_sat, const exponent_t acc_exp, const exponent_t b_exp, const exponent_t c_exp, const headroom_t acc_hr, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and shifts needed by xs3_vect_s16_macc().
This function is used in conjunction with xs3_vect_s16_macc() to perform an element-wise multiply-accumlate of 16-bit BFP vectors.
This function computes
new_acc_exp
andacc_shr
andbc_sat
, which are selected to maximize precision in the resulting accumulator vector without causing saturation of final or intermediate values. Normally the caller will pass these outputs to their corresponding inputs of xs3_vect_s16_macc().acc_exp
is the exponent associated with the accumulator mantissa vector \(\bar a\) prior to the operation, whereasnew_acc_exp
is the exponent corresponding to the updated accumulator vector.b_exp
andc_exp
are the exponents associated with the complex input mantissa vectors \(\bar b\) and \(\bar c\) respectively.acc_hr
,b_hr
andc_hr
are the headrooms of \(\bar a\), \(\bar b\) and \(\bar c\) respectively. If the headroom of any of these vectors is unknown, it can be obtained by calling xs3_vect_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), theacc_shr
andbc_sat
produced by this function can be adjusted according to the following:// Presumed to be set somewhere exponent_t acc_exp, b_exp, c_exp; headroom_t acc_hr, b_hr, c_hr; exponent_t desired_exp; ... // Call prepare right_shift_t acc_shr, bc_sat; xs3_vect_s16_macc_prepare(&acc_exp, &acc_shr, &bc_sat, acc_exp, b_exp, c_exp, acc_hr, b_hr, c_hr); // Modify results right_shift_t mant_shr = desired_exp - acc_exp; acc_exp += mant_shr; acc_shr += mant_shr; bc_sat += mant_shr; // acc_shr and bc_sat may now be used in a call to xs3_vect_s16_macc()
When applying the above adjustment, the following conditions should be maintained:
bc_sat >= 0
(bc_sat
is an unsigned right-shift)acc_shr > -acc_hr
(Shifting any further left may cause saturation)
It is up to the user to ensure any such modification does not result in saturation or unacceptable loss of precision.
See also
- Parameters
new_acc_exp – [out] Exponent associated with output mantissa vector \(\bar a\) (after macc)
acc_shr – [out] Signed arithmetic right-shift used for \(\bar a\) in xs3_vect_s16_macc()
bc_sat – [out] Unsigned arithmetic right-shift applied to the product of elements \(b_k\) and \(c_k\) in xs3_vect_s16_macc()
acc_exp – [in] Exponent associated with input mantissa vector \(\bar a\) (before macc)
b_exp – [in] Exponent associated with input mantissa vector \(\bar b\)
c_exp – [in] Exponent associated with input mantissa vector \(\bar c\)
acc_hr – [in] Headroom of input mantissa vector \(\bar a\) (before macc)
b_hr – [in] Headroom of input mantissa vector \(\bar b\)
c_hr – [in] Headroom of input mantissa vector \(\bar c\)
-
void xs3_vect_s16_mul_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
[xs3_vect_s16_mul]
Obtain the output exponent and output shift used by xs3_vect_s16_mul().
This function is used in conjunction with xs3_vect_s16_mul() to perform an element-wise multiplication of two 16-bit BFP vectors.
This function computes
a_exp
anda_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms of associated with the input vectors.a_shr
is an arithmetic right-shift applied by xs3_vect_complex_s16_mul() to the 32-bit products of input elements to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with the input mantissa vectors \(\bar b\) and \(\bar c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(\bar c\) respectively. If the headroom of \(\bar b\) or \(\bar c\) is unknown, they can be obtained by calling xs3_vect_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), thea_shr
produced by this function can be adjusted according to the following:exponent_t a_exp; right_shift_t a_shr; xs3_vect_s16_mul_prepare(&a_exp, &a_shr, b_exp, c_exp, b_hr, c_hr); exponent_t desired_exp = ...; // Value known a priori a_shr = a_shr + (desired_exp - a_exp); a_exp = desired_exp;
When applying the above adjustment, the following conditions should be maintained:
a_shr >= 0
Be aware that using a smaller value than strictly necessary for
a_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
- Notes
-
Using the outputs of this function, an output mantissa which would otherwise be
INT16_MIN
will instead saturate to-INT16_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
See also
- Parameters
a_exp – [out] Exponent of output elements of xs3_vect_s16_mul()
a_shr – [out] Right-shift supplied to xs3_vect_s16_mul()
b_exp – [in] Exponent associated with \(\bar b\)
c_exp – [in] Exponent associated with \(\bar c\)
b_hr – [in] Headroom of \(\bar b\)
c_hr – [in] Headroom of \(\bar c\)
-
void xs3_vect_s16_scale_prepare(exponent_t *a_exp, right_shift_t *a_shr, const exponent_t b_exp, const exponent_t c_exp, const headroom_t b_hr, const headroom_t c_hr)¶
Obtain the output exponent and output shift used by xs3_vect_s16_scale().
This function is used in conjunction with xs3_vect_s16_scale() to perform multiplication of a 16-bit BFP vector \(\bar{b} \cdot 2^{b\_exp}\) by a 16-bit scalar \(c \cdot 2^{c\_exp}\). The result is another 16-bit BFP vector \(\bar{a} \cdot 2^{a\_exp}\).
This function computes
a_exp
anda_shr
.a_exp
is the exponent associated with mantissa vector \(\bar a\), and must be chosen to be large enough to avoid overflow when elements of \(\bar a\) are computed. To maximize precision, this function choosesa_exp
to be the smallest exponent known to avoid saturation (see exception below). Thea_exp
chosen by this function is derived from the exponents and headrooms of associated with the inputs.a_shr
is an arithmetic right-shift applied by xs3_vect_complex_s16_scale() to the 32-bit products of input elements to achieve the chosen output exponenta_exp
.b_exp
andc_exp
are the exponents associated with \(\bar b\) and \(c\) respectively.b_hr
andc_hr
are the headroom of \(\bar b\) and \(c\) respectively. If the headroom of \(\bar b\) or \(c\) are unknown, they can be obtained by calling xs3_vect_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), thea_shr
produced by this function can be adjusted according to the following:exponent_t a_exp; right_shift_t a_shr; xs3_vect_s16_scale_prepare(&a_exp, &a_shr, b_exp, c_exp, b_hr, c_hr); exponent_t desired_exp = ...; // Value known a priori a_shr = a_shr + (desired_exp - a_exp); a_exp = desired_exp;
When applying the above adjustment, the following conditions should be maintained:
a_shr >= 0
Be aware that using a smaller value than strictly necessary for
a_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.
- Notes
-
Using the outputs of this function, an output mantissa which would otherwise be
INT16_MIN
will instead saturate to-INT16_MAX
. This is due to the symmetric saturation logic employed by the VPU and is a hardware feature. This is a corner case which is usually unlikely and results in 1 LSb of error when it occurs.
See also
- Parameters
a_exp – [out] Exponent of output elements of xs3_vect_s16_scale()
a_shr – [out] Right-shift supplied to xs3_vect_s16_scale()
b_exp – [in] Exponent associated with \(\bar b\)
c_exp – [in] Exponent associated with \(\bar c\)
b_hr – [in] Headroom of \(\bar b\)
c_hr – [in] Headroom of \(\bar c\)
-
void xs3_vect_s16_sqrt_prepare(exponent_t *a_exp, right_shift_t *b_shr, const exponent_t b_exp, const right_shift_t b_hr)¶
Obtain the output exponent and shift parameter used by xs3_vect_s16_sqrt().
This function is used in conjunction withx xs3_vect_s16_sqrt() to compute the square root of elements of a 16-bit BFP vector.
This function computes
a_exp
andb_shr
.a_exp
is the exponent associated with output mantissa vector \(\bar a\), and should be chosen to maximize the precision of the results. To that end, this function choosesa_exp
to be the smallest exponent known to avoid saturation of the resulting mantissa vector \(\bar a\). It is derived from the exponent and headroom of the input BFP vector.b_shr
is the shift parameter required by xs3_vect_s16_sqrt() to achieve the chosen output exponenta_exp
.b_exp
is the exponent associated with the input mantissa vector \(\bar b\).b_hr
is the headroom of \(\bar b\). If it is unknown, it can be obtained using xs3_vect_s16_headroom(). Alternatively, the value0
can always be safely used (but may result in reduced precision).- Adjusting Output Exponents
-
If a specific output exponent
desired_exp
is needed for the result (e.g. for emulating fixed-point arithmetic), theb_shr
produced by this function can be adjusted according to the following:exponent_t a_exp; right_shift_t b_shr; xs3_vect_s16_mul_prepare(&a_exp, &b_shr, b_exp, c_exp, b_hr, c_hr); exponent_t desired_exp = ...; // Value known a priori b_shr = b_shr + (desired_exp - a_exp); a_exp = desired_exp;
When applying the above adjustment, the following condition should be maintained:
b_hr + b_shr >= 0
Be aware that using smaller values than strictly necessary for
b_shr
can result in saturation, and using larger values may result in unnecessary underflows or loss of precision.Also, if a larger exponent is used than necessary, a larger
depth
parameter (see xs3_vect_s16_sqrt()) will be required to achieve the same precision, as the results are computed bit by bit, starting with the most significant bit.
See also
- Parameters
a_exp – [out] Exponent of outputs of xs3_vect_s16_sqrt()
b_shr – [out] Right-shift to be applied to elements of \(\bar b\)
b_exp – [in] Exponent of vector{b}
b_hr – [in] Headroom of vector{b}
-
xs3_vect_complex_s16_add_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s16_add()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s16_add()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_complex_s16_add_scalar_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s16_add_scalar()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s16_add_scalar()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_complex_s16_conj_mul_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s16_conj_mul()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s16_conj_mul()
is identical to that forxs3_vect_complex_s16_mul()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_complex_s16_nmacc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_complex_s16_nmacc().
The logic for computing the shifts and exponents of
xs3_vect_complex_s16_nmacc()
is identical to that forxs3_vect_complex_s16_macc()
.This macro is provided as a convenience to developers and to make the code more readable.
-
xs3_vect_complex_s16_conj_macc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_complex_s16_conj_macc().
The logic for computing the shifts and exponents of
xs3_vect_complex_s16_conj_macc()
is identical to that forxs3_vect_complex_s16_macc()
.This macro is provided as a convenience to developers and to make the code more readable.
-
xs3_vect_complex_s16_conj_nmacc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_complex_s16_conj_nmacc().
The logic for computing the shifts and exponents of
xs3_vect_complex_s16_conj_nmacc()
is identical to that forxs3_vect_complex_s16_macc()
.This macro is provided as a convenience to developers and to make the code more readable.
-
xs3_vect_complex_s16_mag_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s16_mag()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s16_mag()
is identical to that forxs3_vect_complex_s32_mag()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_complex_s16_real_scale_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_complex_s16_real_scale().
The logic for computing the shifts and exponents of
xs3_vect_complex_s16_real_scale()
is identical to that forxs3_vect_s32_scale()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_complex_s16_scale_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s16_scale()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s16_scale()
is identical to that forxs3_vect_complex_s32_mul()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_complex_s16_sub_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_complex_s16_sub()
.The logic for computing the shifts and exponents of
xs3_vect_complex_s16_sub()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_s16_add_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_s16_add()
.The logic for computing the shifts and exponents of
xs3_vect_s16_add()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_s16_add_scalar_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_s16_add_scalar()
.The logic for computing the shifts and exponents of
xs3_vect_s16_add_scalar()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also
-
xs3_vect_s16_nmacc_prepare¶
Obtain the output exponent and shifts required for a call to xs3_vect_s16_nmacc().
The logic for computing the shifts and exponents of
xs3_vect_s16_nmacc()
is identical to that forxs3_vect_s16_macc_prepare()
.This macro is provided as a convenience to developers and to make the code more readable.
-
xs3_vect_s16_sub_prepare¶
Obtain the output exponent and shifts required for a call to
xs3_vect_s16_sub()
.The logic for computing the shifts and exponents of
xs3_vect_s16_sub()
is identical to that forxs3_vect_s32_add()
.This macro is provided as a convenience to developers and to make the code more readable.
See also