# XS3 Mixed-Depth Vector Functions¶

void xs3_vect_s32_to_s16(int16_t a[], const int32_t b[], const unsigned length, const right_shift_t b_shr)

Convert a 32-bit vector to a 16-bit vector.

This function converts a 32-bit mantissa vector $$\bar b$$ into a 16-bit mantissa vector $$\bar a$$. Conceptually, the output BFP vector $$\bar{a}\cdot 2^{a\_exp}$$ represents the same values as the input BFP vector $$\bar{b}\cdot 2^{b\_exp}$$, only with a reduced bit-depth.

In most cases $$b\_shr$$ should be $$16 - b\_hr$$, where $$b\_hr$$ is the headroom of the 32-bit input mantissa vector $$\bar b$$.

The output exponent $$a\_exp$$ will be given by

$$a\_exp = b\_exp + b\_shr$$

Parameter Details

a[] represents the 16-bit output mantissa vector $$\bar a$$.

b[] represents the 32-bit input mantissa vector $$\bar b$$.

a[] and b[] must each begin at a word-aligned address.

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of $$\bar b$$.

Operation Performed:

\begin{split}\begin{align*} & a_k \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}

Block Floating-Point

If $$\bar b$$ are the 32-bit mantissas of a BFP vector $$\bar{b} \cdot 2^{b\_exp}$$, then the resulting vector $$\bar a$$ are the 16-bit mantissas of BFP vector $$\bar{a} \cdot 2^{a\_exp}$$, where $$a\_exp = b\_exp + b\_shr$$.

Parameters
• a[out] Output vector $$\bar a$$

• b[in] Input vector $$\bar b$$

• length[in] Number of elements in vectors $$\bar a$$ and $$\bar b$$

• b_shr[in] Right-shift appled to $$\bar b$$

Throws

ET_LOAD_STORE – Raised if a or b is not word-aligned (See Note: Vector Alignment)

void xs3_vect_s16_to_s32(int32_t a[], const int16_t b[], const unsigned length)

Convert a 16-bit vector to a 32-bit vector.

a[] represents the 32-bit output vector $$\bar a$$.

b[] represents the 16-bit input vector $$\bar b$$.

Each vector must begin at a word-aligned address.

length is the number of elements in each of the vectors.

Operation Performed:

\begin{split}\begin{align*} & a_k \leftarrow b_k \cdot 2^{8} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}

Block Floating-Point

If $$\bar b$$ are the mantissas of BFP vector $$\bar{b} \cdot 2^{b\_exp}$$, then the resulting vector $$\bar a$$ are the 32-bit mantissas of BFP vector $$\bar{a} \cdot 2^{a\_exp}$$. If $$a\_exp = b\_exp - 8$$, then this operation has effectively not changed the values represented.

Notes

• The multiplication by $$2^8$$ is an artifact of the VPU’s behavior. It turns out to be significantly more efficient to include the factor of $$2^8$$. If this is unwanted, xs3_vect_s32_shr() can be used with a b_shr value of 8 to remove the scaling afterwards.

• The headroom of output vector $$\bar a$$ is not returned by this function. The headroom of the output is always 8 bits greater than the headroom of the input.

Parameters
• a[out] 32-bit output vector $$\bar a$$

• b[in] 16-bit input vector $$\bar b$$

• length[in] Number of elements in vectors $$\bar a$$ and $$\bar b$$

Throws

ET_LOAD_STORE – Raised if a or b is not word-aligned (See Note: Vector Alignment)

void xs3_vect_complex_s32_to_complex_s16(int16_t a_real[], int16_t a_imag[], const complex_s32_t b[], const unsigned length, const right_shift_t b_shr)

Convert a complex 32-bit vector into a complex 16-bit vector.

This function converts a complex 32-bit mantissa vector $$\bar b$$ into a complex 16-bit mantissa vector $$\bar a$$. Conceptually, the output BFP vector $$\bar{a}\cdot 2^{a\_exp}$$ represents the same value as the input BFP vector $$\bar{b}\cdot 2^{b\_exp}$$, only with a reduced bit-depth.

In most cases $$b\_shr$$ should be $$16 - b\_hr$$, where $$b\_hr$$ is the headroom of the 32-bit input mantissa vector $$\bar b$$. The output exponent $$a\_exp$$ will then be given by

$$a\_exp = b\_exp + b\_shr$$

Parameter Details

a_real[] and a_imag[] together represent the complex 16-bit output mantissa vector $$\bar a$$, with the real part of each $$a_k$$ going in a_real[] and the imaginary part going in a_imag[].

b[] represents the complex 32-bit mantissa vector $$\bar b$$.

a_real[], a_imag[] and b[] must each begin at a word-aligned address.

length is the number of elements in each of the vectors.

b_shr is the signed arithmetic right-shift applied to elements of $$\bar b$$.

Operation Performed:

\begin{split}\begin{align*} & b_k' \leftarrow sat_{16}(\lfloor b_k \cdot 2^{-b\_shr} \rfloor) \\ & Re\{a_k\} \leftarrow Re\{b_k'\} \\ & Im\{a_k\} \leftarrow Im\{b_k'\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}

Block Floating-Point

If $$\bar b$$ are the complex 32-bit mantissas of a BFP vector $$\bar{b} \cdot 2^{b\_exp}$$, then the resulting vector $$\bar a$$ are the complex 16-bit mantissas of BFP vector $$\bar{a} \cdot 2^{a\_exp}$$, where $$a\_exp = b\_exp + b\_shr$$.

Parameters
• a_real[out] Real part of complex output vector $$\bar a$$.

• a_imag[out] Imaginary part of complex output vector $$\bar a$$.

• b[in] Complex input vector $$\bar b$$.

• length[in] Number of elements in vectors $$\bar a$$ and $$\bar b$$

• b_shr[in] Right-shift appled to $$\bar b$$.

Throws

ET_LOAD_STORE – Raised if a_real, a_imag or b are not word-aligned (See Note: Vector Alignment)

void xs3_vect_complex_s16_to_complex_s32(complex_s32_t a[], const int16_t b_real[], const int16_t b_imag[], const unsigned length)

Convert a complex 16-bit vector into a complex 32-bit vector.

a[] represents the complex 32-bit output vector $$\bar a$$. It must begin at a double word (8-byte) aligned address.

b_real[] and b_imag[] together represent the complex 16-bit input mantissa vector $$\bar b$$. Each $$Re\{b_k\}$$ is b_real[k], and each $$Im\{b_k\}$$ is b_imag[k].

The parameter length is the number of elements in each of the vectors.

length is the number of elements in each of the vectors.

Operation Performed:

\begin{split}\begin{align*} & Re\{a_k\} \leftarrow Re\{b_k\} \\ & Im\{a_k\} \leftarrow Im\{b_k\} \\ & \qquad\text{ for }k\in 0\ ...\ (length-1) \end{align*}\end{split}

Block Floating-Point

If $$\bar b$$ are the complex 16-bit mantissas of a BFP vector $$\bar{b} \cdot 2^{b\_exp}$$, then the resulting vector $$\bar a$$ are the complex 32-bit mantissas of BFP vector $$\bar{a} \cdot 2^{a\_exp}$$, where $$a\_exp = b\_exp$$.

Notes

• The headroom of output vector $$\bar a$$ is not returned by this function. The headroom of the output is always 16 bits greater than the headroom of the input.

Parameters
• a[out] Complex output vector $$\bar a$$.

• b_real[in] Real part of complex input vector $$\bar b$$.

• b_imag[in] Imaginary part of complex input vector $$\bar b$$.

• length[in] Number of elements in vectors $$\bar a$$ and $$\bar b$$

Throws

ET_LOAD_STORE – Raised if a is not double-word-aligned (See Note: Vector Alignment)

void xs3_vect_s16_extract_high_byte(int8_t a[], const int16_t b[], const unsigned len)

Extract an 8-bit vector containing the most significant byte of a 16-bit vector.

This is a utility function used, for example, in optimizing mixed-width products. The most significant byte of each element is extracted (without rounding or saturation) and inserted into the output vector.

Parameters
• a[out] 8-bit output vector $$\bar a$$

• b[in] 16-bit input vector $$\bar b$$

• len[in] The number of elements in $$\bar a$$ and $$\bar b$$

Throws

ET_LOAD_STORE – Raised if a or b is not word-aligned (See Note: Vector Alignment)

void xs3_vect_s16_extract_low_byte(int8_t a[], const int16_t b[], const unsigned len)

Extract an 8-bit vector containing the least significant byte of a 16-bit vector.

This is a utility function used, for example, in optimizing mixed-width products. The least significant byte of each element is extracted (without rounding or saturation) and inserted into the output vector.

Parameters
• a[out] 8-bit output vector $$\bar a$$

• b[in] 16-bit input vector $$\bar b$$

• len[in] The number of elements in $$\bar a$$ and $$\bar b$$

Throws

ET_LOAD_STORE – Raised if a or b is not word-aligned (See Note: Vector Alignment)

void xs3_mat_mul_s8_x_s8_yield_s32(xs3_split_acc_s32_t accumulators[], const int8_t matrix[], const int8_t input_vect[], const unsigned M_rows, const unsigned N_cols)

Multiply-accumulate an 8-bit matrix by an 8-bit vector into 32-bit accumulators.

This function multiplies an 8-bit $$M \times N$$ matrix $$\bar W$$ by an 8-bit $$N$$-element column vector $$\bar v$$ and adds it to the 32-bit accumulator vector $$\bar a$$.

accumulators is the output vector $$\bar a$$ to which the product $$\bar W\times\bar v$$ is accumulated. Note that the accumulators are encoded in a format native to the XS3 VPU. To initialize the accumulator vector to zeros, just zero the memory.

matrix is the matrix $$\bar W$$.

input_vect is the vector $$\bar v$$.

matrix and input_vect must both begin at a word-aligned offsets.

M_rows and N_rows are the dimensions $$M$$ and $$N$$ of matrix $$\bar W$$. $$M$$ must be a multiple of 16, and $$N$$ must be a multiple of 32.

The result of this multiplication is exact, so long as saturation does not occur.

Parameters
• accumulators[inout] The accumulator vector $$\bar a$$

• matrix[in] The weight matrix $$\bar W$$

• input_vect[in] The input vector $$\bar v$$

• M_rows[in] The number of rows $$M$$ in matrix $$\bar W$$

• N_cols[in] The number of columns $$N$$ in matrix $$\bar W$$

Throws

ET_LOAD_STORE – Raised if matrix or input_vect is not word-aligned (See Note: Vector Alignment)

void xs3_mat_mul_s8_x_s16_yield_s32(int32_t output[], const int8_t matrix[], const int16_t input_vect[], const unsigned M_rows, const unsigned N_cols, int8_t scratch[])

Multiply an 8-bit matrix by a 16-bit vetor for a 32-bit result vector.

This function multiplies an 8-bit $$M \times N$$ matrix $$\bar W$$ by a 16-bit $$N$$-element column vector $$\bar v$$ and returns the result as a 32-bit $$M$$-element vector $$\bar a$$.

output is the output vector $$\bar a$$.

matrix is the matrix $$\bar W$$.

input_vect is the vector $$\bar v$$.

matrix and input_vect must both begin at a word-aligned offsets.

M_rows and N_rows are the dimensions $$M$$ and $$N$$ of matrix $$\bar W$$. $$M$$ must be a multiple of 16, and $$N$$ must be a multiple of 32.

scratch is a pointer to a word-aligned buffer that this function may use to store intermediate results. This buffer must be at least $$N$$ bytes long.

The result of this multiplication is exact, so long as saturation does not occur.

Parameters
• output[inout] The output vector $$\bar a$$

• matrix[in] The weight matrix $$\bar W$$

• input_vect[in] The input vector $$\bar v$$

• M_rows[in] The number of rows $$M$$ in matrix $$\bar W$$

• N_cols[in] The number of columns $$N$$ in matrix $$\bar W$$

• scratch[in] Scratch buffer required by this function.

Throws

ET_LOAD_STORE – Raised if matrix or input_vect is not word-aligned (See Note: Vector Alignment)

unsigned xs3_vect_sXX_add_scalar(int32_t a[], const int32_t b[], const unsigned length_bytes, const int32_t c, const int32_t d, const right_shift_t b_shr, const unsigned mode_bits)

Add a scalar to a vector.

Add a scalar to a vector. This works for 8, 16 or 32 bits, real or complex.

length_bytes is the total number of bytes to be output. So, for 16-bit vectors, length_bytes is twice the number of elements, whereas for complex 32-bit vectors, length_bytes is 8 times the number of elements.

c and d are the values that populate the internal buffer to be added to the input vector as follows: Internally an 8 word (32 byte) buffer is allocated (on the stack). Even-indexed words are populated with c and odd-indexed words are populated with d. For real vectors, c and d should be the same value &#8212; the reason for d is to allow this same function to work for complex 32-bit vectors. This also means that for 16-bit vectors, the value to be added needs to be duplicated in both the higher 2 bytes and lower 2 bytes of the word.

mode_bits should be 0x0000 for 32-bit mode, 0x0100 for 16-bit mode or 0x0200 for 8-bit mode.