32-Bit Vector Chunk (8-Element) API#

group chunk32_api


int32_t chunk_s32_dot(const int32_t b[VPU_INT32_EPV], const q2_30 c[VPU_INT32_EPV])#

Compute the inner product between two vector chunks.

This function computes the inner product of two vector chunks, \(\bar b\) and \(\bar c\).

Conceptually, elements of \(\bar b\) may have any number of fractional bits (int, fixed-point, mantissas of a BFP vector) so long as they’re all the same. Elements of \(\bar c\) are Q2.30 fixed-point values. Given that, the returned value \(a\) will have the same number of fractional bits as \(\bar b\).

Only the lowest 32 bits of the sum \(a\) are returned.

Operation Performed

\[\begin{aligned} & a \leftarrow \sum_{k=0}^{\mathtt{VPU\_INT32\_EPV}-1} \left( round\left( \frac{b_k\cdot{}c_k}{2^{30}} \right) \right) \end{aligned}\]

  • b[in] Input chunk \(\bar b\)

  • c[in] Input chunk \(\bar c\)



void chunk_s32_log(q8_24 a[VPU_INT32_EPV], const int32_t b[VPU_INT32_EPV], const exponent_t b_exp)#

Compute the natural log of a vector chunk of 32-bit values.

This function computes the natural logarithm of each of the 8 elements in vector chunk \(\bar b\). The result is returned as an 8-element chunk \(\bar a\) of Q8.24 values.

b_exp is the exponent associated with elements of \(\bar b\).

Any input \(b_k \le 0\) will result in a corresponding output \(a_k = \mathtt{INT32_MIN}\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \ \begin{cases} log(b_k\cdot{}2^{\mathtt{b\_exp}}) & b_k > 0 \\ \mathtt{INT32\_MIN} & \text{otherwise} \\ \end{cases} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)

  • b_exp[in] Exponent associated with \(\bar b\)


Raised if b or a is not double word-aligned (See Note: Vector Alignment)

void chunk_float_s32_log(q8_24 a[VPU_INT32_EPV], const float_s32_t b[VPU_INT32_EPV])#

Compute the natural log of a vector chunk of float_s32_t.

This function computes the natural logarithm of each of the VPU_INT32_EPV elements in vector chunk \(\bar b\). The result is returned as an 8-element chunk \(\bar a\) of Q8.24 values.

Any input \(b_k \le 0\) will result in a corresponding output \(a_k = \mathtt{INT32_MIN}\).

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow \ \begin{cases} log(b_k) & b_k > 0 \\ \mathtt{INT32\_MIN} & \text{otherwise} \\ \end{cases} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)


Raised if b or a is not double word-aligned (See Note: Vector Alignment)

void chunk_q30_power_series(int32_t a[VPU_INT32_EPV], const q2_30 b[VPU_INT32_EPV], const int32_t c[], const unsigned term_count)#

Compute a power series on a vector chunk of Q2.30 values.

This function is used to compute a power series summation on a vector chunk (VPU_INT32_EPV-element vector) \(\bar b\). \(\bar b\) contains Q2.30 values. \(\bar c\) is a vector containing coefficients to be multiplied by powers of \(\bar b\), and may have any associated exponent. The output is vector chunk \(\bar a\) and has the same exponent as \(\bar c\).

c[] is an array with shape (term_count, VPU_INT32_EPV), where the second axis contains the same value replicated across all VPU_INT32_EPV elements. That is, c[k][i] = c[k][j] for i and j in 0..(VPU_INT32_EPV-1). This is for performance reasons. (For the purpose of this explanation, \(\bar c\) is considered to be single-dimensional, without redundancy.)

Operation Performed

\[\begin{split}\begin{aligned} & b_{k,0} = 2^{30} \\ & b_{k,i} = round\left(\frac{b_{k,i-1}\cdot{}b_k}{2^{30}}\right) \\ & \qquad\text{for }i \in {1..(N-1)} \\ & a_k \leftarrow \sum_{i=0}^{N-1} round\left( \frac{b_{k,i}\cdot c_i}{2^{30}} \right) \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}-1} \end{aligned}\end{split}\]

  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)

  • c[in] Coefficient vector \(\bar c\)

  • term_count[in] Number of power series terms, \(N\)

void chunk_q30_exp_small(q2_30 a[VPU_INT32_EPV], const q2_30 b[VPU_INT32_EPV])#

Compute \(e^b\) on a vector chunk of Q2.30 values.

This function computes \(e^{b_k}\) for each element of a vector chunk (VPU_INT32_EPV-element vector) \(\bar b\) of Q2.30 values near \(0\). The result is computed using the power series approximation of \(e^x\) near zero. It is recommended that this function only be used for \( -0.5 \le b_k\cdot{}2^{-30} \le 0.5\).

The output vector chunk \(\bar a\) is also in a Q2.30 format.

Operation Performed

\[\begin{split}\begin{aligned} & a_k \leftarrow e^{b_k\cdot{}2^{-30}} \\ & \qquad\text{for }k \in {0..\mathtt{VPU\_INT32\_EPV}} \end{aligned}\end{split}\]

  • a[out] Output vector chunk \(\bar a\)

  • b[in] Input vector chunk \(\bar b\)