# XS3 Scalar Functions¶

float xs3_pack_float(const int32_t mantissa, const exponent_t exp)

Pack a floating point value into an IEEE 754 single-precision float.

The value returned is the nearest representable approximation to $$m \cdot 2^{p}$$ where $$m$$ is mantissa and $$p$$ is exp.

Example

// Pack -12345678 * 2^{-13} into a float
int32_t mant = -12345678;
exponent_t exp = -13;
float val = xs3_pack_float(mant, exp);

printf("%e <-- %ld * 2^(%d)\n", val, mant, exp);


Note

This operation may result in a loss of precision.

Parameters
• mantissa[in] Mantissa of value to be packed

• exp[in] Exponent of value to be packed

Returns

float representation of input value

void xs3_unpack_float(int32_t *mantissa, exponent_t *exp, const float input)

Unpack an IEEE 754 single-precision float into a 32-bit mantissa and exponent.

Example

// Unpack 1.52345246 * 10^(-5)
float val = 1.52345246e-5;
int32_t mant;
exponent_t exp;
xs3_unpack_float(&mant, &exp, val);

printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);


Parameters
• mantissa[out] Unpacked output mantissa

• exp[out] Unpacked output exponent

• input[in] Float value to be unpacked

void xs3_unpack_float_s16(int16_t *mantissa, exponent_t *exp, const float input)

Unpack an IEEE 754 single-precision float into a 16-bit mantissa and exponent.

Example

// Unpack 1.52345246 * 10^(-5)
float val = 1.52345246e-5;
int16_t mant;
exponent_t exp;
xs3_unpack_float_s16(&mant, &exp, val);

printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);


Note

This operation may result in a loss of precision.

Parameters
• mantissa[out] Unpacked output mantissa

• exp[out] Unpacked output exponent

• input[in] Float value to be unpacked

int32_t xs3_scalar_s64_to_s32(exponent_t *a_exp, const int64_t b, const exponent_t b_exp)

Convert a 64-bit floating-point scalar to a 32-bit floating-point scalar.

Converts a 64-bit floating-point scalar, represented by the 64-bit mantissa b and exponent b_exp, into a 32-bit floating-point scalar, represented by the 32-bit returned mantissa and output exponent a_exp.

Parameters
• a_exp[out] Output exponent

• b[in] 64-bit input mantissa

• b_exp[in] Input exponent

Returns

32-bit output mantissa

int16_t xs3_scalar_s32_to_s16(exponent_t *a_exp, const int32_t b, const exponent_t b_exp)

Convert a 32-bit floating-point scalar to a 16-bit floating-point scalar.

Converts a 32-bit floating-point scalar, represented by the 32-bit mantissa b and exponent b_exp, into a 16-bit floating-point scalar, represented by the 16-bit returned mantissa and output exponent a_exp.

Parameters
• a_exp[out] Output exponent

• b[in] 32-bit input mantissa

• b_exp[in] Input exponent

Returns

16-bit output mantissa

int32_t xs3_scalar_s16_to_s32(exponent_t *a_exp, const int16_t b, const exponent_t b_exp, const unsigned remove_hr)

Convert a 16-bit floating-point scalar to a 32-bit floating-point scalar.

Converts a 16-bit floating-point scalar, represented by the 16-bit mantissa b and exponent b_exp, into a 32-bit floating-point scalar, represented by the 32-bit returned mantissa and output exponent a_exp.

remove_hr, if nonzero, indicates that the output mantissa should have no headroom. Otherwise, the output mantissa will be the same as the input mantissa.

Parameters
• a_exp[out] Output exponent

• b[in] 16-bit input mantissa

• b_exp[in] Input exponent

• remove_hr[in] Whether to remove headroom in output

Returns

32-bit output mantissa

int32_t xs3_s32_sqrt(exponent_t *a_exp, const int32_t b, const exponent_t b_exp, const unsigned depth)

Compute the square root of a 32-bit floating-point scalar.

b and b_exp together represent the input $$b \cdot 2^{b\_exp}$$. Likewise, a and a_exp together represent the result $$a \cdot 2^{a\_exp}$$.

depth indicates the number of MSb’s which will be calculated. Smaller values here will execute more quickly at the cost of reduced precision. The maximum valid value for depth is XS3_S32_SQRT_MAX_DEPTH.

Operation Performed:

\begin{align*} a \cdot 2^{a\_exp} \leftarrow \sqrt{\left( b \cdot 2^{b\_exp} \right)} \end{align*}

Parameters
• a_exp[out] Output exponent $$a\_exp$$

• b[in] Input mantissa $$b$$

• b_exp[in] Input exponent $$b\_exp$$

• depth[in] Number of most significant bits to calculate

Returns

Output mantissa $$a$$

int32_t xs3_s32_inverse(exponent_t *a_exp, const int32_t b)

Compute the inverse of a 32-bit integer.

b represents the integer $$b$$. a and a_exp together represent the result $$a \cdot 2^{a\_exp}$$.

Operation Performed:

\begin{align*} a \cdot 2^{a\_exp} \leftarrow \frac{1}{b} \end{align*}

Parameters
• a_exp[out] Output exponent $$a\_exp$$

• b[in] Input integer $$b$$

Returns

Output mantissa $$a$$

int16_t xs3_s16_inverse(exponent_t *a_exp, const int16_t b)

Compute the inverse of a 16-bit integer.

b represents the integer $$b$$. a and a_exp together represent the result $$a \cdot 2^{a\_exp}$$.

Operation Performed:

\begin{align*} a \cdot 2^{a\_exp} \leftarrow \frac{1}{b} \end{align*}

Parameters
• a_exp[out] Output exponent $$a\_exp$$

• b[in] Input integer $$b$$

Returns

Output mantissa $$a$$

int16_t xs3_s16_mul(exponent_t *a_exp, const int16_t b, const int16_t c, const exponent_t b_exp, const exponent_t c_exp)

Compute the product of two 16-bit floating-point scalars.

a and a_exp together represent the result $$a \cdot 2^{a\_exp}$$.

b and b_exp together represent the result $$b \cdot 2^{b\_exp}$$.

c and c_exp together represent the result $$c \cdot 2^{c\_exp}$$.

Operation Performed:

\begin{align*} a \cdot 2^{a\_exp} \leftarrow \left( b\cdot 2^{b\_exp} \right) \cdot \left( c\cdot 2^{c\_exp} \right) \end{align*}

Parameters
• a_exp[out] Output exponent $$a\_exp$$

• b[in] First input mantissa $$b$$

• c[in] Second input mantissa $$c$$

• b_exp[in] First input exponent $$b\_exp$$

• c_exp[in] Second input exponent $$c\_exp$$

Returns

Output mantissa $$a$$

int32_t xs3_s32_mul(exponent_t *a_exp, const int32_t b, const int32_t c, const exponent_t b_exp, const exponent_t c_exp)

Compute the product of two 32-bit floating-point scalars.

a and a_exp together represent the result $$a \cdot 2^{a\_exp}$$.

b and b_exp together represent the result $$b \cdot 2^{b\_exp}$$.

c and c_exp together represent the result $$c \cdot 2^{c\_exp}$$.

Operation Performed:

\begin{align*} a \cdot 2^{a\_exp} \leftarrow \left( b\cdot 2^{b\_exp} \right) \cdot \left( c\cdot 2^{c\_exp} \right) \end{align*}

Parameters
• a_exp[out] Output exponent $$a\_exp$$

• b[in] First input mantissa $$b$$

• c[in] Second input mantissa $$c$$

• b_exp[in] First input exponent $$b\_exp$$

• c_exp[in] Second input exponent $$c\_exp$$

Returns

Output mantissa $$a$$

static inline unsigned ceil_log2(unsigned N)

Get the size of a 32-bit unsigned number.

This function reports the size of the number as $$a$$, the number of bits required to store unsigned integer $$N$$. This is equivalent to $$ceil\left(log_2\left(N\right)\right)$$.

N is the input $$N$$.

Operation Performed:

\begin{split}\begin{align*} a \leftarrow \begin{cases} & 0 && N = 0 \\ & \lceil log_2\left( N \right) \rceil && otherwise \end{cases} \end{align*}\end{split}

Parameters

N[in] Number to get the size of

Returns

Number of bits $$a$$ required to store $$N$$

XS3_S32_SQRT_MAX_DEPTH

Maximum bit-depth to calculate with xs3_s32_sqrt().