XS3 Scalar Functions¶
-
float xs3_pack_float(const int32_t mantissa, const exponent_t exp)¶
Pack a floating point value into an IEEE 754 single-precision float.
The value returned is the nearest representable approximation to \( m \cdot 2^{p} \) where \(m\) is
mantissa
and \(p\) isexp
.- Example
// Pack -12345678 * 2^{-13} into a float int32_t mant = -12345678; exponent_t exp = -13; float val = xs3_pack_float(mant, exp); printf("%e <-- %ld * 2^(%d)\n", val, mant, exp);
Note
This operation may result in a loss of precision.
- Parameters
mantissa – [in] Mantissa of value to be packed
exp – [in] Exponent of value to be packed
- Returns
float
representation of input value
-
void xs3_unpack_float(int32_t *mantissa, exponent_t *exp, const float input)¶
Unpack an IEEE 754 single-precision float into a 32-bit mantissa and exponent.
- Example
// Unpack 1.52345246 * 10^(-5) float val = 1.52345246e-5; int32_t mant; exponent_t exp; xs3_unpack_float(&mant, &exp, val); printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);
- Parameters
mantissa – [out] Unpacked output mantissa
exp – [out] Unpacked output exponent
input – [in] Float value to be unpacked
-
void xs3_unpack_float_s16(int16_t *mantissa, exponent_t *exp, const float input)¶
Unpack an IEEE 754 single-precision float into a 16-bit mantissa and exponent.
- Example
// Unpack 1.52345246 * 10^(-5) float val = 1.52345246e-5; int16_t mant; exponent_t exp; xs3_unpack_float_s16(&mant, &exp, val); printf("%ld * 2^(%d) <-- %e\n", mant, exp, val);
Note
This operation may result in a loss of precision.
- Parameters
mantissa – [out] Unpacked output mantissa
exp – [out] Unpacked output exponent
input – [in] Float value to be unpacked
-
int32_t xs3_scalar_s64_to_s32(exponent_t *a_exp, const int64_t b, const exponent_t b_exp)¶
Convert a 64-bit floating-point scalar to a 32-bit floating-point scalar.
Converts a 64-bit floating-point scalar, represented by the 64-bit mantissa
b
and exponentb_exp
, into a 32-bit floating-point scalar, represented by the 32-bit returned mantissa and output exponenta_exp
.- Parameters
a_exp – [out] Output exponent
b – [in] 64-bit input mantissa
b_exp – [in] Input exponent
- Returns
32-bit output mantissa
-
int16_t xs3_scalar_s32_to_s16(exponent_t *a_exp, const int32_t b, const exponent_t b_exp)¶
Convert a 32-bit floating-point scalar to a 16-bit floating-point scalar.
Converts a 32-bit floating-point scalar, represented by the 32-bit mantissa
b
and exponentb_exp
, into a 16-bit floating-point scalar, represented by the 16-bit returned mantissa and output exponenta_exp
.- Parameters
a_exp – [out] Output exponent
b – [in] 32-bit input mantissa
b_exp – [in] Input exponent
- Returns
16-bit output mantissa
-
int32_t xs3_scalar_s16_to_s32(exponent_t *a_exp, const int16_t b, const exponent_t b_exp, const unsigned remove_hr)¶
Convert a 16-bit floating-point scalar to a 32-bit floating-point scalar.
Converts a 16-bit floating-point scalar, represented by the 16-bit mantissa
b
and exponentb_exp
, into a 32-bit floating-point scalar, represented by the 32-bit returned mantissa and output exponenta_exp
.remove_hr
, if nonzero, indicates that the output mantissa should have no headroom. Otherwise, the output mantissa will be the same as the input mantissa.- Parameters
a_exp – [out] Output exponent
b – [in] 16-bit input mantissa
b_exp – [in] Input exponent
remove_hr – [in] Whether to remove headroom in output
- Returns
32-bit output mantissa
-
int32_t xs3_s32_sqrt(exponent_t *a_exp, const int32_t b, const exponent_t b_exp, const unsigned depth)¶
Compute the square root of a 32-bit floating-point scalar.
b
andb_exp
together represent the input \(b \cdot 2^{b\_exp}\). Likewise,a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\).depth
indicates the number of MSb’s which will be calculated. Smaller values here will execute more quickly at the cost of reduced precision. The maximum valid value fordepth
is XS3_S32_SQRT_MAX_DEPTH.- Operation Performed:
- \[\begin{align*} a \cdot 2^{a\_exp} \leftarrow \sqrt{\left( b \cdot 2^{b\_exp} \right)} \end{align*}\]
- Parameters
a_exp – [out] Output exponent \(a\_exp\)
b – [in] Input mantissa \(b\)
b_exp – [in] Input exponent \(b\_exp\)
depth – [in] Number of most significant bits to calculate
- Returns
Output mantissa \(a\)
-
int32_t xs3_s32_inverse(exponent_t *a_exp, const int32_t b)¶
Compute the inverse of a 32-bit integer.
b
represents the integer \(b\).a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\).- Operation Performed:
- \[\begin{align*} a \cdot 2^{a\_exp} \leftarrow \frac{1}{b} \end{align*}\]
- Parameters
a_exp – [out] Output exponent \(a\_exp\)
b – [in] Input integer \(b\)
- Returns
Output mantissa \(a\)
-
int16_t xs3_s16_inverse(exponent_t *a_exp, const int16_t b)¶
Compute the inverse of a 16-bit integer.
b
represents the integer \(b\).a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\).- Operation Performed:
- \[\begin{align*} a \cdot 2^{a\_exp} \leftarrow \frac{1}{b} \end{align*}\]
- Parameters
a_exp – [out] Output exponent \(a\_exp\)
b – [in] Input integer \(b\)
- Returns
Output mantissa \(a\)
-
int16_t xs3_s16_mul(exponent_t *a_exp, const int16_t b, const int16_t c, const exponent_t b_exp, const exponent_t c_exp)¶
Compute the product of two 16-bit floating-point scalars.
a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\).b
andb_exp
together represent the result \(b \cdot 2^{b\_exp}\).c
andc_exp
together represent the result \(c \cdot 2^{c\_exp}\).- Operation Performed:
- \[\begin{align*} a \cdot 2^{a\_exp} \leftarrow \left( b\cdot 2^{b\_exp} \right) \cdot \left( c\cdot 2^{c\_exp} \right) \end{align*}\]
- Parameters
a_exp – [out] Output exponent \(a\_exp\)
b – [in] First input mantissa \(b\)
c – [in] Second input mantissa \(c\)
b_exp – [in] First input exponent \(b\_exp\)
c_exp – [in] Second input exponent \(c\_exp\)
- Returns
Output mantissa \(a\)
-
int32_t xs3_s32_mul(exponent_t *a_exp, const int32_t b, const int32_t c, const exponent_t b_exp, const exponent_t c_exp)¶
Compute the product of two 32-bit floating-point scalars.
a
anda_exp
together represent the result \(a \cdot 2^{a\_exp}\).b
andb_exp
together represent the result \(b \cdot 2^{b\_exp}\).c
andc_exp
together represent the result \(c \cdot 2^{c\_exp}\).- Operation Performed:
- \[\begin{align*} a \cdot 2^{a\_exp} \leftarrow \left( b\cdot 2^{b\_exp} \right) \cdot \left( c\cdot 2^{c\_exp} \right) \end{align*}\]
- Parameters
a_exp – [out] Output exponent \(a\_exp\)
b – [in] First input mantissa \(b\)
c – [in] Second input mantissa \(c\)
b_exp – [in] First input exponent \(b\_exp\)
c_exp – [in] Second input exponent \(c\_exp\)
- Returns
Output mantissa \(a\)
-
static inline unsigned ceil_log2(unsigned N)¶
Get the size of a 32-bit unsigned number.
This function reports the size of the number as \(a\), the number of bits required to store unsigned integer \(N\). This is equivalent to \( ceil\left(log_2\left(N\right)\right) \).
N
is the input \(N\).- Operation Performed:
- \[\begin{split}\begin{align*} a \leftarrow \begin{cases} & 0 && N = 0 \\ & \lceil log_2\left( N \right) \rceil && otherwise \end{cases} \end{align*}\end{split}\]
- Parameters
N – [in] Number to get the size of
- Returns
Number of bits \(a\) required to store \(N\)
-
XS3_S32_SQRT_MAX_DEPTH¶
Maximum bit-depth to calculate with xs3_s32_sqrt().