Getting Started¶
Overview¶
lib_xs3_math
has a two layer API model. The upper layer is a block floating-point (BFP) API in which many details of operations being performed are hidden from the user. The lower layer, called the low-level API, stays much closer to the metal and requires that some care be taken to avoid conditions like arithmetic saturation or underflow. The BFP API calls the low-level API, which is where the bulk of the work is actually performed.
BFP API¶
In the high-level API the BFP vectors C structures such as bfp_s16_t
, bfp_s32_t
, or bfp_complex_s32_t
, backed by a memory buffer. These objects contain a pointer to the data carrying the content (mantissas) of the vector, as well as information about the length, headroom and exponent of the BFP vector.
Below is the definition of bfp_s32_t
from xs3_math_types.h.
C_TYPE
typedef struct {
/** Pointer to the underlying element buffer.*/
int32_t* data;
/** Exponent associated with the vector. */
exponent_t exp;
/** Current headroom in the ``data[]`` */
headroom_t hr;
/** Current size of ``data[]``, expressed in elements */
unsigned length;
/** BFP vector flags. Users should not normally modify these manually. */
bfp_flags_e flags;
} bfp_s32_t;
The functions in 32-Bit Block Floating-Point Functions take bfp_s32_t
references as input and output parameters.
Functions in the BFP API generally are prefixed with bfp_
.
Initializing BFP Vectors¶
Before calling these functions, the BFP vectors represented by the arguments must be initialized. For bfp_s32_t
this
is accomplished with bfp_s32_init()
.
#define LEN (20)
//The object representing the BFP vector
bfp_s32_t bfp_vect;
// buffer backing bfp_vect
int32_t data_buffer[LEN];
for(int i = 0; i < LEN; i++) data_buffer[i] = i;
// The initial exponent associated with bfp_vect
exponent_t initial_exponent = 0;
// If non-zero, ``bfp_s32_init()`` will compute the headroom currently present in data_buffer.
// Otherwise, headroom is initialized to 0 (which is always safe but may not be optimal)
unsigned calculate_headroom = 1;
// Initialize the vector object
bfp_s32_init(&bfp_vec, data_buffer, initial_exponent, LEN, calculate_headroom);
// Go do stuff with bfp_vect
...
Once initialized, the exponent and mantissas of the vector can be accessed by bfp_vect->exp
and bfp_vect->data[]
respectively, with the logical (floating-point) value of element k
being given by ldexp(bfp_vect->data[k], bfp_vect->exp)
.
BFP Arithmetic Functions¶
The following snippet shows a function foo()
which takes 3 BFP vectors, a
, b
and c
, as arguments. It multiplies together a
and b
element-wise, and then subtracts c
from the product. In this example both operations are performed in-place on a
. (See bfp_s32_mul()
and bfp_s32_sub()
for more information about those functions)
void foo(bfp_s32_t* a, const bfp_s32_t* b, const bfp_s32_t* c)
{
// Multiply together a and b, updating a with the result.
bfp_s32_mul(a, a, b);
// Subtract c from the product, again updating a with the result.
bfp_s32_sub(a, a, c);
}
The caller of foo()
can then access the results through a
. Note that the pointer a->data
was not modified during this call.
Low-level API¶
The functions in the low-level API are optimized for performance. They do very little to protect the user from mangling their data by arithmetic saturation/overflows or underflows. Functions in the low-level API are generally prefixed with xs3_
.
As an example of a function from the low-level API, see xs3_vect_s32_mul()
from xs3_vect_s32.h
, which multiplies together two int32_t
vectors element by element.
C_API
headroom_t xs3_vect_s32_mul(
int32_t a[],
const int32_t b[],
const int32_t c[],
const unsigned length,
const right_shift_t b_shr,
const right_shift_t c_shr);
This function takes two int32_t
arrays, b
and c
, as inputs and one int32_t
array, a
, as output. length
indicates the number of elements in each array. The final two parameters, b_shr
and c_shr
, are the arithmetic right-shifts applied to each element of b
and c
before they are multiplied together.
Why the right-shifts? This reflects details of the XS3 instructions which target the VPU. With the XS3 VPU, multiplications of 32-bit numbers always include a compulsory (rounding) right-shift by 30 bits. So, to multiply two vectors element-wise with managed precision, the inputs must be shifted before multiplication to ensure the results are scaled as desired.
Contrast this with xs3_vect_s16_mul()
:
C_API
headroom_t xs3_vect_s16_mul(
int16_t a[],
const int16_t b[],
const int16_t c[],
const unsigned length,
const right_shift_t a_shr);
The parameters are similar here, but instead of b_shr
and c_shr
, there’s only an a_shr
. This reflects the fact that products of 16-bit numbers can be accumulated without a compulsory right-shift, and so there is no risk of losing information by multiplying. Instead, a single right-shift can be applied to the 32-bit product to correctly scale theresult.
Both xs3_vect_s32_mul()
and xs3_vect_s16_mul()
return the headroom of the output vector a
.
Functions in the low-level API are in many cases closely tied to the instruction set architecture for XS3. As such, when more efficient algorithms are found to perform an operation these functions are more likely to change.