XCORE SDK
XCORE Software Development Kit
|
32-bit Discrete-Time Finite Impulse Response (FIR) Filter More...
#include <xs3_filters.h>
Public Attributes | |
unsigned | num_taps |
unsigned | head |
right_shift_t | shift |
int32_t * | coef |
int32_t * | state |
32-bit Discrete-Time Finite Impulse Response (FIR) Filter
This struct represents an N-tap 32-bit discrete-time FIR Filter.
At each time step, the FIR filter consumes a single 32-bit input sample and produces a single 32-bit output sample.
To process a new input sample and compute a new output sample, use xs3_filter_fir_s32(). To add a new input sample to the filter without computing a new output sample, use xs3_filter_fir_s32_add_sample().
An N
-tap FIR filter contains N
32-bit cofficients (pointed to by coef
) and N
words of state data (pointed to by state
. The state data is a vector of the N
most recent input samples. When processing a new input sample at time step t
, x[t]
is the new input sample, x[t-1]
is the previous input sample, and so on, up to x[t-(N-1)]
, which is the oldest input considered when computing the new output sample (see note 1 below). The coefficients form a vector b[]
, where b[k]
is the coefficient by which the k
th oldest input sample is multiplied. There is an additional parameter shift
which scales the output as described below. Both the coefficients and shift
are considered to be constants which do not change after initialization (although nothing should break if they are changed to new valid values).
At time step t
, the output sample y[t]
is computed based on the inner product (i.e. sum of element-wise products) of the coefficients and state data as follows (a more detailed description is below):
acc = x[t-0] * b[0] + x[t-1] * b[1] + x[t-2] * b[2] + ... + x[t-(N-1)] * b[N-1] y[t] = acc >> shift
Importantly, all three of the operators above (addition, multiplication and the rightwards bit-shift) have slightly ideosyncratic meanings.
The products have a built-in rounding arithmetic right-shift of 30 bits, where ties round toward positive infinity. This is a hardware feature which allows for longer filters (larger N
) without sacrificing coefficient precision. These element-wise products accumulate into 8 40-bit accumulators saturate the sums at symmetric 40-bit bounds (see Symmetrically Saturating Arithmetic). The order in which the taps are accumulated is unspecified (see note 2 below).
After each tap has been accumulated, the 8 accumulators are then added together to get a 64-bit penultimate result (with 43 useful bits). Finally, an unsigned rounding arithmetic right-shift of shift
bits is applied to the 64-bit sum, and the final result is saturated to the symmetric 32-bit range (-INT32_MAX
to INT32_MAX
inclusive).
Below is a more detailed description of the operations performed (not including the saturation logic applied by the accumulators).
\[ & y[t] = sat_{32} \left( round \left( \left( \sum_{k=0}^{N-1} round(x[t-k] \cdot b[k] \cdot 2^{-30}) \right) \cdot 2^{-shift} \right) \right) \\ & \qquad\text{where } sat_{32}() \text{ saturates to } \pm(2^{31}-1) \\ & \qquad\text{ and } round() \text{ rounds to the nearest integer, with ties rounding towards } +\!\infty \]
Initialize: A xs3_filter_fir_s32_t
filter is initialized with a call to xs3_filter_fir_s32_init(). The caller supplies information about the filter, including the number of taps and pointers the coefficients and a state buffer. It is typically recommended that the state buffer be cleared to all 0
s before initializing.
Add Sample: To add a new input sample without computing a new output sample, use xs3_filter_fir_s32_add_sample(). This is a constant-time operation which does not depend on the number of filter taps. This may be useful in some situations, for example, to quickly pre-load the filter's state buffer with multiple samples, without incurring the cost of computing an output with each added sample.
Process Sample: To process a new input sample and produce a new output sample, use xs3_filter_fir_s32().
After initialization via xs3_filter_fir_s32_init(), the contents of the xs3_filter_fir_s32_t
struct are considered to be opaque, and may change between major versions. In general, user code should not need to access its members.
num_taps
is the order of the filter, or the number of taps. It is also the (minimum) size of the buffers to which coef
and state
point, in elements (where each element is 4 bytes). The time required to process an input sample and produce an output sample is approximately linear in num_taps
(see Performance below).
head
is the index into state
at which the next sample will be added.
shift
is the unsigned arithmetic rounding saturating right-shift applied to internal accumulator to get a final output.
coef
is a pointer to a buffer (supplied by the user at initialization) containing the tap coefficients. The coefficients are stored in forward order, with lower indices corresponding to newer samples. coef[0]
, then, corresponds to b[0]
, coef[1]
to b[1]
, and so on. None of the functions which operate on xs3_filter_fir_s32_t
structs in this library will modify the contents of the buffer to which coef
points. This buffer must be at least num_taps
words long.
state
is a pointer to a buffer (supplied by the user at initialization) containing the state data – a history of the num_taps
most recent input samples. state
is used in a circular fashion with head
indicating the index at which the next sample will be inserted.
More work remains to fully characterize the time performance of this FIR filter, but asymptotically (i.e. with a large number of filter taps) processing a new input sample to produce a new output sample takes approximately 3 thread cycles per 8 filter taps.
That assumes that both the coefficients (pointed to by coef
) and state buffer (pointed to by state
) are stored directly in SRAM.
If the function takes S * num_taps + V
thread cycles, what is V
?
When there are fewer than M
taps, it is more efficient to just use a C implementation of an FIR filter. What is M
?
Brief explanation of how thread cycles correspond to actual time.
Suppose you're starting with a floating-point FIR filter model with coefficients B[k]
which operates on a sequence of 32-bit integer input samples x[t]
to get a result Y[t]
where
Y[t] = x[t-0] * B[0] + x[t-1] * B[1] + ... + x[t-(N-1)] * B[N-1]
Because of the 30-bit right-shift and the right-shift of the final accumulator by shift
bits, the coefficients b[k]
to use with this library can be thought of as fixed-point values with 30 + shift
fractional bits.
The floating-point coefficients B[k]
can then be naively converted to fixed-point coefficients b[k]
shift = 0 b[k] = (int32_t) round(ldexp(B[k], 30)
After this, any further doubling of the coefficients can be compensated for without changing the overall gain by incrementing shift
.
To maximize precision, you'll typically want shift
to be as large as possible while in the worst case to be considered neither saturates the internal accumulator (which, for safety, should generally be assumed to be 42 bits), nor saturates the final 32-bit output when shift
is applied.
The details of this depend on various details, such as your filter's gain and the statistics of the sequence x[t]
(e.g. any headroom x[t]
is known a priori to have).
This library includes a python script which converts existing floating-point FIR filter coefficients into a suitable representation and generates code for easily initializing and executing the filter. See Note: Digital Filter Conversion for more.
This example creates a simple 256-tap filter which averages the most recent 256 samples.
Each b[k]
is \(2^{29}\), and the final accumulator is right-shifted 7 bits. In the worst case, all input samples are \(-2^{31}\). In that case, the final accumulator value is \( 256 \cdot (2^{29} \cdot -2^{31} \cdot 2^{-30}) = -2^{38} \), well below the saturation limit of the accumulator. After shift
is applied, that becomes \(-2^{38} \cdot 2^{-7} = -2^{31}\). Finally, the 32-bit symmetric saturation logic is applied, making the final output value \(-2^{31}+1\).
state
is a circular buffer, and so the index of x[t]
within state
changes with each input sample. The state
field of this struct is considered to be opaque – its exact usage may change between versions.int32_t* xs3_filter_fir_s32_t::coef |
Pointer to a buffer containing the filter coefficients.
unsigned xs3_filter_fir_s32_t::head |
Index into the state
buffer where the next new sample will be placed.
unsigned xs3_filter_fir_s32_t::num_taps |
The number of taps in the FIR filter.
This is also the number of elements in state
and coef
right_shift_t xs3_filter_fir_s32_t::shift |
Unsigned arithmetic rounding right-shift applied to accumulator when computing filter output.
int32_t* xs3_filter_fir_s32_t::state |
Pointer to a circular buffer containing the previous input samples.