Resource usage

The mic array unit requires several kinds of hardware resources, including ports, clock blocks, chanends, hardware threads, compute time (MIPS) and memory.

This page attempts to capture the requirements for each hardware type with relevant configurations.

Warning

The usage information below applies when the default usage model is used. Resource usage in an application which uses custom mic array sub-components will depend crucially on the specifics of the customization.

Discrete Resources

Resource

Count

port

3

clock block

1 (SDR)

2 (DDR)

chanend

4

thread

1 or 2

Ports

In all configurations, the mic array unit requires 3 of the xcore.ai device’s hardware ports. Two of these ports (for the master audio clock and PDM clock) must be 1-bit ports. The third (PDM capture port) can be 1-, 4- or 8-bit, depending on the microphone count and SDR/DDR configuration.

Clock Blocks

In applications which use an SDR microphone configuration, the mic array unit requires 1 of the xcore.ai device’s 5 clock blocks. This clock block is used both to generate the PDM clock from the master audio clock and as the PDM capture clock.

In applications which use a DDR microphone configuration, the mic array unit requires 2 of the xcore.ai device’s 5 clock blocks. One clock is used to generate the PDM clock from the master audio clock, and the other is used as the PDM capture clock (which must operate at different rates in a DDR configuration).

Chanends

Chanends are a hardware resource which allow threads (possibly running on different tiles) to communicate over channels. The mic array unit requires 4 chanends. Two are used for communication between the PDM rx service and the decimation thread. Two more are needed for transferring completed frames from the mic array unit to other application components.

Threads

The PDM rx service can run either as a stand-alone thread or as an interrupt within the decimator thread. Accordingly, the mic array requires one thread when the PDM rx service runs in interrupt mode, and two threads when it runs as a stand-alone thread.

Running PDM rx as a stand-alone thread modestly reduces the mic array unit’s MIPS consumption by eliminating the context switch overhead of an interrupt. The cost of that is one hardware thread.

Note

When configured as an interrupt, PDM rx ISR is typically configured on the decimation thread, but this is not a strict requirement. The PDM rx interrupt can be configured for any thread on the same tile as the decimation thread. They must be on the same tile because shared memory is used between the two contexts.

Compute

The compute requirement of the mic array unit depends strongly on the actual configuration being used. The compute requirement is expressed in millions of instructions per second (MIPS) and is approximately linearly related to many of the configuration parameters.

Each tile of an xcore.ai device has 8 hardware threads and a 5 stage pipeline. The exact calculation of how many MIPS are available to a thread is complicated, and is, in general, affected by both the number of threads being used, as well as the work being done by each thread.

As a rule of thumb, however, the core scheduler will offer each thread a minimum of CORE_CLOCK_MHZ/8 millions of instruction issue slots per second (~MIPS), and no more than CORE_CLOCK_MHZ/5 millions of issue slots per second, where CORE_CLOCK_MHZ is the core CPU clock rate (specified as SystemFrequency in the XN file). With a core clock rate of 600 MHz, that means that each core should expect at least 75 MIPS.

Table Estimated MIPS XS3 (per configuration) shows the mic array MIPS by profiling an application that includes the mic array. Table Estimated MIPS VX4 (per configuration) shows the same for the vx4 architecture.

The application used to generate the MIPS numbers runs the default mic array API (so the decimator running in a single hardware thread) with all defines set to their default values as listed in Configuration defines (mic_array_conf_default.h) except for MIC_ARRAY_CONFIG_MIC_COUNT and MIC_ARRAY_CONFIG_USE_PDM_ISR. These two (along with the output sampling rate) are varied to build the different configurations that are profiled.

Table 1 Estimated MIPS XS3 (per configuration)

mic count

PDM RX

output samp freq

MIPS

1

ISR

16000

13.810

1

ISR

32000

16.849

1

ISR

48000

20.873

1

THREAD

16000

12.514

1

THREAD

32000

15.409

1

THREAD

48000

19.290

2

ISR

16000

28.685

2

ISR

32000

34.013

2

ISR

48000

41.358

2

THREAD

16000

26.142

2

THREAD

32000

31.421

2

THREAD

48000

38.670

Table 2 Estimated MIPS VX4 (per configuration)

mic count

PDM RX

output samp freq

MIPS

1

THREAD

16000

11.616

1

THREAD

32000

14.000

1

THREAD

48000

17.368

2

THREAD

16000

23.692

2

THREAD

32000

27.964

2

THREAD

48000

34.204

Note

The MIPS numbers scale approximately linearly with the number of microphones. Although the table lists values only for the 1- and 2-mic configurations, these results can be extrapolated to estimate MIPS for configurations with a higher microphone count. If a given configuration cannot be accommodated within a single hardware thread, the decimator can be split across multiple threads to distribute the compute load. This approach is not supported by the default mic array API. An example of a custom multi-threaded decimator implementation can be found in app_par_decimator.

Note

In vx4 configuration, running PDM RX in ISR mode is currently not supported, so the application will use at least two threads for running the mic array unit.

Memory

The memory cost of the mic array unit has three parts: code, stack and data. Code is the memory needed to store compiled instructions in RAM. Stack is the memory required to store intermediate results during function calls, and data is the memory used to store persistent objects, variables and constants.

Table Memory usage (in bytes) reports the memory usage of two minimal applications that include the mic array: one using the default mic array API and another using a custom configuration created by instantiating a MicArray object.

Both applications are built for 1 or 2 microphones with a 16 kHz output sample rate, with the other compile-time parameters set to their default values as defined in Configuration defines (mic_array_conf_default.h).

Memory for higher microphone counts can be extrapolated from the 1- and 2-mic numbers.

Across different sampling rates, memory usage with the default API remains unchanged since the default API compiles all the decimation filters included in the library. For custom usage, the data memory across different sampling rates varies depending on the decimation filters included.

Table 3 Memory usage (in bytes)

Config

Available on tile

Used

Stack

Code

Data

1mic_custom

524288

13308

588

8694

4026

1mic_default

524288

17236

652

9578

7006

2mic_custom

524288

14892

604

9078

5210

2mic_default

524288

20508

668

11450

8390

Note

The default API requires more memory than the custom configuration. The additional code memory comes from the wrapper and abstraction code included in the default API. The increased data memory results from the inclusion of filter coefficients for all filters provided by the library, whereas the custom build includes only the coefficients required by the specific MicArray instance.

Note

A minimal empty application (no mic array) occupies ~4.6 KiB (Stack: 356 B, Code: 3754 B, Data: 542 B). Subtract this baseline from the reported totals to isolate mic array overhead.