Energy consumption is a hot topic in the world of voice-enabled AIoT devices. With good reason.

Voice shows the fastest adoption of any consumer technology ever[1]. At the current rate of growth, there’ll be a further 1.5 billion new voice-enabled devices in our homes in 2025, with an estimated 5 billion units in use worldwide.

Imagine all these devices powered up and hanging on to our every keyword. At a very rough estimate, those devices will consume 65 TeraWatt hours of electricity[2] a year, simply by being always on, listening for a keyword. That’s almost the equivalent (90%) of the annual output of the world’s largest nuclear power plant[3]. It’s not sustainable. Intelligent IoT systems should enable us to consume less, not more. 

As voice becomes a mainstream requirement and the focus moves inexorably forward to contextual, conversational interfaces, so we’re also seeing a shift in the semiconductor industry, with increasing innovation (and demand) around energy efficient solutions.

Today’s architecture is energy hungry.

Many different algorithms, using significant processing power, are required to capture and extract a clean voice signal from a noisy environment. To date, Digital Signal Processors (DSP) and Application Processors (AP) have been the solution of choice for far-field voice-enabled products. 

Whilst DSPs are a better, purpose-designed architecture for real-time processing such as voice detection and isolation, APs are almost always embedded in today’s systems – to support the control and communications processing required for a connected AIoT device. These systems meet the functional requirements of the application, but they are power-hungry and always on – in the case of voice systems, listening for keywords. While the incremental running cost for the individual user is low (and likely tolerable), the overall burden on our planet is high. 

Is ‘always-on’ the answer? 

The smart speaker is a familiar presence in our homes. But, what do we want it to do when there’s no-one in the room? Nothing. We want it to do nothing. By logical extension, how much energy do we expect it to consume when it’s not in use? None. 

We don’t need “Always on”, we need “Always ready”. We need our smart speaker to be ready only when a human voice is detected. Extending the ‘always ready’ concept to other AIoT devices, we need our smartTV to ‘wake’ into active standby only when we’re present in the room, and not for example when the family pet strolls through the space. We might want our kitchen appliances to wake in the presence of an adult, but not when a child is alone in the room. We might want our cars to become active only when an identified individual is present.

If we want to determine when a device needs to ‘wake’ into a ready state, we need to gather real-time contextual data from the surrounding space, using sensors such as microphones, radar and cameras. We then need an intelligent processor within the smart product to process the multiple data streams concurrently and determine when to wake the system, (acknowledging how much notice it needs to reach its ready state). The challenge is how to embed this kind of intelligence without compromising the energy-consumption of the device.

Is there a solution? 

Producing a “zero-energy” AIoT device requires broad changes, with innovations to the sensors, the compute and the communications. 

The sensors must deliver enough information for the system to infer when it needs to be mission-ready. The sensors must either constantly monitor the environment for events (see microphones with built-in acoustic activity detectors from Vesper VM3011 and Knowles AISonic SmartMics), or scan the environment periodically eg to detect a human presence using radar or a camera.

The ideal compute resource should consume zero power whilst inactive but be ready to switch to an active state immediately it is needed (and deliver billions of operations per second). This “wake time” is application and architecture specific. For a voice interface, the processor must wake within around 5 milliseconds to support keyword detection. This is around 200x faster than an Application Processor can boot. 

The combination of “zero-power” sleep and “instantaneous” wake into a mode that provides billions of operations per second requires a new class of processor – one that can deliver the substantial performance demanded by AIoT workloads (including all of the control and communications processing); that consumes micro-watts when sleeping and can wake into a fully operational mode in milliseconds . NXP has been thinking in that direction with their i.MX.RT series of Crossover Processors. Here at XMOS, we’ve designed our latest xcore device to handle exactly that kind of challenge. The all new xcore will be revealed very soon.

Finally, we come to connectivity. WiFi is currently the go-to connectivity solution for the smart home, but it’s not synonymous with low power in its active modes and it’s slow to reconnect to devices that have been powered down.  The rule of thumb for a human-machine interface is that responses should appear within a quarter of a second.  Historically, this has meant that the WiFi system must be fully active and has led to an impression that WiFi cannot be used in many battery powered applications.  This challenge has been addressed by major players in the industry and recent innovations from Dialog and Silicon Labs allow IoT devices to remain connected in modes that consume micro-watts.

A change is coming.

We’re not going to see “zero-energy” AIoT devices just yet, but important innovations are emerging; some highlighted here result in a standby power state that’s three orders of magnitude more efficient than the current norm. This is real, meaningful progress. It reduces the collective energy consumption of all those keyword spotters down to a modest sized mobile generator instead of a state-of-the-art nuclear power station. 

Similarly, whilst we’re not going to see truly instantaneous wake, lightning fast system activation times are already within our grasp – designers just need to think a little differently about how they architect their systems to leverage these capabilties. 

At XMOS, we’re developing a new xcore AIoT processor that will deliver intelligence at the edge and help manufacturers address the low-energy challenge. Standby for an announcement soon! 


Visit www.xcore.ai to register your interest

[1] Accenture: Reshape to Relevance
[2] Based on 2019 power requirements for an Amazon Dot or Google Home
[3] Tokyo Electric Power Co.’s (TEPCO) Kashiwazaki-Kariwa plant in Japan

Scroll to Top