artificial intelligence

The role of women in the rise of AI

The role of women in the rise of AI 310 310 XMOS

We spoke with Natalie Powell (ChannelNewsAsia) at MWC19 on the “Role of Women in the rise of Artificial Intelligence”.

Mark Lippett on how XMOS has made itself heard.

Mark Lippett on how XMOS has made itself heard. 350 494 XMOS

Don’t miss Mark’s interview in the January edition of South West Business Insider, where he outlines how XMOS’ intelligent speaker technology is set to change the way we live our daily lives …

Talking the Talk. PwC talks to Mark Lippett about the future of voice

Talking the Talk. PwC talks to Mark Lippett about the future of voice 2000 1333 XMOS

To keep XMOS compelling we have to anticipate where the market is going and what our clients are need. In the short term that means delivering an even richer user experience whilst driving costs down. We have a strong position in a massive market. Our challenge – and our opportunity – is to stay focussed on…

Yukai and NTT DOCOMO launch the first development kit “codama” for NTT DOCOMO “AI Agent API” using XMOS voice technology.

Yukai and NTT DOCOMO launch the first development kit “codama” for NTT DOCOMO “AI Agent API” using XMOS voice technology. 391 360 XMOS

Bristol/Japan 3rd December 2018

XMOS, a leading supplier of voice and audio solution is proud to be involved in this new voice interface development kit, “codama”, jointly developed by Yukai Engineering and NTT DOCOMO.

“codama” base board and ‘codama’ microphone array board

The “codama” voice interface development kit includes the XMOS VocalFusion XVF3100 voice processor; it enables AI engineers and developers to build a voice interface with NTT DOCOMO “AI Agent API” into their products. This technology brings people the freedom to control their electronic devices and access a wide range of IoT services by simply using their voice – wherever they are in the room and whatever is happening around them.

The XVF3100 delivers outstanding voice capture accuracy over long distances; its barge-in capability means the user can interrupt any music playing on the consumer device, even when it’s playing loudly, and the wake-word is spoken softly. Additionally, the XVF3100 uses beamforming to capture the direction of arrival of the voice command and track the person speaking as they move around the room. Sophisticated algorithms deliver acoustic echo cancellation, dereverberation and noise suppression, to capture the voice of interest and clean up the signal for onward send to the NTT DOCOMO “AI Agent API”.

Finally, Yukai have added an excellent personalisation feature that lets people create their own wake-word on Yukai Engineering’s website and download it to their device.

Mark Lippett, CEO of XMOS said today,“We’re very proud of this collaboration with the talented team at Yukai Engineering and NTT DOCOMO. NTT DOCOMO “AI Agent API” opens up a world of AI possibilities across electronics and robotics, giving people an altogether richer, more exciting experience – and now people will be able to access that through voice control, from anywhere in the room.”

Mr. Yoshikazu Akinaga, Manager of Innovation Management Department at NTT DOCOMO said, “NTT DOCOMO developed AI Agent API to ultimately provide end users with all-new speech AI experiences based on highly natural AI-supported communication. Today, we are very excited to introduce new speech development kit integrated with XMOS’s class-leading far-field voice interface technology working with Raspberry Pi and NTT DOCOMO AI Agent API seamlessly. This kit enables everybody to develop speech and dialog service “device” easily. We are very grateful to Yukai engineering and XMOS team collaboration to takes this major step forward for achieving our goal. We are expecting all users to enjoy new speech AI experiences.”.

The “codama” voice interface development kit will be sold from 21st December 2018 via Yukai EngineeringMACNICA and Amazon.  You can pre-order via from 3rd December 2018 (the pre-order link can be found on Yukai Engineering’s web-site).

The kit will be demonstrated at DOCOMO Open House 2018, Tokyo Big Sight, 6th December to 7th December 2018.

Key features

  • Small form factor development kit 64mm(W) x 55mm(L) x 20mm (H). (H)20mm: including pin headers
  • Sample programs for NTT DOCOMO AI Agent API (available via Yukai Engineering,
  • Works with Raspberry Pi
  • Linear microphone array: 4 x Infineon XENSIV™IM69D130
  • Wake-word engine by Sensory
  • Full duplex Acoustic Echo Cancellation (AEC)
  • Barge-in
  • adaptive beamformer
  • Dereverberation
  • Noise suppression
  • Automatic Gain Control (AGC)
  • Direction of arrival (DOA) indication

About XMOS:

XMOS is a leading supplier of voice and audio solutions to the consumer electronics market. Unique silicon architecture and highly differentiated software positions XMOS at the interface between voice processing, biometrics and artificial intelligence. For more information, please visit


NTT DOCOMO, Japan’s leading mobile operator with over 74 million subscriptions, is one of the world’s foremost contributors to 3G, 4G and 5G mobile network technologies. Beyond core communications services, DOCOMO is challenging new frontiers in collaboration with a growing number of entities (“+d” partners), creating exciting and convenient value-added services that change the way people live and work. Under a medium-term plan toward 2020 and beyond, DOCOMO is pioneering a leading-edge 5G network to facilitate innovative services that will amaze and inspire customers beyond their expectations. DOCOMO is listed on the Tokyo stock exchange (9437).

About Yukai Engineering

Yukai Engineering is a start-up comprised of designers and engineers who are passionate about making our daily lives more fun and fulfilling with robotics technology”. Their award-winning products include Necomimi (selected for the Time magazine’s “The World’s 50 Best Innovations 2011” list), and BOCCO, a communication robot that keeps family members in touch with each other. For more information, please visit


Japanese language press release available on request from

Skyworth unveils ‘always-on AI TV’

Skyworth unveils ‘always-on AI TV’ 891 595 XMOS

Chinese TV manufacturer Skyworth has unveiled an ‘always-on AI TV’ at its new product strategy event. Skyworth has also joined forces with voice and audio processors specialist XMOS to develop the AI voice chip for its TVs.

Giving voice to the elderly

Giving voice to the elderly 5386 3591 XMOS

Voice-enabled technologies will transform the health and happiness of the elderly.

The UN predicts a 56% rise in the number of people aged 60 years or over, taking us from over 900 million in 2015 to nearly 1.5 billion in 2030.The world’s population is changing. Our demographic is aging. And this could well be the defining issue of our time. An aging population creates a burden on health systems and individual households. Family members, clinicians, and assisted care providers will need a new generation of technology platforms to help them stay informed, coordinated, and most importantly, connected.

The social care system is facing a mountain of challenges and it can’t cope with a sustained upswing in the number of senior people and adults living with chronic illnesses.

Whether living at home or in an assisted facility, help may come from an unexpected source – technology. Speech recognition and voice-enabled devices make technology accessible to all. There’s no need to tap a keyboard or figure out how to work the remote control, you simply talk to the device from across the room. A voice-controlled device can empower a formerly ‘dis-empowered’ user. It can ease pressure on caregivers, becoming a companion and digital assistant. Of course it’s not a replacement for human interaction, but rather a meaningful addition.

How can voice-enabled assistants make a difference?

Voice-enabled assistants such as Amazon’s Alexa, Google’s Assistant, Apple’s Siri, and Microsoft’s Cortana are at the forefront of society’s screenless future. Thanks to rapid advancements in voice technology and natural language processing (NLP), these virtual assistants are far better equipped to understand human speech and respond accurately (and in real-time). These virtual-assistants can perform all sorts of tasks – including playing music on demand, calling friends or prompting you to take your medication at a certain time.

This can make a big difference to those living with a chronic illness, anyone who has limited mobility, and the elderly. It can make tasks easier and create a sense of companionship. More importantly, it helps people regain control. From simple actions such as lighting the room and adjusting the temperature, to things that are critical to our wellbeing, such as controlling access to our home and the calling for emergency assistance when needed.

How does this work in the real world?

The team at Pillohealth have come up with a ground-breaking, in-home 24/7 companion robot – Pillo – which combines voice control, facial recognition and artificial intelligence to provide personalised digital healthcare. The technology can also provide 24-hour care and companionship, entertainment on the go, reminders of when the patient needs to eat, sleep or move and the ability for the person to live independently.

The device acts as a secure pill dispenser, offers video check-ins with caregivers, can quickly and reliably identify valuable healthcare insights and can send data to healthcare professionals. But it offers much more than that to the user. They don’t need to learn how to use Pillo, they can just talk to it – and the companion robot is on hand to tune into their favourite radio station radio, answer a question, manage their calendar and give them handy reminders.

It all adds together to help the user enjoy a more independent life and could help to ease pressure on a stretched healthcare system. (see

Companionship is equally important

Happiness comes when the ‘assistant’ becomes something more akin to ‘companion’. A study by US company Brookdale Senior Living, explores technologies that can help the older generation stay independent for longer.

A team set out to determine whether reciting Shakespeare with a robot could increase engagement and lessen symptoms of depression. A two-foot tall robot called Nao recited the first 12 lines of Shakespeare’s Sonnet 18, “Shall I compare thee to a summer’s day?” and then prompted seniors to recite the last two lines.

Those who interacted with Nao experienced significant decreases in depression and significant increases in engagement over time. Showing the tangible capability of voice-enabled assistants to be more than just a virtual caregiver, but at times a companion who is available 24/7.

Voice can provide both practical and emotional benefits

Over time, advancements in artificial intelligence will improve voice-enabled assistants. Learning more about the user with each interaction, they will move from reactive responses to a more relevant, engaging conversation. They will become an integral part of the consumer ecosystem, seamlessly integrating across all devices and platforms to become a natural, digital companion.

Crucially, if technology is controlled by voice, it becomes accessible to everyone. There’s no need to learn how to use it, you just talk to it. A number of studies have shown that talking makes us feel happier, so it’s easy to see how voice enabled technology could transform life for the elderly in ways that are both practical and emotional. And given our aging demographic, this feels like a very good thing.

Want to develop a voice enabled device that can hear across the room?

Want to develop a voice enabled device that can hear across the room? 1014 762 XMOS

You’ll need the right acoustic echo cancellation (AEC) solution.

If you’re designing a voice-enabled product for the smart home that includes a loudspeaker, you’ll need to remove the acoustic echo it generates so you can interrupt the audio stream – barge-in – and give a voice command when the device is playing such as adjust volume.

Mono or stereo?

For products such as security solutions or kitchen appliances, and many smart speakers, mono-AEC is usually the right tool for the job. But if you’re designing products that output true stereo audio, for example TVs, soundbars and media streamers, then you’ll need stereo-AEC to secure the best performance available. Here’s why …

Acoustic echo cancellation explained

Acoustic echo cancellation is a digital signal processing technique for removing echo that originates from a loudspeaker. Within a device, there’s a direct path between the loudspeaker and microphones. There’s also an indirect path between the two, because the audio signal reflects off the walls and other surfaces before it reaches the microphone. Put simply, you’ll get a reflection off the ceiling, floor, each wall and every solid object in the room. These reflections are known as indirect acoustic echo and they’re picked up at different times by the microphone, depending on the length of path from the loudspeaker to the microphone.

If we look at a soundwave generated by a noise from the loudspeaker, the original sound can usually be identified at the beginning and then the soundwave tails off as the energy falls in reflections.

To support barge-in and capture a clear voice stream to send to your automatic speech recognition service (ASR), you need to remove as much echo from the captured microphone signal as possible.

It’s not possible to remove 100% of the echo because the time needed to capture the signal and separate out all of the echo would lead to a delayed response, and the user experience demands that this all happens in real time. So in practice, you’re looking to target an “acceptable” level of echo cancellation that allows the ASR to respond accurately.

Types of acoustic echo cancellers

Echo cancellers are categorised by the number of loudspeaker reference channels supported. Common configurations are either: mono – 1-channel, or true stereo – 2-channel. Another configuration – pseudo-stereo – behaves in a very similar way to mono, but has some significant performance issues when challenged with true stereo audio output.


Mono-AEC uses a single reference signal based on the audio input and applies it to the output, which can be one or more loudspeakers.

The Digital Signal Processor uses the reference signal to calculate indirect echo based on the time it takes the reflections to reach the microphone.

Where signal processing has been used to give the impression of a stereo system from a mono signal (e.g. by adjusting the signal pan and volume and output to two or more speakers) the calculation remains based on the reference signal and position of the loudspeakers from the microphone:

True Stereo-AEC

True stereo-AEC uses two separate reference signals based on the two-channel input.

Each reference signal is used to cancel the echo from its corresponding loudspeaker output.

True stereo-AEC requires almost twice the computational resources of a mono solution, and it requires very low latency within the system to keep all the echo cancellation synchronized within the required thresholds.


A pseudo-stereo solution is similar to a mono-AEC configuration; it outputs the two audio streams to separate speakers but uses a single reference signal that is a mix of the two inputs.

The mixed reference signal is then applied to each loudspeaker output.

Problems arise when the mixed signal differs significantly from the two output channels, for example a loud track on one loudspeaker and a quiet one of the other, and the mixed reference signal is not representative of either input signal.

In the example above the amplitude of the reference signal is significantly larger than the output for Input A. This causes the signal to be drowned out leading to a very low signal-to-noise for the voice capture process. With Input B there is not enough AEC when the input is loud which will cause increased artefacts in the captured voice stream and a higher likelihood of inaccurate word recognition.

Choosing the right acoustic echo cancellation solution

The start point is to decide which acoustic echo canceller you need for your microphone array and audio subsystem.

Using a mono-AEC algorithm with a true stereo device will only work if both channels are very similar. If your stereo product uses the full capabilities of stereo audio with spatial soundscape and dramatic volume changes, then the only solution is one that supports true stereo-AEC.

For devices like smart speakers where the required range of output is more limited, a pseudo-stereo may provide an good solution. And for things like kitchen appliances where high quality audio isn’t required, mono-AEC is ideal.

XMOS has a range of solutions to fit whatever product you’re developing. Our XVF3000 series with mono-AEC is ideal for smart panels and smart speaker developers, while our XVF3500 series with two channel stereo-AEC delivers outstanding performance for smart TVs, soundbars and other products that playback true stereo output.

by Huw Geddes