Beginners Guide to Automated Voice App Testing


This article was originally published on Botium’s blog on February 3, 2021, prior to Cyara’s acquisition of Botium. Learn more about Cyara + Botium

This guide suggests best practices, infrastructure, and tools to ensure your voice app continues to deliver outstanding user experience.

Questions when testing voice apps

Application of the suggested practices helps answer the questions:

  • Is my voice app following the designed conversation flow? Is the conversation flow working as expected?
  • How does my voice app work under real-life conditions? Does it handle low audio quality? Does it handle slow network connections?
  • Is my voice app available 24×7, or are there any interruptions in service?


The Art of Challenging Chatbots

The challenges when testing chatbots, especially voice-enabled ones, are different ones than when testing apps with a graphical user interface: while a graphical user interface restricts the possible user interactions by the controls it offers, with natural language, the number of possible user inputs is limitless. Additional when using voice as user input there are again more variables to take into account: the individual nuances in voices, the quality of the microphone, the background noises surrounding the speaker, and more — when testing a graphical user interface, a button click is always perceived the same by the application, regardless of who actually clicked it.

The platforms behind powerful voice applications are still evolving and are subject to constant improvements — which means that developers have to rely on components that they do not own and the possible influence is limited.


Testing the Voice Conversation Flow

The open-source product Botium provides you with all the tools required for implementing a comprehensive, holistic test strategy for your voice apps. You can read about Botium and the background on testing conversation flow in the official Botium documentation.

We will use Bring! Shopping List as an example of a voice app to test. It is published as Alexa Skill, and we can use the Botium Connector for Amazon Alexa with AVS for simulating voice input and output with Botium.

For details about the presented steps and tools please take a look at the Botium Wiki!


Record Test Cases

The quickest way to get started is to use the Live Chat in Botium Box to record your own voice with your microphone. You can immediately see and listen to the response of your voice app.

Depending on the technology of your voice app, both text and audio response are shown or either of them.

You can save the conversation as a test case and make some changes afterward.

  • Refining input and output text and audio
  • Using wildcard matching or utterance lists instead of full text
  • Add additional test steps or asserters


Synthesize Test Cases with Text-To-Speech

Instead of recording your own voice for the test cases, you may decide to instead (or additionally) use synthesized voice samples. Botium has its own Text-To-Speech and Speech-To-Text platform based on the best open source and cloud engines available — Botium Speech Processing.

Test cases are showing plain text now instead of audio input:


Eliminating Flakiness — Homophone Mappings

A typical problem when testing voice apps is that audio transcriptions, especially for low-quality audio, can be rather unstable — in test automation, we usually rely on hard facts (fixed text assertions), and this will lead to increased flakiness of the test results.

In this example, you can see that instead of okay milch ist auf deiner liste the transcription says okay milch is auf seiner liste — this one character difference will make a test case fail:


Botium provides the option to specify homophone mappings to deal with audio snippets that are often misinterpreted by the Speech-To-Test engine.

Test cases use these mappings to qualify transcription results as success or failed.


Testing Real-Life Scenarios

Using your own microphone in front of your laptop might be a good starting point, but in real-life voice apps are used in another way — with smartphones, with home automation or entertainment devices like Alexa or Google Home, in a car. To come up with meaningful End-2-End test cases for these scenarios you will have to make your test data similar to those scenarios.

  • Add background noise on various levels
  • Pitch volume up or down
  • Simulate various levels of distance
  • Simulate technical restrictions like GSM phone line or low bandwidth
  • Simulate otherwise bad audio quality like interruptions or various levels of silence
  • … you name it …


In Botium Box you can apply various effects for simulating real-life usage scenarios to your own clean recordings or synthesized audio samples.


Continuous Monitoring

The recipe for ensuring the availability of your voice app is actually rather simple — all you need is:

  • smoke test for checking basic behavior (for instance, just sending a simple hello to the voice app and listing for a response)
  • scheduler to run the smoke test every few minutes
  • notification mechanism to inform you in case of failures


With Botium Box, everything you need is coming out of the box.



Now you know what is needed for automated testing of your voice app, you may give Botium Box a try, or you can stick to the free and open-source plan with Botium Core.

  • Record your own voice or use a synthesized voice
  • Apply audio effects for real-life simulation
  • Conversation flow testing with Botium


See this article in Spanish here! 🇪🇸