Assessing Quality of Chatbot Training Phrases for Watson, Dialogflow and Everything Else


This article was originally published on Botium’s blog on September 30, 2020, prior to Cyara’s acquisition of Botium. Learn more about Cyara + Botium

This article shows you how to analyze and evaluate the quality of the training phrases for your chatbot intents with Botium. The purpose of this analysis is to avoid confusing the agent with phrases irrelevant to the intents supplied to, or more relevant to, other intents.

Botium first generates semantic embeddings of the training phrases by using the TensorFlow Hub Universal Sentence Encoder module and visualizes them in a 2D map. Based on the similarity between the training phrases, the average similarity between the intents is computed (separation), as well as the average similarity of phrases within an intent (cohesion). This approach helps to identify training phrases that might confuse your chatbot – based on the similarity in the embedding space.


Downloading Chatbot Training Phrases to Botium

All data science projects start with slicing and dicing data. Botium includes a Test Case Wizard, which downloads the training phrases from your chatbot provider with a single click – IBM Watson, Google Dialogflow, SAP Conversational AI, and more (see a list of supported chatbots engines in the Botium Wiki). Or you can decide to import one of the supported file formats instead – including JSON, YAML, Excel, and more (see Botium Wiki for details).


Analyze Training Phrases

Now navigate to the NLP Training Quality section to launch the analysis job in the background. Depending on the size of your training data this will take a few minutes.


Similarity Visualization

The semantic similarity of the training phrases is now visualized on a 2D map: The closer the points, the closer the semantical similarity. Hovering over a data point will show you the similarity in question. You can decide to show and hide training phrases for individual intents be selecting/deselecting the intents one by one.

Hint: Having all the training phrases for all intents shown on one map is confusing. For getting the most out of it, first check the following sections for similarities and then activate/deactivate the intents in question on the map.


Utterance Similarity

Training phrases in different intents that have high similarity value can be confusing to the NLU engine and could lead to directing the user input to the wrong intent.


Intent Separation

Given two intents, the average distance between each pair of training phrases in the two intents is shown.


Intent Cohesion

Cohesion is the average similarity value between each pair of training phrases in the same intent. That value is computed for each intent. The higher the intent cohesion value, the better the intent training phrases.


Improve Chatbot Training Phrases

To improve the quality of the training phrases for your intents, consider the following approaches:

  • Find the phrases in different intents with high similarity in the Utterance Similarity table, and change or remove them
  • For intents with low cohesion, add more meaningful training phrases
  • For intent pairs with low separation, investigate training phrases


Botium provides tools for these steps. See the Botium Wiki.