The Path to DevOps: Reducing the Cost of Failure


Part of our work in the Domain Consulting team is advising Cyara customers on their Agile/DevOps journeys. Through this process, we've identified a path to DevOps in the contact center that I'd like to explore in more detail.

In the first post of this series, I talked about putting in the guard rails with design-driven assurance, and in the second, about accelerating the delivery of value. In this post, I'll cover best practices for reducing the cost of failure, through building solutions with testing and monitoring in mind, automated deployment and configuration, or using test data in production.


The High Cost of Failure 

It may seem obvious, but failures in the contact center can cause:

  • Lower NPS (and eNPS): bad or sub-par experiences can reduce customer satisfaction and lead to lower NPS
  • Increased Turnover: unstable environments can cause stress for employees (contact center agents, support staff, etc.) and thereby increase turnover
  • Increased AHT (Average Handle Time): longer AHT can result in higher costs per transaction
  • Missed Revenue: if customers can’t self-serve or can’t reach an organization, that organization is probably missing out on revenue

There are five important steps organizations can take to reduce the cost of failure in software development projects.

1. Identify Issues Proactively

Two complementary elements go into proactively identifying issues:

  • Outside-in monitoring: the type of monitoring offered by Cyara or some application performance monitoring (APM) tools, such as New Relic or AppDynamics (which monitor online and mobile applications), and
  • Inside-out monitoring: or traditional network monitor from the bottom-up.

We recommend doing both types of monitoring; organizations should continue to do traditional inside-out/bottom-up monitoring wherever possible. Of course, there are situations — such as when if you have a cloud contact center platform or cloud IVR — where this may not be possible for you to directly monitor from the inside-out. But if you can do it to complement your outside-in monitoring, you should.

2. Prioritize Issues Based on Customer Impact

Prioritizing issues based on customer impact is largely driven from outside-in Cyara monitoring (or APM tools). When an issue is identified, it needs to be prioritized based on the impact it has on customers. Organizations can use that outside-in view to understand the customer impact. For example, there may be a situation where a backup router fails completely. This is obviously an issue, but does it impact customers? Most likely not. However, if there’s a problem with a slow database that results in a customer’s bank balance request not being completed at all, or the request is timing out and sending calls to agents, that is obviously customer-impacting. And of the two, this is the issue that should receive higher priority. Essentially, it’s about giving the ops team visibility on what they need to prioritize based on the impact to customers and consequently to the business.

3. Use Monitoring Information to Get to the Root Cause Faster

Organizations can use monitoring to understand which services are up and which are down, but monitoring can also help drill down to the root cause of an issue. Faster access to precise and objective information is key. In my experience, what takes the most time when an incident is open is figuring out what the actual issue is. There may be a situation where contact center agents are saying they can't log in. Maybe they can't log in to their CTI, to Microsoft, or to their soft phone. They don’t have the knowledge to diagnose why they can’t log in — all they know is they can’t. Putting together a monitoring solution that allows organizations to get an answer in this kind of scenario more quickly is important.

4. Use Automation to Replace Faulty Component(s)

Once the source of the problem has been identified, the next step is to swap out that element as quickly as possible. For example, imagine one of the IVRs is experiencing an issue, such as a problem with how the application server is handling requests. That IVR can be taken out of rotation. Then, as I talked about in Part 2 of this series, a replacement IVR component can be rebuilt in a completely automated fashion and brought back into the pool of resources in minutes. Then, after the replacement IVR is in place, there is an opportunity to go back and identify and analyze the original problem. This provides the ability to diagnose without impacting customers, and means less stress for the support team(s).

5. Emphasize Reviews and Continuous Improvement

This is more of a process element. Ask yourself: how can we build a test in our innovation lifecycle process to avoid issues arising in the first place? Every time there is an incident, the teams should sit down and examine whether they proactively identified the issue. If they didn't, what kinds of monitoring or testing can they put in place to proactively identify the issue next time? And if they did, was there anything they could have done to identify the issue earlier, or a way to get to the root cause faster? This then becomes another feed into an organization’s development and testing processes. Understanding what needs to change in the development process allows for better monitoring and earlier identification of issues. 

Cyara's Domain Consulting team can help organizations improve their CX development and delivery with tailored advice and solutions. To learn more about how Domain Consulting can help you, contact us, or get in touch with your Account Executive.

Contact Us