A while back I spoke about what is MTTR and MTTD and why they matter in software delivery or at an organization. Today I’ll go in a bit more in depth and talk about how to reduce incident resolution time at your company. First, I’ll go into some general basics but this article assumes you already understand MTTR and are able to measure it. If you aren’t there yet, I can write another article or read my previous one to get an idea. Let’s get into it! How do you reduce MTTR? (mean time to resolution) Mean Time to Resolution
Have you ever worked somewhere where they deployed once a quarter? I have. It sucks and it’s super risky. On the other hand, I’ve been at places where we push to production over 1000+ times a week. “But we have 75 people on the call and they’re all paying attention”. Yeah, OK. I’ve been on these and I’ve heard people sleeping. Midnight calls suck and sleep deprived people who are deploying large amounts of code with a lot of steps manually is RIPE for error. Making mistakes happen and “short deployments” turn into hours and you get delayed even further.