April 4, 2019 • 39 Minutes

Building Resilient Systems with Thai Wood

Thai Wood, editor of Resilience Roundup, joins me to discuss on-call, incident management, and the latest in systems resiliency research. We get into incident command structures, how to make it work on a small team, on-call in the Emergency Medical Services (EMS) world and the parallels to DevOps, and a bunch of fun stuff with the academic research on resiliency.

About the Guest

Thai helps teams build more resilient systems and improve their ability to effectively respond to incidents. A former EMT, he applies his experience managing emergency situations to the software industry. He writes about resilience engineering each week at ResilienceRoundup.com

Links Referenced:

Transcript

Mike Julian: This is the Real World DevOps podcast, and I'm your host, Mike Julian. I'm setting out to meet the most interesting people doing awesome work in the world of DevOps. From the creators of your favorite tools to the organizers of amazing conferences, from the authors of great books to fantastic public speakers. I want to introduce you to the most interesting people I can find.

Ah, crash reporting, the oft forgotten about piece of solid monitoring strategy. You struggle to replicate bugs or elusive performance issues you're hearing about from your users, you should check out Raygun. Whether you're responsible for web or mobile applications, Raygun makes it pretty easy to find and diagnose problems in minutes instead of what you usually do, which, if you're anything like me, is ask the nearest person, “Hey, is the app slow for you?” and getting a blank stare back because hey, this is Starbucks, and who's the weird guy asking questions about mobile app performance? Anyways, Raygun, my personal thanks to them for helping to make this podcast possible. You can check out their free trial today by going to Raygun.com.

Mike Julian: Hi folks, I'm Mike Julian, your host for the Real World DevOps podcast. My guest this week is Thai Wood. He's an internal tools specialist at Fastly, the editor of Resilience Roundup, a fantastic newsletter that I'm a huge fan of. And perhaps really interesting is he's a former EMS professional. For those that don't know, EMS is what most of the Ops world keeps looking to figure out how can we improve our own on-call and incident management procedures. So, welcome to show Thai.

Thai Wood: Hey Mike, thanks for having me.

Mike Julian: So for those that really have no idea or frame of reference for what EMS is, what is it?

Thai Wood: EMS is sort of the broad umbrella of what happens anytime someone has a medical emergency, usually starts with someone calling 911. Someone gets sick or injured, they call 911 and the people who show up in whatever capacity are EMS workers and EMS professionals.

Mike Julian: So, would this also be like firefighters, police, doctors, nurses? Everyone's included in that or?

Thai Wood: Yeah, typically it's a big group of people just because who shows up actually, in the US strangely depends on where you live, what state you're in, what your county rules are, and a lot of things like that. So, here in Las Vegas, for example, we actually have three different fire departments, some of which run their own ambulance services, some of them do not. So it can also depend on what area of town you're in.

Mike Julian: Yes. San Francisco is the same way. The San Francisco Fire Department actually, I guess consumed the San Francisco Emergency Management into themselves so there's no such thing in San Francisco County is separate EMS. So that's why you hear so many fire trucks everywhere, it's because most of the incidents are not for fires, the city is not always burning down, it's just someone's always hurt. So they send a fire truck and police and more fire trucks. Where I'm from in Knoxville, Tennessee we actually had separate EMS, which was like they had the same vehicles as the ambulances except they were green instead of red, which was kind of cool. So yeah, that's fun.

So one of the things I really find interesting about EMS is when you look at people responding to incidents, and I'm using incident in a very broad, vague way here, someone gets hurt. And the people responding have just like the most cool, collected nature I've ever seen. And it's always like two people show up and everyone knows exactly what's going on. Meanwhile, the people who have called are freaking out, but the EMS personnel are calm and collected. How do you even get that way when someone's bleeding on the sidewalk, but EMS is perfectly cool about it.

Thai Wood: I think that's a really good question. It's actually something that I feel is missing in a lot of software, which is just a lot of it is experience. The hundredth time that you've been on a similar call, the thousandth time, of course like anything we habituate to it, and it gets easier. A lot of places with EMS, you have an opportunity to practice these things, and to get better at them. Oftentimes, that might be that you do ride-alongs even before you are certified. So, you're already becoming immersed in this world. You have an opportunity to do some clinical hours at hospitals, for example, sometimes, and you get to just be in these different situations. And of course, the first, maybe the fifth, maybe even the 10th, you feel the same way. There's also of course a culture where a lot of times what you're seeing is not in fact the truth.

Mike Julian: Okay. Tell me more about that.

Thai Wood: Well, depending on what it is that you're walking into, I or others might have an internal dialogue that say, “Oh, no, I have not seen this before.” But, we're not doing either of us a service by letting that dictate our outward response. If I'm letting that change my behavior or letting it allow me to be distracted or unfocused, I'm not helping you nor am I helping myself be more effective or be more effective in helping you.

Mike Julian: Yeah, that makes sense. I mean, I've definitely had that situation when I've been on call. You walk in has something like, well, everything's exploded, I have absolutely no idea what's going on. But as the senior engineer, everyone's looking to you like, expecting, you've seen this before. So you just kind of put on that veneer of, “Nope, I've got this,” while screaming internally.

Thai Wood: Absolutely. And I think that, depending on why it's done, I think it's actually a good thing. In software, I do tend to question a little bit, I think it should be okay to say, “I don't know, this thing. I haven't seen it before, but I'm going to figure it out,” whereas with … Because you're probably with personnel that you know at least somewhat if you're on call, right? Network operations, staff, people you've at least talked to before, which is not necessarily the case with emergency management, in the physical realm. These are strangers and you don't know how they're going to react. So your best service to them is to just keep that cool.

Mike Julian: Right. Yeah. It's just really hard to do sometimes or, most the time for me. Man, I do not miss being on call at all. I never liked the idea of having to present that coolness that everyone's looking for.

Thai Wood: I definitely understand that. It's a very visceral, physical experience. And I know a lot of us can habituate good and bad to some of this stuff. We know that noise, whatever our pagerduty alert noise is, we know that noise, right? Or even if you put it on vibrate, the sound of that phone vibrating against your nightstand, you know the difference in what it sounds like to vibrate on your nightstand versus like your kitchen table. It embeds in your brain and there's a very physical, visceral response. And I think we don't give enough credit to that in technology and software that people are experiencing this. Unlike EMS where there is a lot of understanding and at least a lot of the companies oftentimes will have staff psychologists training, things like that to help you deal with this, whereas in software a lot of times people are just like, it's software. But it ignores the human side of this response that we can't help but encounter.

I actually saw a study once, I don't recall the details but they'd done something like hooked up a cop to an EKG and they put them in kind of like a simulated car or maybe it was a real car. But they got him all thinking he's on duty and all this and they're waiting for him and they get him all just sitting around waiting for a call to come in. And they're watching his vitals. And they trigger his console and so he gets this call. And of course he knows it's simulated because he's wearing these EKG wires, but immediately they watch his heart rates skyrocket to about 200 beats per minute, which just instantly, just this very visceral response. I don't think it takes being a cop to have that same physical response. We're still having the same neural architecture, right?

Mike Julian: Man, my last job that I was on call, when I left the job I was waking up in the middle of the night to nothing. The phone was not going off but I thought it was going off. And inevitably like an hour after I woke up, it would go off. So that really messes with your head.

Thai Wood: Yep, absolutely.

Mike Julian: Here's a very important question, what's your on-call ringtone?

Thai Wood: So for the moment, as we are speaking this right now, I do not have one because I am not currently in rotation for the moment. Typically, I try to change mine sometimes. It's just a thing that I experiment with.

Mike Julian: That's a good idea.

Thai Wood: Yeah, just so I don't habituate to one too badly. It's just depends on what I'm shooting for. There were periods where if I was concerned about missing it because if I was spending time with family or I was traveling somewhere where I was away from my routines and I would want that noise to catch me and help me get grounded in that, I might use fall back on something normal, usually just a series of beeps, not too loud. But otherwise, I might just pick something random. I think at one point, it was Vivaldi spring.

Mike Julian: I must have really screwed myself because I was using Strong Bad's the system is down.

Thai Wood: I will admit that I can't help but think of that in my head oftentimes.

Mike Julian: Yeah, I totally see that it was a bad idea in hindsight. So EMS, on-call, as soon as we talk about EMS, it's hard to separate that whole idea with on-call but there's another aspect to it that I find really interesting, which is the incident management portion. I kind of alluded to when I started laying out the scenario I think of when I think EMS. But there are well defined roles in EMS, like when someone responds to an incident, there are certain people doing certain things and they know why they're doing those things and what's expected of them. And this is incident command. I know you and I have discussed this many times over coffee and drinks. But there's something there about a standard incident command structure that you and I spoke of. Could you tell us more about that?

Thai Wood: Yeah, absolutely. So, nowadays, in especially like a post 9/11 world, incident command in most emergency services typically references The National Incident Management System, which uses ICS, the INCIDENT COMMAND SYSTEM, which is this whole thing set up by FEMA. It started in like the 70s because they were having trouble of managing a bunch of fires. Well, you know, it's tough to manage this stuff if you don't have everyone on the same page. Who would have thought?

Mike Julian: Right.

Thai Wood: So eventually it became this national standard of how do different agencies work together? What are the structures they form? One of the interesting things about it is that in EMS you probably aren't thinking about it day to day. You're not showing up to a car accident and going, “As I get out of the ambulance, I am now the incident commander.” You know that if you were to work with a larger group or another agency that you would become the system, but there is this established role both between you and your partner or you and the other responders, you and, as it grows, other agencies, that I think is really helpful, because you know that for the most part, people are going to want this information from you or this is your role. And that plays down all the way even into, for the most part, who is the one, if you're rolling, someone who is the one that counts off? Who is the one that decides and says 123? It goes all the way down so that everyone can know what to expect and that provides this common ground of people to be able to work together.

Mike Julian: Okay. So, that sounds like it's really only useful at like large teams though, like when there's a lot of people moving. Is there value in that when say my team's three people?

Thai Wood: There definitely is, I'm glad you asked that, because even though we're not keeping it always in the front of our head again, I'm not getting out saying, “I'm this,” you are that person. Even the way the system is defined, you actually are essentially instantiated an incidence command by responding to the scene and knowing that and knowing who's maybe first on scene and is acting in that role can help I'd say even as early as two or three people. I mean, you already know between you and your partner.

But if a cop rolls up maybe from the next county, they already have some idea of probably how to insert themselves into the situation. They have at least some notion of what they might ask of you and what they might not, knowing that you are a form of incident commander or the first person on scene, they might avoid asking you, “Hey, can you go fill out this paperwork?” “No, I'm busy.” There's a lot of things that they're able to just skip over because of that. And I think just having the role as well is valuable to responders themselves because especially in software, I think that people get put on-call with all sorts of diverse backgrounds. And very rarely is there more code that they could have learned or something to make them better responders, right?

Mike Julian: Yeah.

Thai Wood: Being a better software engineer, being a better Ops person doesn't always make you a better responder. But learning about and then participating in incident management structure can help make you a better responder and feel more prepared. I think there's a gap between when people, they're great at their jobs, they're good at their code. They're operating the infrastructure. But that doesn't always translate to, “I know what to do when the pager goes off.”

Mike Julian: Yeah. Let's dig into that a bit. So, say I'm an Ops engineer on a small team. I've got three or four people on my team. My team is pretty good at doing Ops. But incidents are challenging. How can we get better? What are some concrete steps we can take?

Thai Wood: So, the number one thing I recommend to everyone is just try to stay calm. It will be difficult, and that's okay. It's not a personal failure or anything like that. Just try to stay calm. And as we used to say in EMS, don't become part of the emergency.

Mike Julian: I get what you're saying.

Thai Wood: Yeah. If you're running around all over the place, at least in the physical world, and I'm speeding to an accident scene and I get hurt, well, now they have to send two people. It makes you less effective. So I'd say just take a moment, try to remain calm. That's not easy for everyone. That ability differs through a lot of people, and that's okay. It's like the number one thing I always tell people. It's also important for you to be able to have that space to actually be effective. After that, I would say just defining some of those roles that we've talked about, incident command and what does that mean to your team. You don't always have to follow this big federal guideline, but having something that you and your small team agree on in advance, so that in the moment everyone knows their role instead of trying to figure it out at the worst possible time.

Mike Julian: Yeah. In fact, I would say trying to adopt the federal guidelines in a small team or even right out of the gate, no matter the size of your team, would probably be detrimental to your entire effort, trying to take on so much all at once. It's all new. So you're going to screw it up, and that's expected and that's fine. But to say, we now have 10 different roles. And by the way, we've never formalized incident management here before, I'm sure that's going to go over great. How would you expect that to go while we were running software?

Thai Wood: Yeah, absolutely.

Mike Julian: It seems to me that you should probably start with a couple roles. And to me that the thing that I found the most valuable and I would love to hear your feedback on it, is the first role I've always wanted to implement anywhere I start doing formalized incident management is communicate liaison. I don't care about anything else except for that, because that frees up people to work on the incident and defines who is doing the communication.

Thai Wood: Yeah, that's really helpful. Knowing who's doing communication and what communication is expected, I found, as you have, very effective. Also, asking that question allows you to go, “Wait, why are we giving updates to our CFO when we're maybe not able to train her or not able to do anything in this moment or why is he or she asking for updates every five minutes when this is a 15 minute process?” Asking some of these questions about roles also helps you reconsider what your knee jerk response might be. Well, people are popping into the channel, I'm just going to answer my status. Saying “Well, if we have someone in charge of communication, not only are they answering it but then we get to define what is it that they answer? What communication is it that they provide?”

Mike Julian: Yeah, absolutely. So when you start overhauling incident management, is there a different role that you'd like to go for first or do you also fall in line, go for communications?

Thai Wood: Usually, I like to set up some form of incident commander or a notion of someone being first on scene, and then followed by communication. I typically find that in addition to having a role, the practice of communication and defining how it is that we, if I'm working with them, how we as a team, communicate. And often that's things like closed loop communication. If I tell you something, then I'm going to expect that maybe you're going to repeat it back to make sure we're on the same page, or you're going to acknowledge it. And then that way, I know that if you don't, you didn't hear me, you're very focused doing the thing you're doing and as you should be. But just techniques and frameworks like that, in addition to the roles, really with an incident command role, and deciding who's doing communication, and then working on how it is that you communicate, I think gets a lot of teams a huge leap forward from where they start.

Mike Julian: Yeah. One of the big worries or anxieties I've seen people have with the incident commander role is the first on scene is incident commander, but that doesn't mean you're always incident commander. Someone else could come on scene and you can hand that off, and that's fine. So the idea that if I'm on call, and I'm the first ops person to respond to an incident, and now I'm running the incident, what if I'm a junior engineer? I'm kind of terrified of that whole thing that I'm directing these people who are way better than I am? But that's actually fine because the role of the incident commander is not to be the best at solving the problem, it's to understand who's doing what, and make sure that everyone's on track.

Thai Wood: Absolutely. I was on call for a very big E-commerce organization, you drive past their stores, over the holiday, and that was something that I've noticed in this area as well is just having someone to coordinate. I mean, in this case, I think we capped just over 50 people on a particular bridge.

Mike Julian: That's a big bridge.

Thai Wood: But yeah, that scale, there's a lot of things that really don't have to do with your seniority. Jane says, “I'm going to go investigate this thing.” And then 10 minutes later, no one's heard from Jane. Is she still on the bridge? Did she get disconnected? Is she making great headway? Is she seeing amazing things that might be revealing to us? Having someone, again, managing that incident is able to say, “Hey, Jane, can you go look at that, please, and come back to me in 10 minutes, or give us an update in 10 minutes, and then we'll go from there.” Or, someone might pipe up and say, “I'm actually really stuck here.” Well, with a group of 50, you tend to get the bystander effect, which is that in large groups, people don't individually tend to take action. Having an incident commander role allows that person to overcome that a bit and say, “John, I heard that you're stuck on this. What is it that you need from the group?” And then that can help bridge some of that gap.

Mike Julian: Right. So one of the things that … So I wrote about a little bit about incident command in my book, and the thing that I found most valuable for determining who is an incident commander in small teams when they're first adopting this sort of stuff, is to intentionally not allow managers to be incident commanders, because then you end up with this blurry lines of like, is this the manager telling me to do this or else or is it because they're in their role of incident commander? So actually I really like having managers like a manager of a team be communication liaison. I mean, managers are really good at communication generally, and they know all the players are ready and can more politically massage a hard message. But having them be incident commander, it sets up some weird incentives to me.

Thai Wood: That's interesting. I think it can, depending on the team culture for sure. I do find that some managers are much better at being that communications role than anyone else tends to be, and at least off the bat without more training. I do like it when even if it's in a simulated incident, a war game, a tabletop scenario of some sort, that managers do participate in incident command, at least in those areas, just so that they can retain appreciation of what the job is.

Mike Julian: Yeah. Agreed. So, I want to completely switch gears here. You're the editor of the Resilience Roundup, which I keep calling Resilience Weekly, but it's called Resilience Roundup, which is what, like resilienceroundup.com?

Thai Wood: Yes.

Mike Julian: Yeah, there you go, resilienceroundup.com, fantastic newsletter. What I love about it is that it's not just a link roundup. It's actually significantly more than that. You have a unique take on things. Tell me more about that.

Thai Wood: So I started this just by talking to some folks. And I've already, prior to this, started seeing a big overlap in a lot of different fields of like my past experience, and then learning somewhat about how NASA does things, how pilots do things and seeing a lot of overlap in things that we could learn in software. I had a chance to go to Paul and Mary;s really great REdeploy Conference and talk to a lot of people there and hear their different takes on resilience. And there were so many resources. I walked away with a lot of people saying, “You should read this, or you should read this, you should read this.” And I think I filled up like half of one of those little notebooks of not even just notes, but just title and author, title and author, title and author all the way down. And I got home and I went through these, and some of them were 500 page books. Some of them were 30 page papers and man, this is just a huge kind of uphill battle to get through some of this.

Thai Wood: And so I had this thought that well, if I'm going to do it, why don't I share it so that maybe the next person doesn't have to? But also I don't want to keep them from forming their own conclusions. So that shaped the format where I will try to give you a good not really summary, but my take on it, how I think it can be useful, what I see is some actionable takeaways in about 10 minutes' reading time. But if you want to, I purposely pick articles that are accessible not behind a paywall, if you want to dig into that 20, 30, 40 page paper, you absolutely can. But it all started with this idea of if I'm going to read this and I'm going to form these conclusions and I'm going to do this, I really want to be able to help others do it as well.

Mike Julian: Yeah. The prospect of reading a couple dozen academic papers on resiliency or incident command or any of these things, it's just daunting to me. But I've read a few, and there's some fantastic stuff to be had from it, but it's just so hard to pull it out. And then once you do get it all pulled out and you get concise points, then you have to figure out how am I actually going to apply this to ops and software. So I'm really glad that someone's doing it. Resilience lately has kind of felt like the new buzzword. It seems like resilience is the new reliability. But I think that's wrong. That doesn't feel correct at least. I know you have a lot to say about that topic since you are now reading so many papers, all the papers. So resilience, resilience at a conceptual level, what is it?

Thai Wood: So I think you're right that, at least in language, it does seem to be trending toward a lot of the processes that have made other things buzzwords. But as a concept, resilience engineering comes from a bunch of disciplines, like human factors research and cognitive systems engineering and is sort of this label that had been developed partly from looking at biological ecosystems and all these different things. And has essentially come to … I mean, what most people mean when they talk about it is looking at systems and where they can continue to adapt and respond. So not just say, we might think of reliability as my engine in my car, those pistons are going to go for 300,000 miles if we're lucky, and they're very reliable. But if you were maybe to put them in a different case, that is not the normal operation envelope, how does it respond? And I think the term for this in the research tends to be adaptive capacity. Does it still retain some capacity to be able to respond to whatever the different situation is?

Mike Julian: I've been trying to figure out where John Allspaw got his company name from and well, there you go, adaptive capacity labs. I had no idea. So, it is pretty buzzwordy at the moment. Hopefully that will improve in the coming years, but we'll see because observability is at about the same point. I think perhaps really interesting is observability being from like 1960s control theory. Resiliency is also about as old as, we've got research going back to the 60s if not earlier.

Thai Wood: Yeah, absolutely. It wasn't always called that at the time but as different disciplines developed and started to see overlap, again, biology is a big point in this. David Woods has a great paper. David Woods has a lot of great papers, but in particular, he has one that he put out recently, and in it, he talks about, a lot about biological systems, and how some of our ideas of system performance are drawn from and reflected in biology. So as a result, yeah, some of this research is pretty old, and it's still relevant, whether that's accident research or certain things about human cognition. We're not changing as humans that quickly even though, the tools that we're interacting with are potentially changing very quickly, we as the operators are not really. We're still facing those same human either limitations or … The good things about being human is that we can have these intuition and, and adapt to these scenarios.

Mike Julian: So let's talk about resilience. Why is the ops in the software world suddenly talking so much about this? What's the point of it? Why is it suddenly so interesting or so valuable?

Thai Wood: So I think it is most valuable primarily because of the question that it asks about adaptive capacity, which is that, a machine just sitting there in Iraq, blinking at you, itself does not have adaptive capacity. Looking at the world of ops and software and incident response through this lens of resilience helps us realize that people are often still and always have been very key into how these things work. Whether or not the systems fail, and how often and whether or not we're able to support them. Humans are the key in that in this industry where we, I think, are kind of seeing a pushback from an era of saying, “Well, we'll just automate everything away. We'll just take people out of the loop. And eventually AI, we'll just fix it.” I think that this is kind of a natural pushback as well to a feeling that a lot of us have experienced and like, wait, that's not really how it's working. And the research has actually looked at this and start to support it for at least a couple of decades that actually adding automation makes things harder on humans.

And I think we've just reached a point with such complex systems being so accessible because at certain points in time, we wouldn't have the large number of complex systems that we have, at least from an internet point of view. So we're building more and more complex systems. And there was a period where we're seeing more and more, we'll just automate things and it'll be fine. And so I think that culmination that is this inclination to say, “Well wait, how can we keep having this ability to adapt? How can we encourage it? How can we find it? How can we learn more about it?” And fortunately, these researchers have been trying to answer this question for decades and looking at pilots, NASA, firefighters, all these different things and just using this window, wherever they can find it to try and extract these things for us.

Mike Julian: I saw a take on this a while back, just a short quip that complex systems are working because of the humans not in spite of the humans.

Thai Wood: Yes, absolutely. And the research does, for the most part, bear that out. There is no amount of automation, at least as we speak today that can really fix these problems. As often gets quoted as well is that, it's not surprising when it fails, it's surprising that it works at all. And that's because of, as we all know, of the people behind the scenes just day in day out doing their normal work in kind of the trenches, keeping all these things running.

Mike Julian: I think that the quote you just referenced there came from a fantastic paper from … I first read it in the Web Ops book, years ago, How Complex Systems Fail, that was what it's called. I forget the author. Who wrote that? Do you know?

Thai Wood: Yes, and I really think everyone should read it.

Mike Julian: It is a wonderful paper. It's available freely online.

Thai Wood: It is. It is a short paper even though it's been printed books by Richard Cook, I strongly recommend most of his research, but-

Mike Julian: I mean, it's like six pages long. It's pretty quick read.

Thai Wood: Yeah, How Complex Systems Fail is just a list format. So it's a really great intro to a lot of this stuff. And I think as ops and software people, you can't get through very far of that without nodding your head. I don't think I've ever seen a single person who works in these areas who doesn't read this and you either hear, “Ah, yes,” or you just see them nod their head.

Mike Julian: Yeah. Where I first learned about it, where I first read it, it was in John Allspaw, web ops, web ops something, I forget what the book title was. But web ops is a really great book and it's pretty old now but surprisingly, is still applicable. And one of the thing is the only paper in there was Richard Cook's paper. And yeah, every time I go through it I'm like, “Yep, that's software in a nutshell.”

Thai Wood: Yep. It's actually printed out just to my right over in my office as a reminder that I do revisit some times. It's just such a great summary. And I think it's easy to forget some of the points as we focus on these different areas that I just have it out to remind myself.

Mike Julian: So we often see resiliency talked about in like right alongside chaos engineering. What's all that about? Why are the two going hand in hand in conversations?

Thai Wood: So I think most of that is because I see it, at least as chaos engineering is built on a lot of the same principles of resilience. Chaos engineering is, and their tools are a kind of response to the things that resilience engineering is teaching us or is … It's kind of a subset of that same world? And because of-

Mike Julian: So we could call it like applied resiliency?

Thai Wood: Right. Absolutely. So, if the research tells us that we are able to build nowadays, systems that are so complex that we cannot predict all the system interactions, we can look at individual components in a system and try to assess if they're safe. But that doesn't prevent us from having system accidents where there are interactions between components we could not have predicted. And the systems we're building are so complex, the answer isn't to get better at predicting, because we can't, so I think chaos engineering is an answer to that. Well, if we can't predict it, what if we just cause it and watch what happens? Now we don't have to predict it.

I love that take on it. I never really considered it that way, but you're completely right. Well, Thai, it's been fantastic talking with you. Where can people find out more about you and your work?

Thai Wood: As you said, resilienceroundup.com, I have issue articles there from the past. People can sign up, every Monday I'll send you something to read in this area.

Mike Julian: All right. Well, thank you so much. And to all our listeners, thank you for listening to the Real World DevOps Podcast. If you want to stay up to date on latest episodes, you can find us at realworlddevops.com and on iTunes, Google Play or wherever it is you get your podcast. I'll see you in the next episode.

Want to sponsor the podcast? Send me an email.

2019 Duckbill Group, LLC