Probably the most common question I received when I told people I was writing a book about monitoring was, “Have you read James Turnbull’s book?”
I’m putting that to rest with a delightful conversation with James Turnbull on a variety of topics, including which of his own books is his favorite, some not-so-subtle digs at Kubernetes, and why James thinks DevOps is dead.
About James Turnbull
James Turnbull is originally from Australia but now lives in Brooklyn, NY. He likes wine, food, and cooking (in that order) and tattoos, books, and cats (in no particular order).
He is a CTO in residence and lead startup advocacy at Microsoft. Prior to Microsoft, he was the founding CTO at Empatico. Before that, James was CTO at Kickstarter, VP of Engineering at Venmo, and in leadership roles at Docker and Puppet. He also had a long career in enterprise, working in banking, biotech, and e-commerce. James also chairs the O'Reilly Velocity conference series. In lieu of sleep, James has written eleven technical books, largely on infrastructure topics.
Mike Julian: This is the real world DevOps podcast and I'm your host Mike Julian. I'm setting out to meet the most interesting people doing awesome work in the world of DevOps from the creators of your favorite tools to the organizers of amazing conferences or the authors of great books to fantastic public speakers. I want to introduce you to the most interesting people I can find.
This episode is sponsored by the lovely folks at InfluxData. If you're listening to this podcast, you're probably also interested in better monitoring tools and that's where Influx comes in. Personally, I'm a huge fan of their products and I often recommend them to my own clients. You're probably familiar with their time series database InfluxDB, but you may not be as familiar with their other tools. Telegraf for metrics collection from systems, Chronograf for visualization and capacitor for realtime streaming. All of these are available as open source and as a hosted SaaS solution. You can check all of it out at influxdata.com. My thanks to InfluxData for helping make this podcast possible.
Hi folks I'm Mike Julian, your host for the Real World DevOps podcast. My guest this week is James Turnbull. You probably know James from his seeming inability to stop writing technical books such as Monitoring with Prometheus, The Art of Monitoring, The Terraform book and like a bajillion others. He has also worked for some pretty neat companies too, like Puppet, Kickstarter and Venmo and now he works at Microsoft leading a team as CTO-in-residence. Welcome to the show, James.
James Turnbull: Hi Mike.
Mike Julian: I'm really curious like what is a CTO in residence?
James Turnbull: I guess my primary mission is to make Microsoft relevant to start ups much the same way that Microsoft is shaping its relevancy towards the open source community. We're also interested in looking at other audiences that we've traditionally not been involved with and so it's just one of those.
Mike Julian: Gotcha. And you're just leading a team of people that are focused on that sort of stuff?
James Turnbull: Yeah, so most of my team is people who've come from startups or and particularly from engineering management leadership roles in startups. One of my colleagues is ... was the CTO of SwiftKey and another one, fairly famous, Duncan Davidson who wrote Tomcat and Ant and has been around engineering management for a long time — and folks like that who really are here to help sort of startups understand a bit more about how to grow and scale. And I think some of the big challenges startups have are actually not technology related at all. They're really about, you know, how do I build a recruiting process? You know, I had 10 engineers last week, I have 100 this week. How do we structure the team? So we've sort of brought together a group of folks who have fairly deep experience in those sort of problems for startups and have sort of a deep empathy for the startup community.
Mike Julian: Yeah that's quite the task ahead of you.
James Turnbull: Yeah. Look, I think, I mean Microsoft traditionally been known as an enterprise software company. You know, a lot of startups are not sure of their relevance to us. I think increasingly we're seeing traction to cover is one is that obviously Azure is one of our focuses and the cloud platform in there. And that platform is looking more broadly at not just enterprise audiences but other groups. And secondly, a lot of startups ... Microsoft's deep in the middle of most of their customers. So particularly if you're a Beta based startups, something like that and you're, you know, you're trying to sell into enterprise business, Microsoft has been doing that for 30 years. They have all the connections and account managers and sales folks and you know, multimillion dollar relationships with some of the people you want to be customers with. We can provide you with A, some of those connections, but also a lot of advice and expertise about how to sell to those customers.
James Turnbull: And having worked at both Empatico and Docker, you know, a large part of my job was attempting to sell, you know, as a small startup, as a fairly early employee at both into big companies. You know, you can't walk in the door to Wall Street financial if you're a 30 person start up in Portland, Oregon without having a pretty credible story. So I'm happy to sort of help startups and I do some of that messaging and understand how to have some of those conversations.
Mike Julian: Yeah. It's one of the interesting things about my own company is my clients are all these large companies too and I'm a two person company, but selling into a very large company is not ... it's nothing like selling it to a small company. Everything works differently. People think about their jobs differently.
James Turnbull: Yeah.
Mike Julian: Yeah. I think maybe my most favorite thing of everything you just said is this isn't your daddy's Microsoft. The Microsoft we all grew to know and hate is not today's Microsoft at all. Not by a stretch and that's just absolutely incredible to see that turnaround.
James Turnbull: Yeah. I got a LinkedIn request from somebody yesterday and the message said, you know, I've read a bunch of your books and you know, I've used a bunch of different sorts things you've worked on. I was really surprised to see you at Microsoft. And I was like, okay, this could end badly the next couple of sentences, because I've certainly had a few people of my generation who remember the bad old days and “Linux is a cancer” and things like that. And he finished with, you know, it's really interesting to see companies grow and change. And I was like, wow, okay, that's a ... I thought that was going to go really badly but I think it's a fairly accurate reflection.
Microsoft is aware of the fact that this is not a position that pragmatically that was not a good business position to be in. The world is changing. It's moving towards the cloud, you know, the stacks in people's companies, the way they manage things, the infrastructure, the software, you know, things are changing. And I hesitate to say this conclusively, but I think open source won, you know this for certain values of one, given recent sort of events, discussions about large corporates and their contribution to open source, but as a technology choice, it's pretty clear to me that open source won. And I'm kind of a bit smug about that to be honest.
Mike Julian: Speaking of your books I would just straight up say you're the one that got me into monitoring and you kind of did this unknowingly, like we hadn't met until a couple of years ago, but in 2006 I guess it was, you released a book called Pro Nagios 2.0 and at the time I was working for a very small private school and someone said, hey, we've got like these couple hundred printers and they keep going offline and like, you know, we should probably know when that happens. So I'm like, well I don't know how to solve this problem. So I started googling around and find this thing, Nagios and then find, oh hey, there's a book on it. So I bought the book and like that ... I learned about monitoring that day. Like that's ... Pro Nagios 2.0 is what got me into that and this whole time like it kind of started my career, which was really cool. So thank you for that.
James Turnbull: You're welcome. As I said to you before we started I feel like apologizing too because I can't remember anything that's in the book and I think that it's probably acting as a monitor stand for a lot of folks and I'm pretty sure that my ideas about monitoring we're very embryonic but I really appreciate that. That's always exciting to hear when someone actually is like this was really helpful. Because none of us are ever going to be John Grisham right? We aren't in this for the money and those conversations where someone reached out and said, I read your book, it really helps or even if I read your book, it didn't help and here's why I'm like, that's awesome. I'm glad somebody got, you know, got something out of it and had some feedback. And so yeah, if anyone who's listening, who has ever had the urge to tell me what's wrong with things or what went well in the book, my email is easy to find, feel free to shoot me an email. Always happy to chat.
Mike Julian: Yeah, absolutely. Like one of the things that authors don't get very often is feedback, positive or negative. I really expected to get a lot of hate mail for the stuff I wrote in my book and it just didn't happen. I was very disappointed.
James Turnbull: I actually think that ... I was thinking about your book the other day and I, when we exchanged some emails and I think your timing was very good. I think people are waking up to the fact that monitoring was evolving and I think that what you had to say was not only very timely, but for me it solidified a bunch of different ideas that I had that I'd previously sort of, you know, I could talk about in abstract or you know, in the solid sort of way. I think the first couple I would strongly recommend people should read the first couple of chapters of your book.
Mike Julian: Those were my favorite to write.
James Turnbull: Yeah, they are one of the better summaries I've read. I guess more modern monitoring.
Mike Julian: Well, thank you for that.
James Turnbull: Yeah well it's a topic that I think you, I and about 200 people care about but we all do.
Mike Julian: When I was ... I'd tell people like hey I am writing this book on monitoring their first response was almost invariably, have you read The Art of Monitoring?
James Turnbull: Oh dear.
Mike Julian: Do you really think I would set out to write a book without having read every book on monitoring there is? Like that I was somehow unaware of one of the most popular books out there, so I'm like, you know what? I'm going to have him ... I'm going to have James do a blurb on the back of the book and solve that problem forever.
James Turnbull: That was a good plan. I looked ... was looking through my Amazon history the other day when I was doing my taxes and I can tell when I'm working on a new topic cause I have literally bought every single book on that topic. Not just writing a book but I decided the other day that I should ... I gave Rust a stab last year and I didn't get a chance to do anything with it and I thought I'd give it a stab again and I thought oh I'll buy a couple of books and see what they're like and so I can see the pattern of here's some Rust books and then a few years ago here's some go books and a few years before that here's every book about monitoring, which is not actually a large portfolio but there's enough around.
Mike Julian: It is much smaller than people would think.
James Turnbull: I bought a bunch of books on around the same time it's like on systems theory and stuff like that because I was struggling to find adequate ways to talk about monitoring as a construct and I realized that the maturity of the vast majority of the conversations out there, you know those on any technical topic, there's sort of like I look at the very bottom of the pile Hacker News post comment thread on there is like the worst case scenario and then there's like a few Stack Overflow answers and then this may be a detailed blog post that's going to explain how to use something and then maybe there's a ... somebody having an opinion about design or the language aspect of some language and then there's like a computer sciency like somebody's thought about things, document and monitoring is very heavily stuck at the blog post end of that spectrum.
Mike Julian: Yeah, I completely agree. Like when I was trying to find higher level thoughts on it, they're just not there. The conversation, I think the level of conversation has started to shift in the past couple of years and that's awesome. Like, I really want to update my book now because of all the stuff around observability coming out has changed the conversation dramatically. One of the interesting things is I never even used the word observability anywhere in my book.
Mike Julian: Like it just wasn't ... people didn't ... people weren't talking about that way so I didn't talk about it that way either.
James Turnbull: Yeah I was having this conversation with Darren Schwartz who makes some software for database observability, database monitoring and Darren is super smart and very much more computer sciencey person than I am. I realized that there were a bunch of stuff in there ... stuff you'd hear the way of his thinking that you know, he was one of the handful of people that had taken commentary about monitoring and observability further than just, you know that scratched the surface sort of thing. And he's not a person who ... I thoroughly recommend there's a couple of short things he's written and his blog post that are really interesting sort of reading from ... as I learn stuff that that that was sort of more high level and useful and and over arching than I than I had previously seen.
Mike Julian: So I want to shift topics a little bit. You and I were talking before we started recording about DevOps and I will start off with a very provocative statement. DevOps is dead. What do you think?
James Turnbull: I think I agree. I was involved in very early days. I was trying to look at it before when I wrote my first blog post on DevOps and it, I think it's like 2008 or 2009 and I think I went to the second DevOps days. I didn't go to the first. I think probably, and I take some responsibility for this because I worked for a company that's sold a DevOps tool, but I think the first time a marketing person described A, categorized DevOps as being about tools and B, used it as a somewhat abstract rallying, cry, marketing rallying cry rather than a cultural statement. That's when the first knife was sort of stuck into the entity as it were and I think, yeah, I think I would agree now.
Mike Julian: Tell me more about that. Like what do you mean by that knife going in? Like is it really that marketing killed DevOps?
James Turnbull: I mean I'm being honest here on marketing now I think it's probably a factor. To me, DevOps was almost nothing about tools, tools to me were enablers for folks doing DevOps things. To me, the big thing about DevOps and the thing that really struck me when I first started thinking about the concept is that I've been doing engineering things for 25 years now. I feel really old and a significant part of the scars that I have as far as those experiences are being on eight ... one of the ... of either sides of the conversation being the developer of a bit of software or the operator bit of software where I've been in conflict with the other party because you know, we either didn't talk about how they built it or they don't understand ... they didn't understand that the environment that they were deploying it into.
And most of those conversations happened at three o'clock in the morning on a conference call where a vendor is screaming at us because some mission critical piece of infrastructure is down and they're losing money. And to me that was ... that's been a really ... that was a really scarring experience. And to me, DevOps was about solving that problem. It was about having conversations with the people we work with and going, you're building this thing, here's an idea of what it looks like in production. You know, and by the way, can we make sure that we do this, this, and this to ensure that we care about security and monitoring and you know, backup and recovery or whatever it happens to be and you know, create that sort of bridge between those two disciplines in which really hasn't existed for most of my career.
Mike Julian: Yeah, completely agree with all that. What do you think about SRE? Like has that changed things to? To me I feel like SRE is also kind of a marketing label.
James Turnbull: Yeah. I mean I know a lot of people out at the Google SRE organization and I deeply respect the work that they've done. Yeah and it's definitely ... there are definitely a lot of solid ideas in like the SRE book is an example of ... I actually ... it was very ... responses to the SRE book were very polarizing, let's put it that way. It was very … I quite liked ...
Mike Julian: That's a very polite way of putting it.
James Turnbull: I quite liked it and I thought it was really useful. What I'm really sad about was that it wasn't published in 2005.
Mike Julian: Yeah.
James Turnbull: When it would have been actually life changing to a bunch of people. A bunch of us who worked in the high volume, high value web facing world. It's a solid ... the SRE program at Google is a solid platform. Not everything applies to everybody, you know, the classic refrain of, you know, you're not Google. I think that needs to be reemphasized a few times. Not everybody has Borg. Like there's definitely a flavor to it, but I can't deny the fact that a part of the reason it was released was definitely as a marketing aide to Google's recruiting in the SRE organization and there's nothing fundamentally wrong with that but, it needs to be acknowledged as one of the origins of that movement.
Mike Julian: I observed a conversation happened recently in a Slack where someone released a series of articles, fantastic articles, and they were referring to an important measurement as a KPI and someone responded just like, why didn't you call it an SLI? It's like, well, because an SLI is something that essentially Google came up with. We've been using the term KPI to meet an important metric for, I don't know, decades and the term SLI is less than 10 years old.
James Turnbull: Yeah. I mean but, you know, this is one of those things like the ... every generation reinvents the past right? You know, I'm going to say something controversial here. I look at the way Kubernetes is configured and I look at the sea of YAML files that I'm expected to poke my way through. There's still some tooling around that and I'm like, did we learn nothing from the horror that was configuration files. I mean it could be worse. I was having the argument that other day, it could be worse, it could be XML, but I'm like, so I could be stabbed in the front and the back. But it feels very, ... it feels weird to me that we, you know, there's a bunch of lessons we haven't learnt and a bunch of things that we have reinvented the wheel about.
So, you know, I kind of ... I'm not really fussed about the terminology people use. I'm not even fussed about sort of recognizing that there's a past history there except to hopefully learn from it as long as people take it on board and go, you know, in the case of SLI's and KPI's, it's like you have a customer, they have a measure of how successful they are, you know, that should mirror your measurement of how your ... the functionality of your infrastructure or the thing you look after for them. And if they have that sentiment, I don't care what they call it, you know, an SLA and SLI or a KPI. Yeah, I think debating about that is a funny one.
Mike Julian: So you mentioned that the sea of configuration files and Kubernetes, which is absolutely wild. Like yes, it's like we didn't learn anything at all. That brings me to do we even need DevOps anymore? Like on one hand we're still making the mistakes that we used to make, on the other things are very different than they used to be.
James Turnbull: Yeah. I think that there is no yes or no answer to that statement. I think it's a bit more nuanced. Obviously there's a bunch of things we haven't learnt and I, you know, I overheard a conversation at a Kubecon a couple of years ago where two fairly ... I would say 20 something looking engineers are talking about the fact that we'd be so much easier if there was some sort of templating system for configuration. It would make so much easier if we could build templates and stuff. And I was ... I have no hair anymore, so I wasn't pulling my hair out, but I was mentally doing it. I thought don't say anything James, you'll look like an old fart. Like just turn around and walk away, go to the bar, have a quiet drink. But, so yeah, definitely we need to ... we should learn from the things that came before us to make the experience of the people maintaining these systems at least as good as if not better than the experiences we had.
But that being said, Kubernetes is an example of how far up the stack we've moved. You know, back in the day I spent a lot of time worrying about Linux kernel modules and package management system and IP tables and stuff like that. To a large extent those are not skills that are relevant to a lot of contemporary engineers who are working on say container based systems because that's all black box to them. It's taken care under the covers for good or ill, you know that they're running a Q cluster on top of a machine that they may not even maintain or may not even know anything about. So perhaps some of those, the problems we had in the past might not exist anymore perhaps. I don't know it ... never been a huge fan of black boxes either so.
Mike Julian: It seems to me that we've ... we have moved some problems around like some of the problems are still there, we just don't see them anymore or like we've made them someone else's responsibility. Like when say Amazon or Azure, pick your cloud provider of choice. I don't have to care about the network anymore except I kind of do, but it's entire black box to me so when something is kind of hinky, I can't really do anything about it anyways.
James Turnbull: Yeah.
Mike Julian: So there's that whole discipline of network engineering that has ... where a lot of systems people were also amateur network engineers are not anymore.
James Turnbull: Yeah, I think there ... I mean the argument the cloud providers make and I think it's a reasonable one is that economies of scale apply not just a cost they apply to you know, stability and availability and you know, the ... in the vast majority of cases the 80/20 rule applies and you don't need to care about the fabric between your infrastructure. In the cases where you do like, let's say I'm a high frequency trader or something like that where I care about every pico-second between me and the pipe out the building and me and the trading floor. Yeah maybe you're not running in the cloud, right? Maybe you're running on, you know, custom built high-performance machines with incredibly tuned kernels and network stacks. Does everybody else, you know, need that? Probably not, but you're right, it does make debugging more complex and potentially problematic.
Mike Julian: So I think all that is pretty interesting. And there's also something you mentioned before we started recording about Puppet, Chef, config management in general, significantly less relevant than it used to be. I remember that my last full time job, most of the work I did was writing Chef and Puppet like it was a whole lot of orchestration and config management of like how do we build a system and now like as far as I could tell, no one's really caring about that anymore. Like we have ... the problems of moved further up stack.
James Turnbull: Yeah I agree and I've been saying for a few years and I think if I look at some of the work that's come out of the some companies and you know, HashiCorp too to some extent. The important thing about configuration management is the lessons learnt about configuration management.
Mike Julian: You know, right.
James Turnbull: And the fact that the abstraction has moved up the technology stack should be like, you know, how do we apply the lessons we learnt managing infrastructure level components and managing application and service level components. Orchestration is not a solved problem by any stretch of the imagination and things like microservices make things considerably more complex.
Mike Julian: Yep.
James Turnbull: You know obviously they're very flexible in many ways but you know all of a sudden you have 300 little services that talk to each other via various ports and require different levels of security and AAA, you know with the required different pieces of configuration like this is a non trivial problem and guess what, we've actually solved some of these non trivial problems before why some of these companies hopefully will reinvent themselves to be in that space and I see a little bit of that happening now. We'll just see who survives I guess.
Mike Julian: Right. I have a few last questions for you. Of the bajillion books you've written, which ones have been your favorite? Like what was the most interesting one to write?
James Turnbull: I think it's probably the art of monitoring. There's a lot wrong with that book and a good amount ...
Mike Julian: As always.
James Turnbull: I went into an obsessively deep hole and I wrote a 700 page book, which is very focused on technology, using technology stack to articulate what is effectively a change in thinking. And I did that because I ... everyone had a conversation with, I was like, who's going to buy a book about the theory of monitoring, but people might buy a book that has like configuration files and technology and shows you how to do things, maybe that'll work better. And of course I realized that I would have written a much shorter book, and probably not have spent a year and a half of my life buried in complex configurations if I hadn't have written a theory of monitoring book and it might've been quite timely.
So yeah, I think it's probably my favorite book, but it's also my least favorite one too. And there's definitely ... there were some terrible ... I brushed over some topics that I probably should have covered in more detail and there's a couple of ... I recently found a calculation error in one of my graphs that are ... a Russian PhD student pointed out to me and I was like, huh. It's not a big miscalculation, but it's enough that I felt embarrassed and I went back to this guy with I'd done a calculation wrong and our formula wrong. And, but yeah, so there's moments where I'm like, how many people saw that and thought, what an idiot. So that's and that's never good.
Mike Julian: I had that happen recently with mine. My book is being translated into Japanese right now, which is super cool and the Japanese translators are thorough.
James Turnbull: Okay.
Mike Julian: They have found so many errors and so many like typos, but some of them are are calculation errors. They don't fundamentally change what I'm talking about or the illustration, but it's one of those like, Huh, I really did screw up an average calculation.
James Turnbull: Yeah, I did that too… but yes this Russian PhD student, he was hilarious. He's like, I just don't understand how you got this number. And it was ... there's no spreadsheet. There's no our formulas in there. There's just like a graph and he's obviously smart enough to look at the graph and go, that's wrong. And he was very polite about it, but he was genuinely thought I might've discovered a new branch of math as opposed to me making a terrible mistake which I though was flattering and horrifying at the same time.
Mike Julian: Absolutely. So I noticed there's ... seemingly, there's a trend with how you write books. To me looking from the outside, it's you start writing on a topic right about the time it hits mainstream. Whether that's true or not, that's how it seems to be. So what's ... what can we look forward to next from you? Like are you thinking about writing any new books?
James Turnbull: I am contemplating it. I had ... I've had a long dry spell of just writing sort of bits and pieces for internally. I'm writing a bunch of content for Microsoft right now. I'm thinking about writing something again, something technical. I feel like maybe service mesh is probably somewhere in this space that I'm interested in, but I don't see anything in there yet that sort of resonates with me as a solution I want to write about. But I think that's the space I'm going to watch. I would love to write a book about startup engineering practices and about [inaudible 00:29:59].
Mike Julian: That could be fun.
James Turnbull: But I think that I've been beaten to. Camille Fournier wrote The Manager's Path, which is, to me, every time I read it I'm like, I can't do any better than this. This is an awesome book.
Mike Julian: It is a very good book.
James Turnbull: And so I feel like that position has been taken, but I've had some thoughts about like the startup and things like that. Like I think there's definitely ... we're definitely in a different era and there's definitely some lessons learned and you know, particularly things around topics like work life balance and ethics and diversity and inclusion where an update to some of the seminal ideas about startup, the way startups work might be welcomed.
Mike Julian: Yeah. All that sounds great to me. Where can people find out more about you and your work?
James Turnbull: Probably the easiest place is Twitter. I'm one of the dying generation that uses Twitter, so my Twitter handle is @kartar
and if you're interested in my books, turnbull.press
will ... is my grandiose imprint and that to list all of the topics of the books and the topics and so forth and that's probably the easiest way to find me.
Mike Julian: James, thank you so much for coming on the show. It's been a pleasure to chat with you.
James Turnbull: You too. Thanks so much for having me.
Mike Julian: And thank you to everyone else listening to the real world DevOps podcast. If you want to stay up to date on latest episodes, you can find us at RealWorldDevOps.com and on iTunes, Google play or wherever it is you get your podcasts. I'll see you the next episode.
Announcer: This has been a HumblePod production. Stay humble.