City-Scale Observability with Andrew Rodgers
One of the most interesting conversations I’ve had was the day I learned about Andrew Rodgers using Graphite to monitor a million-dollar furnace in a manufacturing plant. Since then, he’s gone on to start a company that uses the same observability tooling you know and love (Influx, Kafka, Cassanda, Grafana, etc) to solve observability challenges in the physical world, such as tracking energy consumption in the hundreds of government buildings in Washington, D.C. If you’re at all interested in unique uses of software tooling, this episode is a fun one.
About Andrew Rogers
Andrew leads technical strategy and architecture development for ACE IoT Solutions. Andrew also leads the development of technical and research strategy at The Enterprise Center, a non-profit focused on developing the innovation ecosystem in Chattanooga, TN. When not bringing his extensive professional experience in Industrial Control Systems, Critical Infrastructure Controls, and Network Engineering to his professional endeavors, he can most commonly be found with a camera in his hand. A deep passion for photography takes him off the beaten path the world over, and serves as a convenient excuse for a variety of other means for enjoying nature, including hiking, biking, and most board sports. Andrew loves sharing his travels and photography, and keeps an instagram account updated with his most recent adventures.
Mike: This is the Real World DevOps podcast, and I'm your host Mike Julian. I'm setting out to meet the most interesting people doing awesome work in the world of DevOps. From the creators of your favorite tools to the organizers of amazing conferences, from authors of great books to fantastic public speakers. I want to introduce you to the most interesting people I can find.
Mike: This episode is sponsored by the lovely folks at InfluxData. If you're listening to this podcast, you're probably also interested in better monitoring tools, and that's where Influx comes in. Personally, I'm a huge fan of their products, and I often recommend them to my own clients. You're probably familiar with their time series database InfluxDB, but you may not be as familiar with their other tools. Telegraf for metrics collection from systems, Chronograf for visualization, and Capacitor for real-time streaming. All of these are available as open-source and as a hosted SaaS solution. You can check all of it out at influxdata.com. My thanks to InfluxData for helping make this podcast possible.
Mike: Hi folks, I'm Mike Julian, your host for Real World DevOps podcast, and my guest this week is a friend of mine. Andrew Rogers, he's an expert in industrial control systems and co-founder of ACE IoT solutions where he helps companies with improving visibility in their operations and energy systems. Welcome to the show.
Mike: Now you live in one of my favorite cities ever, which is Chattanooga, Tennessee. I can hear the Chattanooga fans in the background just, "Yeah, this is awesome." So one of the coolest things that I think there is in Chattanooga, aside from just the gorgeous weather and great food, is the low cost municipal internet.
Andrew: Yeah, it's pretty fantastic. Um it's certainly part of the reason why I moved to Chattanooga in the first place. Uh and I think that it's had that effect on a lot of people over the years. So um we have, sort of, an out-sized technical community here based on the fact that it's easy to support a remote workforce when you have gigabit or 10 gigabit Internet available ubiquitously across the community.
Mike: That was one of the things I never expected because there are actually some pretty significant companies based out of Chattanooga entirely as a result of this, highly available, municipal Internet like companies might know of Bellhops which is based there.
Andrew: Yeah. So that project or that company started, based on another company in Chattanooga exiting and the founders starting a fund and looking for startups across the southeast that could benefit from the available broadband here. And you know it's a pretty big company now, well over 100 folks in their technical workforce, and they continue to you know grow and support the community. It's certainly been a really interesting success story for Chattanooga.
Mike: You worked for a while as, I think, a technologist in residence for one of the startup incubators in the area.
Andrew: So yeah, Chattanooga has a lot of really unique resources. You talked about one of those, the fiber broadband available over a 600 square mile area. And one of the other things that I think, is actually due to the fiber but also just due to the type of community Chattanooga is, and the effort and interest in working collaboratively, is a 501c3 nonprofit organization focused on entrepreneurship, and helping startups both at a scale of coming up and building a high growth startup, but also mom-and-pop shops, and helping them grow and build sustainable businesses or get to the next step in their business plan. They did, as part of when the fiber was first launched in 2009, a group of folks in the community came together to say, "Hey, how do we get the most out of this? This is an awesome asset. We know that the future of our economy is going to be based on growing businesses in the area. How can we use this new asset to support that, not just in luring a big company into town, and sort of the traditional economic development scenario, but growing companies locally."
Andrew: And so one of the things they decided to try to do is launch a startup accelerator. It wasn't a venture funded accelerator, it was funded by this nonprofit. And it was focused on finding the companies around the country who had ideas that were only viable when ubiquitous broadband was available. Now, that brought a lot of really interesting folks to Chattanooga. And you know we did it about four years, I think. GIGTANK still exists today. It's changed a little bit, happens every summer. But yeah, one of the challenges with GIGTANK was that, you bring in a company, and you get a group of really great mentors from the business community here in Chattanooga. You surround them with professional services companies who are eager to support growing a business in Chattanooga. Then you tell them that you know you're trying to build a high growth startup, and your total addressable market is you know a million people in the US that actually have access to this high speed connectivity.
Andrew: So what tended to happen, unfortunately, is the businesses were viable, but the reliance on the broadband wasn't. And now finally, in 2019, this was back... I was involved heavily in 2012, 2013, 2014. 2019, we're sitting here, and you see fiber deployments happening around the country. Talk of 5G is running thick right now, and so those businesses actually a lot more have a lot more viability. But what did happen was, the businesses came for the fiber and kind of stuck around for Chattanooga. So we ended up accruing and amazing tech talent pool, really interesting entrepreneurs uh who have gone on to focus on you know other things. We even took a little stint at additive manufacturing because we saw that, sort of, digital manufacturing gave you the same opportunity that, sort of, was the basis for deploying the fiber in the first place, which was making a smart grid that was truly smart, and making the trade of moving photons instead of electron to achieve the same sorts of quality of life improvements, et cetera, that rural electrification had in the early 20th century.
Andrew: We saw that same thing happening in manufacturing, that with digital manufacturing, the capacity and need to move data, as much or more than moving materials, is really important. It's an important use case for the fiber infrastructure, and it's part of the story of why companies like Volkswagen, several Volkswagen major suppliers, have all moved to Chattanooga, is that ability to support a digital manufacturing environment. We actually took a year of GIGTANK and focused on additive manufacturing, and the digital manufacturing up out of that.
Andrew: One of the real success stories out of that whole program is a company that 3D prints really unique architecture using some of the world's largest 3D printers that they've built, designed, engineered in-house. It's called Branch Technologies, it's really cool. They were pioneers of the technique of printing in free space. Instead of laying up layers like you expect a normal FDM 3D printer to do, they actually print and stretch plastic in mid-air, and it's really wild to watch, but it's really incredible to see the results.
Mike: You started to talk about smart grids here and this leads me to think about smart cities, and I read a, I want to say it's a video or an academic paper or something like this, where, Chattanooga has embedded sensors in roadways and streetlights and all this stuff to measure various atmospheric conditions and environmental conditions. And alongside that, they have a something that I think is pretty unique, at least within the US, a open data policy for the government. Any data that government produces is automatically open rather than having to, say, request it, you get access to it, which has led to some pretty interesting possibilities for things.
Andrew: Yeah. It's really fascinating. Chattanooga was a fairly early adopter of the open data movement, open data policies. I think we adopted the open data policy we have in 2016, and it doesn't say that all data is immediately open, but it does sort of set out the principles by which government operates that you should seek to keep data open when possible. There is certain privacy, sensitive data that can't be opened, unfortunately. But sometimes the aggregation of that data can be opened or some sort of anonymized format of it. What you know I talked to city leaders across the country, what the open data movement brought... There's a little bit of like reduction in cost because of, you don't want to process for your request anymore, Freedom of Information Act request. But there's also this internal organizational friction, or sharing data between the departments, and that is really expensive.
Andrew: Every time Jane has to email Jill to get that copy of that spreadsheet, again, is time lost for both Jane and Jill. It's you know uh we see this in enterprises everywhere. These, sort of silos and trying to... When workflows cross silos, there's always some resistance there. You can put a tiny pipe between the two silos and let you know that's where Jane and Jill are emailing each other once a week, or you can just try to break some of those down. And having an open data policy and having an open data portal, the pipelines, the ETL processes that you have to put in place to enable that, that actually turns out just helping you move data between your existing departments, allowing access so people can just do their job faster and more efficiently. That's been an incredible win for Chattanooga and other cities that have adopted that policy.
Andrew: In fact, Chattanooga, just this year, invested in an additional platform, specifically for internal data. So the data it was funny, the data that is sensitive and couldn't be shared, the open data platform, they still didn't have tooling to support the same sort of streamlined data flows. So they actually invested in letting that happen across the enterprise, even for data sources that aren't necessarily public.
Mike: I was watching this, I was reading this paper I mentioned a bit ago that, due to a lot of this environmental data being accessible, it led to, I want to say some climatologists decided to start pulling in this data and started modeling, "What would happen if there was, say, a major fire in this one particular area and wind conditions were as such. How do we evacuate people?" Because they had all this data accessible to them, they came up with a way to evacuate the city along certain streets in real-time based on the weather conditions at that time. Where the plumes were going, all this sort of stuff. I'm like, "Wow, this is super cool." Yeah, that was really cool.
Mike: You mentioned, a bit ago, that working on the smart grid stuff, and I know that you've started doing a lot of this work with smart cities. I imagine that a lot of this is more than just open data policies. There's probably a lot more that goes into it.
Andrew: Yeah so, for sure. I think there's a transitioning happening right now, especially in municipal government IT operations. Even those most, 90% of the open data platforms and systems that you see deployed in cities now are still pretty much focused around batch data. So it's enterprise data that is, billing system of record data, it's transactional, it's the sort of data you expect in a general business logic operating the large enterprise that most cities are. Given all the sensors that are being deployed now for air quality, for roadway conditions, traffic signal integrations, all that data is streaming. And where you see some real interesting development happening, and certainly stuff we're focused on not just in the city of Chattanooga but with the University of Tennessee in Chattanooga as well, helping sort of build out and pilot some technologies to handle this data. It's a lot more data than cities have had to deal with in the past, and to be valuable, it needs to be available in real-time or near real-time.
Andrew: And that streaming of system looks very similar to what we expect in industrial systems, it looks very similar to what our company does with building energy systems. You end up with being able to deploy... I think what's really interesting in the cross-tie with your audience working in the more, the general technology operations world is, the same systems that are powering logging, and time series metrics, and all these, sort of, at scale IT systems start to become relevant when you're dealing with all the traffic signals states in the city every 100 milliseconds. That's where, I think, there's a lot of really interesting work to be done. I think that's something that Chattanooga is, kind of, putting a stake in the ground to be a leader in that space, working between the university and the city itself.
Mike: You and I were having some coffee a couple of years ago when you first started working on this stuff. You kind of mapped out these architecture systems that you were building, and it was like Cassandra and Kafka, and this real-time streaming system. There was time series database behind it, and a Grafan on the front, and I'm like, "Oh, that's cool. You're building a monitoring system?" "Well, no, but yes. Michael, what are you monitoring? What are you building this for?" You're like, "Oh, it's a furnace, in a manufacturing facility." "And I'm like sorry, what?" What's the most interesting thing about all of this is, we've kind of been alluding to it this whole conversation, but to say it explicitly, the stuff that you're using, the tools and the approaches that you're using to work with smart cities and work with industrial control systems and monitoring a building, is the same technologies that we're using in DevOps to do monitoring of servers and applications. It's the same sort of stuff.
Andrew: Absolutely. I mean, I think, and I actually have to credit you with this, we met many, many years ago, and I was kind of coming out of an industrial control critical infrastructure traditional approach where it's all vertically integrated, vendor-driven solution engineering or not engineering. Depends on how you-
Mike: As the case may be.
Andrew: As the case may be. And you know talking about some of the systems that you were using at the time, which we look at now and think where... It's amazing how fast nine or eight or nine years is in this industry.
Mike: Yeah, I'm pretty sure that we were talking about how hot Grafite was.
Andrew: Yeah, I'm pretty sure. I'm pretty sure. Yeah. I was like, "Oh, you mean there's something different than our RD tool? What?" So yeah. You know we were using... At the time I was working in an industry that SQL server was where they put time series metrics. So that works for some things, but it turns out that, when somebody has to... When you crash the on-prem server every time you try to look at more than a week of data, or all of those sorts of issues you have trying to use relational databases for massive time series systems. We talked about it, and we talked about some solutions, and I actually implemented some of that stuff, and they said, "Hey, you should really go to this conference that I've found. It's really cool. It's called Monitorama." And I've been back every year since, and all I tell everybody when I go, it's like, "I just come here, learn what you all are doing, and steal it and rip it off, and apply it to real systems."
Andrew: This is exactly the parallels are very real. I remember actually a really striking thing for me was the first time I was at the first Monitorama I attended, the keynote speaker was talking about all the lean manufacturing principles, and how that could be applied to DevOps. And I realized where this overlay was and kind of how the tables had turned. So manufacturing had done a really, really good job of defining the principles, and really engineering out the processes by which this stuff could be done, but they had failed on the technology implementation because that's not what they were, and that's not where their expertise was. So when you took those principles and put them in the hands of a systems engineer and software engineers who are building these things for software companies, boy, that mean, they could implement things that we in the manufacturing space could only dream about having.
Andrew: It's sort of a circular system where some of the principles that were developed 50, 60 years ago around process management, lean process management got pushed off into the information technology space. But then they got encoded and codified in these tools that are so much better than what the industry had. And now you know a big part of what I do on a day to day basis is feeding those tools back to the real world environment.
Mike: You were helping me out with a project I was working on when I worked for Oak Ridge National Lab where I had a bunch of solar panels and we were collecting this information from solar panels, metrics about performance of the panels, like how much sun there is that day, and then the amount of power generated. We were shoving all this into Grafite. Like it's the same stuff that I was it's same system I was shoving all of my operating system and application metrics to. But the industrial controls person that I was working, this was just completely blowing their mind that this was even a possibility.
Andrew: Right. Right. Yeah-
Mike: I think it's absolutely incredible.
Andrew: Yeah. I think so. This is, I will say, that cycle, and especially five years ago, four years ago, I think even more so, those tools weren't immediately applicable back to this space because, in general, in manufacturing you need systems of record. And time series for monitoring operations monitoring don't tend to have those same sort of consistency demands. I think what's really interesting is I've actually seen, over the last five years, a bigger and bigger push toward having strongly consistent monitoring systems that can give definite answers, because people are building business value systems on top of monitoring, which is taking it... Again it's just helping close some of those gaps, which is great for me, so keep doing it.
Mike: All right. You gave a talk at, I want to say one of the Grafana's Conferences in New York.
Andrew: It was in Amsterdam. GrafanaCon in-
Mike: There we go.
Andrew: Amsterdam. Yeah. Yeah.
Mike: About monitoring at building scale or city scale. Is that what-
Andrew: Yeah. Monitoring buildings at city scale. Yeah, I've been involved-
Mike: I think this gets to what you're currently working on, doesn't it?
Andrew: Yeah, absolutely. So I've been doing sort of consulting work for many years, as we've been talking about, in manufacturing, in applying some of these monitoring technologies back into the manufacturing space, and where things... NSF has a really weird term for this, National Science Foundation, but I actually think it sums this stuff up better than any other term I've heard, which is cyber-physical systems. That tends to be, my career has always been in cyber-physical systems. I didn't know what it was when I started my career. I didn't know what it was until about five years ago, but that's what they are, and at any time-
Mike: Can you define that for us?
Andrew: Yeah, yeah. So you know cyber-physical system is anytime where a software-defined system touches something that is hardware-defined, and is interacting with the real world in a way that is not sending bits over a wire. That sums up-
Mike: Oh yeah. How about an example there just so we're clear on what we're talking about?
Andrew: That's a very general term. The example could be anything, I mean, a traffic light that is connected to ethernet and sending data back to a central database, that's a cyber-physical system. Your thermostat, your Nest Thermostat is a cyber-physical system. To some degree, maybe your mouth is a cyber-physical system. But at the end of the day, it all gets down to something that happens in the real world, is turned into bits or something that's happening in bits is turned into something in the real world. That's really where my career and, especially, my consultancy has centered on. And I got involved, brought in actually, to help bring some expertise in IT operations to a project with the City of Washington DC, deploying a large scale monitoring system for their building operations.
Andrew: So they had buildings across the city. They have a $40 billion real estate portfolio, and they spend about 100 million dollars a year on energy, so it's big application. What they would find is they would make a strategic investment in a building that was performing poorly, which they could see because they pay the utility bills, and they would improve it by 20 or 30%. But, month by month after they made that capital investment, they would see the building slide back towards its original benchmark, and they didn't have any insight as to why, because all they had was how much money they were paying for electricity. They had no insight into, well, what equipment is running? What set points are being applied? Who's turning on what, when? Which spaces are occupied? They scoped out this project where to really define how they could collect all that operations information and provide it in a accessible way, and visualize it in an accessible way for their staff from the top down.
Andrew: They wanted their enterprise executives to be able to look and see, "Okay, how is this building being used?" They also wanted their technicians out in the field to say, "This doesn't look right. What's going on? I need to go investigate X." And so we started building this system. It's operational now. It has about 60 of the 400 buildings connected that DC owns, and it's full deployment in about 20 of those buildings where they've actually kind of done all the accessory work to make sure that they understand what the data they're getting is. With it, they're saving around a million and a half dollars a year on energy, and most importantly, where they're making capital investments, they're seeing retention of the savings they gain. That project, sort of, demonstrated to me and helped me understand a lot about that space and what that space needed, and ended up starting a company focused on building out a cloud solution that provides that service to other large portfolio owners or even individual building owners.
Mike: That's pretty cool. I imagine you probably used a lot of the same tricks and technologies that we were talking about earlier to do all that?
Andrew: Yeah, absolutely. When I got involved in that project, the city of Washington DC had been engaged with the Federal Department of Energy, and specifically working with Pacific Northwest National Lab on a software platform called VOLTTRON. My business partner refuses to introduce it without saying, "Unfortunately called VOLTTRON." With two T's, just to avoid trademark issues. But it was a software platform that had been written by National Lab researchers to enable them to examine interesting and novel ways to control buildings, to control what they call distributed energy resources, which is everything from your server UPS, to the air conditioner, to hot water heaters, to the solar panels on your roof, to dedicated battery storage devices. And it's a pretty robust platform, but it was built by researchers, which... I think you have some experience with system's built by researchers-
Mike: I do-
Andrew: Which you may or may not have PTSD from.
Mike: Plenty of that. Unfortunately, it's weird working for a national lab because all of my experiences, stuff, I can't talk about.
Andrew: Well. Yeah.
Mike: Department of Energy. Yeah.
Andrew: Yeah. Anyway, so what they really needed was, sort of, someone with a view of the same things we've been talking about, which are, what are the technologies that are being applied in the broader technology and developed in the broader technology ecosystem, and how can those benefit enhance what we've got here? One of those was back to sort of the same challenge that I've found myself facing time and time again of, how do you store time series data at scale efficiently and aggregate it easily, and provide robust analytics quickly. We did a lot of work on retooling and moving some things around in the platform to enable a little bit more effective, efficient deployment, and help them kind of justify to DOE moving the whole project to the Eclipse Foundation. Now, VOLTTRON is a project in the Eclipse Foundation's portfolio.
Andrew: We are one of the only commercial companies out there offering services around VOLTTRON. We use it to support our cloud service offering, which is a basic building instrumentation tool. But we also support other companies who are doing interesting things with the platform, and support their use cases and help them develop robust technology processes around it.
Mike: Did I hear right? Before we were talking, before we started recording, you mentioned something about storing this data through... Oh, shoot, I forget what it was. I think Kafka and S3?
Andrew: Yeah. I mean, I think, this whole moving... Big data, we hear, and have heard, way too often, and I refuse to let anybody I work with call any data, big data. I just say medium data and then they like hem and haw then they shut up-
Mike: Of course.
Andrew: But there's been a real transition, and I think I touched on this with the smart city use case. When big data kind of became a buzz word, it was all about, okay, we have been collecting transaction data or we've been collecting core business logic data for 50 years. Now we have these vast repositories of data that are in 20 different formats. How do we get all this into a data lake or whatever platform or technology you want to use to get it where you can actually get value from it. But now it's, we have all that data, and we need to actually join it with streaming data that's coming in real-time to get value from it. That's a lot different challenge.
Andrew: But obviously, stream processing is a big deal. It's taken off. Kafka obviously is a technology that's getting used a lot, it's getting a lot of traction. Confluent seems to be doing really well, it's exciting. Pulsar is another technology, but this also gets back to the system of record data. So when I first heard about Kafka, it was being used to sort of multiplex monitoring data in a way where, maybe, you weren't that careful about consistency, and it wasn't a big deal if your consumer indexes got moved around a little bit, et cetera. But now we're seeing really robust frameworks for processing that data, getting it into objects and something like S3 where you can move it easily, you can query across it, you can pre-aggregate it. And so that's yeah, we're, we're working a lot with those kinds of data systems now, and getting real-time streaming data that's available in the Grafana dashboard, but also for your data science teams to use with the existing data you may have in your enterprise.
Mike: Man, all this is so incredibly interesting. I absolutely love the crossover here. The comment you made earlier about the manufacturing world really came up with solid principles but weren't equipped to do the execution, to do the implementation on the software side. Coming from my background of... I'm not from manufacturing world, I'm from the systems world. To me, I've always looked up to manufacturing as, "They've got their shit together," but really, we're doing the implementation and now you're taking it all back, which is awesome. Everyone wins.
Andrew: Yeah. I mean, don't get me wrong. I think manufacturing has done an incredible job, especially if you look at the mechanical systems that they developed these processes for. Obviously, I mean... The defect rate in a modern vehicle, if you really just sit down and think about it, is mind blowing. The fact that there's only a few recalls a year, those sort of things. But I'm also the kind of person who sits down every once in a while and is like, "Man, 90 years ago the idea of flying across the country would have been completely strange to anyone." You tried to say, "Oh well, you could..." You're like, "I'm going to do on Friday and be in Portland in four hours from across the country." That was completely strange. And so I think, it's not to, kind of, crap on manufacturing-
Mike: Of course not-
Andrew: They've done incredible, incredible work. And if you look at those principles that people like Deming worked on, they're still incredibly relevant, which is kind of wild, but-
Mike: What I love about all this is that the principles and implementations that each of us reason are being applied in ways that most people don't even think about. I never would have considered that you would be using the same technologies that I use to do completely different work.
Andrew: Yeah. I mean, I... To be perfectly frank, you know I didn't either until I met you. So I mean, I might owe a lot of my career to you. This is...
Mike: Thank you-
Andrew: Kind of fun, kind of coming back full circle as well.
Mike: Yeah. Well, this has been wonderful. Thank you so much for taking the time to chat.
Mike: Where can people find out more about you and your work?
Andrew: Aceiotsolutions.com, we've got a blog that we try to keep updated with information about some of the projects we're working on, about some of the technologies we're using. Please follow us there.
Mike: All right then, and to everyone else listening, thank you for listening to the Real World DevOps podcast. If you want to stay up to date on the latest episode, you can find us at realworlddevops.com and on iTunes, Google Play or wherever it is you get your podcast. I'll see you in the next episode.
Speaker 3: This has been a HumblePod production. Stay humble.
2019 Duckbill Group, LLC