Image Deploy Process and Infrastructure as Code [DevOps Lunch Jan 19 2021]

Mar 23, 2021 17:42 · 8799 words · 42 minute read

Hello, I’m Rob Hirschfeld CEO and co founder of RackN and the DevOps lunch of January 19, was all about a couple of topics. We talked about image based deployments, which we’re going to have a follow up on the challenges of image based deployments and how things are changing on that part of the infrastructure management. And then we pivoted to Infrastructure as Code about 30 minutes in and talked about collaboration, transparency, and empathy that’s necessary to really build good organizational hygiene, for Infrastructure as Code as real thoughtful conversation that that pulled together a lot of our other threads.

I think you’ll find it highly, highly interesting. Please join us.

00:43 - These are designed to be casual conversations over lunch on Tuesdays about DevOps in the industry. So the 23rd Cloud is the place for all those details. Thanks and enjoy the conversations.

00:56 - Um, this image build thing it keeps coming up more and more.

01:02 - Over the last couple weeks, Josh, I see you shaking your head, are you hearing the same thing? I’ve been hearing the same conversation for years. Yeah, exactly. I know. That’s exactly.

01:15 - I mean, yeah. It’s, it’s the it’s the same, it’s the same desire. And the only thing changes is the complexity of the environments that people want that desired outcome, meaning they’re going up. It’s not a hard, that’s fine. You just need a process. It is a whale’s heart. No, no. What are we talking servers or endpoints? It doesn’t matter. I mean, the, the, you know, the, the key thing that I’ve seen is that our I guess the the biggest barrier that people run into isn’t a technical one, usually, yes, it’s you have to acknowledge that you might not be able to ever get 100%, depending on the dependencies, the dependencies that you have, or the requirements of whatever you need included in that image.

And if you start with that, and you accept that, then you’re already off to a great start, I’ve been successful. The number of times I’ve seen people say, hey, well, we need to be able to provision this image against this, whatever the platform, whatever it is, and it needs to be exactly the same every single time and have every single thing that the person needs, like.

02:33 - Okay, they heard a little bit smaller first, yes.

02:39 - Because, again, if if you can at least start with everybody getting the same operating system with the same installed dependencies and those things at the same that already gets really, really far. And then Jojo you say that, when you say that, all I go is as a person who has led organizations, I said, for my head goes, hmm, sounds like we’ve allowed technology for technology’s sake, one the way we should do business, and how we should have consistency of I think a bit about enterprise architecture, have a consistent per department or per instance, have consistent desktop images based on job duties and tasks.

That’s the step in that right direction. I think I’m just saying everybody adding their stuff in the problem is a different versions of the same thing as the Comanche issue.

03:43 - With the Comanche right, one of the most amazing helicopters ever developed. And then as as they were getting the development going, people kept adding requirements to it and expectations to it, that design changes, change performance characteristics, to the point that they took something that was stable and capable. And they, they they applied requirements to get it to do things it wasn’t ever built to do. Because they wanted to attach that capability to that product.

We fall into that same fallacy with the technologies we adopt today. Kubernetes to the Bradley’s? Right, exactly, yeah, no.

04:25 - And this goes to the thing that, you know, just recent conversations talking about images didn’t Rob I know, we’re completely, you know, we I feel like we need to have a whole session on this topic. Because, but going back to the idea of, you know, the success of images is keep it simple. Keep them as bare bones as possible. Don’t build all these image types. Like, you know, I just went through a couple of years of trying to trying to shape people’s kind of methodology of thinking, Oh, I need to have an image for a web server.

I need to have an image for a database server. I need to have an image for this and versus I need one image that’s stable. And then I bolt on what is required using Ansible, or whatever your config management is by saying it needs to look like this and apply it to that base image. Yeah. And then the thing is, right, we’re talking image. And the real reality is it’s about process, right? It’s process. Yes.

05:25 - Yep. I mean, if you can’t whiteboard it, or if you can’t describe it succinctly, you can’t image it. Exactly. And that’s also the difference.

05:35 - There’s also the difference between managing images 10 years ago, where a golden image was still practical, versus now where you got updates, potentially every day. Yep.

05:47 - Potentially every hour? Yeah, potentially. Yeah. And that’s the other thing is, you know, the, you know, we did all of our image builds and pipelines, so they were on a cadence of where new images were stamped out.

06:01 - And then you had that process. So for VMware people, being able to inject an image into V center, and then have it tagged as latest and have an incremental rollback process as well.

06:16 - That way, if I inject an issue, I can easily go back. And I have artifacts along the way that dictate on what that boot build was. So if Rob deploys a web server and has issues and we know what that artifact is, we can go back and look at the build and understand why did that? What What was it that caused that issue? But he was able to have that cane exactly what changed? Exactly, exactly.

06:41 - Nothing changed. Josh, you know that. But at least it’s in code and pipelines, you know, and everybody can say, what is being done? And then you wanted to network? Exactly.

06:52 - Exactly. I’m not we’re going full circle as well. With container images with Yes.

06:59 - We’re taking we’re taking the old school images that we used to have on we’re praying it down to just what we need to run the stiff. They Exactly.

07:11 - Interesting. Yeah, it was container for better design from that perspective to to embrace that process, but it is very similar.

07:18 - From a post config and boot process. It’s like one of the things I was noting is that Ubuntu 2004 actually eliminating an ATS that eliminated Neptune. Yes, I did slim things down. They decided that they wouldn’t do proceed anymore and you can’t net boot it.

07:34 - You have to image deploy it. Yep. And it’s a vein even going through Packer. I know that there’s a lot of open issues from Packer and it’s interesting you brought that up, Rob, is that this has been going on for months from a packer perspective. Getting a consistent deployment on Ubuntu 2004. Now I’ve gone through and for whatever reason, it works for me I don’t know why I’ve tried in different environments that everything I’ve been successful getting 2004 deployed the VSA to V center using Packer proxmox KVM, which obviously proxmox is KVM for the most part, vagrant, you know, different different VirtualBox, VMware Workstation fusion on successful along the way.

08:21 - Others are getting stuck with Packer and I don’t understand why.

08:25 - But I’ve tried on different machines. I’ve tried on Linux, I’ve tried on Mac, and I’ve got consistent results.

08:33 - So I have two questions for you, Larry. One is, you’ve got the repo out there.

08:42 - But there are a lot of Larry Smith’s out there. How do we know which one is yours? The original Larry’s Just kidding.

08:52 - I mean, I’ve worked for a guy named Larry Smith. Yeah, exactly. Doing object oriented databases. That was me too.

09:06 - It was a junior. Junior. Mine is Mr. Le Smith Jr. I use that on everything. So I’m the original, I’m the O g.

09:19 - Okay, and then I can’t remember what the other one was. But it was about Oh, it was.

09:24 - If Ubuntu is now fucked up, pardon my French, what would be the recommended Linux to run on a laptop that’s gonna have pretty much everything running out of VMs or containers.

09:37 - They’re all they’re all. Ubuntu is fine.

09:43 - 2004 is fine for a laptop. That’s what I use.

09:46 - And it’s it’s perfectly legit. stable, it works. It’s well supported. But this unit actually works really well acting as a server, to remote storage and other things like that.

10:00 - But from a desktop perspective, it’s fine that the thing that we’re complaining about is the pre see the pre seed is gone. So if like for us we can install, right? Yeah.

10:14 - Individual, it was an individual, it doesn’t matter at all your USB install, whatever.

10:21 - Yeah, know what they what they did was they there, they’re all clouds so they eliminated the users out of it now uses cloud uses cloud net and curtain. Wow, which we use we use curtain also which is a you can get my team in a cup of coffee in here an hour worth of ranting about kirtan.

10:42 - And I still want to hear that you actually need to I need to bring him in and have that conversation because you know, when we were doing image deploy stuff and like over the holidays, we got ESX to image deploy working which is a huge leap forward and how normally you install VMware but yeah, but curtain and curtains, it trips people up from building an image perspective.

11:12 - Yeah, one of the interesting things I saw arch we’ve done yeah, arch, I’ve done some arch thing with a boon to like selling your data to Amazon a couple years ago.

11:24 - It wasn’t selling did that what they did was they had in there what they call the lens, basically their their search, they had their Amazon search automatically enabled. So your search as on the desktop want to Amazon they had a big backlash for that. justified backlash.

11:51 - of that. And I once tried installing Arch Linux, I thought you know, it’d be just a good minimal Linux distribution. And and it’s like partition your drive. Yes. And I’m like, Okay, I know how to partition a drive. I mean, seriously, I’ve been doing this for 20 years. But no, I’m not going to break now partition my laptop to install.

12:15 - Exactly. I’ll share. I’ll share another link here, guys.

12:22 - Some Packer stuff. Yeah, this stuff’s fun. I mean, it’s fun when you have nothing else to do, right.

12:29 - I mean, I’ll take this any day over windows put it that way. I mean, can I do the windows? Yeah, I did it in Windows. Yeah. And I have to and it’s painful as hell, you know, all the different iterations of auto unattend and all the other crap and it’s like, oh, I’m even back. In fact, I had a conversation, Joshua, appreciate this is a risk remote installation services or something like that. It used to be back in the day, remember, when PE and all the going way back way back way back? All of the the process around building those images.

13:09 - We had a conversation just recently, where the the company actually still uses that as a build process.

13:19 - I was like, Wow, you’re shaking your head like I did. I’m like why much, much more efficient ways of doing this today.

13:26 - So it’s essentially we actually came up, we don’t do windows through net boot at all. the only the only way we support Windows installs is through an image deploy.

13:39 - Because the net boot process and licensing are so painful. Together, what would happen is people would would have to go through multiple boot cycles. And it was just like we could do it. It just was so painful. We just said we don’t support that if you want to do image windows deploys install Windows, I mean, that’s basically where we are with Ubuntu now. 2004. And it’s where, you know, forget the fact is image deploy is a is a better process.

14:09 - It’s just as fast. It’s faster, faster. I mean, I’ve I mean, I’ve done you know, for years, just the whole Pixie TFTP boot boot stuff. And I’ve written about it, you know, years ago on doing that for a VMware perspective. and nine times out of 10. It was usually faster over the wire, doing real time in line when Pixie boot builds good, Josh, this isn’t new, right? Exactly UCS. That’s when Yeah, and it’s fantastic. We actually had a point where at SolidFire, we had toyed with the idea, and we had actually customers doing this to where they will take SolidFire nodes from a cluster and they could switch it from being a SolidFire node in a cluster to Pixie boot it to Use it for compute only as a data set frozen.

15:04 - And based on changes in capacity requirements, they could Nix it as a SolidFire node automatically bring it back online as a as a, as a compute node for their compute cluster. We didn’t really encourage this, but at the same time, it’s like they’re not changing the system. They’re just booting to a remote image, running it, and then killing it, and then putting it back up and letting the system load, you know, reestablish the data integrity.

So the interesting thing that people chose to do, I wish I’d have known about that I could have put some use to that solid fire solid fires we hadn’t in production at one point.

15:49 - I mean, it was really strange to go. Yeah, yeah.

15:53 - I remember having solid fire. I think, Josh, I think you and I got connected. Well, maybe before that, but you were at SolidFire. I think they’re sort of in 2014 2015. I think, dude, we got SolidFire and, and used it. So much fun.

16:10 - good old days. That’s right. All right, Rob. Well, rabbit hole we going down now.

16:22 - Infrastructure as Code is a bad idea. I do think we need to change the name of it, though. Okay, so wanted to point out that the Boeing, giant rocket fire that was supposed to get us to the moon needed four minutes of fire, only got one minute, 15 seconds.

16:42 - They’ve traced the shutdown to software. g who’d a thunk? This is a test they were doing in in Mississippi. Yep. That was the engine the engine test? Yep. That’s why they do an extended an extended one. Well, I hate to say it, but in Boeing, I suspect who was in charge of this engine.

17:11 - what somebody else put post the 737 Max Max.

17:16 - They they’ve given up on aerospace engineering discipline. And they’re just drinking whatever Kool Aid they they want to make things go fast and agile, well, agile in particularly good for mission critical. can’t change it at the drop of a hat stuff. Yeah, I mean, you’re not gonna be posting updates every day to the operating system of a satellite or a rocket to Mars. So just delete a spaceship and redeploy it.

17:57 - As as someone who holds a degree in aerospace engineering, and is co authored a paper on the concept of applying agile development principles to you know, similar infrastructure types.

18:12 - I’ll say that the problem with the SLS, isn’t isn’t software development problem. It’s the desire to reuse existing proven tested platforms in ways that they weren’t intended to be used.

18:31 - I mean, if it’s kind of on theme with with other things we talked about today. Yeah.

18:35 - Because the only requirement that they really received in transitioning from the shuttle program that this program is you have to keep all the same contractors employed. What? Oh, yeah. That was a NASA NASA through through, you know, as you might expect from our congressional leaders, part of the reason why SpaceX has able to iterate so much quicker than, say, someone like NASA and their programs is because NASA like Boeing, have these distributed workforces, like the workforce is distributed, and it’s mainly done for political reasons or through adhering to political processes, oversight and regulations.

So you know, while that test fire, you know, that tennis fire happened in Mississippi for a reason, and it’s because someone in political spectrum made sure that it did not because there was Senator stennett No, it’s there’s no accident that that because that I’ve been there, I used to live in New Orleans. That is a NASA facility in the middle of frickin nowhere, absent you show the Shelby Shelby on the space station in 1989.

And we will work on various aspects. At some point, pretty much everything stopped because there was a need for West Virginia to have part of the space station contract. NASA figured out, there was the software that this one company in West Virginia was capable of writing. So the contract was put out there. And the competition occurred. And it turns out that another company move two employees to West Virginia, filed the RFP response from West Virginia and beat out the West Virginia Company.

And nobody knew what to do, everything got put on hold again, because West Virginia still didn’t have a piece of the space station contracts. And that was back, we had the same problem with the the heat distribution, it was made in one place, and it had to be redesigned, because some other states companies needed to get to have it instead of us. So we had to actually give them the design so that they could make it so that it wouldn’t put the system behind too much.

21:27 - It’s like, it’s interesting, because I want to offer a counterpoint, I was doing a one on one, club, 2030 podcast interview with some of y’all might know, ain Ray, at a cloudops Club Canada.

21:42 - And one of the things that was is interesting here is that the balance between efficiency, and, you know, basically, spreading out innovations, you know, smaller companies and things like that, and we do get into this trap of, you know, rushing it and doing things really efficiently and versus, you know, more longer term thinking. And there’s a while it was easy in a in my instinctual reaction is like, yeah, they should not do that.

It’s like, well, but wait a second actually congregating you know, all of all of that knowledge into one place, maybe, you know, it doesn’t make sense. And we need to rethink that. Because the the place where we went from that was like, Alright, if the how fast you can build infrastructure is your number one criteria for getting infrastructure, then yeah, it’s all going to go into Amazon really, really fast? Because they they’re just building servers and infrastructure faster.

So just get over it. And that answer to me, I understand. And there’s a part of me that’s worried about the trend line.

22:51 - So I, you know, it’s, yeah, I want to I want us to have big rockets. And I want to build really fast and cheap. That’s important. But it’s there’s a part of me that’s like maybe our timeline, maybe we’re we’re creating a race where it’s not the right thing to do? I don’t know. Yeah, that was that was going to be my my second point to add actually is along those lines is, it’s less about how these things are built and where things are sourced.

And whether it’s centralized or distributed. It’s how you respond when you need to respond quickly. That happened that goes with will go with the space program. As an example, one of the most impressive things to me has been seeing how Space X is capable of identifying a problem with the rocket during a test and have a patch or have a fix done in a matter of hours or days. Whereas we don’t see that in these larger, distributed contracted environments.

We don’t see it in that way. Because there’s a lot more friction in those processes. And when we talk, I’m talking please give me a minute. Okay, thank you. And then when we see these processes applied to for instance, like fighter jets, right, they will still they use and apply agile principles for the F 35 fighter jet.

24:08 - And their point is not that they feel the requirement to update that software on the jet every day, they don’t do that they have like the F 22 for it, or F 35 has three versions of software on it at all times. And that is so that they can transition and test and validate on the fly and, and have a redundant, very established version that they can drop to if they need to. But their their intent is to ensure that if they identify an issue, they can get a fix to that issue out as quickly as possible to keep everybody operational ready.

24:50 - How quickly can they reboot to the safe version? I don’t remember specifics, but it’s it’s pretty quick. I mean, I don’t think they try to do it in the air, but I’m confident that they have. Okay, so that was what my, my wondering is because that thing doesn’t fly without software. I mean, yeah, I mean, while back Yeah, the failsafe if the failsafe is actually the fail. Oh, Wes. That’s actually and they can do it and still salvage the bird if something goes wrong by failing over to the reliable.

Well, I suspect I mean, the space shuttle have multiple parallel control systems, right. And so if you were building a fighter plane, you’re going to do exactly the same thing, because you could have a stray bullet take out a control system.

25:47 - Yeah, but you also have things on those fighter planes, which I’ve found interesting, fascinating. I had a friend who used to do computational fluid dynamics, and they are doing dynamically configurable inlets and outlets on those jets to keep them flying.

26:06 - Yeah. It’s, it’s just amazing. But it also is scary as hell well. So let me let me make it a little less scary, right? So one of the key components of fly by wire systems, right, where you don’t have cables and hydraulics that are being controlled physically by the stick, but it’s electronic. Right? those systems work as a, like a sub system or overlying system. Right? So those systems Hi, Mom, those systems actually are treated more like API’s.

Right? So that software that I was describing earlier is controlling software, but it’s, it’s executing against the software stack, that is the fly by wire, right? It’s the adjustment stuff to to a subsystem, right? So the flight control system is actually the actual mechanical control that is executed by across the wire is it can actually be controlled by a separate piece of software. And so they can be independent in that.

So it’s very, it’s very similar, if you know, of, you know, what we do with with application and systems engineering.

27:20 - And that’s the thing I try to explain to people, right. Particularly with complex systems, like those models don’t change a whole lot. It’s just get applied in different ways. And then people think they’re gonna, like make some grandiose change and how their systems going right. It’s still same thing.

27:39 - It’s a new label. Yeah. But Josh, you know, we just reinvented a whole new way of doing things come on.

27:46 - Yeah, I know, because rocket rocketry is very good. Exactly. We didn’t invent it. So can’t be very good.

27:57 - Man, you bunch of boomers, you know. That’s that the, this is actually interesting. I would take Larry to tell me about Infrastructure as Code. And I would take us back to that if you if y’all want.

28:10 - And just as a side note, right part of me with these lunches is is the ability for us to sit down and talk about anything, if somebody has a I have topics that I could, we could bring in, but I actually like the connecting and talking and geeking out as a thing.

28:30 - And we still need to do our book, or to our book club.

28:33 - I’m horrible about staying on topic. But we we talked a lot about Infrastructure as Code before the before Thanksgiving timeframes.

28:44 - And I had a revelation over the holidays about how all that stuff boils down, like we had like a grid, and we had a matrix. And we have these key principles for Infrastructure as Code. And I thought about it all. And this is related to the rocket stuff. It’s it’s junk, in that the purpose of Infrastructure as Code is to collaborate, yes.

29:10 - And all of the principles, everything that we talk about, like repos, and that state infrastructure, and all everything else, which I still believe is core to the tech. If if you’re not figuring out how to collaborate around those things first.

29:27 - That’s why you’re doing it. The purpose is to is not to automate systems, right? You could automate systems with batch files and like the Kubernetes, Kubernetes, can be it’s just hard cutting everything and go, which will work. But the point here is that you’re trying to create systems that allow you to collaborate around the automation. So actually, what is Rob? Yeah, it’s because all those software geeks think software is documentation.

So it’s really infrastructure is documentation.

30:00 - But you have to do it in code because the software geeks don’t know how to do it. If it? Yeah, I think that there’s, there’s more, there’s more to it than that, like the idea that the idea that I’m putting my automation in source code control. It’s like, yeah, hey, I want it in source code control. It’s like Why? Well, the reason you do it is because you’re actually collaborating around what gets released into your infrastructure and having a person be able to review it and check it.

30:28 - Right. So there’s, there’s all sorts of benefits. But if they all come back, you know that I started just connecting them all back to collaboration, right? When you start talking about why why should a CIO care, that they’re going to implement Infrastructure as Code processes, and it’s like, well, because at the end of the day, your teams need to collaborate better around your infrastructure, and they need to be able to get help from external parties around your infrastructure.

And that’s fine, I’d like to add, argue that that collaboration is actually derivative.

30:59 - Really, transparency is the key. That’s what I was gonna say yes. Now without because, you know, I’ve done data center automation in the past, and it is, you know, this executes, things happen. And then an outcome occurs, and particularly in the infrastructure space, where people utilize infrastructure without understanding infrastructure, it’s akin to magic. And there’s no transparency or awareness of how things got from point A to point z.

And Infrastructure as Code provides an opportunity for complete transparency on what is happening and how you get from point A to point z, as results are then able to be collaborative, and how you improve how you improve those processes, or how you explain and hold yourselves accountable for the outcomes of those processes. And I’m glad you did, because I was actually gonna go there as well, because that’s one of the things that I see as well is that it is about transparency.

It’s about being transparent on what’s going on. So there’s accountability.

32:07 - And then also, at the same time, kind of like Josh was saying, it’s about getting collaboration, the goal is to collaborate to get others to buy into what’s going on, but also to contribute to it. Yeah, but the problem, the problem still resides, or the problem still is that those that don’t want to collaborate, they still want to hold near and dear to their heart, because they’re doing a thing that they feel is more important, and don’t want to go for the overall objective.

You know what I’m saying? So they don’t want go ahead.

32:38 - Yeah, I was gonna say from keys perspective, it’s, you know, what if my guy gets hit by a bus, exactly, from the CEO perspective, so many buses? This, is this a valid point or like that, that there is a knowledge transfer component to their, like, you, you, you write your Infrastructure as Code so that you enable someone with less knowledge than yourself to reproduce what you just deployed? And then let’s go attempt to understand it with our having you sitting there over the shoulder.

33:19 - I still don’t go back to collect part of the collaboration pieces. I want to reuse other people’s code I want to bring in yet pieces, right? Because that’s right. I totally agree.

33:31 - It’s a transparency is a huge thing, right? I’ve been a big fan of de magic infrastructure, or at least when you write automation, you need to be able to watch the gears turn, it’s always a test to me.

33:44 - And, and yet, it can be very transparent code. And you can be but you can’t share it. Right? I watched, you know, we’ve been talking about terraform or Ansible. And nobody can pick up somebody else’s playbooks. And really you reuse them or let alone parts of them.

34:03 - And so that’s, yeah, there’s no you can’t collect like you could collaborate like, Oh, yeah, so we’ve put a terraform plan into a into git, and my team can use it. But nobody, if I, nobody else can reuse what I’ve done out of that plan, it’s alright. If somebody changes the stuff I depend on, then it breaks what I did, right? That, you know, oh, my God, you terraform apply a new thing. They pull down a new provider, and the provider changed syntax somewhere that broke your whole infrastructure, and I ran into you still isn’t there? You know, until we have all this stuff documented in repos.

We can’t have reuse. But we still don’t have the standardization that the process, the discipline to actually get to the reuse. So there’s just so much code out there that you used once and done, when with next level will be able to sit there and go, Oh, this is applicable to these other places.

35:11 - Let’s get everything working on the same thing. And then you have to connect it all back in. So that there is one source of truth, part of the issues.

35:23 - Infrastructure as Code is anything else, you still need to find a way back to one source of truth. And that’s, that’s just me nuts.

35:31 - Yeah. If you if you if you don’t have a feedback cycle, you don’t have reuse, you have borrowed exactly, yep. are competing, right? Yep. And, and I was gonna bring this up, Keith was part of a large operation with myself, I always go back to this engagement that Keith on here has been involved in one of the things that we as an AI and Keith tried to, to push was collaboration and basically, building that whole universe where people shared right within the environment.

36:04 - There was a lot of pushback, because of certain individuals or teams that did not want to participate in that in the enterprise. Right? Because it was, Keith, do you want to add to that any? From from your perspective? Yeah, I just because I’m listening to this everything I had this competition before ears. This is ill, it’s ill equipped organizational design.

36:29 - And what I mean by that is, Infrastructure as Code and the reason why I think it’s gotten where people talked about changing the name, same thing with DevOps, people like it doesn’t fit in more, it’s not inclusive enough. It’s not broad enough. All of this is because these are techniques to achieve an end, what I always draw us back to this whole conversation of what is it you’re trying to achieve as an organization in design organizational construct, to achieve that goal, right.

And you can use Infrastructure as Code, you can use iterative development, you can use waterfall, I don’t care what you use as a methodology of achieving that goal.

37:08 - But the construct of organization should be things like open transparency, celebrate your failures.

37:16 - We are peers work as peers as collective team reuse of work. So that means that you are constructing work in a way that can be picked up by someone, as long as they understand the technology being used, it should be done in a consistent way that makes sense. So we have documentation is back the Rockies point that says this is how we do this thing. This is how we create playbooks. This is how playbooks will function. This is how we do item potency among playbooks within the pipeline.

These are the, in what Larry was indicating, as these are some of the issues that we had trouble with. People love Infrastructure as Code because it’s like, oh, I’m a hardware engineer, but I get to do coding. Yay, I’m now a developer. But one of the hardest things that we tried to do is that the jobs out there, one of the things that we tried to do was Institute, the concept of No, I’m sorry, let me back up.

38:08 - Because one of the things you end up having was these system engineers that would basically go Okay, let me go figure it out, do my little stuff. And then I’ll write the code. And one of Larry’s pet peeves is, as from an automation perspective, no, you write it all the way through, you step through your code, hello. And then when you find the right solution, it’s already there. We had one guy, who, every time we would deal with moss, every time he went through a rebuild, he had to refigure out how he solved the problem that he had before.

38:48 - That’s because he would not understand or adopt the automation principles that we’ve talked about. So to me, it’s, we as engineers, and you guys know, I say this all the time, we’re our own worst enemy, we have to step back and look at Infrastructure as Code is not an organizational construct. It’s a method of achieving a goal using a technique or tools of sex or a combination of techniques. In order to accelerate your adoption of the endpoint.

The idea is to have an organizational construct, transparency, teamwork, celebrate your failures, collaborate, all those things, organizational constructs, that will allow for or Infrastructure as Code or DevOps as a total principle to excel.

39:35 - And I think Josh, you want to add something there? Well, yeah, cuz I spent a lot of years automating primarily with PowerShell. I have a I have a repo private, with 1000s of scripts that are useless. Like I, I go back, I can go back now and look at them and I might be able to pull One clever bit that I did something in there, that is repeatable reusable.

40:06 - The challenge being, and I’ll liken this because it’s an extension of what Keith is talking about. While I was at Cisco, I would write automation scripts or tool scripts, to help automate some of our most common maintenance tasks that we’d have to do. Now, most of those maintenance tasks would happen with a team that existed in India. And these were phenomenal. Administrators, they did great work. And I would write and test and validate that the script would work.

And I would submit it to them for their, their maintenance window, which started at like, 10pm. Right Eastern time. And I’ll around 11 o’clock, they would call me and telling me it was it was okay for me to run the script now.

41:00 - After you just gave it to him. And I’m like, No, this is for you to run. Oh, yes. We’re not going to do that. Because we don’t, we don’t want to mess anything up. It’s like, no, it’s best and validated, it works. What I learned is that I needed to provide I needed to bridge the gap of their understanding and the capabilities of the automation. So I learned that basically, I needed to provide a prompt, that all it said was, you were about to execute maintenance this on this target, would you like to proceed? Yes.

That’s all I needed. I needed validation. But they needed that to feel like it worked within the framework. They know, over time, several of the spokes got more comfortable. But when we think about things, for instance, like Infrastructure as Code, we need to remember that to Kies point, we have to have very specific defined outcomes that everybody is agreeing that we’re moving towards. And to garner the type of collaboration and consistency, we need to recognize that the level of skill knowledge and awareness of how we get to an Infrastructure as Code like uniform framework, not everybody is going to get there.

And and what the biggest example of this that I share, when I present and talk about automation is, if you’re not sure about that, just watch any person who does a demonstration of their automation tool, or whatever it is, they’ll go in the console, and they’ll execute it. And then they’ll pull up a GUI to prove that it worked. You should never do that.

42:38 - Just like should have come to that. Yep. And you bring up a valid point, because I went through a series of, you know, the ACI automation, you know, from a Cisco perspective. And, you know, talking about the automation, you know, from from ground up, completely deploy a full fabric all the way through the first thing that people outside of the team that was doing the automation, the first thing that they wanted, was that web interface, they wanted the validation that it was working, rather than watching that it was actually working and saying, We’ve already put this into pipelines.

We know this works. We know the results we’ve tested, validated, they still had to have that validation. But the other thing I want to add, I don’t see as a problem. Go ahead, Larry. But I mean, I it’s one of those things, maybe the scenarios a little bit different. But but it’s one of those things where, you know, unless you’re comfortable with, you know, those of us that have done automation, I can’t log into a V center interface and navigate it.

I can’t, I can’t log into an ACI fabric and navigate it. I can automate the hell out of it, though, because I know all of the API endpoints, I know all the different things I need to do to get the goal get to the goal, right? The goal isn’t just that single thing. It’s the collaborative thing across the board, the holistic view. But the thing I wanted to add that I’m gonna hand it back to you, Rob, is you made a comment about not being able to reuse playbooks and things like that.

44:09 - I what we saw, again, this goes back to Keith, you know, and what we saw is that, too many of let’s just say us, when we get into automation, we only look at the thing in front of us, right? We don’t look at the bigger picture of how we develop automation skills, how we develop the automation tasks, what they look like, long term. So I only look at, I need to do this thing. I don’t care about anybody else outside of that.

And I don’t care what it looks like six months from now. Where those skills that I think that we need to get better at is that we’ve got back to Josh’s point, we’re not doing anything different than what we’ve done manually for 20 plus years. We’re just being able to automate those things. And we know what it takes to get from here to there. We need to bridge those gaps.

45:00 - Just to make things more portable and usable, with less friction across this is this is this to me comes back to that a lot of the tools that we use were designed to solve individual problems and individual systems they they weren’t designed to Josh’s point with collab with transparency to my point.

45:18 - Yeah, collaboration, right? Yep. Those were added after the fact.

45:24 - They weren’t, they weren’t the first they weren’t the transparency and collaboration I think are critical to building good automation systems and organ it was an organizational, they support the organizational design, some of what you what you were talking about, to me is an empathy.

45:39 - And I saw an SRT con talk that was really good about about empathy, automation, empathy.

45:45 - And Josh, like your thing about, Hey, I, before something starts, I wanted to treat me, you know, ask me a question, acknowledge my anxiety before the thing runs or, you know, hey, something big, I set up automation, something bigs happening, having the UX having a UX to, you know, have empathy with me, my anxieties, as the operator to see that it happened? I don’t think is is wrong.

46:10 - But it’s, it’s, there’s a dimension in a lot of these automations automation systems that we we don’t acknowledge perspective, where, who is the user versus who is the developer, if it’s an operator, you need to actually design with the operator’s perspective in mind. And so if, if you’re, you’re doing it, then you’re going to use the console. But if it’s an operator, you want these extra cues, because you didn’t write it.

So you need some sort of communication. So long.

46:52 - It’s a an individual tool versus a an organizational tool, an organizational tool has to have more cues in it. So Rob, I would, I’m glad you put that empathy down.

47:08 - And I would like to extend empathy, the way it has been described here, beyond the warm and fuzzy, right. Empathy is not just a, you know, let me take, quote, unquote, feelings into account. I’m not talking about that, right? Generally, if you’re managing a very large, broad software development lifecycle internally, I’d like people to regard empathy as well, what is it the specific build that I am? So forget infrastructure, just a large software development cycle, anything that I’m building a component within the software, right? If I’m responsible for a library, a self contained library, as far as I’m concerned, how that interacts with the larger software code.

47:54 - As long as all the contributors to each library as it exists, where it falls in the larger software build.

48:03 - I regard that as empathy as well, now, you know, people try to do it with software contracts, what they’re building, what component they’re building. But this is where you’re able this happens, you know, this anxiety happens as is. But if people understand where the component is that they’re building in the entire large software stack, and what the dependencies are not necessarily what the exact dependencies are.

Because that becomes very hard and large software process, right? Where the lock where that where the potential dependencies, because it could be that our regard as empathy as well, right? If you regard how we do cloud builds now, somebody that is perhaps doing an upgrade, or like to incorporate a new Im build in their terraform provider, they need to have this empathy that, you know, Where exactly are all the secrets that are managed within? Whether that is, you know, the touch points with involved, what the, what other components could perhaps inter interface with? This is where it helps.

So, you know, before I go on and proceed, Hey, I got to talk to my team as well. Because this is where the failures will happen. If I go ahead and change what my tokens are, or, you know, if I incorporate new tokens within my serverless build, I got to make sure I actually talk to you guys as well. No, I mean, you will love it as well. It’s so important. So absolutely.

49:34 - And you know, this is where you say, collaboration really, really matters. Yes, collaboration matters, right? If you look at the whole, you know, that mobius strip of a DevOps lifecycle that people really talk about, each person is responsible. Each team is responsible with different parts of that lifecycle. While there are lots of other little circular loops that go on in each build process. You have to know you know, what you’re taking in and what you’re doing.

50:00 - acting in that whole DevOps lifecycle, it’s a lot easier for me to understand that because people who have managed, you know, software development life cycles, very complex software builds it, it’s intuitive to us. So I don’t necessarily like this demark. As you know, empathy should just be well, are you an operator? versus Are you a database build guy, or your database performance guy. All that can collapse into if you regard all fit as a large software development build, there could be components to it sure, that are very hardware centric hardware focus.

But if you start kind of abstracting even the hardware bills, as these are just abstractions, regard them as just software bills, there’s objects in the environment, as software building, you know, what, what am I doing, when I change this? Who are perhaps the touch points, now, you’re going to be very hard pressed to find 1pm, one, product Program Manager, they’ll understand all of this, and they’ll take on the responsibility, oh, you know, I’ll manage all these different iterations that are going on.

And I have to make sure that all of them match. Unless you have this comfort level between every person that’s participating in it, you get that freedom. And Josh, to your point where you started talking about, you know, just maintenance scripts, I saw something very similar. But in reverse. I was at EMC, we had our software Build Team in India, and you know, somebody higher up, decided to take the dev part of the Build Team in India and just bring it over to China.

The China team was just doing automation, a unit test regression tests, and all of us and they were responsible for fricking development.

51:47 - So it was trial by fire, you know, how to get them to be in a development mindset, because all they knew was how to run unit test scripts. That is it. And now they were being burdened by Oh, yeah, we’re not getting Sheldon scripts. We’re telling you what to actually write. You’re not writing automation scripts anymore.

52:08 - To bring it full circle with Infrastructure as Code. Yep. I think the biggest challenge alone Infrastructure as Code and Rob touched on this earlier, and I know, Larry, and made reference to this, it really comes down to a lack of empathy from the infrastructure providers, and understanding the ramifications of the changes that they make to their code, to their API’s. I tell you, one of the biggest selling points I ever had, it was funny, because the product managers and SolidFire some of them were like, why are you talking about this? It was the fact that we had side by side API’s, we had every API for a solid fire system existed in code and was accessible.

52:52 - in perpetuity, like every version of code to head, so all I had to do is to tell my my strongest automation supporters for SolidFire, like just hard code to the version that you have. And it doesn’t matter, you can update the SolidFire system. And I’m using SolidFire, as an example. But this is a sample of you know how you can be empathetic to consumers for Infrastructure as Code, but you just hard code to the the version that you’ve validated and tested, you can still provide all the updates to the software, you know, indefinitely without changing your infrastructure automation.

We don’t see that very often. And instead, we say, these changes to your automation are gatekeepers to you being able to take advantage of whatever the latest and greatest features. And in our case, we released an update that improved efficiency like 25%. That’s huge, like that made millions of dollars for some people. But if you weren’t going to be able to upgrade your automation framework to take advantage of that.

Yep, that’s awful. It’s horrible. So you know, good design principles that I’ve started to use is yes, terraform gives you providers, so you can have specific providers, but you know, if people don’t have an appetite, right, they say no, you know, we don’t want multiple providers, because, you know, how do we manage all this. And this is part of where the education comes in, you know, you can start even in the build, push, you can start pushing out feature flags, and start telling people hey, these are what will become relevant to you.

So you know, give them some comfort level, month to month, a quarter ahead. They start seeing these flags turned switched off, but they know what they’re writing through as a reminder, while the next build that they are building to they exactly know how the other part of the code is going to use it because they’re not, you know, active, but still the show up as feature flags, so it’s any new developer that they bring on.

Even that person out of curiosity should be asking, Hey, I know it’s switched off.

55:00 - What is this? You know, what are we? This is where empathy comes in, right? Always have some kind of a forewarning that this is what we were all building towards. And oh, by the way, make sure you hard code, those versions of providers and Ansible versions, and all those. Just go there.

55:18 - Alright, alright. I, we’re, we’re, we’re way over and I need us to wrap it up so I can do like, I can stay on schedule.

55:28 - I love actually I love how this conversation turns because I think we we’ve gone through a really meaningful Infrastructure as Code conversation. And I really appreciate everybody’s thoughtfulness on this. This is this is what it’s really, really about. And that’s how we have to explain things. So.