Running Kafka in Kubernetes

Nov 27, 2020 08:36 · 4619 words · 22 minute read jmx uh local machine kubernetes

my name is ned beveridge and i’ll be sharing with you how we run and manage kafka clusters in kubernetes at amadeus and amadeus is a provider of i.t services for travel industry so for those of you who came uh by plane and stinger hotel you might use be using our uh services without knowing uh we’re celebrating 30 years this year which makes us much younger than our biggest competitor which is a texan company we have been running kubernetes since its inception so since three years which is basically 30 years equivalent in kubernetes years on our own premises and in public clouds i’m solution architect at meduse i’m helping our business units i develop new platforms or migrate in 30 years we have quite a number of applications and today’s migrations are almost always towards kubernetes so we use kafka we use kafka outside kubernetes we use kafka next to kubernetes we actually did it since we started using it for logs and for events collection we have been installing it using puppet which is an experience we don’t like to repeat uh and recently we had this idea to start using actually kafka for more functional stuff and we are building the streaming platform where we have a number of uh events uh operational events from the airlines or things like bookings or uh boardings which go into the platform and there’s an a whole number of uh microservices which process this in the pipeline and then certainly some actions executed at the end and we use kafka as an underlying messaging infrastructure so uh it’s supposed to be advanced session so i guess you all know kafka but to be on the same page or is kafka it’s a streaming platform you have a cluster of servers and servers are called brokers storing streams of records and topics topics are split into partitions which are spread over all uh brokers which allows for horizontal uh scalability you can add new brokers and new partitions and accept more traffic you can replicate those expectations to have a higher ability if one broker dies you have a backup on another and you have producers and consumers as clients of this uh platform okay and so can we run it in kubernetes well when you run a normal replica set or you do deployment in kubernetes you get something like a pod which is called uh with a randomish extension at the end which is not fits well with uh kafka in kafka in a kafka cluster each broker has its own unique identity uh which is both an id but it also has to have its own unique network address so that brokers can talk between themselves and also they can clients can talk with brokers they also need persistence to store this uh partitioned log files in addition you don’t you need another thing next to uh kafka class you need a zookeeper cluster because kafka stores a lot of metadata inside zookeeper which is basically exactly the same thing you need another cluster with it knows whichever identity and persistence and luckily we have now stateful sets in kubernetes actually when we started those were called pet sets they evolved and they will soon be uh exiting beta so what is a stateful set unlike traditional pods if we can save something three years old a traditional is uh stateful uh uh set pods have a stable pot identity so they’ll be called pod zero pod one was two they uh you also need to create a headless service which is a kind of sub domain within your namespace so your pod address will be pod0.some domain if it’s the name of the service dot namespace they provide stable storage they provide order startup and start down a shutdown so they will start press zero then one then two and uh shut down in reverse and since recently there are rolling updates so to run kafka and zookeeper we need two states possessed and actually in the mediums when we run kafka zookeeper deployments we always run one kafka and one associated zookeeper uh for those who operate kafka you know you can share zookeepers across man several clusters it’s not what we’re doing to bring us we’re deploying one one to one um and uh in addition uh there is a discovery service so unlike headless service which doesn’t have a cluster ip the discovery service has a cluster ip and it actually allows clients to just say okay i want to connect to kafka you bootstrap them you don’t have to tell them go to kafka zero you tell them go to kafka and it will fall on one of the brokers and then learn about full cluster and of course if you want to deploy things in kubernetes first thing that you need is containers and you need your descriptors and then you need probably things like i want to facilitate this and to replicate and to do several times deployments you need things like charts so there’s a number of projects out there on uh github i think most are inspired by audience kubernetes kafka those in bold are the ones which are inspired by how we operate inside the modules you have a chart and you have the operator there so i’ll be trying do a small demo so i hope the demogods are on my site today it worked on plane coming over here the first thing i will do i will deploy to a producer and a consumer and of course we’ll see because i haven’t deployed kafka cluster they will probably fail royally so no failing there’s no kafka there there’s nothing in the cluster so let’s uh delete those two and i’ll now deploy classroom at the madison we are actually running uh open shift as a kubernetes distribution so we are using openshift templates to deploy here i’ll be using helm charts uh helm is a package manager for kubernetes and it’s a great way to reproduce uh your deployments or to have multiple deployments if you want in an environment so i’ll here deploy my kafka cluster which takes just a few seconds and we’ll see what uh how will uh deploy all the elements necessary so several stateful sets and services that are needed so let’s have a look first we have our staple sets up there uh two of them kafka and zookeeper i uh requested three uh of each and we have our services that was mentioning just before we have a headless services with no ip no cluster ip and we have a cluster ip services for for the discovery there are a few other things that are deployed here but i will talk about those uh uh later so while it uh spins them up i just share a few of the practices we use actually in uh amadeus when deploying kafka kafka is pretty much this io and network uh performance of the captain’s disco network uh based so we actually want to land kafka brokers on the instances which have good performance this wise so basically it means ssd and we use node selectors for this for all the brokers have node selectors that want to deploy them on the nodes which has have a label disk fast or something similar but there we also want to do something is that we don’t want that all our kafka brokers land on the same node with this uh label because if we lose that node then we lose all of the clusters so we are using anti-affinity which is a feature in kubernetes which allows us to tell uh take this a pod which have this specific label and spread them across some topology key and here we are using a hostname key so we are saying basically to kubernetes when you deploy these kafka brokers please make them across different uh machines we are using preferred you can also enforce it you can say it has to be on different machines another thing which is necessary for kafka is persistent storage when you’re using stateful set you can use a volume um volume claim templates where actually you specify a kind of uh volume claim you want and each pod would get exactly the same volume claim type so you have six pods you will have six different volume claims and for those of you who are reading the slides instead of listening of me it’s written things they are completely different because that’s a common wisdom when you’re running kafka you want to keep your logs you want to get persistent volumes you get it’s provisioned it’s attached to your pod it’s stored your pod dies you will get it back again uh in our particular case we are building a streaming platform which has pretty strict slas from the moment that an event comes inside uh it should be all the action should be taken within a few seconds most few minutes so we can keep the amount of data fairly limited on the brokers and we want this high performance so we could use host host path but that’s not good security wise so what we actually do is we for each of the pod in a pod set we simply say okay use empty volume it would be a local disk it would be an ssd it will be fairly fast uh if the container crashes we get actually the same empty there and all it’s fine if the node crashes while we lose it but we are running kafka and one of the selling points of kafka is this replication you have a copy of your uh um partitions replicated so when a pod is spun up on a different node eventually it will get up to date in sync with uh the current leaders and it will uh be able to serve it of course to be able to do it you have to have enough uh brokers and you have enough replicas so if you have five bulk uh broker cluster and two replicas well you can afford to lose one uh of the brokers um well you have two rep because you can only afford to at least one uh brokers if you lose two you you might be in trouble uh what’s coming soon in uh kubernetes it’s actually an alpha already is local persistent volumes which would be volumes which are on a local machine and the pod would then be only scheduled on this particular machine it’s something that we’ll be looking into future uh in future uh honestly currently behavior with the empty deer was sufficient for our use cases you have to monitor what’s going on you won’t have to see there are several approaches how to monitor kafka you can see those also hub you may use kafka scripts what we actually do in media is we use uh tcp socket actually we opened uh the sockets on kafka brokers because that’s what’s telling us it’s running uh it’s accepting connections and we use prometheus and jmx uh we are we’re repackaging ourselves uh containers so basically we’re deriving from what has been done in fabricate projects which for everything which is jvm automatically exposes uh prometheus and jmx entry points so we can have nice dashboards and if the operators need to do something they can go directly and connect to the pod and look into the j mix before diving into operators let’s have continue with our demo power so okay have a little bit trouble here let’s deploy again our consumer and producer let’s see what happens there oh it’s not working so uh well actually it’s connected it’s no longer a big java exception we see there uh it’s connected but we are running in a multi-tenant environment so we have uh micro services which are published there so we have dozen of teams we’re publishing those independently and we want actually to control who connects to the which cluster we don’t want it okay you go there you just type kafka 1992 and you’re connected to the cluster and you start publishing we want to use uh uh to identify uh our clients and we are uh actually before i have installed as our process we create a separate secrets and there are secrets published for each of the kafka clusters which allow clients to connect so there’s a jazz file inside it so let’s do it let’s deploy the secured version has to spin up eventually it will hopefully yes it’s there and it’s not running it’s still failing but it’s a different error it’s failing with a unknown topic because uh if you want to run things with kafka you need to create topics and there are two ways basically how you could do it you can do it uh by saying okay anyone can create the topics which basically leads to character thing there you have hundreds of topics or you may use kafka scripting scripts to do it so either there is a person who is typing or you have ansible or whatever uh twin you use to create those topics which brings us to the subject of the operators so what are the operators so it’s a pattern of transposing the domain knowledge of sre operations or releasing teams into executable code to automate uh behavior uh so based on some kind of descriptors you describe what you want to have and the tooling will do it for you and then those are actually level operators we use couple of those there is open source per material so writers reviews the blue ones are actually the ones that we have written and some are open sourced already workflow and some will be like credits cluster and one of the first things that operators do is they provision uh clusters that’s what promotes for example does or that’s where our radius cluster operator does among other uh things uh as you see i can actually fairly easily pro provision cloud clusters using open uh hand charts or openshift templates or apps with so uh today at keynotes and once the platform is up and once kafka is running for us it stays in place it will be there for a fairly long amount of time we can scale it up if there’s a problem and that’s usually what we need when we scale up uh scaling down and evacuation if you have to do upgrades of the nodes and upgrades are tricky some of these things work because of the kafka applications i’ll talk about it a little bit at the end but you have this question of topics as we’re deploying dozens of micro services we need dozens and even more uh of the topics being created and for us we want to have the topics present in target environments when we deploy microservices so we can start immediately running we want to delete them if a micro service is not there because particularly they can be rearranged and for different customers want to have the same behavior in environments in development in qa in production but also across different production clusters as we may have clusters running on our premises or in the public cloud uh we want to be able to react on things like okay this now it is no longer having disk space let’s reduce reduction time uh for kafka and uh we all actually want to deliver all this as a code so when developers actually finish their coding they do their pull request there’s a container built we actually from their description of their project generate deployment uh yaml file and generate this descriptor of the topic which then goes into bernie’s and then there is a process in kubernetes operator which looks into it and applies it so how does look a topic as a well it’s a config map for us it might be soon custom resource it’s a config map which basically says i want a topic like this name which is the name of the config map with the partition count with the replication factor like this and maybe someone knew what how to configure more into details the properties and whenever this config map is created in uh in kubernetes the operator that’s managing the cluster will create a topic for it and whenever content map is deleted it will delete it and it’s actually exactly the same behavior as service catalog provision and provision uh behavior we’re intentionally mapped because we are going towards offering internally uh i.t services as a service catalog so let’s do it let’s create this config map so it’s there and let’s have a look at what’s going on here so we’ll have a long list here oh and it’s working so as soon as the uh config map was there topic was created and the publisher started working and the consumer started consuming i can just show you what happened inside the operator so uh inside the operator there there’s a log which basically says okay i’ve seen that you have the config map and i’ve created a topic for this config map so that’s fine it allows us to manage uh all the topics but we want actually to go a step further there uh we’ll be conscious about security we want to control the access uh to the topics um we don’t want that f developer hard-coded the topic name in the code that suddenly the microservice which is deployed starts publishing to a topic which shouldn’t be there so we are actually using access control where we have um each deployment comes with annotations saying okay listen i’m consuming this topic and i’m publishing to this topic it can be a multiple actually and what uh what the operator does it monitors this behavior this um uh deployments and will actually apply actually choose first one uh random user assign it to this uh deployment create a secret and assign the rights into kafka to use it so let’s do this part of the demo deploying here the acl and if we look at the operator logs we would immediately see that operator has uh seen that uh there’s a new deployment there called kafka producer and has assigned a user for it and does the same thing for the kafka consumer a bit lower and assigned the user right and if you look at the secrets actually see that we now have uh credentials created for our producer and somewhere there should be like here should be credentials uh created for for the consumers and those credentials contains users which in this particular case for consumer can only read from one topic and a producer can only publish to uh one topic the previous use the previous case where we did the credentials it was it could publish to anything um okay i’m a little bit ahead of the time so it’s good kafka upgrades um it’s the thing that we wanted to build since the inception inside um kafka operator during our work on this i think there were like three or four exchanges in kafka that had to do upgrades and basically each of them had a little bit different scenario which is like not really looking like something that’s easily uh can be easily automatized uh so for example if they change into broker protocol you might need first when you upgrade to use the old protocol so you roll out one upgrade then you change do the configuration change then roll the second upgrade but maybe the kafka change storage format hopefully now they are one zero so those things will happen less let me change uh uh storage format so you might have to first to update consumers then go uh on on the servers i have this idea it’s like don’t upgrade but instead recreate cluster so basically it means we have one cluster running the old version we create a new cluster with the new version they’re running next to each other uh we can publish actually to both uh clusters and then we switch it’s kind of blue-green deployment it comes with its own set of problems which is actually wouldn’t suggest that anymore will more looking into automatizing that in in the future finding a way how we can do this the first step simply and uh performance as i said kafka performance is basically dominated by this i o that’s what we have experienced and having a good disk we’re up opting for ssd but adding a good disk even if you have network storage maybe it will work with a good network storage uh then the second is by network uh and it’s almost never by uh cpu or by uh memory uh fairly low even for throughputs like 100 k messages per second we have issues there’s a couple of things which uh uh appear strange when doing tests things so some sometimes kafka brokers and a broker and the zookeeper land on the same node so it actually reduced quite a lot network uh through through there because they are talking to themselves uh on the same node sometimes like actually when clients land on the same node uh so you might see that you have have like uh 20 instances of of a micro service pod and then two of them are having this super high performance and all the others are behind the same so they landed actually on the same node as um as kafka brokers and they’re talking um between their self things to be ready in a cloud environment that’s not everything behaves always the same and i think that would be it for uh for the presentation if you have any questions feel free to ask this is a kafka cluster zone aware how are you handling hp okay so uh uh we are our our clusters are spread uh uh uh kubernetes clusters are spread across um our across our uh our infrastructure and uh we’re basically relying on the fact that they are they’re not zero aware we rely simply on fact that uh they will be split across different zones in our data center but uh not we don’t make them aware of the zone they’re not right there’s nothing like raqqa wear or things like that we have our uh we operate our own data data center and we would deploy to water clouds but repair it on data center question was do we have any constraints on number persistent volume we create per node as it said when we run kafka we are running them with empty there so we don’t create persistent volumes uh we are not using much persistent volumes at the uh moment we use for some of the monitoring tool uh we used it for the kafka and but it hadn’t put in place any specific constraints there the question was how do we handle the kafka operator um uh upgrades well it’s cooperate is completely stateless so it’s actually upgraded out of the band it can be updated in any moment in the time and actually all the updates it does it does is by delta it’s like kubernetes it’s it checks what is the kafka cluster what do i have in kubernetes and applies the delta yeah yes yes it’s watching changes on uh the question was it does kafka operator listen to the changes on a couple questions do we provide a way to uh get to kafka from outside cluster meaning clients are outside of the class no if someone has the solution for that we we’re very interested okay question was did we write the operator with the java client for the kubernetes with the java client for the kubernetes and java client for the new java client for the kafka admin client so confluence suggested to run kafka cluster on bare model for performance reason what do you think about that okay uh um our approach is that we tend to run as much as possible things in vms and then let’s run it on to kubernetes uh maybe we’ll be running kubernetes on bare metal uh but currently we’re running on the vms uh we had the same exactly the same suggestion from confluent to run into bare metal but we went for vms the question was uh why is headless services needed it’s needed for stateful set it’s by design that’s how stateful set behaves you need to provide the headless service and then the pods will be in with the dns name which is pod name state uh headless service then i think in this platform our typical cluster would be uh five brokers the order of deployment is that the two keeper has to come up and running before you deploy your things we don’t enforce order of deployment we actually um i don’t think i put uh getpod let’s do just that ctl get actually seeing that the kafka zero restarted once i was restarting because it’s waiting it was it said there’s no zookeeper and it was uh it crashed and restarted uh again so you’re you’re asking the kafka to crash and recover until the zookeeper is up and running yeah okay can you just speak into the mic please uh you put the link up of your github operator yeah do you have any other custom code that you use to run it or is that kind of the everything that is needed no it’s not what we’re on inside house if that’s what was your question would you be able to describe any of the the customizations that you had to do we have few customizations around security mostly which are inside it’s pretty much a similar thing uh also we are less up to date with the recent version of kubernetes and openshift inside house compared to this one so you don’t see any restrictions on running the published version uh in so that’s actually we are in talks with some open source uh players that to make this really open source well if doesn’t get anything we will be open sourcing it uh uh from a media side but and if there is to be a community we think there should be someone who is more into this kind of business what kind of storage do you use when you run it on public cloud um so we don’t run uh uh as i said we don’t run this with persistence uh volumes uh at the public cloud we use instances we have which have ssds even on public clouds can you describe your general strategy for scaling down no scaling down a general strategy to scaling down would be a lot of manual work by moving uh partitions and replicas and then scaling down and one of the things is that usually this may be linked to usually most of the problems that we experienced were due to the humans when the humans scaled it down without taking steps what they have to do before so it would be really great if that can be automatized questions well thank you very much .