Orchestrating the Orchestration


Introduction

Docker with docker-compose is an amazing combination for local development and testing, but it’s not something you would simple use on it’s own when running containers in production, and especially in distributed environments. We want to automate as much as possible so that everything can be done with one simple command in Slack. (And yes, chatops is quite the efficient way of working with your environments.) When we’re working with (really) big environments it’s no longer enough to have deployment scripts that SSH to each node, or tools like “capistrano”.

So, what can we do about inefficient deployments, and as a whole, production environments?

Enter Kubernetes

After poking around a couple of other solutions, such as smarm mode in Docker, Kubernetes felt like the more mature, optimisable and scalable option. And although it has it’s upsides, it comes with a lot of quirks.

In order to get Kubernetes to qualify for production you need to do a couple of things yourself.

  • Plan it’s deployment (Terraform, scripts, etc.)
  • Configure monitoring tools (Heapster, InfluxDB, Grafana)
  • Configure logging tools (Fluentd, Elasticsearch, Kibana)
  • Configure RBAC access control
  • Set up autoscaling
  • Figure out updating

If you only deploy Kubernetes by itself you basically get a really fancy environment which can do a lot for you, but it will be insecure, prone to failure and tedious to manage.

Now, the question is - weren’t orchestration tools supposed to make it easy for us to work with distributed production environments? To be honest, that’s pretty idealistic. In the current state Kubernetes requires manual maintenance and although it does not necessarily mean that it is complicated, but why can’t we automate it?

Personal experience

In the company I currently work for we ended up crating a step-by-step plan for upgrading Kuberentes. It basically consists of

  1. SSH-ing to every node and updating the running kubelet
  2. Updating the cloud-config of the worker autoscaling group in AWS and replacing the master nodes one by one with the new cloud-config

As you can see this can be a tedious process. It requires SSH-ing to nodes which work production loads and updating a lot of configurations. This is already quite the risk. The last thing we want to do is SSH to nodes and manually update them.

Tectonic by CoreOS

Tectonic is actually a pretty nifty solution for managing Kubernetes. You can use it to deploy a Kubernetes cluster with Terraform (yes, Tectonic uses Terraform), update it, manage the nodes and also work with Kubernetes itself - manage deployments, services, etc.

The software comes with a CLI and GUI installers for deploying Kubernetes and a handy dashboard for further management of the working cluster.

What I personally think is really nice is the built-in Prometheus monitoring for your cluster, so you don’t have to worry with the whole Heapster and InfluxDB setup. If anyone has had to deal with setting up RBAC authorisation in Kubernetes, you’de be happy to hear that you can manage that via Tectonic too.

Orchestrating the orchestration

Now, my question is - why isn’t Kubernetes shipped with Tectonic by default? A whole extra layer of orchestration has been added for the purpose of orchestrating our orchestration tool. It is kinda funny for me to see that so many systems exist on top of each other when it can all be boiled down to one. I think it would be amazing if Kubernetes just shipped with Tectonic by default and we would not have to worry about all of the management.

For anyone that would like to start using Kubernetes in production - take a look at Tectonic. 😃