Containers have been around now for quite a few years. We can trace the concept back to 1979 and the introduction of the chroot system call but it wasn't until BSD Jails, Solaris Zones and LXC in 2000, 2004 and 2008 when the technology started to mature. Zones in particular became incredibly stable very early on. With a very high level of isolation and performance, capable of multi-tenancy systems.
With the rise of VMWare and IaaS providers like AWS, container technologies took a back seat as the masses embraced cloud computing. Containers weren't fully able to satisfy the demands of ephemeral and dynamically scaling systems. However, in more recent years Docker has revitalised the interest back in this technology by introducing the idea of application containers and a powerful set of tools and infrastructure for maintaining container images.
Expanding the benefits beyond performance and resource utilisation gains, Docker improved standardisation, configuration management and portability, meaning containers are fast becoming the next hot technology (if they're not already). However, they do maintain some challenges in the Cloud. Specifically monitoring, orchestration (e.g. automated scheduling and auto-scaling) and service discovery are an additional burden.
To solve these issues whole new solutions have arisen. Technologies like Kubernetes and Mesos have been developed to handle the orchestration and scheduling of containers. Although they introduce some of their own issues (complexity, automation, customisation and deployment of varying applications and cloud providers, engineer up-skilling, support and upgrades dependency introduction), they are powerful technologies and do a good job.
There are many well written articles (download the linked document from Rancher's article) and quick set up guides for these systems, so I'm going to step aside from them and look at an alternative method - one which utilises a system already incumbent in most environments and a couple of very simple caretaker scripts. We call it the Concierge Paradigm. So if you've struggled with some of the additional challenges of running containers in the cloud brings, or you're just interested in different points of view, read on, I'm sure that you'll find this interesting.
The Concierge Paradigm idea was borne over time but initially came from challenges we had by working out how to operate scalable and enterprise-grade application containers. This meant monitoring, logging, deploying, scaling, load-balancing and service discovery. We needed to do this both on-premise and in AWS. We were running some docker tasks in ECS and a Kubernetes and CoreOS cluster on-premise but we hadn't chosen these by researching different options, they were pretty much the first options we came across. We'd felt the difficulties of our decision, so started to look out to the wider community at other options. This is when I came across the Autopilot Pattern developed by some clever cookies at Joyent.
Joyent have been running a scalable, container-native infrastructure for years before anyone else, so it wasn't a surprise that they came up with a simple and elegant solution to container orchestration. The general premise of the Autopilot Pattern is that instead of maintaining complex frameworks to manage your containers, you simply strip the infrastructure down to its most essential component - service discovery (Consul, Etcd), then keep the orchestration logic with the application code. This of course passes the responsibility back to the developer, but who knows better than they how the application should run. Furthermore, the logic can be kept in the same source code repository as the application code rather than for some other external system. Below is a workflow diagram for Containerpilot (previously Containerbuddy), which is small piece of code that runs inside the container and is the only other component needed in the pattern.
Personally, I found the idea beautifully simple and very interesting for our particular problems but it has many other benefits beyond its simplicity, including things like:
Further to managing the lifecycle of the application, Containerpilot also has the ability to run coprocesses. Although Docker recommend running a single service inside of a container, I'm yet to find a really compelling argument for literally running one process as opposed to a single service with supporting processes. For example, it can also be used for running a monitoring agent (Containerpilot has the ability to open up a telemetry endpoint but there are a number of benefits to running an agent which I will elaborate on later). It's here that Concierge Paradigm began to develop.
In our bare-metal and VMWare & AWS VMs implementations we used a combination of Zabbix, Splunk (although we deprecated this ages ago due to costs and experience), Papertrail, ElasticStack (ELK) and Librato. All with a particular function and at different stages of life. Logging was handled through docker log drivers and we were moving to pumping everything into ElasticStack (Papertrail had a quick integration but was limited). Librato, the same had a much nicer user experience than Zabbix, it just wasn't as functional, nor incumbent; and primarily it was missing event management feature which Zabbix does so well.
On the Zabbix side we ran the Zabbix Agent on all of our hosts, which took responsibility for collecting metrics from the OS, applications and any frameworks. Being a Java house JMX and OS were covered off natively by Zabbix. However, as a company we'd made a decision to not open up JMX interfaces (for the Zabbix JMX Gateway) in AWS and chose to use Dropwizard framework to push metrics to Librato. This lead to a problem with capturing such metrics in Zabbix because the delivered object was not one understood by Zabbix. So we chose to use Dropwizard metrics' AdminServlet and used the agent to query this locally and reformat to a JSON object that Zabbix did understand. This allowed us to discover metrics easily by simply querying the endpoint, then capture each metric we wanted in a similar manner using curl from the Zabbix Agent.
This method worked okay for a while but because each metric was queried separately a lot of post processing had to happen and as we scaled down the size of VMs in AWS and increased the number of metrics, we found that we hit a threshold that significantly impacted CPU utilisation. This meant we couldn't capture as many metrics as we wanted; nor as often as we wanted without seriously hammering resources of the host. In some instances we were even forcing the application cluster to scale up due to high CPU utilisation. Later on, another problem arose where we discovered that because there was a time gap between when we collected each metric, we sometimes had a bad windowing issue when we were trying to calculate metrics. For example, we would see spikes of 300% 5XX HTTP response errors because the last value we captured for 5XX was more recent than the total request count metric (particularly when clients were aggressively retrying). This issue wasn't noticed in our Librato push model because a single, snapshot object of all metrics was sent to Librato and the timestamp taken from the time of delivery. It also meant that our containerised applications would be impacted even more seriously if we followed the same approach, so we had to look at other options.
One of the things we did was create our own Zabbix Metrics library for Dropwizard metrics, written by Manish Kumar, with Long Chen. There were other libraries available but they didn't solve discovery well and we felt this important for automation. However, we still needed to be able to monitor container resources, underlying Docker host VMs, middleware, non-Java applications, off-the-shelf products, plus anything we felt needed to be covered off.
From research, it seemed that solving the Docker monitoring problems was best done by either running an agent of some sort on the Docker host VM or running a privileged container that had access to the Docker socket. Neither of which I like much. Partly for the security aspects but also you may not always have access to the underlying machine. Enter Autopilot mode and the Concierge Paradigm.
It's probably clear by now that the first thing we did was include a monitoring agent inside the container, under the management of Containerpilot. By doing this, functionality already built into the agent, allowed us to immediately answer a number of questions:
With previous performance issues in mind, the next step was to work out an efficient way to monitor applications in this environment. Doing so without impacting the limited resources available inside a container caused by machine gun approach of scraping metrics individually. It was thinking about this machine gun way of metric shots compared to the push model - a single, accurate shot to Librato - that started me thinking if the two could be combined.
Named after the famous British sniper rifle, the main aim (pun intended) was to provide the most accurate (comparing data or deriving metrics from a single shot containing all metrics returns more accurate results), most effective (one shot, does the job of many), easily deployable on different operations (able to return the same result wherever it's placed) and able to gain immediate feedback on whether we hit the target. This contrasted to the machine gun (common pull method) approach where it takes many shots to hit all metrics (so not very accurate under comparison); or the shotgun (push method) approach where one shot probably does the job but getting immediate feedback on a successful operation is unlikely (at distance... There's no long-range sight on a shotgun... alright, just humour me).
We accomplished our Enfield Method goal by taking advantage of the Zabbix Agent's ability to run a script on the host (although it may be more effective to create a module, writing the script in Python makes the method more agnostic to the monitoring system being used. Plus there is this library to embed a Python interpreter as a module and numerous Zabbix Sender libraries (e.g. py-zabbix) which should improve performance). This script serves only two purposes:
In step one of our example script below, the discover_timers() function in the script (see below) is ran locally by the Zabbix Agent in active mode (i.e. After querying the server to discover what it should monitor, the agent then runs autonomously, collecting metrics and sending them back to the server without the server instigating any action), which subsequently returns a list of available metrics back to the Zabbix Server. The server then uses the list to create a set of trapper item keys that values can be assigned to later, in step two. Using an active agent is predominantly a push based method, but in this case, the agent still has to pull the data from the application endpoint.
In step two, the agent runs in passive mode (i.e. it waits until the server requests something before performing any action) and upon receiving a request from the server, it runs the get_metric() function of the script. This function queries the same endpoint but this time captures the key-value pairs of metric-name:current-value from the application and pushes them, all in one go to the Zabbix Server. The server then traps the object and assigns each value to the corresponding trapper item key, created in step one. To the original request still needs to receive a value, so the script then returns either:
The Zabbix Server then records this as another metric. One which we can then add triggers to so that we can be alerted us to possible issues with performance, network partitioning or to understand event changes to the state of running containers. This step is implementing both pull and push based methods.
The concierge_courier.py script is called such because it takes responsibility for performing a request from a client to personally collect and return a manifest of metrics:
Or it takes a request from a client to post an important document of data to an agreed delivery point and confirm back to the client that the package was received successfully. For example:
In Zabbix, we use this as follows:
Discovery check:
What we've done here is create a low-level discovery rule that parses values from a list returned by running a special, built-in Zabbix function (or in our case a custom script called concierge_courier.py), filters them, then makes names of things in that list available, as a macro called {#TIMER}, to be interpreted by our item prototypes for creating our metric key names.
Some item prototypes:
Item prototypes are basically blue prints for dynamically creating metrics based on a given list. In our script, the get_metric_record() creates a set of key-value pairs where "timer[{#TIMER}.max]": 1200 would be expanded to something like "timer[my-awesome-api.response-time.max]": 1200. For every iteration of {#TIMER}, that has a max value, a new trapper item key will be created on the server. E.g. "timer[the-best-api.response-time.max]": 800. Likewise for .count, .p95, .p75, etc.
The static items:
NB. In Zabbix terms, the type (circled in the image above) of Zabbix Agent is a pull based method whereas Zabbix Agent (active) is a push-based method (although it can be a combination of both)
Static items are one which are created manually (although generally templated) and not automatically by some dynamically generated object. Here we have three.
Triggers on the static items:
Zabbix provides a set of functions which we can use to test various conditions against our data and record events. I'll quickly explain what's happening here. We will generate an event of some specified severity, and maybe perform some automated action, when:
Now, we've gained some further advantages:
Although in theory, this method should be much more efficient, due to our previous experiences we wanted to prove the performance gains by testing the collection of a very large volume of metrics with as little resources given to the container as possible. In our lab, this is 128MB RAM and 20% of a CPU core.
To keep things simple metrics were collected from a file rather than running and querying an application endpoint. This also stressed the test even further because we were performing disk IO. We are also writing to disk again and pass this file, for the test, to Zabbix Sender. Clearly there are some optimisations that we can make but this is does highlight performance under restrictive conditions.
The top left graph is the returned time taken to deliver metrics. All other stats should be self-explanatory and as you can see, even under these extreme conditions, impact is pretty low. Such performance means we can start considering reducing the collection interval to capture metrics at a faster rate. Doing this will then give us a couple of further advantages:
With further potential optimisations in mind, I was happy enough with these results, so moved on to the next part of the Concierge Paradigm, scheduling.
The next part of the Concierge Paradigm concerns how we coordinate the scaling of our services. To do this we need to know the state of each service. As we know, each container in the Concierge Paradigm runs a Zabbix Agent as a coprocess. By doing this we can configure it so that it auto-registers each instance with the Zabbix Server without having to make any changes to the application. It is a simple function that is implemented by configuring an auto-registration action based on either the hostname, some host metadata, or both. In this example I'm using FQDN (which is set by configuring Docker's DNS_DOMAIN parameter) to pass the application name (e.g. Consul) as a sub-domain. This can then be used to assign to relevant templates and include the container in any host groups. For example, the host in the graphs above registered via the following action:
Whilst we attach this monitored host to various templates that will monitor things like standard OS metrics, the Concierge Paradigm requires only a few static items to be configured manually but they should all be templated so will only need creating once. How you organise your template schema is up to you but this is all that is needed:
As mentioned earlier, the container.state trapper item is set to catch container state messages. In the Concierge Paradigm, we accomplish this using Containerpilot tasks and the postStop handler. The purpose of this check is to maintain a heartbeat and running/termination state for each container in our environment. In our example we simply do the following in containerpilot.json
By configuring both container.state and agent.ping, one which is pushed and one which is pulled, we can gain a fair amount of confidence if a container is still running, has exited gracefully or exited catastrophically. Here, container.state acts as our main heartbeat and because it sends a termination value (0), we can react immediately to it exiting. However, being as this is a push based method, we will have to wait for some pre-set period to expire, should there have been some catastrophic failure that didn't allow this exited message to be sent. We can look to reduce this period by adding a combined TTL across both container.state and our pull method metric agent.ping (see previous triggers section). We can then schedule some action, like stop monitoring this instance and archive its data (just in case we want to perform some post-humous analysis later), without having to wait for the full TTL to expire. Basically we're saying, we haven't heard from you in a while, let's call your address and see if we get a response from the concierge. If we do, we'll give you a bit more time to respond personally but if not, we'll assume you left all together.
Let's have a look how we maintain this information. We start by creating a service definition for our application - in Zabbix, this is a simple as creating another host without a specific IP address and with only derived metric items (although, you could also give it a VIP address of some load balancer, should you want to aggregate some additional service checks). In the example below, our "service" is Consul, and we aggregate all the metrics from each node in the {$SERVICE_HOSTGROUP} host group, where the {$SERVICE_HOSTGROUP} macro is set to "Consul Nodes" (one of the host groups we add each instance to when it auto-registers).
What were doing here is creating a set of aggregated metrics from across the "Consul Nodes" host group. 3 which sum (grpsum[]) the the last recorded values from each instance in the group and 2 which average (grpavg[]) the last result. The containers::running and agents::responding items are both used for understanding the state of running applications in the cluster and their value is passed to our scheduling script to work out the target number of instances required when we want to scale up or down.
Auto-scaling profiles usually require setting conditions on one or more metrics. Then upon matching the condition, some action is performed. Usually, add or remove instances. Most event management systems have had this feature built in for many years. In Zabbix this is done by putting triggers on our metrics which, when fired, perform some actions. For example:
On the metric which is tracking the average load across the cluster (system.cpu.load[]), we've put two triggers. One when load is high and one when load is low. We can then use these trigger points for knowing when we need to be scaling up and when we need to scale down. However, because event management systems have evolved this facility over many years, we usually have the ability to create highly detailed conditions. Such as:
We then need to add some actions and link them to triggers and our scheduling script (ignore *UNKNOWN*. This is because we like to include the last value in our trigger name/alert messages but when templated, there is no value to interpolate)
Here we have two actions, one on low load for scaling down and one on high load. Each of these actions runs when the linked trigger fires and the operation it runs is our remote script (below).]
So when the trigger fires, run our custom script from the Server (there is no agent interaction with a dummy "service host") and pass it a scale_up command and some variables ({INVENTORY.DEPLOYMENT.STATUS} macro is automatically set in the inventory by the SERVICES::aggregate: Service::Consul::containers:running item and will be the current number of running containers for this service. {HOST.HOST} will pass the service name - consul - which is our dummy host name).
The Concierge acting in the role of concierge_scheduler.sh is taking a request from its client to manage their business schedule. As such when some particular resource gets low, order some more; and when something is no longer being used, get rid of it. Below is a simple example of what the script could look like:
With container infrastructures, like Joyent's Triton, that manage placement of containers and allow containers to be first-class citizen's on the host and network, simply running docker-compose will be fine. However, when running containers on other infrastructures you may need to perform a little extra work to set up Docker Engine in Swarm Mode and scale services using docker service.
As with anything new, there were a number of issues along route that weren't considered initially. Things like how to tell Docker what the target number of instances should be when scaling up and down (because you don't tell it to +1). So we needed to find a way to extract the current number of running instances value out the DB and pass it to the scheduler script so it could work out what number to run. Overall though, the results have been really exciting because with this paradigm, we're not really doing a great deal more that we were doing already but gaining quite a lot. Let's summarise some of them
We will create a github repository with Ansible playbooks for spinning up an example environment on your laptop with all the necessary basic requisites. On the Zabbix server side you'll just need to perform a few extra steps manually until then. If using Zabbix, you'll need to create:
If you have any questions you can contact us directly or on LinkedIn and Twitter