MQTTParty

Published in

Spread the WiFi — The Official Karma Blog

6 min readApr 3, 2016

Karma’s mission is to provide our users with a WiFi connection anytime and anywhere, which is why we built Karma Go. We’ve previously written about the challenges we encountered while manufacturing the hardware and getting the device certified, but we haven’t told you much about the software behind Go yet.

In this technical article, we’ll explain how Karma Go connects to Karma’s backend infrastructure using the MQTT protocol, a custom-built MQTT message broker, and a proxy server to allow our backend applications to keep using HTTP for internal communication.

What is MQTT, and why do we use it?

Both Karma Go and Karma Classic periodically collect status information (e.g. battery level, signal strength) and relay that to our backend. Additionally, a Karma device needs information from our backend to determine if a connected user can access the internet from that device.

As our CTO Stefan explained previously, Karma Classic uses HTTP + TLS to connect to our backend, and status information is sent from Karma Classic in small packets. We felt that we could improve this both in technical architecture and in data usage overhead, so we came up with a solution that would scale easier and consume less bandwidth.

Enter MQTT. This lightweight messaging protocol is particularly aimed at situations where bandwidth and battery power are at a premium; it’s a perfect fit for Karma Go. To replace the high-overhead JSON packets we switched to using protocol buffers, a binary data format that we wrap in an MQTT message.

There were no off-the-shelf MQTT servers available that met all our demands, so we did what we always do: we wrote it ourselves.

The final step was to find a proper MQTT server that would meet our demands for scalability and availability. We wanted the MQTT server to be as stateless as possible, so it wouldn’t matter to which server a Karma Go connects. We wanted there to be a pool of MQTT servers to properly distribute load and to be resilient against one or more MQTT servers failing. Also, we still wanted our backend services (mostly written in Ruby) to be able to communicate with Karma Go over HTTP, so we wouldn’t have to rewrite all of them to use MQTT. Finally, we wanted an open source solution that would allow us to dig into the source code.

Unfortunately there were no off-the-shelf MQTT servers available that met all these demands, so we did what we always do: we wrote it ourselves!

It’s an MQTTParty and you’re all invited

By the end of 2014 we started building this highly available, scalable, MQTT server, and we chose the Go programming language to build it in. Not because that matches so well with the name of our new flagship product — though it’s a nice coincidence — but mostly because of its crazy performance and built-in concurrency primitives, both of which are crucial for a high performance message broker. We also like how concise and complete the standard library is, and how there are very few surprises when writing applications in Go. The fact that a Go application compiles into a single, statically-linked binary sealed the deal, as we wanted something that was easy and fast to deploy on a potentially high number of servers.

We named our server MQTTParty, an homage to the popular HTTParty library for Ruby.

Highly available: balancing load

One of the things we missed most in open source MQTT brokers at the end of 2014 was the ability to run multiple instances of the broker in a clustered form. We wanted our MQTT broker to be highly available from the start.

Ideally, a Karma Go would connect to a server in the MQTTParty cluster, without that server having or requiring any prior knowledge of that particular Karma Go. The MQTTParty servers should be replaceable without losing state. To achieve this, we needed a load balancer and a way to store state outside of the MQTTParty cluster.

Load balancing Karma Go connections has some interesting extra challenges: when Karma Go connects to MQTTParty, it opens a TCP connection that should stay open until the Karma Go shuts down or goes into standby mode. This can take hours, meaning the load balancer has to be smart enough to balance these long-living connections over the available MQTTParty servers.

Usually when doing load balancing (e.g. for a web site, like the one you’re visiting now) you’re dealing with short-lived and stateless HTTP connections that just need to end up on a backend server as fast as possible. The backend server does not connect back to you through the load balancer, but directly over the internet. A round robin algorithm usually works fine here.

In our case, however, we need long-lived, bidirectional, stateful communication over the load balancer. Using a round robin algorithm doesn’t work for this, because it will lead to an unbalanced situation in the backend; some backend servers will have many connected clients, some just a little.

Because we want this to be balanced as much as possible, the load balancer should take into account the amount of established connections on each of the backend servers and allocate new connections to the backend server with the least amount of connections.

Apart from a better balance in the amount of connections to each backend server, this also means that when an outage occurs (e.g. one backend server dies) the affected clients will be spread evenly over the remaining backend servers when they reconnect, reducing the additional load on the remaining backend servers as much as possible.

We normally use Amazon Web Services’ Elastic Load Balancer (ELB) for all our load balancing needs, but in this case it wasn’t sufficient. Even though ELB’s Cross Zone Load-Balancing should balance long-lived properly, our performance tests still showed an unbalance in the number of established connections to MQTTParty servers over time.

HAProxy has proven to be very reliable even under high load, and it keeps a perfect balance in the amount of established connections to each MQTTParty server.

For this reason we decided to go with HAProxy as a load balancer, using its leastconn balancing algorithm. We use the PROXY protocol together with the send-proxy directive to allow the backend MQTTParty servers to identify the IP address of clients connecting through the load balancer.

HAProxy has proven to be very reliable even under high load, and it keeps a perfect balance in the amount of established connections to each MQTTParty server.

Scalable: stateless servers and MQTTProxy

To make the MQTTParty cluster as scalable as possible, we try to keep them as stateless as possible. When a Karma Go connects to an MQTTParty server, that server will store its own identifier with the unique identifier of the Karma Go in Redis. Each time the Karma Go sends a heartbeat to MQTTParty, this identification tuple is updated in Redis.

This information is used by a custom written MQTT/HTTP proxy server — dubbed MQTTProxy — that acts as a mediator between our backend applications and the Karma Go devices that are connected to MQTTParty.

When a backend application tries to send a message to a Karma Go (for example, to inform it of updated firmware) it sends it to MQTTProxy over HTTP. MQTTProxy then uses the Karma Go association information from Redis to determine if the Karma Go is currently connected to an MQTTParty server. If so, it forwards the message to the relevant MQTTParty server over HTTP, which in turn sends the message to the Karma Go over MQTT.

Kicking the wheels

All this stuff looks nice on paper, but we needed to make sure our setup could actually handle a high number of Karma Go devices connecting to it. That’s why we implemented an elaborate “fake Karma Go” client and used that to set up tens of thousands of simultaneous connections to a running MQTTParty cluster. All these clients are sending periodic MQTT messages to simulate real life usage.

Using Prometheus we captured a dozen performance metrics from these test clients and from the MQTTParty nodes, and graphed them with Grafana. This allowed us to identify and solve bottlenecks early on, and ensures us we can actually use this stack in production with Karma Go.

Conclusion

We are dedicated to provide customers with a Karma Go that works anywhere, anytime, and MQTTParty will help us do that. We’re looking forward to telling you more about our software stack in future posts. And as always, feel free to leave a comment if you have any questions.

Originally published at blog.yourkarma.com.