Adventures in Go land: high performance custom HTTP(S) service

The problem

While working on a side project I was confronted with the problem of serving a JSON object to potentially tens of thousands of users representing the state of the app (think you are following the results of an election live and you want data to be updated every second).

The web service is really simple, it grabs the payload from the data store (disk/Redis/Mongo), uploads it every second, and serves the content accordingly.

Initially I implemented this using Python and Flask, I must admit that I didn't try too hard to get the data in an async fashion since it wasn't obvious how to do this with Flask. Using twisted could have been a better choice.

Initially I attempted to use Varnish as a front-line HTTP cache with promising results, however using HTTPS became a requirement pretty soon (due to CORS policies) so I had to discard Varnish rather quickly.

Go's bare HTTP server performance


Go_gopher_color_logo_250x249[1]However I took the opportuninty to teach myself Go which I knew it had pretty good support for concurrent operations and that the Google guys already have a success story serving static data with it. Boy have I been surprised.

Take the simplest form of an HTTP server that serves a common JSON payload for all the requests:
Gist link

Note that it reads the data just once from a json file in the disk. No updates every second, for the sake of showing how well Go can do as a static HTTP server, these are the performance results using httpress  on my ThinkPad T430s (i7-3520M 2.90GHz/8GB of RAM):

$ ./httpress -n 100000 -c 1000 -t 6 -k http://localhost:8080
TOTALS:  1000 connect, 100000 requests, 100000 success, 0 fail, 1000 (1000) real concurrency
TRAFFIC: 4195 avg bytes, 109 avg overhead, 419532658 bytes, 10900000 overhead
TIMING:  3.752 seconds, 26646 rps, 112007 kbps, 37.5 ms avg req time

That's 26646 request per second with 1000 concurrent keep-alive connections in 6 parallel threads without doing anything special, I must say that I'm impressed that the Go developers have put so much effort in their built-in HTTP server implementation.

Go's HTTPS performance


Now that Go started to look as a viable option, let's see how it behaves as a HTTPS server. Here's the link to the modifications to turn the previous example into an HTTPS service, note that you have to generate the certificates.

TOTALS:  1000 connect, 100000 requests, 99862 success, 138 fail, 862 (862) real concurrency
TRAFFIC: 3994 avg bytes, 109 avg overhead, 398848828 bytes, 10884958 overhead
TIMING:  22.621 seconds, 4414 rps, 17688 kbps, 226.5 ms avg req time

Damn you secure webs! That's a big downgrade in performance… 4k rps now. This made me realize that this new security scheme for cross site content implemented in browsers is going to kill a lot of trees. At this point I wanted to investigate further for possibilities boost this up.

Revisiting the front-line cache idea: Cherokee

My suspicion is that the TLS handling inside Go's net/http has some chances to be improved, but I didn't want to get my hands dirty there, so I started to think about off-the-shelf alternatives.

I stumbled upon Cherokee's front-line cache by reverse HTTP feature which is really easy to setup thanks to the cherokee-admin interface. This means that I run my original plain HTTP Go implementation as-is behind Cherokee so that it handles the TLS by itself.
These are the results:

TOTALS:  1523 connect, 100000 requests, 99511 success, 489 fail, 1000 (981) real concurrency
TRAFFIC: 8635 avg bytes, 142 avg overhead, 859277485 bytes, 14131208 overhead
TIMING:  12.975 seconds, 7669 rps, 65735 kbps, 130.4 ms avg req time

There's quite an improvement in latency, though the requests per second improvement while worth the while is not as big as I would have expected.


And these are my findings so far, I want to try to configure Cherokee so that it only talks to the backend server once every second and cache the requests that way, haven't figured that out yet.

As per my experience with Go, it has been mostly pleasant, it's nice to have a mix of low level and high level features in the language, though it has some things that are quite confusing for a C/Python/Vala guy like myself. Eventually I will post some notes about the synchronization primitives I've used in the implementation of the service that gets the data from Redis every second.