Adventures in Go land: high performance custom HTTP(S) service

The problem

While working on a side project I was confronted with the problem of serving a JSON object to potentially tens of thousands of users representing the state of the app (think you are following the results of an election live and you want data to be updated every second).

The web service is really simple, it grabs the payload from the data store (disk/Redis/Mongo), uploads it every second, and serves the content accordingly.

Initially I implemented this using Python and Flask, I must admit that I didn't try too hard to get the data in an async fashion since it wasn't obvious how to do this with Flask. Using twisted could have been a better choice.

Initially I attempted to use Varnish as a front-line HTTP cache with promising results, however using HTTPS became a requirement pretty soon (due to CORS policies) so I had to discard Varnish rather quickly.

Go's bare HTTP server performance

 

Go_gopher_color_logo_250x249[1]However I took the opportuninty to teach myself Go which I knew it had pretty good support for concurrent operations and that the Google guys already have a success story serving static data with it. Boy have I been surprised.

Take the simplest form of an HTTP server that serves a common JSON payload for all the requests:

https://gist.github.com/aruiz/8671230.js
Gist link

Note that it reads the data just once from a json file in the disk. No updates every second, for the sake of showing how well Go can do as a static HTTP server, these are the performance results using httpress  on my ThinkPad T430s (i7-3520M 2.90GHz/8GB of RAM):

$ ./httpress -n 100000 -c 1000 -t 6 -k http://localhost:8080
TOTALS:  1000 connect, 100000 requests, 100000 success, 0 fail, 1000 (1000) real concurrency
TRAFFIC: 4195 avg bytes, 109 avg overhead, 419532658 bytes, 10900000 overhead
TIMING:  3.752 seconds, 26646 rps, 112007 kbps, 37.5 ms avg req time

That's 26646 request per second with 1000 concurrent keep-alive connections in 6 parallel threads without doing anything special, I must say that I'm impressed that the Go developers have put so much effort in their built-in HTTP server implementation.

Go's HTTPS performance

 

Now that Go started to look as a viable option, let's see how it behaves as a HTTPS server. Here's the link to the modifications to turn the previous example into an HTTPS service, note that you have to generate the certificates.

TOTALS:  1000 connect, 100000 requests, 99862 success, 138 fail, 862 (862) real concurrency
TRAFFIC: 3994 avg bytes, 109 avg overhead, 398848828 bytes, 10884958 overhead
TIMING:  22.621 seconds, 4414 rps, 17688 kbps, 226.5 ms avg req time

Damn you secure webs! That's a big downgrade in performance… 4k rps now. This made me realize that this new security scheme for cross site content implemented in browsers is going to kill a lot of trees. At this point I wanted to investigate further for possibilities boost this up.

Revisiting the front-line cache idea: Cherokee

My suspicion is that the TLS handling inside Go's net/http has some chances to be improved, but I didn't want to get my hands dirty there, so I started to think about off-the-shelf alternatives.

I stumbled upon Cherokee's front-line cache by reverse HTTP feature which is really easy to setup thanks to the cherokee-admin interface. This means that I run my original plain HTTP Go implementation as-is behind Cherokee so that it handles the TLS by itself.
These are the results:

TOTALS:  1523 connect, 100000 requests, 99511 success, 489 fail, 1000 (981) real concurrency
TRAFFIC: 8635 avg bytes, 142 avg overhead, 859277485 bytes, 14131208 overhead
TIMING:  12.975 seconds, 7669 rps, 65735 kbps, 130.4 ms avg req time

There's quite an improvement in latency, though the requests per second improvement while worth the while is not as big as I would have expected.

Conclusions

And these are my findings so far, I want to try to configure Cherokee so that it only talks to the backend server once every second and cache the requests that way, haven't figured that out yet.

As per my experience with Go, it has been mostly pleasant, it's nice to have a mix of low level and high level features in the language, though it has some things that are quite confusing for a C/Python/Vala guy like myself. Eventually I will post some notes about the synchronization primitives I've used in the implementation of the service that gets the data from Redis every second.

On embracing the web

HTML is not the Holy Grail

There were a few talks at GUADEC that were around the topic of web and trying to figure out ways to build bridges to the web world from the GNOME community and technologies.

It was praised by some, most notably Luis Villa, that HTML/CSS/JS was the way to go. However I think that this is grabbing the stick from the wrong end in many ways.

The interesting parts of the web are not its front end technologies as such (the fact that they are standards and you can assume everybody has a browser is though). The interesting bit of the Web as a platform is the ability to syndicate, publish and aggregate content.

I actually think that the so called fact that people love HTML+CSS+JS is sort of a myth. The reason there's so many people doing stuff with it is not because it is a great technology, it's because:

  • You can reach a huge user base deploying a web app
  • It has a lean learning curve

People go through the pain of building apps with these technologies because it's worth the pain, and actually, for a lot of applications it's the better option. But it's still a pain.

However, suggesting that this is how GNOME should move ahead is in my opinion not the fastest path to provide a great user experience. Which is what GNOME is all about.

It's all about the data!

In my opinion, what we really fail at is at providing tools to create rich user experiences for data driven applications, and ways to feed data from the web more specifically. This has a lot to do with the poorness of our platform when it comes to ways to talk HTTP, libsoup for example is not such a great API for application developers for many reasons.

Then there's Gtk+'s lack of proper views for large datasets and GtkTreeModel is not necessarily a general purpose data model API. This is why, by the way, we developed libmodel at Codethink and created a GtkTreeModel wrapper around it.

I think the ones really pioneering in this field are the Intel guys with libsocialweb and Adrien Bustany on online providers for Tracker. But we still miss the "glue" for our great front end technologies (Gtk+/Clutter/MX) so that application developers can put together apps consuming and pushing online data real quick.

The browser is nice and all but…

There is a reason why Google, one of the main pushers of the web
technologies, still have Java based apps in the android platform. There is a reason flash is not going away and Silverlight and JavaFx are here to stay as well. The
closer you are to the hardware, the better the user experience can be. The quicker you can put together your apps, the better.

Pushing the boundaries of HTML is a nice thing and I'm happy to see Flash, Silverlight and JavaFx going away as substitutes of content that could be deployed as web content in the first place, but innovation and design by committee are not real good friends. We need a platform that can move as quick as hardware does, as much as we need a web platform as well that can cherry pick the innovations

Opportunities for collaboration, our friends from Mozilla

G3428

There is however a huge opportunity for the GNOME community, if we start making steps towards a better toolchain for data driven applications, I think building bridges with the Mozilla community can be a major win. I know what you're thinking, Gecko. No, I actually think WebKit is the way to go as our rendering engine, Gecko is there to follow Firefox's agenda. Fair enough.

There's a space in which building bridges with the Mozilla community can be even a biggest win for both ends, the web services space. Mozilla is creating amazing web services and tools, Firefox Sync, Bespin, Contacts.

GNOME seriously lacks of a community of people dedicated to build web services around the platform, and Mozilla is has that sort of focus. Together I think we can join forces solve this ongoing problem of closed source web services and all the privacy concerns around them by building a truly rich and open ecosystem of server and client side technologies.

Some pending browser breakthroughs

Loads of stuff going on recently on the <video> tag land, Google has made a bold move to push openness into the web, though they added Adobe into the mix, which inspires mixed feelings on me. All in all good news, competition is back on track after 10 years of Microsoft stagnancy on this field. I wish there was more corporations whose business model wasn't based on restricting competition through twisted uses of Copyright, Intellectual Property and business practices.

However, I would like to enumerate a few things that should be exposed or improved in the major browsers soon if they want to accelerate the web application further.

Webcam Access

I think this is actually the last thing we need to get rid of flash at this point and it's relatively straightforward to implement. I don't even think that a standard should be proposed for this to implement it. There's a lot of engine specific stuff on CSS and is not a big pain, most people are relying on jQuery and other cross browser libraries. The only challenge here is the security model, but Flash solved that long ago.

RDF Storage

When I saw a proposal to use SQL as one of the storage models for the browser something died inside me. I understand where the proposal is coming from and why it seems to make sense. Most web developers are familiar with SQL.

I think SQL is sort of alright on the server side as you can always expose data any way you want, but client side, you'll end up with a bunch of data silos for every site and you'll lose a lot of data in the way. A RDF/SPARQL model is the natural storage model for the web, though in my opinion some specific purpose APIs should be added for contacts, location and multimedia storage.

Obviously I'm a bit biased here since Codethink is been the major pusher for an RDF datastore on GNOME through our involvement in the Tracker project.

Smart Card Certification

Smart Cards are becoming widely used, in some countries like Spain the official ID card is an actual Smart Card with a digital certificate that can be used to sign documents (no biometric crap or anything like the crappy Labour proposal in the UK).
However this can be sort of configured already in some browsers, the setup is rather hard. Some projects like Tractis.com could really use some improvements in the ease of use.

Contact Support

I believe the Mozilla guys are already working on this area, what I would actually love to see is a tag where you can specify a contact detail like this:

<contact href="phone://004400000"/>

Same for skype, Facebook, XMPP, etc. plus a javascript API to access the phone features such as phone call, add an entry to the addressbook, send sms… Maybe RDFa instead of a new tag would do it as well. The point is that there should be a common way to define a contact on the document so that the client can do smarter things with it.

These are the main things I would add, although I would like to see a more widespread support for location support and touch based events as wells, I think the three items listed about could actually bring a significant amount of useful and innovative apps both online and offline.

GNOME TV

Seems that my recent efforts to promote the GNOME platform are paying off. As I write this, 755 people have seen the Vala kick start tutorial and I have received loads of positive comments.

As a response to that success there are three things I've been doing, first, setting up twitter and identi.ca microblogging accounts for GTK+, follow them in gtktoolkit@identica and gtktoolkit@twitter (credits goes to Javier Jardon for the idea and comaintaining the accounts).

Logo

Second, setting up a Vimeo channel. There are many reasons I'm using Vimeo, first you can have channels for free, second, it provides the best quality video wise which is quite important when you showcase code writing, third, it provides HTML5 (though not through Theora) and most important, you can download the original file if you're logged in (which I'll make sure in the future will be .OGG). If you have a GNOME related video in Vimeo just poke me.

I'm already putting some pieces together for the next video, as a teaser, it'll use one of these new shiny dynamic languages recently added to the GNOME stack.

PS: I'm very excited about the ongoing work and activity happening in the GNOME UX Hackfest at the Canonical offices in London.

Happy hacking!