Managing request queuing with Rails on Heroku

Summary: Heroku’s random routing algorithm causes significant problems with high-traffic, single-threaded Rails apps. This post describes how we’ve configured our Unicorn dynos to reduce the impact of the router. We set the unicorn backlog to extremely low values to force the router to retry requests until it gets a dyno that’s not loaded. The post also presents a wishlist for Heroku of things that would help futher address the problem.

At Think Through Math, we run an e-learning platform designed to help students become successful at math. We’re running a Rails stack and hosted on Heroku. Over the past six months, our usage skyrocketed thanks to a string of big customer wins and a compelling solution. As our usage grew, performance started to suffer for reasons we couldn’t explain. We monitor our application with New Relic. What it showed us was a fairly flat app server response time, but as usage grew during the day, the end user response time grew dramatically. All of the growth was in the graph section marked network. We also saw waves of dyno restarts along with H12 timeouts.

At the end of every browser using our application is a math student. Many of them have struggled with math for years. They’re conditioned to think they are bad at it; that math is too hard for them. When our app is slow or times out that’s just another reason for the student to lose confidence. So we took these performance issues very seriously, and we felt powerless to stop them since we couldn’t figure out the cause of the elevated response times.

This post from Rap Genius illustrated the problem we were seeing. This post on Timelike covers the same topic, but goes in depth on the math behind routing algorithm efficiencies. Math always gets to us. Heroku has created a blog post recommending Unicorn as a way to minimize the inefficiency in random routing. For high traffic apps, the default configuration they provide won’t have much impact since dynos are severely memory-limited. Each dyno can ony run a small number of worker processes, so they are still at risk of taking on more requests than they can handle in a reasonable fashion, and the default Unicorn settings are not optimal as a solution to request queuing.

Over the course of a couple of months of research, tweaking, and performance monitoring, we’ve finally gotten a handle on our request queuing in Heroku. It’s not eliminated, and it probably won’t be as long as we are using a single-threaded Ruby implementation. However, we have been able to minimize our end user response time and virtually eliminate H12s, even at peak throughput, with some specific Unicorn configuration settings. We’re writing this post not to attack or defend Heroku’s routing implementation. Our goal is to describe our strategy to minimize request queuing, developed with invaluable guidance and research from Unicon:

Get visibility into your request queue depth

If you use New Relic (and you definitely should if you are running a high-traffic app), grab their latest gem (since the issue in this post primarily affects Rails, we’ll link directly to the relevant information). Once you have this gem fielded you will see the impact of request queuing on your app. For us, the picture was pretty grim.

Do the dirty work of query optimization.

Start by optimizing long-running queries, per Heroku recommendation. This topic is worth a blog post itself (so we wrote one).

Dramatically reduce the Unicorn backlog.

The default backlog for Unicorn is 1024. With a queue depth that high there’s no penalty for a random router algorithm in sending traffic to an overloaded dyno. We’re currently set at ~~6 and might still go lower~~ 16 since Unicorn uses Linux sockets and will round up any number below 16 to 16. According to Heroku documentation, if the router sends a request to a dyno with a full backlog, it will retry the request. The retry will also be random, but with an extremely short backlog setting it’s more likely that the request will end up in a short queue. There will be some overhead in the retry (and that’s not shown anywhere, even in New Relic, at the moment), but our experience has been that request queuing ends up being 2-3x more time on the server than processing time, so a little retrying shouldn’t make things worse overall. Warning though - after 10 attempts, if the router hasn’t been able to find a free dyno it will give up and throw an H21 error. You can see if this is happening using whatever log drain add-on you have set up in Heroku. It will take some experimentation, but the goal here is to set the backlog low enough that you minimize queuing time, but not so low that you throw H21s.

Changing the Unicorn backlog setting for Heroku is similar to what is needed for setting Unicorn up with Nginx. In ./config/unicorn.rb you will want to create a listen command with the port and then specify the backlog number:

listen ENV['PORT'], :backlog => 200

The above example will change the backlog to 200 thus rerouting a request if the queue is full. Instead of a socket or even a specified port though, you will need to pull from the Heroku PORT environment variable. You may want to adjust this setting in a staging, sandbox or production environment so we highly recommend that you make a second ENV parameter for the backlog amount and a default:

listen ENV['PORT'], :backlog => Integer(ENV['UNICORN_BACKLOG'] || 200)

That way you can alter the backlog via a heroku config command rather than doing a deploy.

heroku config:set UNICORN_BACKLOG=16 -a <app name>

This has the added benefit of not relying on the Rails environment name for configuration should you be running more than the standard “development”, “test” and “production” environments.

What else we would do

Despite all of our optimizations we still see a fair bit of request queuing. Part of this definitely stems from the single-threaded nature of the stack we have chosen. We’ve started research on switching to a multi-threaded Ruby stack. In the short term there are some ways we could think of for Heroku to reduce this problem.

From following the conversations on the web, we understand the challenge in implementing and managing an ‘intelligent’ routing mesh at the scale Heroku is working with. One option would be to segment high-traffic apps onto a separate mesh; one that could reasonably use a routing algorithm such as least-connections. That would have cost and significant engineering effort, no doubt. We imagine it would need to be part of a package specifically aimed at Heroku’s higher-end tier of customers.

An alternative that would seem to be a lower-effort solution would be to offer a dyno with increased memory; 1024MB, 2048, or even 4096. With a 4GB dyno we could run 16 workers per dyno. Since Unicorn’s master process manages queuing once the request is on the dyno, this would likely be dramatically more efficient overall. We would gladly pay 8x per dyno-hour for a 4GB dyno vs 8 512MB dynos, since we would need far fewer of them overall, and our performance would improve at the same overall infrastructure footprint. Everything else about the dyno model could stay the same.

Tech Blog

The technology team behind Think Through Math

Managing Request Queuing With Rails on Heroku

Get visibility into your request queue depth

Do the dirty work of query optimization.

Dramatically reduce the Unicorn backlog.

What else we would do

Comments