Tech Blog

The technology team behind Think Through Math

Introducing Status_site: A Starting Point for a Status. Subdomain

Summary: Many high-traffic web apps use an http://status.* site to communicate platform availability notices. We found ourselves needing this so we built it. We think this is a general need so we open-sourced ours at http://github.com/thinkthroughmath/status_site/.

There’s a lot that goes in to running a high-traffic, production web app, especially one that grows quickly. Over the past several months we’ve found ourselves needing to schedule small amounts of downtime to run migrations, scripts, reorganize, and denormalize data. We generally schedule these off-hours since our traffic slows significantly outside of school hours, but we want to provide a mechanism to notify interested customers about downtime or issues without spamming our entire population over email, or using in-app messaging.

We did a survey of big-name companies and how they were handling this issue. Most commonly we saw combinations of a custom web app and twitter feed that was specifically dedicated to platform status. We searched for an existing repo to use as a base and we didn’t find anything, so we built a version that tries to use the best practices of other status sites. Because contributing to OSS is important to us we’ve open-sourced our baseline at http://github.com/thinkthroughmath/status_site/. The initial version allows for an admin login that can CRUD a new issue (meaning there is a problem with system performance or there is an outage), CRUD a new maintenance event (for planned downtime), and display system metrics from New Relic if applicable. status_site supports subscriptions via email and RSS to Issues and Maintenance updates. Lastly status_site provides a javascript snippet that you can embed on your main app that will automatically display upcoming planned maintenance events.

Other things we’ve thought would be cool to add: * twitter integration * timeline visualization * automatic calculation of uptime based on outages * tie issue creation to New Relic alerts

What do you think should be added? Feel free to drop an issue in the repo or fork it and issue a pull request. Get involved, and let us know how you decide to use it!

Comments