• Server Check.in Architecture and Dashboard Improvements - Summer 2014
  • Over the past few months, a few improvements have been made to Server pages, as well as some major underlying infrastructure changes which have helped to dramatically reduce false positives.


    More information about server outages


    For every server in your account, data about which server reported an outage, and the reason for the outage (a 'status code'), are shown on your Server's main dashboard page, in the 'Outages' section. This should help you determine if your site may be having trouble only in a particular geographical region, or if your site is having a particular issue (like DNS trouble, or the page is returning a specific non-200 HTTP code).


    Hopefully this extra data will be of use! We're working on integrating the data into email communications, and possibly SMS messages as well, so be on the lookout for that—or not, if you have 100.0% uptime!


    General architecture improvements


    Some users were reporting periods of frequent false-positives, where a server would reportedly go down-and-up many times in a short period of time. This was understandably very annoying, and was also extremely hard to reproduce, since the problem had to do with network connection issues between check servers (which are spread around the world, and among different hosting providers and networks!).


    We have modified Server Check.in's check architecture so it will now perform redundant confirmation checks on servers that are in different areas than the original server that reported the outage, instead of confirming the outages from only one or two servers (potentially in the same region).


    So far, it's been over two weeks since we deployed this improvement, and reports of false positives have decreased to zero!


    As part of this work, we also did extra load testing to ensure that Server Check.in will be able to continue to quickly scale as our user accounts grow—the original version of Server Check.in could've only supported a few hundred customers at 10 minute checks, but the current architecture can support at least a thousand customers with 1 minute checks!


    Thanks for using Server Check.in, and please contact support if you have any issues or questions!