Downtime Alerts: An Ideal Custom Setup
04 Jul 2019
Ryan Glass, lead developer at Downtime Monkey and director of Big Toe Web Design gives us the recipe for his ideal downtime alert setup...
Back in 2017, you originally developed Downtime Monkey as an in-house tool before making it available to the public. Do you still use it?
Yes! At Big Toe we use it for all our website monitoring.
As well as building sites, we manage the hosting of our client's websites and we monitor these to make sure that they are up and running round the clock.
How important is uptime to your clients?
That depends on the site/client. We maintain a range of sites - from simple static sites for small businesses, to e-commerce sites and more complex web apps.
We have three different 'tiers' of hosting and use a different hosting provider for each 'tier'. I assume that any client who pays for mid-range or high-end hosting considers uptime vital.
Do you use the same monitoring setup for all websites?
No - we set a different alert schedule for each tier:
For sites with budget shared hosting we set email and Slack alerts to be sent if the site stays down for 3 minutes, and SMS alerts if the site remains down for 10 minutes.
For sites using the high-end shared hosting there is a 99.9% uptime guarantee so we set email and Slack alerts to be sent if the site stays down for 2 minutes, and SMS alerts if the site remains down for 5 minutes.
For sites on VPSs and dedicated servers the host boasts 100% uptime although in reality it's a tiny bit short of this so we schedule email, Slack and SMS alerts if the site stays down for 1 minute.
Why the longer alert delay for SMS compared to email and Slack?
That's down to lifestyle!
During working hours I'm at my desk most of the time so I'll usually see emails and Slack alerts as they come through. I set these alerts on a short delay so I'll get notified quickly if a site goes down.
However, if I'm in a meeting or at evenings/weekends I'll have emails turned off but I will get text messages. I don't want to be interrupted unless it's really important, hence the longer delay on SMS alerts.
Why do you duplicate email and Slack alerts if you set them to the same schedule?
There are a couple of reasons for this...
First, Slack alerts go to our team channel so if I miss an alert for some reason, say I'm completely off-grid, Jayne should have it covered.
Second, it is a good idea to have duplication when it comes to alerts. Email clients occasionally have problems and the Slack servers have, on rare occasions, gone down. However, the chances of both these things happening at the same time is incredibly low.
What other features do you use in your alert setup?
Rate limiting of alerts is useful for us...
We monitor websites for 30-40 clients and if a server goes down we could have 20 sites go down at once.
For email alerts this isn't a problem but I don't want 20 text messages to come through at the same time!
So I set the SMS rate limit to 3 alerts per hour - if I get 3 messages with sites on the same server going down I can be pretty confident that the server is down. To confirm this I'll log into Downtime Monkey and check the real-time display. Once the problem is fixed I'll usually keep one eye on the real-time display for the rest of the hour.
I also rate limit Slack alerts but not so harshly - the Slack limit is set at 20 alerts per hour.
Anything else alert related?
We use content (keyword) monitoring across the board too. If the content of a site changes unexpectedly I want to know!
The most common cause of this is a database problem. Then the page just shows errors on a white screen.
Another cause of content change is that the site has been hacked and the attacker has replaced the content with their own.
Thankfully, both of these problems are rare but if they do happen, we want to know immediately so all content monitoring alerts are sent instantly.