Lockdown Bugfixes & Midnight Coding

14 May 2020

It's been a strange few months here in Edinburgh. Thankfully Downtime Monkey has been largely unaffected by the lockdown, quietly continuing to monitor websites while the world shuts down.

Coding from home has been challenging with kids off school and nurseries closed. However, in the twilight zone silent hours after everyone has gone to bed we've been developing improvements and fixing bugs.

Here are the details...

night owl

Refactor Monitoring Scripts for Efficiency & Readability

Downtime Monkey's original monitoring scripts were written nearly 3 years ago. Since then, we've added a bunch of new features: a global network of servers, response time monitoring, slack alerts, repeat alerts and rate limiting of alerts have all been added.

These new features were bolted onto the scripts and although they worked well the scripts became increasingly complicated making it difficult to add new features.

Things have also been scaling up. We now have over 1000 users (thank you!) and over 200,000 downtimes were recorded in the last 90 days. Inefficiencies in monitoring scripts that previously had little effect now have the potential to cause increased load on the server.

We could always throw CPUs and RAM at the problem, but these resources are expensive. Instead we bit the bullet and completely refactored the main monitoring scripts for efficiency and simplicity. It was a lot of work for what seems like nothing new but we're now in a position to grow efficiently and add new features easily... so watch this space.

Reminders For Expiring Cards

When a Pro customer is signed-up for auto-renewal and their credit card expires the payment could fail and the customer could lose their Pro account.

Obviously we don't want this to happen so we've set up automated emails to remind customers when a card is due to expire.

Corner Case Bugfix for Repeat Slack Alerts

A rare bug was found relating to Slack repeat alerts under very specific conditions.

When a user had set up a repeat alert to Slack and deleted the monitor after the website went down but before turning Slack alerts off the repeat alert would continue to send if the site remained down after the monitor had been deleted.

When this occurred the customer needed to contact support to stop the alerts. This is now fixed and all repeat alerts are removed when the monitor is deleted.

Rate Limiting: Queued Slack Alerts

One of the side-effects of global lockdown is that many more people are connecting remotely and working from home. This seems to have caused an increase in downtimes and even mass outages when major hosting providers run into server difficulties.

At times like this Downtime Monkey usually sees a spike in alerts with hundreds of alerts being sent out simultaneously. Email and SMS infrastructure deals with this quite well but the Slack API struggles when the number of alerts peaks.

To mitigate this we tweaked our processing of Slack alerts so they aren't sent out simultaneously but instead alerts are queued for sending with a tiny pause between each alert.

A big thank you to everyone who uses Downtime Monkey and an even bigger thank you to the Pro Plan customers - we couldn't do it without you!

