Server(s) Monitoring 101

Publicated on 20 October 2020 by Florian Gabon

thumbnail

This article goal is to provide you guidance to start monitoring a small infrastructure, cluster monitoring and more complex topics won't be covered here. We also assume you are running instance on Amazon Web Services, Azure, Google Cloud, OVH, Scaleway... and we will not covering hardware monitoring like switch, routers, firewall...

💡 Why you should start your monitoring journey as early as possible

An effective Monitoring starts even when you have a small infrastructure, and there are the 3 reasons why you should begin now:

Bored employee
  • Infrastructure uptime and performance avoid you to receive customer/early adopters complains and frustration. This is important for the brand image because your community has a very important impact on the growth of your company. Reducing your downtime increase the value perception of your product.
Bored employee
  • Get usage metrics to drive you growth. Even if you have today a small infra, you expect to have more in the following months/years. So you need to be able to scale your infrastructure, the day that you'll have more customers. The best way for you to drive this growth is to be able to have an overview of your infra using dashboards and metrics.
Bored employee
  • Don't loose your time to manually inspect server: Focus on what's important for your buisness, your Apps and your customers. As a technical team you should focus your efforts on your product: if you spend more than an hour a day on your monitoring, you're wasting your time.

As a conclusion, monitoring your infrastructure will allow you, whatever your sector of activity, to improve your efficiency and also to have an infrastructure that works entirely in a more secure and sustainable way.

🏁 How to start your monitoring journey?

To properly monitor your infrastructure, divide your monitoring journey into several stages:

  • Monitor your website/application from different places in the world as it's important to know from a customer point of view how (and if) your infrastructure is working properly.
  • Check the local availability of your server and its components (DB, web server, chache, etc.). With those details, in case of issue, you will know where to investigate at and what to fix.
  • Google introduced a couple of years ago a methodology to know what kind of metrics you should get from your infrastructure. They called it the RED methodology. RED stands for:

    • Rate (R): The number of requests per second.
    • Errors (E): The number of failed requests.
    • Duration (D): The amount of time to preocess a request.

Using RED methodology, you will measure your server usage: CPU, memory, network and your service usage: DB, web server. This is important for you to have an overview of your system and to be able to corelate server statistics with a usage.

  • Configure Alerts on the most relevant metrics for your buisness. This will allow you to be more reactive and to be notified if something goes wrong instead of waiting customer to call or for someone to look at the dashboard.

All the above steps are important to have a first view of the functioning of your infrastructure, we know that it's energy and time consuming because your task is not to monitor: it's to build your apps. Make your monitoring solution redundant and resilient (does not break if your primary infrastructure break). This is out of the box with a SaaS solution 😉.
If you don't have a Bleemeo account yet, start monitoring your infrastructure today in 30s.