Prometheus and Spring Boot Health Checks

When trying to set up alerting for Spring Boot services with Prometheus, I discovered the synthetic “up” time series which is great for checking whether the monitoring system can reach my service instances. While this is a great thing, I also wanted to alert on the health status of my instances, as reported by /actuator/health. Unfortunately, there is nothing in Spring Boot’s /actuator/prometheus endpoint that I could use.

After some pondering, I decided to expose my own “health” time series from Spring Boot. With Micrometer, this is quite easy – all I have to do is registering a Gauge meter that fetches the health status from the Actuator’s HealthEndpoint bean when sampled:

In this example, I simply map Status.UP to 1 and everything else to 0, but you can easily define your own convention that covers Status.OUT_OF_SERVICE, Status.UNKNOWN, and any custom codes you may have.

Advertisements
This entry was posted in java and tagged . Bookmark the permalink.

5 Responses to Prometheus and Spring Boot Health Checks

  1. Holger says:

    Hi Matthias,
    thanks for that article! Currently I am looking for something similiar, so it was quite helpful for me.

    After startup of the Spring Boot application the prometheus endpoint exposes the status correctly. However, when I disconnect my database from the application, the status is not updated although /health recognizes the missing the connection.
    I am asking myself how the gauge is updated in your example… Reading https://micrometer.io/docs/concepts#_manually_incrementing_decrementing_a_gauge the documentation says that you should not use primitive numbers because the gauge is then never changed (1)
    So, could you let me know if this code is really working and the status is updated correctly in your case?

    Kind regards
    Holger

    (1) “Attempting to construct a gauge with a primitive number or one of its java.lang object forms is always incorrect. These numbers are immutable, and thus the gauge cannot ever be changed. Attempting to “re-register” the gauge with a different number won’t work, as the registry only maintains one meter for each unique combination of name and tags.”

    • Matthias says:

      Hi Holger! That is weird. I have published the prototype that the article is based on here, so you can have a look: https://github.com/mafr/prometheus-health

      When I start it up, the health check is “UP” and that’s visible in the prometheus endpoint. Then I change the status to “DOWN” by sending a POST request to “/down” and that’s immediately reflected in both the health and prometheus actuator endpoints.

      • Holger says:

        Hi Matthias, thanks for your quick response and the link to your repo. With that I saw that there’s small typo in the code of your blog post and now it is working perfectly.
        return registry -> registry.gauge(“healthalthEndpoint, ep -> healthToCode(ep)); should be return registry -> registry.gauge(“health”, healthEndpoint, ep -> healthToCode(ep));

        (I corrected it in a wrong way ^^)

        Best
        Holger

  2. GB says:

    Thanks for the post. I was unable to make it work unless I did “return registry -> registry.gauge(“healthEndpoint”, healthToCode(healthEndpoint));”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s