When trying to set up alerting for Spring Boot services with Prometheus, I discovered the synthetic “up” time series which is great for checking whether the monitoring system can reach my service instances. While this is a great thing, I also wanted to alert on the health status of my instances, as reported by /actuator/health
. Unfortunately, there is nothing in Spring Boot’s /actuator/prometheus
endpoint that I could use.
After some pondering, I decided to expose my own “health” time series from Spring Boot. With Micrometer, this is quite easy – all I have to do is registering a Gauge
meter that fetches the health status from the Actuator’s HealthEndpoint
bean when sampled:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package de.mafr.demo.prometheus; | |
import org.springframework.beans.factory.annotation.Autowired; | |
import org.springframework.boot.SpringApplication; | |
import org.springframework.boot.actuate.autoconfigure.metrics.MeterRegistryCustomizer; | |
import org.springframework.boot.actuate.health.HealthEndpoint; | |
import org.springframework.boot.actuate.health.Status; | |
import org.springframework.boot.autoconfigure.SpringBootApplication; | |
import org.springframework.context.annotation.Bean; | |
import io.micrometer.prometheus.PrometheusMeterRegistry; | |
@SpringBootApplication | |
public class PrometheusDemoApplication { | |
@Autowired | |
private HealthEndpoint healthEndpoint; | |
@Bean MeterRegistryCustomizer prometheusHealthCheck() { | |
return registry -> registry.gauge("health", healthEndpoint, ep -> healthToCode(ep)); | |
} | |
private static int healthToCode(HealthEndpoint ep) { | |
Status status = ep.health().getStatus(); | |
return status.equals(Status.UP) ? 1 : 0; | |
} | |
public static void main(String[] args) { | |
SpringApplication.run(PrometheusDemoApplication.class, args); | |
} | |
} |
In this example, I simply map Status.UP
to 1 and everything else to 0, but you can easily define your own convention that covers Status.OUT_OF_SERVICE
, Status.UNKNOWN
, and any custom codes you may have.
Hi Matthias,
thanks for that article! Currently I am looking for something similiar, so it was quite helpful for me.
After startup of the Spring Boot application the prometheus endpoint exposes the status correctly. However, when I disconnect my database from the application, the status is not updated although /health recognizes the missing the connection.
I am asking myself how the gauge is updated in your example… Reading https://micrometer.io/docs/concepts#_manually_incrementing_decrementing_a_gauge the documentation says that you should not use primitive numbers because the gauge is then never changed (1)
So, could you let me know if this code is really working and the status is updated correctly in your case?
Kind regards
Holger
(1) “Attempting to construct a gauge with a primitive number or one of its java.lang object forms is always incorrect. These numbers are immutable, and thus the gauge cannot ever be changed. Attempting to “re-register” the gauge with a different number won’t work, as the registry only maintains one meter for each unique combination of name and tags.”
Hi Holger! That is weird. I have published the prototype that the article is based on here, so you can have a look: https://github.com/mafr/prometheus-health
When I start it up, the health check is “UP” and that’s visible in the prometheus endpoint. Then I change the status to “DOWN” by sending a POST request to “/down” and that’s immediately reflected in both the health and prometheus actuator endpoints.
Hi Matthias, thanks for your quick response and the link to your repo. With that I saw that there’s small typo in the code of your blog post and now it is working perfectly.
return registry -> registry.gauge(“healthalthEndpoint, ep -> healthToCode(ep)); should be return registry -> registry.gauge(“health”, healthEndpoint, ep -> healthToCode(ep));
(I corrected it in a wrong way ^^)
Best
Holger
Thanks for the post. I was unable to make it work unless I did “return registry -> registry.gauge(“healthEndpoint”, healthToCode(healthEndpoint));”
Sorry about this. The mistake was in an earlier version of the article which I corrected, but it still doesn’t show up – I guess there’s a wordpress caching issue. I have now switched to an ambedded Gist for the code sample. If you don’t see the Gist, this is it: https://gist.github.com/mafr/cf352528e155d19f301f30ab575030fd
Where is the “registry” variable comming from
Sorry, just figured it out. Thanks for this solution Matthias! It saved a lot of time.
ok so how do I test this is implemented correctly? what endpoint do I hit?
nvm I created a separate class and forgot to make the application class include it