Selected Reading

Java Microservices - Health Check API

Quiz

Introduction

In a microservices architecture, we have to make sure each service instance can handle requests. Services might be up (healthy). They may also be down for unknown reason. Without detection, unhealthy services can still receive traffic, degrade performance, or fail unpredictably. This is where the Health Check API pattern comes in: a dedicated HTTP endpoint (e.g., GET /health) that actively verifies service viability. Infrastructure (Load Balancers, orchestrators) and monitoring tools use it to identify healthy instances-and take necessary action when they aren't.

Why You Need a Health Check API

Traffic Control

Load balancers and service registries rely on health status to stop routing to unhealthy instances.

Automated Monitoring & Alerts

Monitoring microservices poll health-check endpoints to trigger alerts or spin up new containers when services fail.

Deployment Safety

Health-checks guard against premature traffic to newly deployed instances that haven't fully initialized.

Anatomy of a Health Check API

Endpoint URL

Common patterns−

/health − general status
/health/live or /healthz − liveness (is the process alive?)
/health/ready − readiness (can serve requests?
/health/started − startup (fully initialized) (tutorialspoint.com, openliberty.io)

HTTP Method & Status Codes

Use GET
200 OK if healthy; 503 Service Unavailable (or 500) if unhealthy
Avoid caching− include headers like Cache-Control: no-cache

Payload Structure

A lightweight JSON response listing each check and its result

Example

{
   "status": "UP",
   "checks": [
      { "name": "db", "status": "UP", "responseTimeMs": 34 },
      { "name": "cache", "status": "DOWN", "error": "ConnectionTimeout" }
   ]
}

What to Check

Divide checks into −

Process Health

Is the service running?
Is the event loop or thread pool responsive?

Resource Health

Disk space, CPU, memory, thread availability.

Dependencies

Databases, caches, messaging systems, external APIs.
Ping downstream services or open DB connections.

Application Logic

Basic app-level operations, e.g., can user login, is config valid.

Best practice− Keep individual checks fast and non-blocking.

Types of Health Checks

Liveness

Simple− is the service process alive?
Used by Kubernetes to restart frozen or crashed containers.

Readiness

Can the service respond to traffic?
Checks dependency availability, connection pools, and app readiness.
Prevents routing to incompletely initialized services.

Startup

Determines when the service is fully initialized.
Prevents readiness/liveness failures during boot.

Composite

Aggregate liveness and readiness for simplified monitoring.

Implementation Strategies

Frameworks & Tooling

Spring Boot Actuator (/actuator/health)
MicroProfile Health for Java− /health, /health/live, /health/ready
Open Liberty built-in health support

Custom Implementation

Set up REST endpoints; run checks with timeout and return aggregated JSON & code
Use circuit breakers or caching for expensive dependency checks.

Integration with Infrastructure

Deploy startup, liveness, readiness URLs to Kubernetes, AWS ALB, Consul, Istio
Configure polling intervals and thresholds

Best Practices

Keep It Lean

Avoid overly broad, slow checks
Load balancers need quick binary decisions.

Automate & Monitor

Poll health endpoints frequently (e.g. every 30 seconds)
Set alerts on app status or check failure

Pitfalls to Avoid

Confusing with Ping− A simple ping says nothing about deeper dependencies.
Heavy Checks in Liveness− Overburdening liveness checks can slow restarts.
Caching Responses− Health endpoints must reflect real-time state.
Insufficient Timeout− Health endpoint shouldn't hang on slow resources.
Unprotected Endpoints− Exposes system details−secure access.
Unnamed Checks− Use descriptive names and timestamps in responses.
Polling Too Infrequently− Hourly checks may miss rapid failures.

Code Samples

Spring Boot + Actuator

In you Spring boot application, in the pom.xml file, add the following dependency−

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
   <version>3.5.3</version>
</dependency>

In your, application.yml, add the following snippet−

management:
  endpoints:
    web:
      exposure:
        include: health,info
  health:
    db:
      enabled: true

After running the application, go to: http://localhost:8080/actuator to see metrics of the application.

Infrastructure Integration

Kubernetes

livenessProbe− /health/live restarts dead containers
readinessProbe− /health/ready gates traffic until healthy

Cloud Load Balancers & Service Meshes

Use health endpoints for routing decisions

API Gateways (e.g. APISIX)

Performs active and passive health checks.

Monitoring & Alerting

Tools like Prometheus can scrape health endpoints
Send alerts on status changes

Real World Patterns

Banking Scenario

Login, transfer, billing microservices each expose health-checks. If a transfer service fails, routing shifts, alerts fire, auto recovery kicks in.

Container Ecosystem

Two-tier health-check strategy−

Liveness probe = fast ping
Readiness probe = full dependency checks.

Health Check in Observability

The Health Check API is part of a broader observability stack−

Logs

Distributed tracing

Metrics

Exception tracking

Ideally, health endpoints feed into dashboards, triggers, and alert systems to detect anomalies early.

When Health Check Isn't Enough

If your system relies on caching, message queues, bulk operations, or multi-step transactions, deeper observability is needed-like distributed tracing, APM, and golden-path tests-but health-checks remain a crucial first line.

Summary

Health Check API provides real-time insight into service availability.
Supports traffic routing, orchestration, and alerting.
Separate liveness/readiness/startup endpoints.
Ensure lightweight, fast, secure, and well-logged checks.
Avoid caching, overloading, and slow feedback.
Combine with broader observability tools for maximum resilience.

The Health Check API may appear simple, but it's foundational. It underpins all upstream systems−load balancers, orchestrators, and alert platforms−empowering autonomous, resilient microservice ecosystems. When done right, it significantly enhances reliability and maintainability.

Previous Quiz Next