Background and Architectural Context
Micronaut's Design Principles
Micronaut emphasizes AOT bean processing, compile-time dependency injection, and configuration metadata generation. By avoiding runtime reflection, it achieves low memory usage and fast startup compared to frameworks that generate proxies and metadata at runtime. The framework runs atop Netty by default for HTTP I/O, supports both reactive and blocking programming models, and integrates with Micrometer, OpenTelemetry, and multiple data stacks (JDBC, R2DBC, JPA, NoSQL, messaging).
Implications for Troubleshooting
- Reflection avoidance: Many failures that manifest as "NoSuchMethod" or serialization errors stem from missing compile-time introspection or native-image reachability configuration.
- Netty event loops: Blocking a loop thread causes cascading latencies. Correctly marking blocking operations is critical.
- Configuration binding: Properties are resolved from layered sources (YAML, env, system props, k8s). Binding precedence mistakes yield environment-specific behavior.
- Observability: Minimal overhead means you must explicitly enable metrics, tracing, and debug logs to see what's happening.
Architecture Deep Dive
Bean Introspection and DI
Micronaut generates bean definitions and introspection metadata at compile time via annotation processors. If classes are not annotated or included in processing, features like validation, serialization, or config binding can fail at runtime under certain paths only.
HTTP Runtime on Netty
Micronaut HTTP leverages Netty's non-blocking event loops for connection handling. CPU-intensive or blocking work must be shifted to appropriate executors, or annotated to signal blocking semantics, to prevent event-loop starvation and p99 latency spikes.
Native Image and AOT
Building with GraalVM Native Image removes the JVM JIT warmup tax and reduces memory, but it hardens classpath and reflection assumptions. Reachability metadata and substitutions become part of your troubleshooting toolkit.
Common Failure Modes and Root Causes
1. Latency Spikes and Throughput Collapse Under Load
- Cause: Blocking calls (JDBC, file I/O, cloud SDKs) executed on Netty event loops.
- Symptom: p95/p99 jumps, timeouts, increased active event loops, low CPU utilization with idle worker pools.
- Corollary: Burst amplification during GC or async backpressure leads to request queues growing unbounded.
2. Intermittent 5xx Due to Mis-scoped Beans
- Cause: Stateful logic in
@Singleton
with mutable fields accessed concurrently. - Symptom: Race conditions, sporadic NPEs, or cross-request contamination when load increases.
3. Serialization and Validation Failures
- Cause: Missing introspection for DTOs or use of Jackson features that rely on reflection without proper configuration.
- Symptom: "Cannot deserialize", "No serializer found", or Bean Validation not triggering for records/POJOs.
4. Native Image Build Breakages
- Cause: Reflection-dependent libraries, dynamic proxies, or resources not included in the image.
- Symptom: Build-time analysis errors, or runtime "ClassNotFound" in native binary but not on JVM.
5. Configuration Drift Across Environments
- Cause: Conflicting property sources (YAML vs env), wrong placeholder resolution, mis-typed
@ConfigurationProperties
. - Symptom: Feature works locally but fails in CI or Kubernetes; properties appear unset at runtime.
6. Connection Pool Exhaustion
- Cause: HTTP client or datasource pools undersized relative to concurrency; leaks due to unclosed responses or ResultSets.
- Symptom: Growing latency, 503 from upstreams, pool timeout exceptions, thread dumps showing threads blocked on borrow.
7. Reactive Backpressure Violations
- Cause: Mixing blocking repositories with reactive controllers without scheduling boundaries; unbounded
Flux
orFlowable
. - Symptom: Memory growth, GC thrash, sporadic
RejectedExecutionException
, or OOM during bursts.
Diagnostics and Verification
1. Enable Targeted Logging
# application.yml logger: levels: io.micronaut.http.client: DEBUG io.micronaut.http.server: DEBUG io.netty: INFO io.micronaut.context.condition: TRACE # bean conditions io.micronaut.inject: TRACE # DI wiring
TRACE on context.condition
and inject
helps uncover bean replacement, conditional activation, and missing qualifiers causing ambiguous injection errors.
2. Inspect Thread Usage
# jcmd or jstack jcmd $PID Thread.print # Look for Netty event loop threads stuck in blocking I/O # and executor pools starved or oversized
Correlate thread states with request logs and metrics to confirm event-loop blocking versus downstream slowness.
3. Activate Micrometer Metrics
# application.yml micronaut: metrics: enabled: true export: prometheus: enabled: true management: endpoints: all: enabled: true prometheus: enabled: true
Scrape latency histograms, connection pool gauges, and executor queue depths. Watch http.server.requests
p99 and Netty event loop utilization.
4. Verify Configuration Resolution
# application.yml endpoints: env: enabled: true sensitive: false
Use the /env
endpoint (guarded in non-prod) to inspect property sources and effective values. Conflicts immediately surface here.
5. Bean Introspection Checks
// DTO.java @Introspected public record UserDto(String id, String email) {}
If using Micronaut Serialization instead of Jackson, ensure the module is on the classpath and classes are annotated or discovered. For Jackson, ensure modules (Java Time, JDK8) are loaded.
6. Native Image Dry-Run
# GraalVM native-image agent on JVM run java -agentlib:native-image-agent=config-output-dir=build/native/agent -jar app.jar # Exercise endpoints, then build native native-image -H:ConfigurationFileDirectories=build/native/agent ...
The agent records reflection, proxy, and resource usage to avert analysis-time surprises.
Pitfalls and How to Recognize Them
Blocking Work on Event Loops
Symptoms include long GC pauses appearing concurrent with traffic spikes, but root cause is thread starvation. Netty loops must not wait on JDBC, cloud SDKs, or filesystem.
Ambiguous or Missing Qualifiers
Multiple beans of the same type without @Named
or custom qualifier cause wiring to flip based on classpath order or conditional activation, leading to environment-only failures.
Improper Bean Scope
Placing mutable caches or per-request state in @Singleton
creates race conditions. Conversely, excessive @Prototype
increases GC churn and startup time.
Mixing Serialization Stacks
Switching between Jackson and Micronaut Serialization in different modules can break polymorphic handling or custom serializers. Choose one per service unless you clearly isolate data boundaries.
Configuration Property Drift
Typos in @ConfigurationProperties
classes are silent by default if fields are nullable. Use @EachProperty
and validation annotations to fail fast on bad config.
Step-by-Step Fixes
1. Prevent Event-Loop Starvation
// Controller: mark blocking endpoints @Controller("/files") public class FileController { @ExecuteOn(TaskExecutors.BLOCKING) @Get("/download/{id}") public HttpResponse<StreamedFile> download(String id) { // blocking I/O here is safe on the blocking pool } }
# application.yml micronaut: executors: io: type: fixed nThreads: 64 blocking: type: fixed nThreads: 64
Default pools may be conservative for enterprise loads. Size blocking pools based on expected concurrency and downstream latencies. Audit all controllers and clients: anything that might block belongs off the event loop.
2. Stabilize Bean Scopes and Qualifiers
// Use explicit qualifiers @Singleton @Named("primaryPayment") class PrimaryPaymentService implements PaymentService { ... } @Singleton @Named("fallbackPayment") class FallbackPaymentService implements PaymentService { ... } @Singleton class Checkout(@Named("primaryPayment") PaymentService svc) { ... }
Make service lifetimes explicit and avoid mutable state in singletons unless guarded. Use @Prototype
only for objects that truly need per-injection lifecycle.
3. Fix Serialization and Validation
// DTOs with compile-time introspection @Introspected public class OrderRequest { @NotBlank String sku; @Min(1) int quantity; // getters/setters } // application.yml: pick one serialization stack micronaut: serialization: jackson: enabled: true # or # serialization: # json: # enabled: true
Enable Bean Validation integration and return 400 errors on invalid payloads to catch client issues early.
4. Right-Size HTTP Client and Datasource Pools
# application.yml micronaut: http: services: payment: url: https://payments.internal pool: enabled: true max-connections: 200 datasources: default: url: jdbc:postgresql://db/service driverClassName: org.postgresql.Driver maximum-pool-size: 50 minimum-idle: 10 leak-detection-threshold: 60000
Use Micrometer to watch pool utilization; ensure clients are closed or reused. Leaks show up as steadily increasing in-use connections without corresponding throughput gains.
5. Enforce Backpressure and Reactive Boundaries
// Controller returning reactive types @Controller("/stream") class StreamController { private final ReactiveService svc; StreamController(ReactiveService svc) { this.svc = svc; } @Get(produces = MediaType.APPLICATION_JSON_STREAM) Publisher<Event> events() { return Flux.from(svc.events()) .limitRate(512) .onBackpressureBuffer(1024); } }
When bridging blocking data stores to reactive controllers, use Schedulers.boundedElastic()
or Micronaut executors to offload, and cap buffers to avoid unbounded memory growth.
6. Make Configuration Binding Robust
// Strongly-typed config with validation @EachProperty("catalog.clients") @Introspected public class ClientConfig { @NotBlank private String url; @Positive private int timeoutMs = 2000; // getters/setters }
# application.yml catalog: clients: price: url: https://price.internal timeout-ms: 1500
Use @EachProperty
for maps of configs and add Bean Validation to fail fast in CI when properties are missing or malformed.
7. Harden Native Image Builds
# Run with agent to capture reflection usage java -agentlib:native-image-agent=config-output-dir=build/ni -jar app.jar # Exercise endpoints, then native-image -H:ConfigurationFileDirectories=build/ni -jar app.jar
// resource-config.json example { "resources": { "includes": [ { "pattern": "application.*\\.yml" } ] } }
Include JSON/YAML, SQL migrations, and service descriptors. For proxy-heavy libraries, add proxy-config.json
, or prefer Micronaut-native clients that avoid dynamic proxies.
8. Guard External Calls with Resilience
// Retry + Circuit breaker @Singleton class BillingClient { private final HttpClient client; BillingClient(@Client("billing") HttpClient client) { this.client = client; } @Retryable(attempts = 3, delay = "100ms") @CircuitBreaker(reset = "5s", failureRatio = 0.5, requestVolumeThreshold = 20) public String charge(String id) { return client.toBlocking().retrieve(HttpRequest.POST("/charge", id)); } }
Instrument retries and breakers with Micrometer tags. Without guardrails, transient upstream failures propagate as mass timeouts and pool exhaustion.
9. Observability: Tracing and Correlation
# application.yml (OpenTelemetry) otel: traces: exporter: otlp sampler: parentbased_always_on micronaut: tracing: enabled: true
Propagate trace context across HTTP clients and messaging to stitch cross-service timelines. Correlate slow spans with executor pools and DB queries.
10. Graceful Shutdown and Draining
# application.yml micronaut: server: shutdown: graceful: true quiet-period: 5s timeout: 30s
Allow in-flight requests to complete and downstream pools to release resources, preventing connection resets during rolling updates.
Advanced Diagnostics Playbook
Thread Dump Forensics
Capture three thread dumps at 5-second intervals during an incident. If event loops are blocked on sun.nio.ch
or JDBC calls, you've located a blocking boundary violation. If blocking pools are saturated while loops are idle, examine queue sizes and offload boundaries.
Heap and Allocation Profiling
Use JFR to profile allocation pressure. Spikes around serialization indicate DTO copying or misconfigured JSON parser. Reuse buffers where safe, and prefer Micronaut Serialization for lower overhead when you control both ends.
HTTP Wire Logs
Enable DEBUG on io.micronaut.http.client
to verify connection reuse and redirect handling. If every request establishes a new TLS session, tune the client pool and SSL session cache.
Database Slow Query Hunting
Correlate http.server.requests
p99 with JDBC metrics and database slow query logs. If time is spent server-side, add indexes or reduce N+1 with batch fetch. If client waits on the pool, raise maximum-pool-size
temporarily and watch saturation; a small increase can remove head-of-line blocking.
Performance Optimizations and Patterns
Cold Start Minimization
- Trim classpath: exclude unused starters; each adds beans and configuration evaluation cost.
- Eagerly initialize only what's needed: avoid premature
@Context
beans that trigger heavy clients at startup. - For serverless, consider native image; for JVM, enable tiered compilation and CDS archives.
JSON Fast Path
- Prefer Micronaut Serialization for DTOs under your control; it uses compile-time codecs and reduces reflection.
- For Jackson, register Afterburner and JavaTime modules, and avoid polymorphic deserialization on hot paths.
Connection Economics
- Match HTTP client pool size to expected concurrency and upstream limits; avoid "thundering herd" retries.
- Use
keep-alive
and tune read/write timeouts per upstream latency distributions.
Netty and Executors
- Keep event loops small (per-core) and predictable; move variability to worker pools.
- Bound queues on blocking pools; measure queue time as a first-class SLI.
Data Access
- For JPA, prefer fetch joins for aggregates; mark read-only transactions to skip flush costs.
- For R2DBC, ensure schedulers are used when bridging to blocking libraries; avoid implicit blocking at drivers.
Long-Term Fixes and Governance
Configuration Policy
Enforce typed config classes with validation, document precedence, and prohibit raw string access in application code. Add contract tests that boot the app with stage-like property sets to catch drift early.
Module Boundaries
Define clear interfaces between web, domain, data, and integration modules. Keep serialization models at the edge and domain models internal to prevent accidental coupling and costly DTO churn.
Operational SLOs
Track p95/p99 latencies, error rates, and pool utilizations. Tie release gates to SLO guardrails; if regressions exceed error budgets, auto-create rollback and incident tickets.
Concrete Troubleshooting Scenarios
Scenario A: Spiky Latency After a Library Upgrade
Symptom: p99 doubled; CPU low; GC normal. Hypothesis: a new SDK added blocking DNS or file I/O on event loops. Action: enable DEBUG on HTTP client and Netty; capture thread dumps; annotate controllers with @ExecuteOn
; move SDK calls to blocking executor; verify via profiler.
Scenario B: Native Image Works Locally, Fails in CI
Symptom: "UnsupportedFeatureError" in CI binary only. Hypothesis: missing resource or reflection config not captured locally. Action: run agent in CI smoke tests; ensure tests hit all endpoints; collect configs into the build; parameterize image build to include resource-config.json
, reflect-config.json
, proxy-config.json
.
Scenario C: Intermittent 503 from Upstream
Symptom: bursts of 503 with coincident pool exhaustion. Hypothesis: retries amplify load; client pool too small. Action: cap retries with jitter, increase max-connections
, add circuit breaker; instrument success vs retry rate; coordinate limits with upstream team.
Best Practices Checklist
- Audit every controller for blocking behavior; use
@ExecuteOn
and size executors prudently. - Adopt one serialization stack per service and add
@Introspected
to DTOs. - Validate configuration with
@EachProperty
+ Bean Validation; test precedence in CI. - Instrument Micrometer metrics and OpenTelemetry tracing; budget overhead explicitly.
- Right-size HTTP and DB pools; monitor saturation, queue time, and error codes.
- Automate native-image agent runs to update configs continuously as dependencies change.
- Document bean scopes and qualifiers; avoid mutable state in singletons.
Conclusion
Micronaut's strengths—AOT DI, low overhead, and Netty-based I/O—enable exceptional performance at scale, but they demand architectural discipline. Most production incidents trace back to a handful of patterns: blocking event-loop code, fragile configuration, ambiguous bean wiring, and insufficient observability. By adopting explicit offload boundaries, typed and validated configuration, consistent serialization, and robust metrics and tracing, you can maintain sub-p99 latencies and predictable resource usage even under bursty workloads. Combine these technical fixes with governance—SLOs, config policies, and module boundaries—to keep large Micronaut estates resilient as teams and features grow.
FAQs
1. How do I choose between Jackson and Micronaut Serialization?
Prefer Micronaut Serialization when you control both producer and consumer and want minimal overhead; it leverages compile-time codecs. Choose Jackson for ecosystem breadth (polymorphism, modules) but budget for reflection and configuration complexity.
2. What's the safest way to integrate blocking JDBC with reactive controllers?
Keep controllers reactive for streaming but offload blocking DB calls to a bounded executor using @ExecuteOn
or reactive bridges. Cap buffers and apply backpressure operators to prevent memory growth under burst traffic.
3. Why does my configuration work locally but not in Kubernetes?
Environment variables override YAML keys and may change casing or separators. Inspect the /env
endpoint (secured) to see precedence and ensure your @ConfigurationProperties
classes match the resolved names exactly.
4. How can I reduce native-image surprises after dependency upgrades?
Automate a CI job that runs the native-image agent during smoke tests and commits updated configs. Prefer libraries with published reachability metadata and avoid dynamic proxies where Micronaut clients exist.
5. What metrics should gate a Micronaut release?
Track p95/p99 per endpoint, error rate, client and datasource pool saturation, executor queue time, and GC pause time. Block releases if any exceed budgeted thresholds compared to the previous baseline.