Background: Why Clojure Troubleshooting Feels Different
Clojure sits at the intersection of dynamic development and the JVM's deterministic runtime. Runtime dynamism—reloading code, macro expansion, late binding—helps teams ship faster, but it also shifts many failures from compile-time to run-time. Troubleshooting must therefore combine JVM profiling with Clojure-aware techniques: inspecting Vars, analyzing lazy sequences, surfacing reflection, and auditing concurrency primitives for semantic, not just mechanical, correctness.
Enterprise Contexts Where Issues Emerge
- High-throughput services using Ring/Jetty or Pedestal with JSON/Transit payloads.
- Data pipelines using core.async, Manifold, or Onyx with Kafka and JDBC backends.
- Event-driven systems with transducers and streaming joins.
- Low-latency components in trading or recommendations using primitive math and custom interop.
Architectural Implications
The JVM Substrate
Clojure compiles to JVM bytecode. The garbage collector, JIT compilation, and classloader behavior govern latency, memory, and warm-up. A misunderstanding of JVM ergonomics (e.g., container memory limits or GC tuning) often masquerades as "Clojure is slow" when the root cause is the runtime.
Immutability and Persistent Data Structures
Clojure's persistent vectors, maps, and sets offer structural sharing. They reduce copying costs but can inadvertently retain large object graphs if long-lived references to old versions are kept. Retention rather than allocation becomes the primary leak vector.
Lazy Evaluation
Sequences are lazy by default. Laziness saves work but can defer I/O and exceptions, amplify memory retention, and cause head-of-line blocking if not realized in controlled scopes. Troubleshooting requires pinpointing where evaluation actually occurs.
Interop and Reflection
Dynamic calls to Java without type hints trigger reflection, which slows hot paths and obscures stack traces. In tight loops this becomes catastrophic, especially with boxing/unboxing of primitives.
Builds, Classpath, and AOT
Leiningen or tools.deps manage dependencies, but transitive conflicts, AOT mismatches, and shaded artifacts can break prod-only paths. Overzealous AOT inflates images and introduces classloader surprises in REPL-based deployments.
Diagnostics: A Systematic Workflow
1) Reproduce the Symptom and Bound the Blast Radius
Capture a minimal repro or failing request signature (headers, body size, downstream calls). Confirm whether the regression is tied to data shape, traffic volume, or environment (JDK version, GC, kernel).
2) JVM Health Checks First
- Thread dumps:
jstack
to spot deadlocks, blocked threads, or runaway pools. - Heap/GC:
jcmd GC.heap_info
,jstat -gcutil
, or Flight Recorder to observe allocation/retention. - Async-profiler/clj-async-profiler: flame graphs for CPU/alloc hotspots.
jcmd <PID> Thread.print jcmd <PID> GC.heap_info jfr start name=clj-record settings=profile filename=/tmp/rec.jfr duration=120s # then analyze in JMC
3) Turn on Clojure-Specific Signals
- Reflection warnings globally:
(set! *warn-on-reflection* true) ; or via JVM: -Dclojure.compiler.warn-on-reflection=true
- Spec instrumentation in non-prod to catch shape errors early.
- Enable structured logs around lazily-computed pipelines to see evaluation boundaries.
4) Confirm Classpath and Dependency State
Dump the classpath and dependency graph. Look for duplicate jars, mixed Scala/Guava versions, or different Jackson modules:
; tools.deps clj -Stree ; Leiningen lein deps :tree
5) Observe the REPL/Reloading Lifecycle
Long-lived processes that hot-reload can accumulate stale singletons, closed executors, or swapped-out Vars held by captured closures. Force a clean restart to differentiate reload bugs from logic errors.
Common Symptoms Mapped to Root Causes
High CPU Under Load with "mostly idle" business logic
- Reflection in hot loops.
- Boxed arithmetic with primitive-heavy workloads.
- Inefficient seq ops (e.g.,
concat
inside a loop) rather than transducers orinto!
.
Growing Heap / GC Thrash
- Retained lazy seq heads (unrealized tails referencing large inputs).
- Unbounded caches/memoization maps without eviction.
- Leaky JDBC result sets or streams not closed.
Intermittent Timeouts or "500" Spikes
- core.async
go
blocks calling blocking I/O. - Executor starvation due to blocking ops on a small pool.
- Downstream backpressure ignored; retries amplifying load.
Prod-only ClassNotFound or MethodNotFound
- AOT mismatch across modules or stale compiled classes on classpath.
- Shaded dependency conflicts, especially JSON/Jackson, logging, or HTTP clients.
Step-by-Step Fixes
Eliminate Reflection and Boxing
Add precise type hints and use primitive ops in tight loops. Inspect compilation with *warn-on-reflection*
and benchmark again.
(set! *warn-on-reflection* true) (defn ^long dot (^long [^longs a ^longs b] (let [^long len (alength a)] (loop [^long i 0 ^long acc 0] (if (< i len) (recur (inc i) (unchecked-add acc (unchecked-multiply (aget a i) (aget b i)))) acc)))))
Prefer unchecked-*
where overflow semantics are acceptable. For string building on hot paths, interop with StringBuilder
.
Control Laziness and Prevent Retention
Realize sequences at clear boundaries; never hold onto the head when the tail references large inputs.
;; WRONG: returns lazy seq; upstream collection is retained (defn lines-from [rdr] (line-seq rdr)) ; ; RIGHT: realize within scope, then close (defn slurp-lines [f] (with-open [r (clojure.java.io/reader f)] (doall (line-seq r)))) ; ; Force realization when side-effects are the point (dorun (map println big-seq))
When building pipelines, prefer transducers to avoid intermediate collections.
(def xform (comp (filter even?) (map #(* % %)))) (transduce xform + (range 1000000))
core.async: Prevent Deadlocks and Starvation
go
blocks are for non-blocking ops. Use thread
for blocking I/O or wrap with dedicated executors.
(require '[clojure.core.async :as a]) ; ; WRONG: blocking call inside go (a/go (slurp "http://...")) ; ; RIGHT: offload blocking work (a/>!! (a/chan 1) :start) (a/<!! (a/thread (slurp "http://...")))
Make channel buffers explicit and bounded; layer backpressure with timeouts and alts.
JDBC/next.jdbc: Close Everything
Connection leaks devastate pools. Prefer next.jdbc with try-with-resources and keep result sets narrow.
(require '[next.jdbc :as jdbc]) (def ds (jdbc/get-datasource {:dbtype "postgres" :dbname "app"})) (with-open [conn (jdbc/get-connection ds)] (jdbc/execute! conn ["select id from t where ts > ?" cutoff]))
Tune pool sizes for your concurrency model; match thread pools to HikariCP max size to prevent thundering herds and contention.
Logging: Structured and Bounded
Adopt structured logging (JSON) with request correlation IDs. Avoid logging full payloads in hot paths or secrets. For bursty logs, enable asynchronous appenders with bounded queues.
Dependency Hygiene and Classpath Control
Pin versions, use pedantic mode, and minimize duplicate jars.
;; tools.deps.edn {:deps {org.clojure/clojure {:mvn/version "1.11.1"}} :mvn/repos {"central" {:url "https://repo1.maven.org/maven2/"}} :aliases {:prod {:jvm-opts ["-Dclojure.compiler.warn-on-reflection=true"]} :pedantic {:pedantic :abort}}}
When conflicts arise, use :override-deps
or exclusions, then validate with integration tests.
AOT Only Where Needed
Compile just entrypoints (-main
) to stabilize startup without freezing dynamic loading everywhere.
; Leiningen :aot [my.app.core] ; # tools.build snippet (compile-clj {:basis basis :ns-compile ['my.app.core]})
Container and GC Tuning
Right-size the JVM in Kubernetes. Use container-aware flags and modern GCs for better tail latencies.
JAVA_TOOL_OPTIONS="-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+UseStringDeduplication -XX:+UnlockExperimentalVMOptions -XX:+UseContainerSupport -XX:MaxRAMPercentage=70.0 -XX:InitialRAMPercentage=70.0"
For ultra-low-latency services on recent JDKs, consider Shenandoah or ZGC. Validate with realistic load, not microbenchmarks.
REPL/Redeploy Hygiene
Hot-reload safely by isolating states behind protocols and components (e.g., Integrant, Component). Provide start/stop hooks; on reload, rewire dependencies rather than mutating singletons in-place.
(defprotocol ServerLife (start! [this]) (stop! [this])) (defrecord JettyServer [inst port] ServerLife (start! [this] (assoc this :inst (jetty/run-jetty handler {:port port :join? false}))) (stop! [this] (some-> inst .stop)))
Safeguard "eval" and EDN Reading
Never feed untrusted input to read-string
or eval
. Use clojure.edn/read-string
with readers disabled, and sanitize payloads.
(require '[clojure.edn :as edn]) (edn/read-string {:readers {} :default (fn [_ _] nil)} safe-edn)
Time and Scheduling
Prefer java.time
interop. Freeze "now" in tests via dependency injection to avoid flakiness. In prod, set container TZ explicitly and avoid JVM default changes between images.
Performance Playbook
Data Transformations
Use transducers for streaming pipelines to avoid intermediate allocations. For hash-heavy workloads, upgrade to Clojure versions with map performance improvements and consider array-map
for very small maps in hot paths.
HTTP and JSON
Favor non-blocking clients where end-to-end is async; otherwise, bound the worker pool. Use JSON libraries that stream (Jackson) and avoid keywordizing arbitrary keys to limit interning.
Parallelism
Reducers and pmap
can help on CPU-bound tasks, but measure: task granularity and GC costs often dominate. Tie pool sizes to CPU cores and downstream capacity.
Caching
Use core.cache
or Caffeine via interop with explicit TTL and max-size. Guard caches with circuit breakers to avoid stampedes on expiry.
Observability Patterns
Metrics
Expose RED/USE metrics per route and dependency: request rate, errors, latency percentiles, thread/queue utilization. Tag by tenant and feature flag.
Tracing
Propagate trace IDs through async boundaries (core.async channels, futures). Wrap entry/exit with tracing spans; export to your collector.
Logging
Log decision points, not just failures. For lazy pipelines, log sizes at boundaries to prevent silent bloat.
Pitfalls and Anti-Patterns
- Mixing
go
blocks with blocking I/O; leads to starvation. - Leaning on global mutable state; reload and tests become flaky.
- Unbounded memoize causing heap growth.
- Over-AOT'ing, breaking dynamic loading and hot-swap.
- Ignoring reflection warnings; death by a thousand micro-stalls.
- Overusing macros where a function suffices; harder stack traces and refactors.
End-to-End Runbook: Latency Regression After a Minor Release
1) Snapshot Baselines
Compare p50/p95/p99, GC pauses, and CPU before/after. Confirm no infra drift (JDK image, container limits).
2) Enable Reflection Warnings and Rebuild
Scan build logs for hot-path namespaces. Add type hints and primitive ops. Re-deploy; re-measure.
3) Heap/Alloc Profiling
Use async-profiler alloc flame graph to find hotspots. Replace concat
with into
or transducers; stop building intermediate vectors in loops.
4) Check Dependencies
Run clj -Stree
. If a JSON or HTTP client upgraded, verify its defaults (timeouts, buffer sizes). Pin back or tune.
5) Verify Thread Pools
Look for starvation: queued tasks and saturated pools. Decouple CPU-bound from I/O-bound pools; set explicit sizes.
6) Roll Out Canary
Use per-endpoint feature flags and traffic splitting to validate improvements safely.
Security and Compliance Considerations
- Never expose nREPL on untrusted networks; gate behind SSH or remove in prod.
- Scrub PII from logs; adopt schema-based redaction.
- Lock dependency versions; review transitive licenses for audits.
Sample Diagnostics Snippets
Detecting Reflection at Build Time
clj -J-Dclojure.compiler.warn-on-reflection=true -M -e '(compile 'my.app.core)'
Finding Retained Lazy Heads
(defn pipeline [xs] ;; BAD: returns a lazy seq captured by a long-lived var (map expensive xs)) ; (defn pipeline! [xs] ;; GOOD: realize and bound the lifetime (into [] (map expensive) xs))
Backpressure with core.async
(def in (a/chan 100)) (def out (a/chan 100)) (a/pipeline-blocking 8 out (map enrich) in) ;; timeouts to prevent infinite waits (a/alt!! out ([v] v) (a/timeout 5000) :timeout)
Long-Term Solutions and Governance
Performance Budgets and Contracts
Define budgets for heap, allocation rate, and tail latency per service. Enforce in CI with smoke-load tests and fail builds that regress beyond thresholds.
Dependency and Build Governance
Centralize approved library versions; run pedantic checks in CI. For polyglot repos, standardize JDK version and GC flags to avoid environment drift.
Operational Playbooks
Codify "jstack-jfr-gc" triage steps, common classpath conflicts, and "known bad" dependency combos. Put examples in runbooks with copy-paste commands.
Training and Code Review Checklists
- Reflection audit for new modules.
- Lazy-to-strict boundaries documented.
- core.async usage reviewed for blocking calls.
- Resource handling: with-open everywhere.
Conclusion
Clojure's power lies in simplicity and excellent leverage of the JVM, but those same strengths create unique failure modes in enterprise-scale systems. Effective troubleshooting blends JVM-level profiling with Clojure-specific practices: eliminating reflection, bounding laziness, taming async semantics, and governing builds. The fastest path to reliability is institutionalizing these patterns—instrumentation-first design, dependency hygiene, and disciplined concurrency—so incidents become rare, brief, and easy to root-cause.
FAQs
1. How do I know if reflection is hurting performance?
Turn on *warn-on-reflection*
in builds and profile with async-profiler. If warnings coincide with hot-stack frames and allocation spikes, add type hints and primitives; re-measure to confirm wins.
2. What's the safest way to handle lazy sequences in services?
Realize at module boundaries, return realized collections to callers, and never store lazy heads in long-lived state. Prefer transducers for streaming transforms to avoid intermediate structures.
3. When should I use go
versus thread
in core.async?
Use go
for non-blocking channel ops and pure computation that yields often; use thread
(or a dedicated executor) for blocking I/O. Mixing blocking calls inside go
starves the dispatch pool.
4. Should I AOT compile my entire app?
No. AOT only entry namespaces to improve startup and packaging. Over-AOT'ing reduces dynamism, complicates classloading, and increases the risk of prod-only linkage errors.
5. How do I prevent memory leaks with immutable data?
Leaks are usually "retention leaks": holding references to old versions or lazy seq heads. Bound lifetimes, clear caches with size/TTL, avoid global state, and ensure I/O resources are closed promptly.