Background: Why Rake's Incremental Model Breaks at Scale
How Rake Decides What to Rebuild
Rake's core heuristic is simple: a file task is 'needed' when the output's modification time (mtime) is older than any prerequisite. In code, this is surfaced via the task's timestamp
and needed?
checks. When the environment is a single machine with a coherent filesystem, this is reliable. At scale, however, distributed CI runners, container mounts, and networked filesystems can distort mtimes in subtle ways.
- Timestamp precision: Some filesystems round to 1s (or worse). Fast build steps may write inputs and outputs within the same second, making mtimes equal and thereby 'up-to-date' when they are not.
- Clock skew: If writer and reader nodes disagree on time, an artifact produced on Node A can appear older than its prerequisites on Node B, triggering infinite rebuilds or, inversely, skipped rebuilds.
- Copy semantics: Tools that preserve source mtimes (e.g.,
cp -p
, certain archive extractors) can yield outputs with earlier mtimes than their inputs. - Overlay and remote filesystems: Docker overlay filesystems and NFS/SMB can delay mtime updates or reorder visibility under concurrency, confusing Rake's dependency checks.
Why This Matters to Architects
In large mono-repos and microservice fleets, small errors in incremental rebuild logic snowball into cost and reliability problems. CI pipelines that oscillate between miss and hit on caches waste compute. Release candidates built from stale artifacts breach compliance and reproducibility guarantees. The net effect is longer feedback loops, flaky builds, and hidden operational risk.
Architecture: Where Rake Meets Distributed Reality
Pipeline Topologies That Amplify the Problem
- Ephemeral CI runners: Each job starts on a fresh VM/container. Restored caches bring files with historical mtimes; if the runner's clock is ahead/behind, comparisons misfire.
- Remote caches and artifact stores: Artifact downloads may preserve original mtimes or apply current time inconsistently, causing false positives or negatives in incremental checks.
- Polyglot builds: Rake orchestrates tools in Rust/Go/Node/Java. Some toolchains write temp files then rename (atomic write; fresh mtime). Others preserve input mtimes for reproducibility, defying Rake's expectations.
- Container bind mounts: Host↔container mount options can change timestamp update behavior, especially under high concurrency.
The Subtle Role of 'Equal mtimes'
On many filesystems the write granularity is 1s. When prerequisite and output share the same mtime to the second, a naive < comparison treats the output as fresh. Rake historically treats equal timestamps as 'not needed'. This is correct for slow builds, but wrong for modern, sub-second steps.
Diagnostics: Proving You Have a Timestamp/Skew Problem
1) Instrument Rake's View of Time
Enable tracing and print task timestamps and prerequisites at decision time. This reveals whether Rake is skipping work due to equal or inverted mtimes.
# Rakefile snippet to trace timestamps require 'rake' module TraceNeeded def needed? n = super if Rake.application.options.trace prereqs = prerequisite_tasks.map { |t| [t.name, t.timestamp.utc] } $stderr.puts %(NEEDED?=#{n} TASK=#{name} TS=#{timestamp.utc} PREREQS=#{prereqs.inspect}) end n end end Rake::FileTask.prepend(TraceNeeded)
Run with:
rake --trace build:all
2) Inspect Filesystem Precision and Clock Skew
Compare host and container clocks, then check file mtime resolution and monotonicity.
# On each node or container date -u ruby -e 'f = ARGV.first; File.write(f, "x"); puts File.mtime(f).strftime("%F %T.%N")' tmp.touch stat tmp.touch # Check displayed precision
If mtimes end with .000000000
despite rapid writes, you have coarse granularity. If two runners differ in date -u
beyond a few milliseconds, clock skew is likely undermining correctness.
3) Reproduce Equal-mtime Skips
Create a fast rule where the output is written in the same second.
# Rakefile file 'out.txt' => ['in.txt'] do sh 'printf updated > out.txt' end task :demo do sh 'printf seed > in.txt' Rake::Task['out.txt'].invoke puts 'first done' sh 'printf seed2 > in.txt' Rake::Task['out.txt'].reenable Rake::Task['out.txt'].invoke puts File.read('out.txt') # Might still be \"updated\" if equal mtimes end
If the second invoke prints the prior content, the equal-mtime case is confirmed.
4) Check Copy/Extract Semantics
Some CI restore steps preserve archival mtimes. Detect this by comparing content digests with mtimes.
shasum -a 256 artifacts/*.tar.gz for f in artifacts/*; do echo "$f - $(stat -c %y "$f")" done
5) Parallel Hazards
With -j
, two tasks must not write the same target. Add guards that fail fast upon concurrent writes.
# Guard: fail if target is already being built BUILD_LOCK = Mutex.new BUILDING = {} module NoDoubleBuild def invoke(*a) BUILD_LOCK.synchronize do raise \"Concurrent build of #{name}\" if BUILDING[name] BUILDING[name] = true end super ensure BUILD_LOCK.synchronize { BUILDING.delete(name) } end end Rake::FileTask.prepend(NoDoubleBuild)
Common Pitfalls When Attempting Fixes
- Blind 'touch' everywhere: Forcing newer mtimes masks the root cause and can cause rebuild storms whenever a cache is restored.
- Relying on sleep: Adding
sleep 1
to force mtime differences 'works' until a slower filesystem or clock skew reintroduces flakiness. It also hurts performance. - Preserving mtimes on outputs: Using
cp -p
or archivers that keep source times can invert dependency order. - Assuming time zones matter: Rake compares epoch seconds; time zones have no bearing. Chasing TZ config is a red herring.
- Ignoring Docker mount modes: On some platforms, mount consistency flags change update propagation timing, affecting visibility of fresh mtimes.
Step-by-Step Fixes
1) Enforce Clock Discipline Across Runners
Adopt a robust time sync strategy (e.g., chrony) on every host used for builds. In containerized CI, expose host time sync or run a time sync sidecar. Validate with a preflight job that fails if skew > 50ms.
# CI preflight (bash) S=$(date +%s%3N) echo "UTC ms: $S" # Optionally hit a trusted NTP-aware time service in your infra and compare # Fail if |host - reference| > 50ms
2) Favor Atomic Writes to Produce Fresh mtimes
Tools should write to a temporary file and then rename
. This guarantees a new inode with a fresh mtime even on coarse filesystems.
# Ruby helper for atomic writes def atomic_write(path, contents) tmp = "#{path}.tmp-#{Process.pid}-#{rand(1_000_000)}" File.open(tmp, 'wb') { |f| f.write(contents) } File.rename(tmp, path) end
3) Stop Depending on mtimes Alone: Content Signatures
The most durable fix is to extend Rake's needed?
check to include content hashes of prerequisites, caching signatures in sidecar files. If any prerequisite's digest changes, rebuild; otherwise, treat the target as fresh even if mtimes wobble.
# Gemfile # gem 'digest' is stdlib; ensure Ruby ≥ 2.5 for stable APIs # Rakefile: Content-aware FileTask require 'digest' class DigestingFileTask < Rake::FileTask SIG_EXT = '.sig' def signature_path name + SIG_EXT end def prereq_signatures prerequisite_tasks.map do |t| if File.exist?(t.name) [t.name, Digest::SHA256.file(t.name).hexdigest] else [t.name, ''] end end end def write_signature! atomic_write(signature_path, prereq_signatures.map { |n,h| "#{n} #{h}\n" }.join) end def stored_signature return {} unless File.exist?(signature_path) File.read(signature_path).lines.map { |l| n,h = l.split; [n,h] }.to_h end def needed? return true unless File.exist?(name) current = prereq_signatures.to_h previous = stored_signature return true if previous.empty? current != previous # Hash delta decides end end def digest_file(name, *prereqs, &block) t = DigestingFileTask.define_task(name => prereqs, &block) t.enhance { t.write_signature! } t end # Usage digest_file 'dist/app.bundle' => ['src/a.rb', 'src/b.rb'] do sh 'ruby build.rb' end
This pattern preserves fast incremental builds and resists skew, overlay delay, and equal-mtime problems. It also makes cache keys explicit and auditable.
4) Normalize Archive and Copy Semantics
Ensure that extraction and copying steps produce new mtimes for outputs. Prefer content-addressed artifact paths or explicitly 'touch' outputs only after successful writes, not inputs. Do not preserve source mtimes for build outputs.
# Avoid cp -p when copying into build outputs cp src/app.min.js dist/app.min.js # Or write via atomic helper to guarantee fresh mtime
5) Make Parallelism Safe
With rake -j
, ban shared output targets and detect accidental overlap early. Prefer per-target temp directories to avoid interleaving of partial files from multiple workers.
# Pattern: unique temp dir per target rule /^dist\/.+\.o$/ => [proc { |t| t.sub(/\.o$/, '.c') }] do |t| tmp = "build/tmp/#{File.basename(t.name)}-#{Process.pid}" sh "mkdir -p #{tmp}" sh "cc -c #{t.source} -o #{tmp}/out.o" File.rename("#{tmp}/out.o", t.name) end
6) Make 'phony' Truly Phony
Phony tasks do not map to files and thus should not be conflated with file targets. Use them as orchestration only; never reuse a file path as both a file task and a phony task name.
task :build # phony wrapper file 'dist/app.bundle' => SOURCES do sh 'ruby build.rb' end task :build => 'dist/app.bundle'
7) Stabilize Inputs in Polyglot Pipelines
When Rake orchestrates other build systems, emit deterministic outputs: atomic writes, content hashing, and a manifest of input digests. Have Rake depend on the manifest, not solely on file mtimes.
# Other tool emits manifest.json with SHA256 for inputs file 'dist/manifest.json' => INPUTS do sh 'node build.js' end digest_file 'dist/app.bundle' => ['dist/manifest.json'] do # Copy from tool output only if manifest changed sh 'cp build/out/app.bundle dist/app.bundle' end
8) Harden CI: Cold Start and Cache Restore
On cache restore, normalize mtimes to 'now' for outputs only, or store a sidecar signature file to drive rebuild decisions. This avoids accidental 'fresh' outputs whose content no longer matches inputs.
# After restoring cache in CI find dist -type f -exec touch {} \; # Normalize output mtimes # Better: rely on .sig files as in DigestingFileTask
9) JRuby vs MRI Considerations
JRuby may surface different filesystem timestamp precision via Java NIO; verify precision in your runtime and align all runners to the same Ruby implementation to minimize cross-run variance. When mixing Ruby versions, pin the version per pipeline stage and record it in artifacts.
10) Observability for Build Correctness
Expose metrics: number of tasks executed, cache hit/miss counts, and reasons for rebuilds. Persist the decision log for a small window (e.g., last 50 builds) to track regression.
# Minimal event bus for decisions DECISIONS = [] module DecisionLog def needed? n = super DECISIONS << { name: name, needed: n, ts: timestamp.to_i } n end end Rake::FileTask.prepend(DecisionLog) at_exit do File.write('.build_decisions.json', JSON.pretty_generate(DECISIONS)) end
Deep Dive: Designing a Deterministic Rake Build
Goal: Hermetic, Hash-driven Incrementalism
Modern build systems moved from time to content. You can emulate this in Rake without abandoning your investment. The pattern: compute a digest for every input, combine into a target signature, rebuild only when the signature changes, and publish the signature along with the artifact for cacheability and audit.
Implementation Skeleton
# signature.rb require 'digest' def file_digest(path) return '' unless File.exist?(path) Digest::SHA256.file(path).hexdigest end def signature_for(paths) digests = paths.map { |p| [p, file_digest(p)] } Digest::SHA256.hexdigest(digests.map { |p,h| "#{p}=#{h}" }.join('; ')) end
# Rakefile (integrating signature) require_relative 'signature' def signed_target(target, inputs) sig = "#{target}.sig" file target => inputs do sh 'ruby build.rb' File.write(sig, signature_for(inputs)) end file sig => inputs do File.write(sig, signature_for(inputs)) end task :verify => sig do current = signature_for(inputs) recorded = File.read(sig) rescue '' abort 'stale build' unless current == recorded end end signed_target 'dist/app.bundle', FileList['src/**/*.rb']
This approach is fast for small inputs and robust for large ones if you scope digests to coarse-grained manifests (e.g., a list of packages) rather than every file for every target.
Best Practices Checklist
- Pin Ruby runtime per pipeline and verify filesystem timestamp precision at startup.
- Use atomic writes for all generated files, never
cp -p
into the build outputs. - Replace pure mtime logic with content signatures for critical targets.
- Fail the build if clock skew exceeds a strict threshold.
- Separate 'phony' orchestration from file targets; never reuse names.
- Guarantee unique output paths per parallel job to avoid races.
- Record and publish decision logs and signatures alongside artifacts.
- Normalize restored caches: either rewrite mtimes or rely solely on signature files.
- Document and version the build environment (OS, Ruby, filesystem type, container base).
- Continuously load-test the build graph with synthetic changes to validate incremental behavior.
Case Study: Flaky CI After Migrating to Containers
Symptoms
A team moved from VM-based CI to containerized runners backed by a networked filesystem. After migration, Rake builds randomly rebuilt large subgraphs or skipped necessary steps, depending on which runner executed which stage.
Findings
- Overlay filesystem reported 1s mtime granularity; tasks writing outputs within the same second as inputs were treated as up-to-date.
- Cache restores preserved mtimes of outputs from prior runs.
- Two runner groups had 200ms clock skew due to misconfigured time sync.
Fixes Applied
- Enabled chrony across all nodes; preflight failed builds when skew > 50ms.
- Adopted atomic writes for tool outputs; removed all
cp -p
usages. - Implemented
DigestingFileTask
for top-level artifacts only (kept standardfile
tasks for leaf steps). - Normalized artifact mtimes on cache restore, then relied on signature files for correctness.
Outcome
Cache hit rate improved from ~40% to ~92%, build times dropped by 35%, and 'stale artifact' incidents went to zero over the next quarter. The decision logs gave auditors a tamper-evident trail of why each artifact was rebuilt.
Operational Playbooks
When Builds Randomly Re-run (Phantom Rebuilds)
- Collect decision logs or enable
--trace
with timestamp instrumentation. - Compare mtimes of top offending targets and their prerequisites; look for equal or inverted times.
- Check cache restore logic for preserved mtimes on outputs.
- Verify runner clock sync; repair and re-test.
- Introduce signature checks on the hot path targets.
When Rebuilds Are Skipped but Outputs Are Stale
- Reproduce with a fast edit to a prerequisite and inspect mtimes to the second.
- Ensure outputs are written via atomic rename to guarantee fresh mtimes.
- Replace mtime logic with content signatures; audit that signature files update.
- Audit third-party tools for 'preserve times' flags or behaviors.
Hardening Parallel Builds
- Static analysis: no two tasks produce the same path.
- Use per-target temp dirs and atomic rename into final location.
- Guard against double invocation with a global 'building' registry during a job.
Security and Compliance Considerations
Deterministic builds are not just about speed; they are foundational to supply-chain integrity. Signature files let you verify that an artifact corresponds to a precise set of inputs. Combined with checksums and SBOMs generated post-build, they provide an evidentiary trail for audits. Avoid 'touch'-based fixes that can accidentally advance mtimes without content change, creating opportunities for confusion or manipulation.
Conclusion
Rake's timestamp-based incremental model is elegant but brittle in distributed, containerized, and high-parallelism environments. The path to reliable, fast builds is threefold: stabilize time (clock sync and atomic writes), eliminate filesystem precision pitfalls (avoid preserving mtimes, ensure fresh outputs), and upgrade the decision rule from time to content (signatures and manifests). With these measures, architects can retain Rake's simplicity while achieving the determinism and correctness demanded by modern enterprise delivery pipelines.
FAQs
1. Can I fix equal-mtime issues by setting a global 'timestamp granularity' in Rake?
No. Rake does not expose a global granularity knob. The reliable fix is to ensure outputs receive fresh mtimes via atomic writes or to augment needed?
with content signatures so equal mtimes no longer matter.
2. Does using JRuby eliminate filesystem precision problems?
Not necessarily. JRuby reports times via the JVM and may show higher precision, but the underlying filesystem granularity still governs correctness. You must address atomic writes and signature-based decisions regardless of Ruby implementation.
3. Will content hashing slow my builds?
It depends on input size. For many targets, hashing a handful of inputs is negligible compared to compilation. For very large input sets, hash coarse manifests or per-package digests instead of every file to keep overhead small.
4. Are 'touch' steps ever appropriate?
Yes, for marking completion of multi-step workflows where the stamp file is the canonical target. Even then, write the stamp after producing outputs and prefer signatures so the stamp reflects actual content change rather than mere time change.
5. When should we switch from Rake to a content-addressed build system?
If your build graph spans thousands of targets with heavy cross-language tooling, or you require remote execution and global cache, a system designed around content digests may be more cost-effective. Until then, layering signature-aware tasks into Rake can deliver most of the determinism benefits with minimal migration cost.