Rake at Scale: Fixing Nondeterministic Builds from Timestamps, Skew, and Filesystems

Details: Category: Build & Bundling; By Mindful Chase; 14.Aug; Hits: 87

Rake is deceptively simple: declare tasks, wire prerequisites, and let timestamps decide what to rebuild. At enterprise scale, though, teams often face a vexing and rarely documented failure mode: nondeterministic incremental builds caused by filesystem timestamp quirks, clock skew across nodes, and parallelized pipelines. Symptoms range from 'phantom' rebuilds (work that re-runs for no code change) to missed rebuilds (stale artifacts shipped to production). These issues quietly erode developer productivity, break CI cache hit rates, and introduce risk in release automation. This article dives deep into the architectural roots of Rake's timestamp-based dependency model, shows how distributed and containerized environments amplify subtle edge cases, and provides robust, long-term fixes that restore determinism, correctness, and speed.

Mindful Chase

Writing Code, Writing Stories

tbd

Experience

tbd

More to Explore

Background: Why Rake's Incremental Model Breaks at Scale

How Rake Decides What to Rebuild

Rake's core heuristic is simple: a file task is 'needed' when the output's modification time (mtime) is older than any prerequisite. In code, this is surfaced via the task's timestamp and needed? checks. When the environment is a single machine with a coherent filesystem, this is reliable. At scale, however, distributed CI runners, container mounts, and networked filesystems can distort mtimes in subtle ways.

Timestamp precision: Some filesystems round to 1s (or worse). Fast build steps may write inputs and outputs within the same second, making mtimes equal and thereby 'up-to-date' when they are not.
Clock skew: If writer and reader nodes disagree on time, an artifact produced on Node A can appear older than its prerequisites on Node B, triggering infinite rebuilds or, inversely, skipped rebuilds.
Copy semantics: Tools that preserve source mtimes (e.g., cp -p, certain archive extractors) can yield outputs with earlier mtimes than their inputs.
Overlay and remote filesystems: Docker overlay filesystems and NFS/SMB can delay mtime updates or reorder visibility under concurrency, confusing Rake's dependency checks.

Why This Matters to Architects

In large mono-repos and microservice fleets, small errors in incremental rebuild logic snowball into cost and reliability problems. CI pipelines that oscillate between miss and hit on caches waste compute. Release candidates built from stale artifacts breach compliance and reproducibility guarantees. The net effect is longer feedback loops, flaky builds, and hidden operational risk.

Architecture: Where Rake Meets Distributed Reality

Pipeline Topologies That Amplify the Problem

Ephemeral CI runners: Each job starts on a fresh VM/container. Restored caches bring files with historical mtimes; if the runner's clock is ahead/behind, comparisons misfire.
Remote caches and artifact stores: Artifact downloads may preserve original mtimes or apply current time inconsistently, causing false positives or negatives in incremental checks.
Polyglot builds: Rake orchestrates tools in Rust/Go/Node/Java. Some toolchains write temp files then rename (atomic write; fresh mtime). Others preserve input mtimes for reproducibility, defying Rake's expectations.
Container bind mounts: Host↔container mount options can change timestamp update behavior, especially under high concurrency.

The Subtle Role of 'Equal mtimes'

On many filesystems the write granularity is 1s. When prerequisite and output share the same mtime to the second, a naive < comparison treats the output as fresh. Rake historically treats equal timestamps as 'not needed'. This is correct for slow builds, but wrong for modern, sub-second steps.

Diagnostics: Proving You Have a Timestamp/Skew Problem

1) Instrument Rake's View of Time

Enable tracing and print task timestamps and prerequisites at decision time. This reveals whether Rake is skipping work due to equal or inverted mtimes.

# Rakefile snippet to trace timestamps
require 'rake'
module TraceNeeded
  def needed?
    n = super
    if Rake.application.options.trace
      prereqs = prerequisite_tasks.map { |t| [t.name, t.timestamp.utc] }
      $stderr.puts %(NEEDED?=#{n} TASK=#{name} TS=#{timestamp.utc} PREREQS=#{prereqs.inspect})
    end
    n
  end
end
Rake::FileTask.prepend(TraceNeeded)

Run with:

rake --trace build:all

2) Inspect Filesystem Precision and Clock Skew

Compare host and container clocks, then check file mtime resolution and monotonicity.

# On each node or container
date -u
ruby -e 'f = ARGV.first; File.write(f, "x"); puts File.mtime(f).strftime("%F %T.%N")' tmp.touch
stat tmp.touch # Check displayed precision

If mtimes end with .000000000 despite rapid writes, you have coarse granularity. If two runners differ in date -u beyond a few milliseconds, clock skew is likely undermining correctness.

3) Reproduce Equal-mtime Skips

Create a fast rule where the output is written in the same second.

# Rakefile
file 'out.txt' => ['in.txt'] do
  sh 'printf updated > out.txt'
end
task :demo do
  sh 'printf seed > in.txt'
  Rake::Task['out.txt'].invoke
  puts 'first done'
  sh 'printf seed2 > in.txt'
  Rake::Task['out.txt'].reenable
  Rake::Task['out.txt'].invoke
  puts File.read('out.txt') # Might still be \"updated\" if equal mtimes
end

If the second invoke prints the prior content, the equal-mtime case is confirmed.

4) Check Copy/Extract Semantics

Some CI restore steps preserve archival mtimes. Detect this by comparing content digests with mtimes.

shasum -a 256 artifacts/*.tar.gz
for f in artifacts/*; do
  echo "$f - $(stat -c %y "$f")"
done

5) Parallel Hazards

With -j, two tasks must not write the same target. Add guards that fail fast upon concurrent writes.

# Guard: fail if target is already being built
BUILD_LOCK = Mutex.new
BUILDING = {}
module NoDoubleBuild
  def invoke(*a)
    BUILD_LOCK.synchronize do
      raise \"Concurrent build of #{name}\" if BUILDING[name]
      BUILDING[name] = true
    end
    super
  ensure
    BUILD_LOCK.synchronize { BUILDING.delete(name) }
  end
end
Rake::FileTask.prepend(NoDoubleBuild)

Common Pitfalls When Attempting Fixes

Blind 'touch' everywhere: Forcing newer mtimes masks the root cause and can cause rebuild storms whenever a cache is restored.
Relying on sleep: Adding sleep 1 to force mtime differences 'works' until a slower filesystem or clock skew reintroduces flakiness. It also hurts performance.
Preserving mtimes on outputs: Using cp -p or archivers that keep source times can invert dependency order.
Assuming time zones matter: Rake compares epoch seconds; time zones have no bearing. Chasing TZ config is a red herring.
Ignoring Docker mount modes: On some platforms, mount consistency flags change update propagation timing, affecting visibility of fresh mtimes.

Step-by-Step Fixes

1) Enforce Clock Discipline Across Runners

Adopt a robust time sync strategy (e.g., chrony) on every host used for builds. In containerized CI, expose host time sync or run a time sync sidecar. Validate with a preflight job that fails if skew > 50ms.

# CI preflight (bash)
S=$(date +%s%3N)
echo "UTC ms: $S"
# Optionally hit a trusted NTP-aware time service in your infra and compare
# Fail if |host - reference| > 50ms

2) Favor Atomic Writes to Produce Fresh mtimes

Tools should write to a temporary file and then rename. This guarantees a new inode with a fresh mtime even on coarse filesystems.

# Ruby helper for atomic writes
def atomic_write(path, contents)
  tmp = "#{path}.tmp-#{Process.pid}-#{rand(1_000_000)}"
  File.open(tmp, 'wb') { |f| f.write(contents) }
  File.rename(tmp, path)
end

3) Stop Depending on mtimes Alone: Content Signatures

The most durable fix is to extend Rake's needed? check to include content hashes of prerequisites, caching signatures in sidecar files. If any prerequisite's digest changes, rebuild; otherwise, treat the target as fresh even if mtimes wobble.

# Gemfile
# gem 'digest' is stdlib; ensure Ruby ≥ 2.5 for stable APIs

# Rakefile: Content-aware FileTask
require 'digest'
class DigestingFileTask < Rake::FileTask
  SIG_EXT = '.sig'
  def signature_path
    name + SIG_EXT
  end
  def prereq_signatures
    prerequisite_tasks.map do |t|
      if File.exist?(t.name)
        [t.name, Digest::SHA256.file(t.name).hexdigest]
      else
        [t.name, '']
      end
    end
  end
  def write_signature!
    atomic_write(signature_path, prereq_signatures.map { |n,h| "#{n} #{h}\n" }.join)
  end
  def stored_signature
    return {} unless File.exist?(signature_path)
    File.read(signature_path).lines.map { |l| n,h = l.split; [n,h] }.to_h
  end
  def needed?
    return true unless File.exist?(name)
    current = prereq_signatures.to_h
    previous = stored_signature
    return true if previous.empty?
    current != previous # Hash delta decides
  end
end

def digest_file(name, *prereqs, &block)
  t = DigestingFileTask.define_task(name => prereqs, &block)
  t.enhance { t.write_signature! }
  t
end

# Usage
digest_file 'dist/app.bundle' => ['src/a.rb', 'src/b.rb'] do
  sh 'ruby build.rb'
end

This pattern preserves fast incremental builds and resists skew, overlay delay, and equal-mtime problems. It also makes cache keys explicit and auditable.

4) Normalize Archive and Copy Semantics

Ensure that extraction and copying steps produce new mtimes for outputs. Prefer content-addressed artifact paths or explicitly 'touch' outputs only after successful writes, not inputs. Do not preserve source mtimes for build outputs.

# Avoid cp -p when copying into build outputs
cp src/app.min.js dist/app.min.js
# Or write via atomic helper to guarantee fresh mtime

5) Make Parallelism Safe

With rake -j, ban shared output targets and detect accidental overlap early. Prefer per-target temp directories to avoid interleaving of partial files from multiple workers.

# Pattern: unique temp dir per target
rule /^dist\/.+\.o$/ => [proc { |t| t.sub(/\.o$/, '.c') }] do |t|
  tmp = "build/tmp/#{File.basename(t.name)}-#{Process.pid}"
  sh "mkdir -p #{tmp}"
  sh "cc -c #{t.source} -o #{tmp}/out.o"
  File.rename("#{tmp}/out.o", t.name)
end

6) Make 'phony' Truly Phony

Phony tasks do not map to files and thus should not be conflated with file targets. Use them as orchestration only; never reuse a file path as both a file task and a phony task name.

task :build # phony wrapper
file 'dist/app.bundle' => SOURCES do
  sh 'ruby build.rb'
end
task :build => 'dist/app.bundle'

7) Stabilize Inputs in Polyglot Pipelines

When Rake orchestrates other build systems, emit deterministic outputs: atomic writes, content hashing, and a manifest of input digests. Have Rake depend on the manifest, not solely on file mtimes.

# Other tool emits manifest.json with SHA256 for inputs
file 'dist/manifest.json' => INPUTS do
  sh 'node build.js'
end
digest_file 'dist/app.bundle' => ['dist/manifest.json'] do
  # Copy from tool output only if manifest changed
  sh 'cp build/out/app.bundle dist/app.bundle'
end

8) Harden CI: Cold Start and Cache Restore

On cache restore, normalize mtimes to 'now' for outputs only, or store a sidecar signature file to drive rebuild decisions. This avoids accidental 'fresh' outputs whose content no longer matches inputs.

# After restoring cache in CI
find dist -type f -exec touch {} \; # Normalize output mtimes
# Better: rely on .sig files as in DigestingFileTask

9) JRuby vs MRI Considerations

JRuby may surface different filesystem timestamp precision via Java NIO; verify precision in your runtime and align all runners to the same Ruby implementation to minimize cross-run variance. When mixing Ruby versions, pin the version per pipeline stage and record it in artifacts.

10) Observability for Build Correctness

Expose metrics: number of tasks executed, cache hit/miss counts, and reasons for rebuilds. Persist the decision log for a small window (e.g., last 50 builds) to track regression.

# Minimal event bus for decisions
DECISIONS = []
module DecisionLog
  def needed?
    n = super
    DECISIONS << { name: name, needed: n, ts: timestamp.to_i }
    n
  end
end
Rake::FileTask.prepend(DecisionLog)
at_exit do
  File.write('.build_decisions.json', JSON.pretty_generate(DECISIONS))
end

Deep Dive: Designing a Deterministic Rake Build

Goal: Hermetic, Hash-driven Incrementalism

Modern build systems moved from time to content. You can emulate this in Rake without abandoning your investment. The pattern: compute a digest for every input, combine into a target signature, rebuild only when the signature changes, and publish the signature along with the artifact for cacheability and audit.

Implementation Skeleton

# signature.rb
require 'digest'
def file_digest(path)
  return '' unless File.exist?(path)
  Digest::SHA256.file(path).hexdigest
end
def signature_for(paths)
  digests = paths.map { |p| [p, file_digest(p)] }
  Digest::SHA256.hexdigest(digests.map { |p,h| "#{p}=#{h}" }.join('; '))
end

# Rakefile (integrating signature)
require_relative 'signature'
def signed_target(target, inputs)
  sig = "#{target}.sig"
  file target => inputs do
    sh 'ruby build.rb'
    File.write(sig, signature_for(inputs))
  end
  file sig => inputs do
    File.write(sig, signature_for(inputs))
  end
  task :verify => sig do
    current = signature_for(inputs)
    recorded = File.read(sig) rescue ''
    abort 'stale build' unless current == recorded
  end
end

signed_target 'dist/app.bundle', FileList['src/**/*.rb']

This approach is fast for small inputs and robust for large ones if you scope digests to coarse-grained manifests (e.g., a list of packages) rather than every file for every target.

Best Practices Checklist

Pin Ruby runtime per pipeline and verify filesystem timestamp precision at startup.
Use atomic writes for all generated files, never cp -p into the build outputs.
Replace pure mtime logic with content signatures for critical targets.
Fail the build if clock skew exceeds a strict threshold.
Separate 'phony' orchestration from file targets; never reuse names.
Guarantee unique output paths per parallel job to avoid races.
Record and publish decision logs and signatures alongside artifacts.
Normalize restored caches: either rewrite mtimes or rely solely on signature files.
Document and version the build environment (OS, Ruby, filesystem type, container base).
Continuously load-test the build graph with synthetic changes to validate incremental behavior.

Case Study: Flaky CI After Migrating to Containers

Symptoms

A team moved from VM-based CI to containerized runners backed by a networked filesystem. After migration, Rake builds randomly rebuilt large subgraphs or skipped necessary steps, depending on which runner executed which stage.

Findings

Overlay filesystem reported 1s mtime granularity; tasks writing outputs within the same second as inputs were treated as up-to-date.
Cache restores preserved mtimes of outputs from prior runs.
Two runner groups had 200ms clock skew due to misconfigured time sync.

Fixes Applied

Enabled chrony across all nodes; preflight failed builds when skew > 50ms.
Adopted atomic writes for tool outputs; removed all cp -p usages.
Implemented DigestingFileTask for top-level artifacts only (kept standard file tasks for leaf steps).
Normalized artifact mtimes on cache restore, then relied on signature files for correctness.

Outcome

Cache hit rate improved from ~40% to ~92%, build times dropped by 35%, and 'stale artifact' incidents went to zero over the next quarter. The decision logs gave auditors a tamper-evident trail of why each artifact was rebuilt.

Operational Playbooks

When Builds Randomly Re-run (Phantom Rebuilds)

Collect decision logs or enable --trace with timestamp instrumentation.
Compare mtimes of top offending targets and their prerequisites; look for equal or inverted times.
Check cache restore logic for preserved mtimes on outputs.
Verify runner clock sync; repair and re-test.
Introduce signature checks on the hot path targets.

When Rebuilds Are Skipped but Outputs Are Stale

Reproduce with a fast edit to a prerequisite and inspect mtimes to the second.
Ensure outputs are written via atomic rename to guarantee fresh mtimes.
Replace mtime logic with content signatures; audit that signature files update.
Audit third-party tools for 'preserve times' flags or behaviors.

Hardening Parallel Builds

Static analysis: no two tasks produce the same path.
Use per-target temp dirs and atomic rename into final location.
Guard against double invocation with a global 'building' registry during a job.

Security and Compliance Considerations

Deterministic builds are not just about speed; they are foundational to supply-chain integrity. Signature files let you verify that an artifact corresponds to a precise set of inputs. Combined with checksums and SBOMs generated post-build, they provide an evidentiary trail for audits. Avoid 'touch'-based fixes that can accidentally advance mtimes without content change, creating opportunities for confusion or manipulation.

Conclusion

Rake's timestamp-based incremental model is elegant but brittle in distributed, containerized, and high-parallelism environments. The path to reliable, fast builds is threefold: stabilize time (clock sync and atomic writes), eliminate filesystem precision pitfalls (avoid preserving mtimes, ensure fresh outputs), and upgrade the decision rule from time to content (signatures and manifests). With these measures, architects can retain Rake's simplicity while achieving the determinism and correctness demanded by modern enterprise delivery pipelines.

FAQs

1. Can I fix equal-mtime issues by setting a global 'timestamp granularity' in Rake?

No. Rake does not expose a global granularity knob. The reliable fix is to ensure outputs receive fresh mtimes via atomic writes or to augment needed? with content signatures so equal mtimes no longer matter.

2. Does using JRuby eliminate filesystem precision problems?

Not necessarily. JRuby reports times via the JVM and may show higher precision, but the underlying filesystem granularity still governs correctness. You must address atomic writes and signature-based decisions regardless of Ruby implementation.

3. Will content hashing slow my builds?

It depends on input size. For many targets, hashing a handful of inputs is negligible compared to compilation. For very large input sets, hash coarse manifests or per-package digests instead of every file to keep overhead small.

4. Are 'touch' steps ever appropriate?

Yes, for marking completion of multi-step workflows where the stamp file is the canonical target. Even then, write the stamp after producing outputs and prefer signatures so the stamp reflects actual content change rather than mere time change.

5. When should we switch from Rake to a content-addressed build system?

If your build graph spans thousands of targets with heavy cross-language tooling, or you require remote execution and global cache, a system designed around content digests may be more cost-effective. Until then, layering signature-aware tasks into Rake can deliver most of the determinism benefits with minimal migration cost.

Contact Us