Background and Architectural Context
Mercurial fundamentals at scale
Mercurial stores history in revlogs under the .hg/store
directory. Each file and changeset is stored as a revlog with deltas and periodic full snapshots. Modern repository formats use revlog-ng, side-data, and generaldelta to speed operations on deep histories. At enterprise scale, design choices—such as server protocol, network topology, filesystem semantics, and CI fanout—shape performance and failure modes.
Key concepts that drive troubleshooting
- Phases: draft, public, secret. Phase transitions determine what can be rewritten. Misunderstood phases create hidden or immutable changesets that complicate merges.
- Obsolescence and the evolve model: obsolete markers track rewritten history. Inconsistent marker propagation produces confusing pruned or orphan states.
- Named branches vs bookmarks: named branches are permanent lineage labels; bookmarks are movable pointers. Mixing them without policy leads to diverging tips.
- Working copy dirstate: a compact index of tracked paths, mtimes, and states. Corruption or skew here causes spurious status output and failed merges.
- Transaction journals: Mercurial commits and pulls are transactional. Crash-recovery replays journal files; stale journals indicate interrupted writes.
Typical Failure Modes and Why They Matter
1) Divergent heads and unresolved branch tips
Multiple heads on a named branch or bookmark often arise from concurrent pushes, partial rewrites, or CI-generated commits. Divergence increases merge risk, build time, and cognitive load for reviewers.
2) Hidden or unexpected changesets
Phases and obsolescence can hide changesets from common commands. Engineers “lose” a branch or rewrite commits locally, push with partial marker propagation, and then discover missing or pruned revisions on peers.
3) Slow clone, pull, or push in WAN or monorepo contexts
Large histories, binary blobs, narrow networks, and legacy wire protocol settings cause I/O amplification. CI farms suffer when every job reclones and re-resolves identical deltas.
4) Working-copy corruption after crashes or anti-virus interference
Unclean shutdowns or external processes that lock or quarantine files inside .hg
can damage dirstate or journals. Symptoms include inconsistent hg status
, failed updates, and inexplicable conflicts.
5) Lock contention and server-side saturation
Concurrent hooks, pretxnchangegroup checks, or repack operations hold locks while CI flood-pushes. Result: timeouts, aborted transactions, and frustrated developers.
6) Subrepo and largefiles drift
Subrepositories pin specific revisions; careless updates or shallow synchronization create mismatches. The largefiles extension can produce missing standins or content mismatches if caches are not synchronized.
Architecture-Driven Diagnostics
Repository format and requirements
Confirm format and enabled features to understand performance ceilings and tool compatibility.
cat .hg/requires # Look for entries like: fncache, dotencode, generaldelta, treemanifest, sparserevlog
Outdated formats limit delta strategies and might prevent adopting newer optimizations.
Wire protocol, transport, and caching
Measure protocol overhead and server configuration. Ensure SSH multiplexing, HTTP keep-alive, and server-side bundle caching are configured appropriately.
hg paths hg showconfig ui.ssh # For SSH: configure ControlMaster and ControlPersist in ~/.ssh/config
Filesystem and OS nuances
Windows path length, case sensitivity, and antivirus scans are frequent contributors to performance issues. On Linux, ext4/XFS mount options, noatime, and barrier settings can subtly affect throughput for CI runners.
Deep-Dive Troubleshooting Playbook
Identify and explain multiple heads
Start with a branch-level inventory.
hg heads -t hg heads -r "branch(default)" hg log -r "heads()" --template "{node|short} {branches}\n"
If heads exploded recently, diff push windows and CI activity. Head proliferation typically follows parallel pushes to the same named branch or divergent bookmarks.
Locate hidden, obsolete, or pruned changesets
Reveal what phases or obsolescence have concealed.
hg log -r "hidden()" hg log -r "obsolete()" hg log -r "orphan() or unstable() or conflicted()" hg phase -r .
If the current commit is draft locally but public on the server, rewrites will fail. That mismatch points to phase propagation issues or a misconfigured server hook.
Verify transaction health and crash residue
Check for stale journals or incomplete transactions, especially after power loss or interrupted pulls.
ls .hg/store/undo* .hg/store/journal* 2>/dev/null hg recover
hg recover
replays journals and cleans locks. If recover frequently triggers, investigate storage or process termination patterns.
Measure clone/pull bottlenecks
Separate network from server generation time by using bundles and logging.
# Server: generate a stream clone bundle if supported hg bundle --type stream-v2 /tmp/repo.bundle # Client: test local clone duration time hg clone /tmp/repo.bundle repo-test # Compare to network clone time hg clone ssh://server/repo repo-net
If local bundle clones are fast but network clones crawl, focus on latency, protocol settings, or server CPU. If both are slow, inspect store layout and delta chains.
Detect dirstate or working-copy anomalies
When hg status
reports phantom changes or merges fail with inexplicable conflicts, examine dirstate, timestamps, and case collisions.
hg debugstate | head hg status -v hg debugpathcomplete some/path hg purge --all --print
If anti-virus or backup agents touch .hg
, add exclusions and retest.
Audit server hooks and CI
Misbehaving hooks stall pushes or create inconsistent phases. CI robots that rewrite history without publishing obsolescence markers create orphaned histories.
hg showconfig hooks # Look for pretxnchangegroup, prepushkey, changegroup
Root Causes and Corrective Actions
Cause: concurrent pushes creating divergent heads
Developers and CI push to the same named branch, each with different tips. Bookmarks are not exchanged consistently.
Fix:
- Adopt a single-writer policy per named branch. Route all writes through a gatekeeper service that merges and pushes.
- Switch to bookmark-centric workflows for short-lived lines, keeping named branches for long-lived release lines.
- Enable server-side
hook: pretxnchangegroup
to reject new heads on protected branches.
# Example: reject multiple heads on default [hooks] pretxnchangegroup.rejectheads = python:hooks.rejectmultipleheads [hooks.rejectmultipleheads] branches = default, release
Cause: hidden or obsolete changesets confuse merges
Phase drift or missing obsolescence markers cause developers to see different DAGs. People attempt merges with incomplete knowledge.
Fix:
- Standardize phase policies: server publishes to public on integration; developers work in draft phases only.
- Propagate obsolescence markers by pushing with
--pushvars
if required and ensuring evolve is installed across teams. - Expose hidden revisions temporarily to repair history, then restore policy.
hg phase --public -r REV hg debugobsolete OLDREV NEWREV hg log -r "hidden()" hg debugvisible --verbose
Cause: slow clones due to WAN latency and large histories
Long delta chains and repeated negotiation across slow links multiply round-trips. CI amplifies pain with fresh clones per job.
Fix:
- Use stream clones or pre-generated bundles on the server; cache them on CI workers.
- Enable HTTP/2 or persistent SSH multiplexing; increase server
max-connections
. - Adopt narrow and sparse checkouts for large monorepos.
# Server: weekly full bundle for CI cache hg -R /srv/repo bundle --type stream-v2 /srv/cache/repo-full.hg # Client: use the cached bundle first hg clone /srv/cache/repo-full.hg repo hg pull -u ssh://server/repo # Narrow clone (requires extension) hg clone --narrow ssh://server/repo repo-narrow -r default --include path:src/
Cause: working-copy corruption or stale locks after crash
Interrupted operations leave journals and locks behind. Anti-virus quarantines files inside .hg
.
Fix:
- Run
hg recover
to apply journals; delete stale lock files carefully. - Exclude
.hg
from anti-virus and backup software; relocate repo to a stable local filesystem. - Rebuild the working copy if dirstate is inconsistent.
hg recover rm -f .hg/store/lock .hg/wlock 2>/dev/null hg update -C .
Cause: lock contention from long-running hooks or repacks
Server hooks perform heavy checks (lint, scan, monorepo guardians) during changegroup transactions. Meanwhile CI pushes pile up.
Fix:
- Move heavy validation to asynchronous post-receive pipelines that can revert via backout if needed.
- Tune hook timeouts and reduce scope with targeted checks using revsets.
- Schedule repack/maintenance windows; deploy read replicas for CI pulls.
# Example: limit hook scope to new revisions only hg log -r "::(adds()) and not public()"
Cause: subrepo and largefiles drift
Subrepositories pin revisions by value; inconsistent updates cause “works on my machine” builds. Largefiles standins diverge if content store is incomplete.
Fix:
- Gate pushes that modify
.hgsub
and.hgsubstate
without coordinated updates across repos. - Mirror largefiles stores and verify presence during CI bootstrap.
- Prefer content-addressable artifact stores for big binaries; prune largefiles usage where feasible.
grep -E "^subrepo" .hgrc hg verify --large hg lfstatus
Step-by-Step Repair Procedures
Repair multiple heads on a named branch
Pick a canonical tip, merge the rest, and publish once.
# List heads hg heads -r "branch(default)" # Choose a base and merge sequentially hg update HEAD1 hg merge HEAD2 hg commit -m "Merge head2 into default" # Repeat as needed, then push hg push
To prevent recurrence, enforce reject new head hooks and route writes through a single integration bot.
Unhide and reconcile obsolete revisions
When changes appear missing, check phases and obsolescence; convert necessary commits to public carefully.
hg log -r "hidden()" hg phase --public -r REV hg pull --hidden hg evolve --all # if evolve is enabled
If markers were never propagated, reconstruct with debugobsolete
and evolve to a stable DAG.
Recover from working-copy or dirstate damage
Always try recovery before destructive operations.
hg recover hg update -C . # If still broken, rebuild from a clean clone and reapply local changes hg diff > /tmp/patch.diff hg clone ssh://server/repo clean cd clean hg import --no-commit /tmp/patch.diff hg commit -m "Reapply local work"
Stabilize CI with bundle caches
Cut clone times by shipping prebuilt bundles to runners and layering incremental pulls on top.
# Nightly on server hg -R /srv/repo bundle --all --type stream-v2 /srv/cache/repo-$(date +%F).hg ln -sf /srv/cache/repo-$(date +%F).hg /srv/cache/repo-latest.hg # CI job prolog hg clone /srv/cache/repo-latest.hg . hg pull -u ssh://server/repo
Resolve subrepo mismatches
Audit subrepo state and update atomically across repos.
hg subrepos cat .hgsub .hgsubstate hg pull -u hg update -C
Lock policy: changes to .hgsub
require review by repo owners and synchronized releases.
Performance Engineering and Tuning
Store layout and maintenance
Modernize to formats that reduce delta depth and accelerate lookups.
# Confirm generaldelta and treemanifest if applicable cat .hg/requires # Run verify periodically on mirrors hg verify
Schedule maintenance to repack or compact stores during off-peak hours; snapshot after verify to create known-good restore points.
Networking and protocol considerations
Use persistent connections, SSH control masters, and HTTP keep-alive. Align server CPU and I/O capacity with concurrent CI traffic.
# ~/.ssh/config Host hg-server HostName hg.company.internal User hg ControlMaster auto ControlPath ~/.ssh/cm-%r@%h:%p ControlPersist 600
Client-side caches and sparse workflows
Adopt narrow clones for teams that only need subtrees, and leverage share
to reuse local stores between working copies.
# Share a store between sandboxes hg share /repos/monorepo /work/monorepo-dev hg share /repos/monorepo /work/monorepo-experiment # Sparse/narrow example hg clone --narrow ssh://hg-server/monorepo app-only -r default --include path:services/app
Revsets for targeted operations
Use revsets to limit command scope and avoid scanning the full DAG.
hg log -r "branch(default) and date(-7 to now)" hg grep -r "funcName" -r "::tip and not obsolete()" hg status -r "ancestor(.)::."
Windows-specific tuning
Enable fncache
and dotencode
, avoid long paths, and disable real-time scanning of .hg
. Prefer NTFS with short paths and exclude repo directories from Defender.
Governance, Workflows, and Policy
Named branches for releases, bookmarks for feature work
Named branches create long-lived lines with clear ancestry, ideal for releases and sustained maintenance. Bookmarks serve fast-moving feature work and are easy to delete or move. Codify rules to prevent push-time surprises.
Phases policy
Public history is immutable; draft history may be rewritten. Make the server the authority for public transitions, and ensure CI never publishes by accident.
[phases] publish = False # on developer machines # on server: publish = True
Hook strategy
Hooks should be fast, deterministic, and idempotent. Reject new heads on protected branches, validate metadata, and defer heavy scanning to asynchronous pipelines.
[hooks] pretxnchangegroup.reject_heads = python:hooks.rejectmultipleheads pretxnchangegroup.branch_policy = python:hooks.enforcebranchpolicy
Backouts over force-pushes
When mistakes land in public history, backout creates an explicit corrective commit that preserves auditability. Force-push equivalents undermine reproducibility and confuse replicas.
Observability and Forensics
Server logs and changegroup tracing
Enable structured logs on servers and correlate with CI job IDs. Track changegroup size, head count deltas, and hook timings.
Client tracing
Use HGPLAIN=1
and --debug
for reproducible output; capture timings with time
and --traceback
during failures.
HGPLAIN=1 hg pull --debug HGPLAIN=1 hg push --debug
Integrity checks
Run hg verify
regularly on mirrors and before backups. Verify largefiles stores for completeness; reconcile missing blobs promptly.
hg verify hg verify --large
Disaster Recovery and Safe Restore
Bundle-first backups
Bundles capture a portable snapshot of changesets and are independent of filesystem peculiarities. Store recurring full and incremental bundles offsite.
# Full bundle hg -R /srv/repo bundle --all /backups/repo-full-$(date +%F).hg # Incremental since last tag hg -R /srv/repo bundle -r "last(tag())::" /backups/repo-incr-$(date +%F).hg
Cold restore procedure
Stand up a fresh server, restore from the latest verified bundle, then reopen for incremental pulls.
hg init /srv/repo hg unbundle /backups/repo-full-YYYY-MM-DD.hg hg unbundle /backups/repo-incr-YYYY-MM-DD.hg hg verify
Post-incident reconciliation
After restore, compare tips, heads, and phases with surviving clones. Recreate missing bookmarks and reapply protected branch policies before accepting new pushes.
Advanced Pitfalls and Mitigations
Case-collision files on case-insensitive filesystems
Two paths differing only by case cause silent chaos on Windows or macOS default mounts. Add pre-receive checks that reject such changes.
hg log -r "file("~path:.*[A-Z].*") and branch(default)"
Timestamp skew and dirstate confusion
Mercurial relies on mtimes for fast status checks. Skewed clocks force expensive scans or stale metadata. Normalize NTP and enforce stable clocks on CI workers.
Partial obsolescence marker propagation
Without consistent marker exchange, evolve shows pruned or orphan lines only on some clients. Make evolve a managed, versioned dependency and require minimum versions.
Binary bloat in history
Big binaries inflate clone and pull sizes. Move to artifact repositories; prune or quarantine legacy blobs via narrow history where policy allows.
Best Practices Checklist
Repository and server
- Adopt modern store formats and periodically run
hg verify
. - Generate and cache stream bundles for CI and remote sites.
- Harden server hooks to prohibit new heads on protected branches.
- Collect metrics on clone time, changegroup size, and head counts.
Workflow and policy
- Public history is append-only; backout rather than rewrite.
- Use bookmarks for short-lived work; named branches for releases.
- Standardize phases; developers set
publish=False
locally. - Gate subrepo changes and largefiles usage.
Operations and reliability
- Exempt
.hg
from antivirus and backup agents. - Schedule maintenance windows for heavy server tasks.
- Automate disaster recovery with bundle-based backups and restore drills.
- Document recovery procedures and keep runbooks current.
Conclusion
Mercurial’s design delivers integrity and performance, but large-scale deployments magnify subtle behaviors around phases, obsolescence, locks, and storage. The most effective fixes blend process and technology: enforce smart branch policies, stabilize phases, cache and stream history to hungry CI fleets, and make integrity checks routine. Treat the repository as critical infrastructure with observability, runbooks, and disaster recovery rehearsals. With these practices, you transform recurring fire drills into predictable, low-risk operations while preserving a clean, comprehensible history for your teams.
FAQs
1. How do phases actually prevent accidental history rewrites?
Public changesets are immutable by policy; commands like rebase and histedit refuse to rewrite them. Enforcing server-side phase transitions ensures that once integrated, history becomes stable, reducing accidental divergence.
2. When should we choose named branches over bookmarks in Mercurial?
Use named branches for long-lived release lines that require maintenance over time. Bookmarks fit short-lived or ephemeral feature work where rebasing and rapid iteration are common without polluting the permanent branch namespace.
3. What’s the safest way to speed up clones for CI without sacrificing integrity?
Publish verified stream bundles on the server and clone from those artifacts, then pull increments from the canonical remote. This approach minimizes network round-trips while preserving cryptographic integrity and auditability.
4. How do we recover if the working copy becomes inconsistent after a crash?
Start with hg recover
, then force a clean update with hg update -C
. If inconsistencies persist, export local diffs, reclone from a trusted remote, and reimport to ensure a pristine dirstate.
5. Can we safely use evolve and obsolescence in a heterogeneous toolchain?
Yes, but only with disciplined version management and marker propagation policies. Require minimum client versions, test marker exchange in staging, and verify that CI and bots participate fully in the obsolescence protocol.