Background and Context
What "async-signal-unsafe" actually means
When a signal arrives, execution can jump to the signal handler at an arbitrary point in any thread, potentially while the thread holds locks or is inside a non-reentrant library routine. POSIX defines a small set of functions that are async-signal-safe; invoking any other function from a signal handler results in undefined behavior. Commonly unsafe calls include malloc, free, printf, pthread_mutex_lock, dlopen/dlsym, and even many time functions. Violations may appear to "work" under light load but will eventually deadlock or corrupt state in long-running services.
Why this matters in enterprise environments
Large services frequently use signals for lifecycle (SIGTERM), watchdogs (SIGALRM), hot reload (SIGHUP), crash handling (SIGSEGV/SIGBUS), CPU sampling (SIGPROF), and external control (SIGUSR1/2). Teams also layer complex logging, metrics, and tracing around those handlers. If handler code allocates memory, writes to buffered streams, or touches non-reentrant subsystems, it can deadlock the whole process or corrupt global state. The result is sporadic outages, inconsistent core dumps, and difficult-to-reproduce "Heisenbugs".
Typical symptoms
- Process hangs that coincide with log rotation or shutdown signals
- Core dumps missing expected stack frames or showing corrupted malloc arenas
- Deadlocks involving libpthread or libc internals shortly after SIGTERM/SIGHUP
- High CPU from a spin inside a handler attempting to acquire a locked mutex
- Occasional "double free or corruption" after a crash handler "prints diagnostics"
Architectural Implications
Signals, threads, and allocators
In multi-threaded programs, signals can be delivered to any thread that does not block them. If a signal fires while the target thread is within malloc or free, and the handler calls malloc/free again (directly or indirectly through logging), allocator internal locks may deadlock. Different allocators (glibc ptmalloc, jemalloc, tcmalloc) have distinct locking strategies, but all are unsafe to re-enter from handlers.
Buffered I/O and stdio locks
Calling printf, fprintf, or puts in a handler attempts to take stdio locks and may flush or allocate buffers. If the interrupted thread already holds a stdio lock, the handler will block forever. Even seemingly harmless formatting (e.g., snprintf) can allocate or invoke locale/memory routines.
Dynamic linking and loader state
Attempting dlopen, dlsym, or even obtaining backtraces that touch unwinder state (libunwind, backtrace()) is typically unsafe in a signal context. The dynamic loader might be in a transient state when interrupted.
Process lifecycle and supervision
Enterprise platforms rely on systemd, Kubernetes, or custom supervisors. Signals are used extensively: graceful termination (SIGTERM), forced kill (SIGKILL), hot reloads (SIGHUP), and watchdog timers. An unsafe handler breaks graceful semantics and may cause timeouts, cascading restarts, or data loss during shutdown.
Foundations: The minimal-safe signal pattern
Design tenets
- Handlers must be minimal, returning quickly.
- Only call POSIX async-signal-safe functions (e.g., write, _exit, signal in some systems, sigqueue, kill).
- Never call malloc, stdio, pthread_mutex_lock, or perform complex parsing.
- Communicate with a dedicated thread via a pipe/eventfd/signalfd rather than doing work in the handler.
- Use sigaltstack for fatal signals to survive stack overflows.
Safe communication via a pipe
A classic, portable pattern is a self-pipe: the handler writes a byte to a non-blocking pipe; the main event loop or a dedicated thread select/poll/epolls on the read end and performs the heavy work in normal thread context.
#include <signal.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <stdatomic.h> static int sigpipe_fds[2] = {-1, -1}; static _Atomic int got_sigterm = 0; static void set_nonblock(int fd) { int flags = fcntl(fd, F_GETFL, 0); (void)fcntl(fd, F_SETFL, flags | O_NONBLOCK); } static void term_handler(int signo) { (void)signo; got_sigterm = 1; char c = 1; /* async-signal-safe: write */ (void)write(sigpipe_fds[1], &c, 1); } int install_handlers(void) { if (pipe(sigpipe_fds) == -1) return -1; set_nonblock(sigpipe_fds[0]); set_nonblock(sigpipe_fds[1]); struct sigaction sa = {0}; sa.sa_handler = term_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_RESTART; if (sigaction(SIGTERM, &sa, NULL) == -1) return -1; return 0; } /* elsewhere: poll/epoll/select on sigpipe_fds[0] and act on got_sigterm */
Diagnostic Process
1) Reproduce under controlled delivery
Use kill -TERM and kill -HUP while the service is under load. Combine with strace -f -ttT -e trace=signal,write,read to observe handler activity and verify that only async-signal-safe syscalls appear. If you see calls like futex related to libc locks, it is a red flag for unsafe code.
2) Inspect core dumps for allocator lock contention
Enable core dumps and use gdb to inspect backtraces of all threads. Look for frames inside malloc/free/_int_malloc (glibc) or equivalent in jemalloc/tcmalloc. If a handler interrupted the allocator and then attempted to allocate, you will often find one thread inside a signal trampoline with the next frame in write (safe) or worse, in fprintf/vsnprintf (unsafe). Multiple threads parked in futex wait states within libc can indicate deadlocks.
3) Audit handlers with static and dynamic analysis
- Search code for sigaction, signal, and handler functions. Verify that handlers only call async-signal-safe functions.
- Use compiler instrumentation: -fsanitize=address, -fsanitize=thread to catch races outside the handler path; while sanitizers are not signal-safe themselves, they can expose suspicious usage when a handler triggers code that should not run.
- Review link maps and nm output if your handler references nontrivial symbols (e.g., logging library entry points).
4) Trace delivery and masking
Confirm which thread receives signals. In multithreaded systems, set a mask with pthread_sigmask to block signals in worker threads and create a dedicated "signal thread" using sigwaitinfo or signalfd on Linux. This confines complex behavior to a safe context and removes randomness from delivery.
5) Validate alt signal stacks for fatal signals
Memory corruption and stack overflow can prevent handlers from running if they share the default stack. Use sigaltstack to register an alternate stack for SIGSEGV/SIGBUS/SIGFPE handlers that need to do minimal crash reporting. Make sure any stack you use is mapped and writeable early at startup.
Common Pitfalls
Logging inside handlers
Using printf, fprintf, syslog (classic, not syslog(3) on some platforms), or C++ stream wrappers from a handler is unsafe. Even simple string formatting can acquire locks or allocate memory. Prefer write(STDERR_FILENO, ...) with preformatted, fixed-size buffers, or better, signal a logging thread through a pipe.
Calling malloc/free or functions that allocate
Any function that might allocate (directly or indirectly) is unsafe. That includes backtrace libraries, high-level time formatting, JSON builders, or metrics emission. If you cannot prove it is on the POSIX async-signal-safe list, assume it is unsafe.
Mutexes and critical sections
Handlers that attempt to acquire a pthread_mutex_t or pthread_rwlock_t can deadlock if the interrupted context was holding the same lock. Spinlocks are not safe either; you may spin forever if the interrupted context was inside the critical section.
Relying on SA_RESTART to "fix" I/O patterns
SA_RESTART causes some syscalls to restart automatically after handler return, but it does not make handler code safe. In fact, SA_RESTART can hide EINTR handling bugs and complicate debugging. Design your I/O to handle EINTR gracefully regardless.
Crash handlers that do too much
It is tempting to collect a backtrace, flush logs, and upload diagnostics in a SIGSEGV handler. Most of that is unsafe. A robust pattern is to write a minimal marker to a pipe and immediately _exit, letting a supervising process analyze the core dump asynchronously.
Step-by-Step Fixes
1) Centralize signal ownership with a signal thread
Block all relevant signals in worker threads using pthread_sigmask, then dedicate one thread to synchronously wait for signals via sigwaitinfo. This moves handling logic to a normal thread context where you can use locks, allocators, and logging safely.
#define _GNU_SOURCE #include <signal.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> static void *signal_thread_fn(void *arg) { (void)arg; sigset_t set; sigemptyset(&set); sigaddset(&set, SIGTERM); sigaddset(&set, SIGHUP); sigaddset(&set, SIGUSR1); for (;;) { siginfo_t si; int sig = sigwaitinfo(&set, &si); if (sig == -1) continue; /* log in normal context */ if (sig == SIGTERM) { /* perform graceful shutdown: safe to use stdio here */ printf("Received SIGTERM, shutting down...\n"); break; } else if (sig == SIGHUP) { printf("Reload requested via SIGHUP\n"); } else if (sig == SIGUSR1) { printf("Diagnostics requested via SIGUSR1\n"); } } return NULL; } void install_signal_thread(void) { sigset_t set; sigemptyset(&set); sigaddset(&set, SIGTERM); sigaddset(&set, SIGHUP); sigaddset(&set, SIGUSR1); /* block in all threads before creating workers */ pthread_sigmask(SIG_BLOCK, &set, NULL); pthread_t tid; pthread_create(&tid, NULL, signal_thread_fn, NULL); }
2) Use signalfd on Linux for integration with event loops
signalfd converts signals into readable bytes on a file descriptor that can be polled like sockets. This simplifies integration with epoll/select-based reactors and avoids installing traditional handlers for non-fatal signals.
#include <sys/signalfd.h> #include <sys/epoll.h> #include <signal.h> #include <unistd.h> int setup_signalfd(void) { sigset_t mask; sigemptyset(&mask); sigaddset(&mask, SIGTERM); sigaddset(&mask, SIGHUP); pthread_sigmask(SIG_BLOCK, &mask, NULL); int sfd = signalfd(-1, &mask, SFD_NONBLOCK | SFD_CLOEXEC); return sfd; }
3) Implement minimal fatal-signal handling with sigaltstack
For SIGSEGV/SIGBUS/SIGFPE/SIGILL, do the bare minimum: write a short, fixed message and call _exit. If you must dump a backtrace, prefer invoking a separate helper process via fork (still risky) or rely on core dumps and external tooling (e.g., systemd-coredump). Install an alternate stack so the handler runs even when the main stack is invalid.
#include <signal.h> #include <unistd.h> #include <string.h> static char altstack_mem[64 * 1024]; static void fatal_handler(int signo, siginfo_t *si, void *ctx) { (void)ctx; char buf[128]; /* Compose a small static message: avoid stdio, malloc */ int n = 0; const char *p = "Fatal signal\n"; /* best effort; ignore errors */ (void)write(STDERR_FILENO, p, (int)strlen(p)); _exit(128 + signo); } void install_fatal_handlers(void) { stack_t ss = {0}; ss.ss_sp = altstack_mem; ss.ss_size = sizeof(altstack_mem); ss.ss_flags = 0; sigaltstack(&ss, NULL); struct sigaction sa = {0}; sa.sa_sigaction = fatal_handler; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_SIGINFO | SA_ONSTACK; sigaction(SIGSEGV, &sa, NULL); sigaction(SIGBUS, &sa, NULL); sigaction(SIGFPE, &sa, NULL); sigaction(SIGILL, &sa, NULL); }
4) Replace unsafe logging paths
Remove all stdio-based logging from handlers. If you must log synchronously, pre-allocate a ring buffer at startup and only use write with fixed offsets in the handler. Better yet, send a single byte to a pipe and let the logging thread emit a structured message.
/* Handler-side: */ static int log_pipe[2]; static void hup_handler(int signo){ (void)signo; char b=1; (void)write(log_pipe[1], &b, 1); } /* Logger thread: */ void *logger_thread(void *){ for(;;){ char b; if (read(log_pipe[0], &b, 1) <= 0) continue; /* now safe to log */ } }
5) Harden EINTR handling and restart logic
Make all I/O paths resilient to EINTR. Wrap syscalls with retry loops unless semantics demand interruption (e.g., timeouts). If you depend on SA_RESTART, document it explicitly and test without it to ensure robustness.
ssize_t r_read(int fd, void *buf, size_t n){ for(;;){ ssize_t r = read(fd, buf, n); if (r >= 0) return r; if (errno == EINTR) continue; return -1; } } ssize_t r_write(int fd, const void *buf, size_t n){ const char *p = buf; size_t left = n; while (left){ ssize_t r = write(fd, p, left); if (r > 0){ p += r; left -= r; continue; } if (r == -1 && errno == EINTR) continue; return -1; } return (ssize_t)n; }
6) Test with fault injection and stress tools
Integrate tests that bombard the process with signals while running high load. Combine stress-ng or internal load generators with kill -USR1/-HUP/-TERM storms. Verify that logs contain a stable sequence of "signal received" events and that no deadlocks occur.
Deep Dive: Interactions that amplify risk
Allocators and reentrancy
glibc's ptmalloc and alternative allocators use internal mutexes and per-thread caches. If a handler interrupts a critical section and then allocates, it can deadlock on the same lock or corrupt internal lists. Even if your handler appears to avoid allocation, hidden allocations can occur within locale handling, DNS resolution, and logging frameworks.
Backtrace collection pitfalls
backtrace() from execinfo.h may allocate and is not guaranteed async-signal-safe. On some systems it works "well enough"; on others, it deadlocks. Prefer external core dumps (/proc/sys/kernel/core_pattern, systemd-coredump) and offline symbolization with addr2line or eu-addr2line. If you insist on in-process stack capture, consider a child process via fork in the handler followed by backtrace() in the child; even then, only async-signal-safe functions should be used before exec/_exit in the child.
Realtime signals and queuing
Realtime signals (SIGRTMIN..SIGRTMAX) queue with payloads and can overwhelm your process if not serviced quickly. If handlers perform expensive work, the queue backs up and latency spikes. Use sigqueue judiciously and prefer eventfd or pipes for high-rate signaling.
setuid binaries and security constraints
In privileged programs, signals compound with restrictions on unsafe functions (e.g., async-signal-unsafe behavior may be exploitable). Keep handlers minimal and avoid exposing reentrancy windows that could be leveraged by malicious inputs.
Systematic Audit Checklist
Source review
- Enumerate all sigaction/signal sites and documented signals.
- For each handler, list every function call. Cross-check against the POSIX async-signal-safe list.
- Replace non-compliant calls with pipe/signalfd notifications.
Runtime verification
- Run under strace with signal filters; ensure only expected syscalls occur inside handler windows.
- Use perf or bcc/eBPF tooling to sample when handlers fire and observe contention.
- Capture core dumps for any hang or crash and inspect for libc/allocator lock owner/waiter relationships.
Operations readiness
- Document signal behavior for SREs (which signals do what, how long shutdown should take).
- Ensure systemd or orchestrator timeouts reflect real shutdown durations.
- Provide debugging switches to disable noncritical signals in emergencies.
Patterns to Adopt
Self-pipe for delivery fan-in
One pipe can represent many signals by writing distinct bytes or small structs. Parsing occurs in the main loop. This harmonizes with reactor frameworks (libevent, libuv) and avoids handler complexity.
Dedicated signal thread
Blocking signals everywhere and using one thread with sigwaitinfo is often the cleanest design for portable services. It keeps the rest of the code unaware of asynchronous delivery, improving reasoning and testability.
signalfd for Linux-centric stacks
If your platform target is Linux-only, signalfd integrates best with epoll. It also renders many handler-related pitfalls irrelevant because the "handler" path becomes straightforward file I/O in a normal thread.
Minimal fatal crash path
For SIGSEGV/"fatal" conditions: write a short message, snapshot a tiny amount of state if you must (e.g., a numeric code), then _exit. Rely on core dumps for full diagnostics.
Anti-Patterns to Eliminate
Logging libraries from handlers
Even libraries that claim to be "async" often allocate or lock. Unless explicitly designed for signal safety (rare), remove them from handlers.
Complex reload logic in SIGHUP handlers
Re-parsing configuration in the handler tends to allocate, use stdio, and take locks. Move reload logic to a signal thread or main loop triggered by a pipe.
Calling pthread APIs from handlers
pthread_create, pthread_mutex_lock, and friends are not async-signal-safe. Avoid entirely.
Performance Considerations
Latency and spikes
Handlers should add near-zero latency. Any nontrivial work risks tail-latency spikes if it hits contention or preemption at unfortunate times. Moving work out of the handler flattens latency distributions.
Throughput and signal storms
In high-throughput systems, signals can arrive faster than they are consumed. Pipes/signalfd and dedicated threads scale better than per-signal heavy handlers. For periodic jobs (e.g., watchdogs), consider timerfd rather than SIGALRM.
NUMA and scheduler interactions
Handlers can interrupt hot paths on any CPU, perturbing cache locality. Centralized handling reduces cross-core noise and can improve cache behavior in hot loops.
Observability: getting the right signals about signals
Lightweight counters
Use sig_atomic_t or C11 atomics to track counts of received signals inside handlers. Export them periodically from a normal thread. Avoid updating complex metrics frameworks directly in handlers.
#include <stdatomic.h> static _Atomic unsigned long sigterm_count = 0; static void term_handler(int s){ (void)s; atomic_fetch_add_explicit(&sigterm_count, 1, memory_order_relaxed); } /* elsewhere: read sigterm_count and publish via normal metrics path */
Tracing
Consider eBPF uprobes or perf-events to trace sigaction and signal delivery in staging. Keep in mind that tracing frameworks themselves may interact with signal delivery; validate overhead.
Case Study: Hang on shutdown due to unsafe logging
Symptom
A service occasionally hung during graceful shutdown. "kill -TERM" would log "Shutting down", then freeze. Core showed one thread inside fprintf and another holding a stdio lock within a hot path.
Root cause
The SIGTERM handler called fprintf to print a message. When delivered to a thread that had interrupted fprintf on the same FILE*, the handler deadlocked on the stdio lock.
Fix
Replaced handler with self-pipe notification and moved logging to the main loop. Added a signal thread to coordinate shutdown and ensure idempotent cleanup. No further hangs under stress.
Step-by-Step Migration Plan
1) Inventory and classify signals
List all signals your process receives and categorize them as "control" (TERM/HUP/USR1) vs "fatal" (SEGV/BUS/FPE/ILL) vs "periodic" (ALRM/PROF). Define appropriate handling strategies for each.
2) Introduce a signal thread and block signals elsewhere
Adopt pthread_sigmask globally during startup and create a single handler thread. Integrate with your event loop or orchestrator semantics.
3) Replace handler code with pipe/eventfd signaling
Where handlers existed, keep only a minimal write to a pipe. Migrate any real logic to the main loop or signal thread.
4) Add altstack for fatal signals, minimize work
Install sigaltstack and ensure the fatal handler does not allocate or lock. Use core dumps for full diagnostics.
5) Harden EINTR and shutdown paths
Audit all blocking syscalls for EINTR safety. Write tests that deliver signals during I/O to verify robustness.
6) Document operational semantics
Publish runbooks explaining how signals are handled, expected shutdown timing, and safe ways to trigger reloads or diagnostics.
Best Practices
Design principles
- Handlers do not allocate, lock, or perform I/O beyond write.
- All complex work happens in normal thread context.
- Prefer signalfd (Linux) or sigwaitinfo for control signals.
- Use sigaltstack for fatal signals and exit quickly.
- Keep configuration reloads and shutdown cleanups out of handlers.
Tooling and verification
- strace to verify safe syscalls in handlers
- gdb on core dumps to detect allocator/stdio lock deadlocks
- Stress tests with signal storms
- eBPF/perf to observe delivery and latency impact
Team practices
- Code reviews require checking handler safety against the POSIX list
- Runbooks document signals and timeouts
- CI pipeline runs signal-interrupt tests alongside normal suites
Conclusion
Async-signal-safety is one of the least discussed yet most pernicious sources of instability in large C services. Production outages often stem from well-intentioned but unsafe logging, allocation, or synchronization inside signal handlers. The durable fix is architectural: ensure handlers are minimal and delegate substantial work to normal thread contexts via pipes, signalfd, or a dedicated signal thread. Combine that with alt stacks for fatal signals, robust EINTR handling, and disciplined reviews. With these patterns, enterprises can eliminate signal-induced deadlocks and crashes, delivering systems that shut down cleanly, reload configuration safely, and produce reliable diagnostics when it matters most.
FAQs
1. Can I safely log from a signal handler using write()?
Yes, write is async-signal-safe when used on a valid, open file descriptor. Keep messages short, preformatted, and avoid dynamic formatting or buffer growth. Prefer signaling a logging thread to emit structured logs.
2. Are backtrace() and dladdr() safe to call in a handler?
No. They may allocate, take locks, or traverse loader state. Rely on core dumps for full stack traces, or offload diagnostics to a dedicated thread triggered by a pipe or signalfd event.
3. How do I test for handler safety systematically?
Combine code review against the POSIX async-signal-safe list with strace under signal stress. Ensure only safe syscalls appear during handler windows and that no handlers call library functions outside the list.
4. Is SA_RESTART a substitute for proper EINTR handling?
No. SA_RESTART helps some syscalls but does not make unsafe handler code safe. Design your I/O to tolerate EINTR explicitly and keep handlers minimal.
5. When should I choose signalfd over sigwaitinfo?
On Linux, use signalfd when integrating with epoll-based event loops so signals behave like normal file descriptors. Use sigwaitinfo when you prefer a portable, thread-based waiting model without extra file descriptors.