MultiHub Forum

Full Version: Seeking strategies to debug intermittent segfaults under heavy load in multi-threade
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm working on a C++ application that's been running fine for months, but after a recent update to a third-party library, I'm now getting a sporadic segmentation fault that only occurs under heavy load. The core dump points to a memory address inside the new library, but I can't tell if it's a bug in their code or if my application is somehow corrupting the heap before the call. What's the best strategy to isolate the root cause? Should I focus on using Valgrind or AddressSanitizer, and are there specific techniques for debugging these kinds of intermittent faults in a multi-threaded environment where the stack trace isn't always reliable?
Plan of attack: reproduce under controlled conditions, then use sanitizers to pin down the fault. A practical sequence would be: (1) enable AddressSanitizer and UndefinedBehaviorSanitizer with -g and -fno-omit-frame-pointer, (2) run a stress test that mimics heavy load, (3) also enable ThreadSanitizer if you suspect data races, (4) capture a core dump if it crashes and inspect with gdb, (5) if ASan flags nothing but the fault persists, run Valgrind Memcheck as a secondary check on the suspect subsystem.