Mac OS X Sierra and Ruby debugging: An unhappy marriage
In my new role I've found myself having to debug Ruby native gems again. Debugging C code is normally a pretty mundane task involving lots of poking around with
gdb, and one I've done many times before in a previous life. However, my employer's chosen development machines are MacBooks and I recently updated mine to OS X 10.12.2 (Sierra). With this new Mac OS version things started to unravel pretty quickly.
Developing on Mac OS X for deployment on Linux is usually quite transparent thanks to
Homebrew. Typically, the steps to debug a native Ruby gem look like this:
crubyfrom source in debug mode (not strictly necessary but I find it helpful when learning about Ruby's APIs since you get info about args, locals, etc):
brew install openssl libyaml libffi CXXFLAGS="-O0 -g3" CFLAGS="-O0 -g3" ./configure --disable-install-doc --with-openssl-dir="$(brew --prefix openssl)" --prefix=/Users/Stefan/bin/ruby make -j install
- Build your gem in debug mode too (this can be done by editing
debugflagsappropriately, see here for more helpful tips):
CONFIG['optflags'] = '-O0 -g3'
Followed by building:
rake clean && rake build
- Start your test under the debugger:
gdb --args /Users/Stefan/bin/ruby/bin/ruby -I"lib" -I"/Users/Stefan/bin/gems/gems/rake-10.5.0/lib" "/Users/Stefan/bin/gems/gems/rake-10.5.0/lib/rake/rake_test_loader.rb" "test/test_my_gem.rb"
Unfortunately it turns out the OS X Sierra has multiple known issue with
Homebrew currently cherry-picks an upstream patch submitted on November 9th, 2016 to even have
gdb start up. I went through the dance, including codesigning (
codesign -fs gdb_codesign `which gdb`). I was then met with:
warning: unhandled dyld version (15)
info shared command did not work and my gem was not being picked up. Some investigation showed that Apple has updated dyld to version 15, which is not yet supported by
gdb. I bumped the
DYLD_VERSION_MAX as had been done for version 14 and built my own gdb from the latest source:
CXXFLAGS="-Wno-c++11-narrowing -O0 -g3" CFLAGS="-Wno-c++11-narrowing -O0 -g3" ./configure --enable-debug --disable-build-warnings --prefix=/Users/Stefan/Desktop/gdb-patched-bin
This time I got a bit further:
Cannot insert breakpoint -1 Cannot access memory at address 0xe85d
Investigation showed that
0xe85d is the offset of the
gdb_image_notifier function in
gdb transparently set a breakpoint on this function in order to intercept dynamic libraries being loaded. After some poking around for a
readelf equivalent for Mac OS' Mach-O format I came upon
otool and borrowed another laptop running Mac OS X 10.11.6 (El Capitan) for comparison purposes:
Mac OS X 10.11.6 (El Capitan):
otool -l /usr/lib/dyld Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xfeedfacf 16777223 3 0x00 7 14 1696 0x00000085 Load command 0 cmd LC_SEGMENT_64 cmdsize 552 segname __TEXT vmaddr 0x00007fff5fc00000 vmsize 0x0000000000038000 fileoff 0 filesize 229376 maxprot 0x00000007 initprot 0x00000005 nsects 6 flags 0x0
Mac OS X 10.12.2 (Sierra):
otool -l /usr/lib/dyld /usr/lib/dyld: Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xfeedfacf 16777223 3 0x00 7 14 1696 0x00000085 Load command 0 cmd LC_SEGMENT_64 cmdsize 552 segname __TEXT vmaddr 0x0000000000000000 vmsize 0x000000000003e000 fileoff 0 filesize 253952 maxprot 0x00000007 initprot 0x00000005 nsects 6 flags 0x0
It became apparent that
vmaddr has changed and that
gdb is now attempting to set breakpoints at a raw offset instead of the final relocated value. This theory is corroborated by the output of the Mac
Process: gdb  Path: /Users/Stefan/Desktop/gdb-patched-bin/bin/gdb Load Address: 0x100000000 ... ==== Non-writable regions for process 91820 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL __TEXT 0000000100000000-000000010064e000 [ 6456K] r-x/rwx SM=COW /Users/Stefan/Desktop/gdb-patched-bin/bin/gdb __LINKEDIT 000000010077a000-0000000100a91000 [ 3164K] r--/rwx SM=COW /Users/Stefan/Desktop/gdb-patched-bin/bin/gdb __TEXT 0000000100a91000-0000000100acf000 [ 248K] r-x/rwx SM=COW /usr/lib/dyld __LINKEDIT 0000000100b06000-0000000100b1c000 [ 88K] r--/rwx SM=COW /usr/lib/dyld ...
Some digging indicates this change my have to do with ASLR:
Slide can be computed by the difference between running __TEXT vmaddr and the one read from disk.
I opened a bug against
gdb and hacked up a crappy fix. While
gdb now starts up properly and
info shared behaves as expected I was still unable to set breakpoints in my gem.
Error in re-setting breakpoint 1: Cannot access memory at address 0x3fe0
I added some more debug printf lines to my
gdb and found that some object files seem to have the wrong memory addresses embedded in their
DWARF2 debug info at
Base addr 0x100001060 for object file: /Users/Stefan/src/ruby/main.o
Base addr 0x3fe0 for object file: /Users/Stefan/src/my_gem/tmp/x86_64-darwin16/my_gem/2.3.3/my_gem.o
This finding indicates that there may be something more fundamentally wrong about relocations on Sierra and
gdb 7.12. I decided to give
lldb a try and found that all my forked children in the test would fail instantly. I enabled core dumps with
ulimit -c unlimited and found the the following:
lldb `which ruby` -c /cores/core.83458 (lldb) target create "/Users/Stefan/bin/ruby/bin/ruby" --core "/cores/core.83458" (lldb) bt * thread #1: tid = 0x0000, 0x000000010040885e dyld`gdb_image_notifier(dyld_image_mode, unsigned int, dyld_image_info const*) + 1, stop reason = signal SIGSTOP * frame #0: 0x000000010040885e dyld`gdb_image_notifier(dyld_image_mode, unsigned int, dyld_image_info const*) + 1 (lldb) register read General Purpose Registers: rax = 0x0000000100438000 dyld`dyld_all_image_infos rbx = 0x0000000000000001 rcx = 0x0000000066a314de rdx = 0x00007fff5fbf07c0 rdi = 0x0000000000000000 rsi = 0x0000000000000001 rbp = 0x00007fff5fbf05b0 rsp = 0x00007fff5fbf0588 r8 = 0x000000000000000f r9 = 0x0000000000000000 r10 = 0x000000000000002c r11 = 0x0000000000000206 r12 = 0x00007fff5fbf07c0 r13 = 0x0000000000000014 r14 = 0x00007fff5fbf07c0 r15 = 0x00001ed1d63963b8 rip = 0x000000010040885e dyld`gdb_image_notifier(dyld_image_mode, unsigned int, dyld_image_info const*) + 1 rflags = 0x0000000000000246 cs = 0x000000000000002b fs = 0x0000000000000000 gs = 0x0000000000000000
gdb as well (with the semi-broken patch I made) and got a similar core dump. (In fact worse, since the
0x0 and Ruby's own signal handler would attempt to walk backwards to find a stack trace and would proceed to
SIGSEGV - I'm still not sure why that happened, but
gdb seemed to die in the same place and with the same
A bit of searching came upon an interesting post:
lldb sets a breakpoint on gdb_image_notifier() in order to catch libraries being loaded ... apparently the debugger did not attach to the new process but incorrectly left its breakpoint active there. When the breakpoint is hit again without the debugger waiting to handle it, the process crashes.
Disassembling some instructions in the core file shows a very suspicious INT 3 which would definitely cause the crash outside a debugger:
(lldb) disassemble --start-address 0x000000010040885e-0x1 -c 100 dyld`gdb_image_notifier: 0x10040885d <+0>: int3 -> 0x10040885e <+1>: movq %rsp, %rbp ...
So while this sounds like a plausible theory, I'm not sure how this mechanism would ever work since
lldb does not support
follow-fork-mode child and in fact
gdb only supports it on Linux:
On some systems, gdb provides support for debugging programs that create additional processes using the fork or vfork functions. On gnu/Linux platforms, this feature is supported with kernel version 2.5.46 and later.
Unfortunately this situation is completely broken and leaves us in a really bad state.
cpython) is a runtime that relies on a single global lock for executing bytecodes in its interpreter loop. While this design vastly simplifies the VM implementation, it also means the concurrency model is process rather than thread oriented (contrast with the JVM).
fork is an integral part of any reasonably complex Ruby program, and not being able to debug forked children easily on Mac OS disqualifies it as a serious development platform for Ruby users.
I believe there are currently multiple compounding issues:
dyld 15and Mac OS 10.12+ have modified their relocation behaviour and
gdbin its current state cannot debug shared libraries on it anymore. This is a huge problem that I partially overcame with a shitty patch.
- For reasons I do not understand at this time the
DWARF2debug info for some objects also contains offsets which
gdbdoes not relocate appropriately, so even with the patch you may not be able to debug shared libraries properly unless you do more hacks to offset the bad locations appropriately.
- To my knowledge no native debugger supports
set detach-on-fork onon Mac OS and the alternatives are pretty lacklustre.
gdbseem to crash in
dyld's notification mechanism when children are forked. I do not recall this behaviour happening in El Capitan (though I don't have easy access to a machine running it to check). I do not yet have a minimal test case, and a trivial "Hello World" style
requireprogram did not exhibit this behaviour.
At this point I've gone way too far down the tooling rabbit hole and will instead set up a Virtualbox VM with Linux so I can use the tooling I know and expect to work. I'll be monitoring the upstream bug report for any comments from
Should you have any ideas please do leave a comment below. Thanks for reading!