aoqi@0: aoqi@0: aoqi@0: aoqi@0: Debugging transported core dumps aoqi@0: aoqi@0: aoqi@0: aoqi@0:

Debugging transported core dumps

aoqi@0: aoqi@0:

aoqi@0: When a core dump is moved to a machine different from the one where it was aoqi@0: produced ("transported core dump"), debuggers (dbx, gdb, windbg or SA) do not aoqi@0: always successfully open the dump. This is due to kernel, library (shared aoqi@0: objects or DLLs) mismatch between core dump machine and debugger machine. aoqi@0:

aoqi@0: aoqi@0:

aoqi@0: In most platforms, core dumps do not contain text (a.k.a) Code pages. aoqi@0: There pages are to be read from executable and shared objects (or DLLs). aoqi@0: Therefore it is important to have matching executable and shared object aoqi@0: files in debugger machine. aoqi@0:

aoqi@0: aoqi@0:

Solaris transported core dumps

aoqi@0: aoqi@0:

aoqi@0: Debuggers on Solaris (and Linux) use two addtional shared objects aoqi@0: rtld_db.so and libthread_db.so. rtld_db.so is used to aoqi@0: read information on shared objects from the core dump. libthread_db.so aoqi@0: is used to get information on threads from the core dump. rtld_db.so aoqi@0: evolves along with rtld.so (the runtime linker library) and libthread_db.so aoqi@0: evolves along with libthread.so (user land multithreading library). aoqi@0: Hence, debugger machine should have right version of rtld_db.so and aoqi@0: libthread_db.so to open the core dump successfully. More details on aoqi@0: these debugger libraries can be found in aoqi@0: aoqi@0: Solaris Linkers and Libraries Guide - 817-1984 aoqi@0:

aoqi@0: aoqi@0:

Solaris SA against transported core dumps

aoqi@0: aoqi@0:

aoqi@0: With transported core dumps, you may get "rtld_db failures" or aoqi@0: "libthread_db failures" or SA may just throw some other error aoqi@0: (hotspot symbol is missing) when opening the core dump. aoqi@0: Enviroment variable LIBSAPROC_DEBUG may be set to any value aoqi@0: to debug such scenarios. With this env. var set, SA prints many aoqi@0: messages in standard error which can be useful for further debugging. aoqi@0: SA on Solaris uses libproc.so library. This library also aoqi@0: prints debug messages with env. var LIBPROC_DEBUG. But, aoqi@0: setting LIBSAPROC_DEBUG results in setting LIBPROC_DEBUG as well. aoqi@0:

aoqi@0:

aoqi@0: The best possible way to debug a transported core dump is to match the aoqi@0: debugger machine to that of core dump machine. i.e., have same Kernel aoqi@0: and libthread patch level between the machines. mdb (Solaris modular aoqi@0: debugger) may be used to find the Kernel patch level of core dump aoqi@0: machine and debugger machine may be brought to the same level. aoqi@0:

aoqi@0:

aoqi@0: If the matching machine is "far off" in your network, then aoqi@0:

aoqi@0:

aoqi@0: aoqi@0:

aoqi@0: But, it may not be feasible to find matching machine to debug. aoqi@0: If so, you can copy all application shared objects (and libthread_db.so, if needed) from the core dump aoqi@0: machine into your debugger machine's directory, say, /export/applibs. Now, set SA_ALTROOT aoqi@0: environment variable to point to /export/applibs directory. Note that /export/applibs should either aoqi@0: contain matching 'full path' of libraries. i.e., /usr/lib/libthread_db.so from core aoqi@0: machine should be under /export/applibs/use/lib directory and /use/java/jre/lib/sparc/client/libjvm.so aoqi@0: from core machine should be under /export/applibs/use/java/jre/lib/sparc/client so on or /export/applibs aoqi@0: should just contain libthread_db.so, libjvm.so etc. directly. aoqi@0:

aoqi@0: aoqi@0:

aoqi@0: Support for transported core dumps is not built into the standard version of libproc.so. You need to aoqi@0: set LD_LIBRARY_PATH env var to point to the path of a specially built version of libproc.so. aoqi@0: Note that this version of libproc.so has a special symbol to support transported core dump debugging. aoqi@0: In future, we may get this feature built into standard libproc.so -- if that happens, this step (of aoqi@0: setting LD_LIBRARY_PATH) can be skipped. aoqi@0:

aoqi@0: aoqi@0:

Ignoring libthread_db.so failures

aoqi@0:

aoqi@0: If you are okay with missing thread related information, you can set aoqi@0: SA_IGNORE_THREADDB environment variable to any value. With this aoqi@0: set, SA ignores libthread_db failure, but you won't be able to get any aoqi@0: thread related information. But, you would be able to use SA and get aoqi@0: other information. aoqi@0:

aoqi@0: aoqi@0:

Linux SA against transported core dumps

aoqi@0:

aoqi@0: On Linux, SA parses core and shared library ELF files. SA does not use aoqi@0: libthread_db.so or rtld_db.so for core dump debugging (although aoqi@0: libthread_db.so is used for live process debugging). But, you aoqi@0: may still face problems with transported core dumps, because matching shared aoqi@0: objects may not be in the path(s) specified in core dump file. To aoqi@0: workaround this, you can define environment variable SA_ALTROOT aoqi@0: to be the directory where shared libraries are kept. The semantics of aoqi@0: this env. variable is same as that for Solaris (please refer above). aoqi@0:

aoqi@0: aoqi@0: aoqi@0: aoqi@0: