Thu, 05 Jun 2014 15:26:51 -0700
Merge
aoqi@0 | 1 | <html> |
aoqi@0 | 2 | <head> |
aoqi@0 | 3 | <title> |
aoqi@0 | 4 | Debugging transported core dumps |
aoqi@0 | 5 | </title> |
aoqi@0 | 6 | </head> |
aoqi@0 | 7 | <body> |
aoqi@0 | 8 | <h1>Debugging transported core dumps</h1> |
aoqi@0 | 9 | |
aoqi@0 | 10 | <p> |
aoqi@0 | 11 | When a core dump is moved to a machine different from the one where it was |
aoqi@0 | 12 | produced ("transported core dump"), debuggers (dbx, gdb, windbg or SA) do not |
aoqi@0 | 13 | always successfully open the dump. This is due to kernel, library (shared |
aoqi@0 | 14 | objects or DLLs) mismatch between core dump machine and debugger machine. |
aoqi@0 | 15 | </p> |
aoqi@0 | 16 | |
aoqi@0 | 17 | <p> |
aoqi@0 | 18 | In most platforms, core dumps do not contain text (a.k.a) Code pages. |
aoqi@0 | 19 | There pages are to be read from executable and shared objects (or DLLs). |
aoqi@0 | 20 | Therefore it is important to have matching executable and shared object |
aoqi@0 | 21 | files in debugger machine. |
aoqi@0 | 22 | </p> |
aoqi@0 | 23 | |
aoqi@0 | 24 | <h3>Solaris transported core dumps</h3> |
aoqi@0 | 25 | |
aoqi@0 | 26 | <p> |
aoqi@0 | 27 | Debuggers on Solaris (and Linux) use two addtional shared objects |
aoqi@0 | 28 | <b>rtld_db.so</b> and <b>libthread_db.so</b>. rtld_db.so is used to |
aoqi@0 | 29 | read information on shared objects from the core dump. libthread_db.so |
aoqi@0 | 30 | is used to get information on threads from the core dump. rtld_db.so |
aoqi@0 | 31 | evolves along with rtld.so (the runtime linker library) and libthread_db.so |
aoqi@0 | 32 | evolves along with libthread.so (user land multithreading library). |
aoqi@0 | 33 | Hence, debugger machine should have right version of rtld_db.so and |
aoqi@0 | 34 | libthread_db.so to open the core dump successfully. More details on |
aoqi@0 | 35 | these debugger libraries can be found in |
aoqi@0 | 36 | <a href="http://docs.sun.com/app/docs/doc/817-1984/"> |
aoqi@0 | 37 | Solaris Linkers and Libraries Guide - 817-1984</a> |
aoqi@0 | 38 | </p> |
aoqi@0 | 39 | |
aoqi@0 | 40 | <h3>Solaris SA against transported core dumps</h3> |
aoqi@0 | 41 | |
aoqi@0 | 42 | <p> |
aoqi@0 | 43 | With transported core dumps, you may get "rtld_db failures" or |
aoqi@0 | 44 | "libthread_db failures" or SA may just throw some other error |
aoqi@0 | 45 | (hotspot symbol is missing) when opening the core dump. |
aoqi@0 | 46 | Enviroment variable <b>LIBSAPROC_DEBUG</b> may be set to any value |
aoqi@0 | 47 | to debug such scenarios. With this env. var set, SA prints many |
aoqi@0 | 48 | messages in standard error which can be useful for further debugging. |
aoqi@0 | 49 | SA on Solaris uses <b>libproc.so</b> library. This library also |
aoqi@0 | 50 | prints debug messages with env. var <b>LIBPROC_DEBUG</b>. But, |
aoqi@0 | 51 | setting LIBSAPROC_DEBUG results in setting LIBPROC_DEBUG as well. |
aoqi@0 | 52 | </p> |
aoqi@0 | 53 | <p> |
aoqi@0 | 54 | The best possible way to debug a transported core dump is to match the |
aoqi@0 | 55 | debugger machine to that of core dump machine. i.e., have same Kernel |
aoqi@0 | 56 | and libthread patch level between the machines. mdb (Solaris modular |
aoqi@0 | 57 | debugger) may be used to find the Kernel patch level of core dump |
aoqi@0 | 58 | machine and debugger machine may be brought to the same level. |
aoqi@0 | 59 | </p> |
aoqi@0 | 60 | <p> |
aoqi@0 | 61 | If the matching machine is "far off" in your network, then |
aoqi@0 | 62 | <ul> |
aoqi@0 | 63 | <li>consider using rlogin and <a href="clhsdb.html">CLHSDB - SA command line HSDB interface</a> or |
aoqi@0 | 64 | <li>use SA remote debugging and debug the core from core machine remotely. |
aoqi@0 | 65 | </ul> |
aoqi@0 | 66 | </p> |
aoqi@0 | 67 | |
aoqi@0 | 68 | <p> |
aoqi@0 | 69 | But, it may not be feasible to find matching machine to debug. |
aoqi@0 | 70 | If so, you can copy all application shared objects (and libthread_db.so, if needed) from the core dump |
aoqi@0 | 71 | machine into your debugger machine's directory, say, /export/applibs. Now, set <b>SA_ALTROOT</b> |
aoqi@0 | 72 | environment variable to point to /export/applibs directory. Note that /export/applibs should either |
aoqi@0 | 73 | contain matching 'full path' of libraries. i.e., /usr/lib/libthread_db.so from core |
aoqi@0 | 74 | machine should be under /export/applibs/use/lib directory and /use/java/jre/lib/sparc/client/libjvm.so |
aoqi@0 | 75 | from core machine should be under /export/applibs/use/java/jre/lib/sparc/client so on or /export/applibs |
aoqi@0 | 76 | should just contain libthread_db.so, libjvm.so etc. directly. |
aoqi@0 | 77 | </p> |
aoqi@0 | 78 | |
aoqi@0 | 79 | <p> |
aoqi@0 | 80 | Support for transported core dumps is <b>not</b> built into the standard version of libproc.so. You need to |
aoqi@0 | 81 | set <b>LD_LIBRARY_PATH</b> env var to point to the path of a specially built version of libproc.so. |
aoqi@0 | 82 | Note that this version of libproc.so has a special symbol to support transported core dump debugging. |
aoqi@0 | 83 | In future, we may get this feature built into standard libproc.so -- if that happens, this step (of |
aoqi@0 | 84 | setting LD_LIBRARY_PATH) can be skipped. |
aoqi@0 | 85 | </p> |
aoqi@0 | 86 | |
aoqi@0 | 87 | <h3>Ignoring libthread_db.so failures</h3> |
aoqi@0 | 88 | <p> |
aoqi@0 | 89 | If you are okay with missing thread related information, you can set |
aoqi@0 | 90 | <b>SA_IGNORE_THREADDB</b> environment variable to any value. With this |
aoqi@0 | 91 | set, SA ignores libthread_db failure, but you won't be able to get any |
aoqi@0 | 92 | thread related information. But, you would be able to use SA and get |
aoqi@0 | 93 | other information. |
aoqi@0 | 94 | </p> |
aoqi@0 | 95 | |
aoqi@0 | 96 | <h3>Linux SA against transported core dumps</h3> |
aoqi@0 | 97 | <p> |
aoqi@0 | 98 | On Linux, SA parses core and shared library ELF files. SA <b>does not</b> use |
aoqi@0 | 99 | libthread_db.so or rtld_db.so for core dump debugging (although |
aoqi@0 | 100 | libthread_db.so is used for live process debugging). But, you |
aoqi@0 | 101 | may still face problems with transported core dumps, because matching shared |
aoqi@0 | 102 | objects may not be in the path(s) specified in core dump file. To |
aoqi@0 | 103 | workaround this, you can define environment variable <b>SA_ALTROOT</b> |
aoqi@0 | 104 | to be the directory where shared libraries are kept. The semantics of |
aoqi@0 | 105 | this env. variable is same as that for Solaris (please refer above). |
aoqi@0 | 106 | </p> |
aoqi@0 | 107 | |
aoqi@0 | 108 | |
aoqi@0 | 109 | </body> |
aoqi@0 | 110 | </html> |