agent/doc/transported_core.html

changeset 0
f90c822e73f8
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/agent/doc/transported_core.html	Wed Apr 27 01:25:04 2016 +0800
     1.3 @@ -0,0 +1,110 @@
     1.4 +<html>
     1.5 +<head>
     1.6 +<title>
     1.7 +Debugging transported core dumps
     1.8 +</title>
     1.9 +</head>
    1.10 +<body>
    1.11 +<h1>Debugging transported core dumps</h1>
    1.12 +
    1.13 +<p>
    1.14 +When a core dump is moved to a machine different from the one where it was
    1.15 +produced ("transported core dump"), debuggers (dbx, gdb, windbg or SA) do not
    1.16 +always successfully open the dump. This is due to kernel, library (shared
    1.17 +objects or DLLs) mismatch between core dump machine and debugger machine.
    1.18 +</p>
    1.19 +
    1.20 +<p>
    1.21 +In most platforms, core dumps do not contain text (a.k.a) Code pages.
    1.22 +There pages are to be read from executable and shared objects (or DLLs).
    1.23 +Therefore it is important to have matching executable and shared object
    1.24 +files in debugger machine. 
    1.25 +</p>
    1.26 +
    1.27 +<h3>Solaris transported core dumps</h3>
    1.28 +
    1.29 +<p>
    1.30 +Debuggers on Solaris (and Linux) use two addtional shared objects
    1.31 +<b>rtld_db.so</b> and <b>libthread_db.so</b>. rtld_db.so is used to
    1.32 +read information on shared objects from the core dump. libthread_db.so
    1.33 +is used to get information on threads from the core dump. rtld_db.so
    1.34 +evolves along with rtld.so (the runtime linker library) and libthread_db.so
    1.35 +evolves along with libthread.so (user land multithreading library). 
    1.36 +Hence, debugger machine should have right version of rtld_db.so and
    1.37 +libthread_db.so to open the core dump successfully. More details on
    1.38 +these debugger libraries can be found in 
    1.39 +<a href="http://docs.sun.com/app/docs/doc/817-1984/">
    1.40 +Solaris Linkers and Libraries Guide - 817-1984</a>
    1.41 +</p>
    1.42 +
    1.43 +<h3>Solaris SA against transported core dumps</h3>
    1.44 +
    1.45 +<p>
    1.46 +With transported core dumps, you may get "rtld_db failures" or
    1.47 +"libthread_db failures" or SA may just throw some other error
    1.48 +(hotspot symbol is missing) when opening the core dump. 
    1.49 +Enviroment variable <b>LIBSAPROC_DEBUG</b> may be set to any value
    1.50 +to debug such scenarios. With this env. var set, SA prints many
    1.51 +messages in standard error which can be useful for further debugging.
    1.52 +SA on Solaris uses <b>libproc.so</b> library. This library also
    1.53 +prints debug messages with env. var <b>LIBPROC_DEBUG</b>. But,
    1.54 +setting LIBSAPROC_DEBUG results in setting LIBPROC_DEBUG as well.
    1.55 +</p>
    1.56 +<p>
    1.57 +The best possible way to debug a transported core dump is to match the
    1.58 +debugger machine to that of core dump machine. i.e., have same Kernel
    1.59 +and libthread patch level between the machines. mdb (Solaris modular
    1.60 +debugger) may be used to find the Kernel patch level of core dump
    1.61 +machine and debugger machine may be brought to the same level.
    1.62 +</p>
    1.63 +<p>
    1.64 +If the matching machine is "far off" in your network, then
    1.65 +<ul>
    1.66 +<li>consider using rlogin and <a href="clhsdb.html">CLHSDB - SA command line HSDB interface</a> or
    1.67 +<li>use SA remote debugging and debug the core from core machine remotely.
    1.68 +</ul>
    1.69 +</p>
    1.70 +
    1.71 +<p>
    1.72 +But, it may not be feasible to find matching machine to debug. 
    1.73 +If so, you can copy all application shared objects (and libthread_db.so, if needed) from the core dump 
    1.74 +machine into your debugger machine's directory, say, /export/applibs. Now, set <b>SA_ALTROOT</b> 
    1.75 +environment variable to point to /export/applibs directory. Note that /export/applibs should either 
    1.76 +contain matching 'full path' of libraries. i.e., /usr/lib/libthread_db.so from core 
    1.77 +machine should be under /export/applibs/use/lib directory and /use/java/jre/lib/sparc/client/libjvm.so 
    1.78 +from core machine should be under /export/applibs/use/java/jre/lib/sparc/client so on or /export/applibs 
    1.79 +should just contain libthread_db.so, libjvm.so etc. directly. 
    1.80 +</p>
    1.81 +
    1.82 +<p>
    1.83 +Support for transported core dumps is <b>not</b> built into the standard version of libproc.so. You need to
    1.84 +set <b>LD_LIBRARY_PATH</b> env var to point to the path of a specially built version of libproc.so. 
    1.85 +Note that this version of libproc.so has a special symbol to support transported core dump debugging. 
    1.86 +In future, we may get this feature built into standard libproc.so -- if that happens, this step (of 
    1.87 +setting LD_LIBRARY_PATH) can be skipped.
    1.88 +</p>
    1.89 +
    1.90 +<h3>Ignoring libthread_db.so failures</h3>
    1.91 +<p>
    1.92 +If you are okay with missing thread related information, you can set 
    1.93 +<b>SA_IGNORE_THREADDB</b> environment variable to any value. With this
    1.94 +set, SA ignores libthread_db failure, but you won't be able to get any
    1.95 +thread related information. But, you would be able to use SA and get
    1.96 +other information.
    1.97 +</p>
    1.98 +
    1.99 +<h3>Linux SA against transported core dumps</h3>
   1.100 +<p>
   1.101 +On Linux, SA parses core and shared library ELF files. SA <b>does not</b> use
   1.102 +libthread_db.so or rtld_db.so for core dump debugging (although 
   1.103 +libthread_db.so is used for live process debugging). But, you
   1.104 +may still face problems with transported core dumps, because matching shared
   1.105 +objects may not be in the path(s) specified in core dump file. To
   1.106 +workaround this, you can define environment variable <b>SA_ALTROOT</b>
   1.107 +to be the directory where shared libraries are kept. The semantics of
   1.108 +this env. variable is same as that for Solaris (please refer above).
   1.109 +</p>
   1.110 +
   1.111 +
   1.112 +</body>
   1.113 +</html>

mercurial