OBPSYM SETUP Now we are tring to install the obpsym which had provided by CTE for watchdog reset problem cte escalation. We can insatll the obpsym by add_drv however boot -r hangs after obpsym installed. System Configuration: SC2000/Solaris 2.4(KJP-43) Reproduce procedure as follows. # cp obpsym /kernel/misc # add_drv /kernel/misc/obpsym Add set obpdebug = 1 in /etc/system *BEGIN: info for watchdog resets forceload: misc/obpsym set nopanicdebug=1 *END: info for watchdog resets # halt ok boot -r INFODOC ID: 12031 SYNOPSIS: Capturing system hangs and crashes on Solaris 2.X DETAIL DESCRIPTION: Collecting System Crash Dump Images On Sun Solaris 2.X Systems +----------------------+ | Panic() & Savecore | +----------------------+ When a Solaris 2.X system panics, the panic() routine writes an image of system memory to the dump device. This image is delimited by short dump records, one at each end of the dump image. When the system reboots, /etc/init.d/sysetup is run. This script can be used to call the savecore utility. By default, the section of Bourne shell code which calls savecore is commented out. The system administrator must uncomment it. When run, savecore examines the dump device. If the two short dump records are seen and it appears that a valid system crash dump image exists, savecore will read the image and write it into a disk file in a specified directory. Savecore also puts a copy of the kernel namelist into this directory. +---------------------------------+ | Dump Device Disk Requirements | +---------------------------------+ The panic() routine is a rather primitive routine. It may not know about volume managers or other advanced disk management techniques and sub-systems. Panic() can only write to one dump device. This will be the primary swap device; in other words, the first swap device listed in /etc/vfstab. Crash dumps vary in size based on the memory configuration of the system and how much of that memory was in use. Crash dumps that use the entire allowed 2gb primary swap partition have been seen on large systems, and in 64-bit Solaris 7, even larger corefiles will sometimes be compressed to fit into a 2-gb swap area. Individual workstations tend to have much smaller crash dumps and are often less than 50mb in size. The primary swap device (disk partition) must be large enough to hold the system crash dump image, and, except for 64-bit systems, must not be ONE BYTE larger than 2.0 gb, not even as a result of rounding by the paritition or format commands. See SRDB 6467. +------------------------------+ | Savecore Disk Requirements | +------------------------------+ Savecore is called from /etc/init.d/sysetup (which is hard- linked to /etc/rc2.d/S20sysetup). Savecore is called with one argument: the name of the directory where the dump image is to be stored. The specified savecore directory must be on a filesystem which has enough disk space free on which to write the system crash dump image. Remember that the image can be quite large at times. If you are concerned about savecore taking too much space in the filesystem, you may create the file minfree in the directory in which savecore is to save the files. In this file, place a number. This number specifies the minimum free space (in kilobytes) that must be available in the filesystem for a dump to be created. +-----------------------+ | /etc/init.d/sysetup | +-----------------------+ By default, for version 2.x (not Solaris 7) the last few lines of the sysetup script reads as: ## ## Default is to not do a savecore ## #if [ ! -d /var/crash/`uname -n` ] #then mkdir -p /var/crash/`uname -n` #fi # echo 'checking for crash dump...\c ' #savecore /var/crash/`uname -n` # echo '' For Solaris 7, do man dumpadm to get savecore information. To enable savecore, the system administrator needs to uncomment all of these lines. The result should look like this: # #Default is to not do a savecore # if [ ! -d /var/crash/`uname -n` ] then mkdir -p /var/crash/`uname -n` fi echo 'checking for crash dump...\c ' savecore /var/crash/`uname -n` echo '' If /var is part of the root filesystem, chances are very good that this filesystem is just not roomy enough to be used for crash dumps. Therefore, it will often be necessary to customize three of these lines. For example: # # Default is to not do a savecore # if [ ! -d /bigdisk/crashes/`uname -n` ] <--- 1 then mkdir -p /bigdisk/crashes/`uname -n` <--- 2 fi echo 'checking for crash dump...\c ' savecore -v /bigdisk/crashes/`uname -n` <--- 3 echo '' `uname -n` specifies use of the system hostname as part of the savecore directory name. Alternatively, savecore can be called without use of the hostname. For example: savecore -v /home8/my_panics Note also that there is a -v option to savecore which can be used to get more "verbose" output from savecore. +------------------------------+ | Testing The Savecore Setup | +------------------------------+ Intentionally crashing a system is not recommended. However, there are occasions when this is required for various reasons. If you are the system administrator or system owner, and you must force your system to crash in order to test your savecore setup, please do the following: 1) Back up all of your data. Systemcrashes can result in non-recoverable and catastrophic loss of data. 2) Gracefully halt your system using 'halt' or 'init 0'. 3) At the OK> boot prom prompt enter: sync Your system should start panic'ing at this time. You should see "dumping" messages. 4) Next, the system will attempt to reboot. During this process you should see some savecore messages. 5) Once the system is rebooted, look in your savecore directory and see if you have system crash dump files there. They will be named "unix.#" and "vmcore.#", where # is the crash number. There should also be a "bounds" file. This contains the next crash number for savecore to use. +----------------------------------+ | Converting A Hang Into A Panic | +----------------------------------+ Hung systems are the most difficult to debug. Fortunately, sometimes a hang can be converted into a panic and an image of memory can be obtained which can later be analyzed. This is *NOT* always the case, however. Before trying to panic a hung system, make sure the system is really hung first! 1) Are *ALL* of the users affected by the hang? 2) Can you ping the system? 3) Can you remotely log into the hung system? 4) Can root log in on the console? If you are sure the whole system is hung, try to force a panic. This is done by following the savecore test steps 4 through 7 described earlier where we "L1-A" the system. If L1-A doesn't result in a boot prom prompt, try disconnecting and reconnecting the console keyboard. Only use this as a last resort and if you are really desperate to get a crash dump, as this step can occasionally cause hardware problems. (In general, you should never disconnect hardware which is powered up.) If you can not force a panic, you will have to power cycle the system and let it reboot normally. Note that as soon as you remove power from the system, the contents of memory is lost forever! Forcing a panic *AFTER* power cycling will result in a system crash dump which will *not* contain evidence as to why the system had hung up earlier. +-------------------------------------------+ | What To Do With System Crash Dump Files | +-------------------------------------------+ Once you have successfully collected a system crash dump image, you have 2 possible courses of action: 1) Call SunService for assistance (see Infodoc 14230) 2) Analyze the crash dump files on your own (see Infodoc 12936 and 13039) For additional information about crash dump analysis, refer to the book "Panic! UNIX System Crash Dump Analysis" by Chris Drake and Kimberley Brown, ISBN 0-13-149386-8. Panic! is available through SunExpress, SunSoft Press, and Prentice Hall. See also: srdb 6660 savecore reports: savecore: /dev/dump: No such device srdb 6467 savecore is enabled, but a coredump is not produced srdb 14172 How come a system corefile was created when the system did not crash? infodoc 6332 how to enable savecore in Solaris 2.x faqs 1563 How to save a system crash dump faqs 1611 How to save a system crash dump faqs 2220 How to setup a tipline on a x86 2.5.1 system for kadb srdb 10170 To save crashdump when machine panics at kadb prompt srdb 17314 How to retrieve a crash dump from a SunScreen SPF-200 infodoc 11816 How to force crashes on Solaris X86 machines srdb 16646 No suitable partition from swapvol to set as the dump device infodoc 13981 Solaris 2.3 Patch Report Update infodoc 15484 Limiting the size of a panic dump under Solaris 2.5.1 infodoc 15553 Forcing a core dump on an x86 system infodoc 17152 watchdog FAQ PRODUCT AREA: Kernel PRODUCT: crash SUNOS RELEASE: Solaris 2.x HARDWARE: any