================================================================================ How do I use q4 to pre-process a dump that HP needs to read DocId: OZBEKBRC00000611 Updated: 10/11/04 11:46:00 AM PROBLEM Prior to Opening a Call with HP, I'd like to know the best way to pre-process my crash dump so HP can troubleshoot it. RESOLUTION ITRC DOCUMENT ID: OZBEKBRC00000611 USING Q4 TO ANALYZE SYSTEM DUMP FILES (For HPUX 10.10-11.23 systems) [NOTE: These guidelines are for North American customers. Other locales may use a different procedure for preprocessing dump files] ================================================================= WHAT IS Q4 ? ================================================================= If HP-UX crashes, system firmware saves critical O/S state info in RAM to the swap LVOL or a dump device, then reboots the system where the O/S copies the dump to a file system directory. The /usr/contrib/bin/q4 utility is used to investigate the dump. "q4" usage methods vary depending on the versions of both the O/S and q4. Please process the dump using the Q4 steps that follow. Step 6 helps the user interpret the Q4 results and what to do next. ============================================================== STEP 1 ============= WHERE IS THE DUMP? =================== ============================================================== 1.1 /var/adm/crash/ is the default destination directory for dumps. If a directory is not specified in one of the following boot-time parameters, do so now. 11.X: /etc/rc.config.d/savecrash - SAVECRASH_DIR= 10.X: /etc/rc.config.d/savecore - SAVECORE_DIR= NOTE: If "/var: file system full" occurs during the dump save, uncompress or q4 processes in step 4 or 5, use a file system that can accommodate 2x the sum of the GZIPed dump files. Update /etc/rc.config.d/save* 1.2 Determine if a recent crash.N (11.X) or core.N (10.X) directory exists in the dump directory: # ll /var/adm/crash/c* (dump directory) NOTE: "N" increments with each new dump. 1.3 If the system dump is not at the expected path, attempt to save it using one of these commands: 11.X: # savecrash -vr 10.X: # savecore -vr If the command results in "invalid dump header", a valid dump does not exist in the swap/dump device. (Swapping may have occured) 1.4 /etc/shutdownlog and /var/adm/crash/c*/INDEX contain a useful crash "panic" statement. If it does not exist: # touch /etc/shutdownlog ============================================================== STEP 2 ===== WHAT VERSION OF Q4 IS LOADED? ================ ============================================================== 2.1 Determine which version of q4 is loaded: # swlist -l product | grep -i q4 The O/S comes with one of the following versions: OS-Core.Q4 B.10.20 HP-UX Crash Dump Debugger for PA-RISC systems OS-Core.Q4 B.11.00 HP-UX Crash Dump Debugger for PA-RISC systems If one of the following patched versions are listed, proceed to step 3: 10.20 11.00 11.11 PHCO_28068(latest) PHCO_28069(latest) PHCO_28067(latest) PHCO_20261 PHCO_26075 PHCO_25723 PHCO_20262 2.2 If q4 is not loaded, or the dump was forced after a system hang, swinstall the appropriate patched version. Loading a patched version will NOT cause a system reboot. Installation instructions accompany the patch. Links to each O/S version of "Patched Q4" (login required): Link to the 11.22 patch for Q4 Link to the 11.11 patch for Q4 Link to the 11.00 patch for Q4 Link to the 11.20 patch for Q4 NOTE: the webpage shows the most recent version. 2.3 If q4 is not loaded and a patch version cannot be downloaded, load the original version from the CD media... Mount the INSTALL media and verify a matching version of Q4 is available: # swlist -l product -s / | grep Q4 OS-Core.Q4 B.10.10 HP-UX Crash Dump Debugger for PA-RISC systems ^^^^^ -matches the O/S Install it: # swinstall -vs / OS-Core.Q4 ============================================================== STEP 3 ===== CD TO THE DUMP DIRECTORY ===================== ============================================================== NOTE: csh (c-shell) will cause errors with q4. Use sh-posix. IMPORTANT! cd to the dump directory and then skip to step 5. Example: # cd /var/adm/crash/crash.0 OR /var/adm/crash/core.0 ============================================================== STEP 4 ========= IF USING UNPATCHED Q4 ==================== ============================================================== 4.1 If vmunix is still compressed (vmunix.gz), unpack it: # /usr/contrib/bin/gunzip vmunix.gz 4.2 Prepare the dump tools ... For 10.20 through 11.11, type: # /usr/contrib/bin/q4prep -p For 11.20 and newer, type: # /usr/contrib/Q4/bin/q4prep -p For 10.10, type the following commands: # uncompress /usr/contrib/lib/Q4Lib.tar.Z (ignore the error if this was done previously) # tar -xf /usr/contrib/lib/Q4Lib.tar (output placed in the current directory) # cp q4lib/sample.q4rc.pl ~/.q4rc.pl \ \ \ Note the use of a tilde and letter "l" (not digit 1) # /usr/contrib/bin/q4pxdb vmunix (ignore a "already preprocessed" complaint) 4.3 Type: # q4 -p . (note the "dot" at the end of the command) Then: q4> trace event 0 > trace.out q4> include analyze.pl (last char. is letter "l", not the digit 1) q4> run Analyze AU >> ana.out (ctrl-c allows interrupt) q4> include whathappened.pl q4> run WhatHappened -HANG > what.out (ctrl-c allows interrupt) q4> exit If a "file system full" message occurs, follow NOTE in step 1.1 and then redo the failed command and proceed. Skip to step 6 ============================================================== STEP 5 ============ IF USING PATCHED Q4 ===================== ============================================================== 5.1 Type: # . /usr/contrib/Q4/bin/set_env Note the 'dot' at the beginning of the command. 5.2 Type: # ll vmunix.gz If the file is not found, go to step 5.3 If the file is found, type: # gunzip vmunix.gz 5.3 Type: # /usr/contrib/Q4/bin/q4pxdb vmunix (Ignore "unnecessary" message) # /usr/contrib/Q4/bin/q4 -p . (note the "dot" at the end of the command) If a "file system full" message occurs, follow NOTE in step 1.1 and redo the failed command and proceed. 5.4 At the q4> prompt, type these 'run' commands: q4> trace event 0 > trace.out q4> run Analyze AU > ana.out q4> run WhatHappened -HANG > what.out NOTE: The last two commands may run for several minutes. CTRL-c can interrupt them. 5.5 Type: q4> exit ============================================================== STEP 6 =========== REVIEW AND SEND DATA =================== ============================================================== 6.1 HPUX uses the acronym HPMC (High Prioirity Machine Check) to denote a hardware failure. Determine if the crash was due to an HPMC ... Type: # grep HPMC ana.out trace.out If either of the following lines appear, open a hardware repair case for the system: "crash event was an HPMC" "Crash Event 0 (HPMC, struct crash_event_table_struct..." The OnlineDiag software bundle captures HPMC chassis codes in /var/tombstones/ts* files. If available, email the 'ts' file created after the "dumptime" listed in the dump INDEX file (eg. ts99). If an HPMC did not occur, proceed to the next step. 6.2 Skip the remainder of this step if the following lines are not found in ana.out: MC/ServiceGuard: Unable to maintain contact with cmcld daemon. Performing TOC to ensure data integrity. If these statements occur, determine the NODE_TIMEOUT value: # cmviewconf | grep "node timeout" If the value returned is 2 seconds, it's a good bet the value is too low, causing the crash. When the kernel is too busy to send a Serviceguard heartbeat packet to the other nodes within the NODE_TIMEOUT period, the other nodes reformed a cluster and 'orphaned' this node - causing a TOC/reboot. The fix is to update the cluster ASCII file - NODE_TIMEOUT to 8 seconds and with the cluster down, 'cmapplyconf -C' the file. See document UXSGLVKBAN00000010 for details of this issue. If this is the case - stop here. 6.3 Generate a patch list: # /usr/sbin/swlist -l product PH* > patchlist.out 6.4 Zip and Email the following files: ana.out patchlist.out trace.out what.out /etc/shutdownlog /var/tombstones/ts99 (if an HPMC was detected) /var/adm/syslog/OLDsyslog.log (if the dump was due to a hang) To: hpcu@atl.hp.com Cc: support@atl.hp.com Subject: EMAIL RECOMMENDATIONS: - DO NOT send this data to the engineer's personal address. - Send files as attachments when possible. - Send fresh messages, not replies. - Mail size must be < 2MB. Anything greater will be denied. After emailing the data, please notify HP that dump email has been sent for action (via callback or ITRC note). If e-mail or ftp is not available, tar up the dump (relative pathing please) and send the tape to: Barry Britt Acceptable tape formats: M/S ALF01-3T20 DDS1 - DDS3 5555 Windward Parkway CD-media (containing dump files) Alpharetta GA 30004 -------------------------------------------------------- http://docs.hp.com http://itrc.hp.com ================================================================================