################################################################################
#Configuring Kdump on Linux
#
#Covers: RHEL 5,6,7
#
#Version 2017.03.12.0001
################################################################################

################################################################################
#TOC
	#Need/issue
	#Caveats
	#When a kdump is activated
	#KDUMP config: READMEs
	#KDUMP config: BEGIN
	#Caveats
	#Managing a crash/hang
	#Crash commands
	#Miscellaneous
	#Bibliography
################################################################################


################################################################################
#Need/issue
################################################################################
The kdump procedure

The received warning means the kdump operation might fail and the crashdump
parameter should be configured correctly. This is the procedure of kdumping:

	Note: Not reserving enough memory for the kdump kernel can lead to the 
	      kdump operation failing.

1 - The normal kernel is booted with crashkernel=... as a kernel option,
	reserving some memory for the kdump kernel. The memory reserved by the
	crashkernel parameter is not available to the normal kernel during regular
	operation.  It is reserved for later use by the kdump kernel.
2 - The system panics.
3 - The kdump kernel is booted using kexec, it used the memory area that was
	reserved w/ the crashkernel parameter.
4 - The normal kernel's memory is captured into a vmcore.

################################################################################
#When a kdump is activated
################################################################################

	There are several parameters that control under which circumstances kdump
	is activated. kdump can be activated when

	- system hang is detected through the Non-Maskable Interrupt (NMI)
		Watchdog mechanism.

		This mechanism is enabled through the nmi_watchdog=1 kernel
		parameter. Refer to What is NMI and what can I use it for? 
		for details

	- hardware NMI button is pressed.

		This mechanism is enabled by setting the sysctl
		kernel.unknown_nmi_panic=1 .

	- the out-of-memory killer (oom-killer) would otherwise be triggered.
		This can be configured by setting the sysctl vm.panic_on_oom=1

	- "unrecovered" NMI has occurred.

		This mechanism is enabled by setting the sysctl
		kernel.panic_on_unrecovered_nmi=1 . The following
		kernel warning messages are associated with
		"unrecovered" NMIs:

			Uhhuh. NMI received for unknown reason
			*hexnumber* on CPU *CPUnumber*.
			Do you have a strange power saving mode enabled?
			Dazed and confused, but trying to continue

	Under many circumstances it is advisable to enable multiple tunables 
	from the above list. As an example, in the event of hang events, it is 
	adviseable to enable kernel.unknown_nmi_panic, kernel.softlockup_panic, 
	and also nmi_watchdog=1. This will increase the likelihood that a vmcore 
	will result from an event that an administrator may not be directly 
	monitoring at the time.

################################################################################
#Caveats
################################################################################

	Update to lastest patches
		There are many many patches that affect kdump running right

	If clustered
		Fencing needs to allow enough time to get a vmcore

	If on HP hardware
		ASR needs to be disabled w/ systems w/ large memory

	Other items
		If you are dumping to local storage and utilize the hpsa storage
		module that you may run into difficulty capturing a core. In that
		event, please ensure you are on the latest kexec-tools package.

		To output a list of configured dump locations, run the following
		egrep command:

			egrep \
				"path|raw|nfs|ssh|ext4|ext3|ext2|minix|btrfs|xfs|auto" \
				/etc/kdump.conf \
				| grep -v ^#


		Console frame-buffers and X are not properly supported. On a system
		typically run with something like "vga=791" in the kernel config line
		or with X running, console video will be garbled when a kernel is
		booted via kexec. The kdump kernel should still be able to capture a
		dump, and when the system reboots, video should be restored to
		normal.

		debug_mem_level is a new parameter from RHEL6.3, it turns on
		debug/verbose output of kdump scripts regarding free/used memory at
		various points of execution. Higher level means more debugging
		output.

		If unable to obtain a kernel dump but the machine can be rebooted,
		consider checking the system's RAM.  
			RPM - memtest86+
			CMD - memtest-setup

	#----------------------------------------------------------------------
	Issue

 		Kdump fails/hangs on HP BL460c G7 using P220i/P410i controller with
 		the following message on console:
 			hpsa 0000:05:00.0: hpsa0: <0x233b> @ IRQ 105 using DAC
 			INFO: task insmod:276 blocked for more than 120 seconds.

	Environment

		Red Hat Enterprise Linux 6
		HP BL460c-G7, Controller P220i, Firmware 1.29
		HP BL460c-G7, Controller P220i, Firmware 3.04
		HP BL460c G7, Controller P410i, Firmware 3.52
		HP BL460c G7, Controller P410i, Firmware 5.06

	Resolution

	Firmware updates
	For HP BL460c G7, Controller P410i, Firmware should be >= 5.70
	For HP BL460c-G7, Controller P220i, Firmware should be >= 3.04

	Make sure kernel parameter "noapic" is not included or passed via 
	the KDUMP_COMMANDLINE_APPEND directive

	File /etc/sysconfig/kdump
		KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices \
			cgroup_disable=memory mce=off"

		grep KDUMP_COMMANDLINE_APPEND /etc/sysconfig/kdump | grep noapic

	If noapic **IS** used in kdump configuration line. Remove it, save the
	file and then rebuild kdump

	#----------------------------------------------------------------------
	Issue

 		Certain HP ProLiant servers may still be unable to generate a crash dump
 		(vmcore file) even if Kdump service is configured correctly.

 	Environment

 		Red Hat Enterprise Linux 5
 		Red Hat Enterprise Linux 6
 		HP ProLiant server, certain models.

	Resolution

	There are number of issues which should be addressed in order to generate a
	crash dump (vmcore file) on certain models of HP ProLiant servers even after
	Kdump service is configured correctly. Double-check for each of solutions
	below if it is applicable to your exact server model.
	
	Intel-based servers have certain issues with crashdump when intel_iommu
	kernel parameter is not set to off. See the related article "Cannot collect
	a vmcore with kdump while Intel IOMMU is turned enabled" and Red Hat
	Bugzilla 719237 for the details.
	
	KDump may fail to store the crashdump file on a local drive with cciss
	RAID-controller running older firmware versions. See the article kdump fail
	to dump core file to cciss target running firmware versions lower than v5.06
	for the details. Note the comment from HP Support stating that certain
	issues can be resolved with firmware version 5.70 or later.
	
	Kdump may fail on HP ProLiant servers using the hpsa driver for local
	drives. See the article Why does kdump fail on HP system using the 'hpsa'
	driver for storage in Red Hat Enterprise Linux 6? for the details.
	
	HP ProLiant servers with a large amount of RAM, 384G for example, may be
	hitting the issue described in the following HP Advisory HP ProLiant Servers
	with a Large Memory Configuration - Linux kdump May Not Collect a Dump if
	the hp-asrd Service and hpwdt Are Enabled When a Panic Occurs
	
	HP ProLiant server may be hitting the issue decribed in the following HP
	Advisory Red Hat Enterprise Linux 6.2 ... - Linux Kernel May Be Tainted on
	HP ProLiant Servers Configured With an Intel Xeon E5-2600-Series Processor
	
	HP ProLiant server with large amount of RAM may have just not enough space
	on a KDump target volume to save the crashdump. Configure core_collector
	option in /etc/kdump.conf to compress the vmcore file so its size fits the
	space available on the partition that it is written to. See the following
	section of the solution for the details on core_collector option Sizing
	Local Dump Targets
	
	Setting crashkernel=auto may not reserve enough memory for the crash kernel
	if a certain number of 3rd-party modules are used. This means that the OOM
	killer can wake up and kill processes while running the crash kernel. In
	this case more memory might have to be reserved with a crashkernel kernel
	parameter. So, if a test crashdump fails it is a good strategy to verify if
	it works with crashkernel=256M@0M or even with crashkernel=768M@0M. If it
	still fails, do further debugging of the memory requirements using the
	debug_mem_level option in /etc/kdump.conf. See more details on this in the
	articles How should the crashkernel parameter be configured for using kdump
	on RHEL6? and kdump memory usage improvements included in Red Hat Enterprise
	Linux 6.2.
	
	Kdump is properly configured, it works the first time a crash happens but
	fails to work on subsequent crashes. Why Kdump Validation Fails When
	Invoking Crash Dump Using HP Integrated Lights-Out (iLO) ?
	
	If the Kdump still fails to generate a crashdump, a full console output from
	the moment crashdump was initiated is needed. Configure kernel log
	redirection to the COM port and save COM port data according to the article
	How to setup virtual serial console for a HP system with iLo?.

	#----------------------------------------------------------------------
	Issue

		Kdump Sending Dump File to Root Filesystem Even Though It Should Be Sent
		to Another Server via SSH
		Kdump fails to dump to NFS server 

	Environment

		Red Hat Enterprise Linux 5
		kexec-tools-1.102pre-154

	Resolution

	Update to kexec-tools-1.102pre-161.el5 (from RHBA-2013-0012) or later.
	This addresses an issue tracked through Red Hat private bugzilla #802928.
	Make sure the /etc/sysconfig/network-scripts/ifcfg-bond* files contain
	the line:

		BOOTPROTO=static

	Then force the system to build a new kdump initrd:
	
		# touch /etc/kdump.conf
		# service kdump restart

	Root Cause

		A change in the way kdump handles bonding network devices prevents
		network devices from being configured correctly if they have static IP
		addresses, but are not marked as static devices.

	#----------------------------------------------------------------------
	 Issue

	 	Item fence_kdump doesn't work when using a bonding device for hearbeat
	 	kdump can not save vmcore via interface bond1
	 	If system has more than one bonding devices, and if kdump target network
	 	file server is connected by bondX (which is not the first bond device
	 	bond0), then it may fail to save kdump since kdump kernel can not bring
	 	up the interface other than bond0

	 Environment

	 	Red Hat Enterprise Linux 6
	 	Red Hat Enterprise Linux 5
	 	bonding
	 	kdump

	 Resolution

	 Fixed in Errata RHBA-2013:0281-1 for from private bug 859824
	 Alternately, in /etc/kdump.conf file, add the line below as a workaround
	 if there are 2 bonding device in system, and the max_bonds parameter
	 always need match the bonding device number on system.
	 Raw
	 options bonding max_bonds=2
	 Root Cause

	 The max_bonds parameter specifies the number of bonding devices to
	 create for this instance of the bonding driver. E.g., if max_bonds is 3,
	 and the bonding driver is not already loaded, then bond0, bond1 and
	 bond2 will be created. The default value is 1.

	 Diagnostic Steps

	 Check to see whether your heartbeat network is on a bond device other
	 than bond0

	#----------------------------------------------------------------------
	 Issue

	 	When using kdump over NFS with a target specified as hostname, the
	 	resolving of the IP address does not work during startup.
	 	unable to mount NFS during a kdump

	 Environment

	 	Red Hat Enterprise Linux 5
	 	Red Hat Enterprise Linux 6
	 	kdump
	 	bnx2 driver

	 Resolution

	 The bnx2 driver takes some time to initialize. Adding a delay to the
	 kdump configuration will fix the issue:

	 Append link_delay 60 to /etc/kdump.conf

	 Rebuild the kdump initrd (service restart kdump)
	 Root Cause

	 The network card (bnx2) needs some time to initialize.

	 Diagnostic Steps

	 Ensure that all configuration is correct at /etc/kdump.conf
	 Touch /nfs/location to ensure that it's writable
	 View current kdump in process, if kdump is started and on the server you
	 don't see any files being transferred, the issue may be related to
	 network.

	#----------------------------------------------------------------------
	Issue

		On system with BCM5718 the kdump to remote host using ssh fails 3 
		out of 5 tries because no link is detected.  
		The following messages are displayed when it fails:

			mapping eth0 to eth0
			Saving to remote location root@192.168.110.23
			lost connection
			Attempting to enter user-space to capture vmcore

		Please note that:
		The normal kernel always works.
		The problem doesn't happen when testing NetXtream BCM5709 NIC, which
		unfortunately uses bnx2 driver, instead of tg3. 
		Using a small static compiled application to set autoneg on after 
		bring the interface UP to force a PHY reset didn't help.
		passing acpi=off didn't help.
		adding "link_delay 120" still fails 1 out of 5 tries.
     
	 		eth0 Link Up.  Waiting 120 Seconds
    		Continuing
    		Saving to remote location root@XXX.XXX.XXX.XXX
    		lost connection
    		...

    		Shutting down interface eth0:  tg3: eth0: Link is up at 1000 Mbps,
    		full duplex.
    		tg3: eth0: Flow control is on for TX and on for RX

		If the interface used by kdump is NOT brought UP in the normal kernel, 
		then it works 100% of the attempts.
		Looping doing ifup eth0; ifdown eth0; in the normal kernel always get 
		a link established.

	Environment

		Red Hat Enterprise Linux 5.6
		Red Hat Enterprise Linux 6.1

	Resolution

	In RHEL5, update to kernel-2.6.18-348.el5(from RHBA-2013-0006) or later.
	In RHEL6, update to kernel-2.6.32-279.el6(from RHSA-2012-0862) or later.

	Root Cause

	The kdump kernel maintains the configuration of MSI-X interrupts as created
	by the crashed kernel but enables only one CPU in the new environment.
	Previously, this caused the tg3 driver to abort MSI-X setup which caused
	interrupt delivery to fail. Consequently, the link became unavailable and 
	any attempt to dump a core file to a remote host to failed. With this 
	update, the tg3 driver has been modified to enforce single-vector MSI-X 
	interrupt mode by disabling the multi-vector interrupt mode for tg3 in the 
	kdump kernel. The NIC is now brought up as expected and kdump can 
	successfully dump a core file to the remote host in this scenario.

	Diagnostic Steps

	I confirmed this problem occurs with 2.6.18-238.5.1.el5 as latest RHEL5
	kernel.
	check if the interface works when it has been not initialized in the normal
	kernel
	check if the interface works when restarted few times in the normal kernel
	provided a small program to issue a phyreset (turn autoneg on) to see if the
	link is negotiated.  ( first test with 5 seconds seemed not enough,
	increasing to 20s..)

	#----------------------------------------------------------------------
	Issue

	 	Kdump fails on a KVM guest if balloon memory is involved
	 	no vmcore is dumped due to an OOM-killer storm within Kdump context

	Environment

		Red Hat Enterprise Linux 6 KVM guest
		kdump attempting to dump a vmcore

	Resolution

	Update kexec-tools package to the following errata: RHBA-2013-0281

	As a workaround, one can issue the following commands at a root shell
	session:

		i.  echo "blacklist virtio_balloon" >> /etc/kdump.conf
		ii. touch /etc/kdump.conf && /etc/init.d/kdump restart
	
	
#############################################################################o
#KDUMP config: READMEs
#############################################################################o

	#######################################
	#Time needed to get a dump
	#######################################

	  Dumping time depends on the options that are used for its configuration.
	  Below are the some of the factors which should be taken in consideration.
	
		Storage speed
		Memory speed
		Data Compression used
		Dump filter level
		Network if dump target is remote storage

	  Estimation can be made by dumping whole memory to disk and measuring the 
	  time needed for it can be considered as probable value to dump the vmcore.
	  However it would be generic value, no specific table will give time 
	  statistics.


	#######################################
	#Dump Level - Pages to filter
	#######################################

      #Uses a BIT MASK
 	  	#makedumpfile [-d DL]:

      #Specify the type of unnecessary page for analysis.
      #Pages of the specified type are not copied to DUMPFILE. The page type
      #marked in the following table is excluded. A user can specify multiple
      #page types by setting the sum of each page type for Dump_Level (DL).
      #The maximum of Dump_Level is 31.
      #Note that Dump_Level for Xen dump filtering is 0 or 1.

	  # DL of 0  - gets ALL pages
	  # DL of 1  - gets all pages EXCEPT zero pages
	  # DL of 17 - exclude Zero pages (1) and Free pages (16) = 17
	  # DL of 31 - gets ONLY active pages (excludes all other types)

      #       |         cache    cache
      # Dump  |  zero   without  with     user    free
      # Level |  page   private  private  data    page
      # ------+---------------------------------------
      #    0  |
      #    1  |   X
      #    2  |           X
      #    4  |           X        X
      #    8  |                            X
      #   16  |                                    X
      #   31  |   X       X        X       X       X

	  RH commonly show 31 (only active, nothing else)
	  Matt suggests    17,31 (no free and no zero :OR: get only active)
                             if "not enough room" then it goes to level 31 
	                         and retries

	#######################################
	#Size crashkernel value
	#######################################

	  Matt suggests:    crashkernel=auto

	  	Otherwise, if we ever grow/change memory on a system, we have to 
		remember to change this.
		Use of auto is recommended in RHEL7.
		Starting with RHEL6.2 kernels crashkernel=auto should be used.
			But there are caveats with memory size (see below notes).
			Before that, size values must be calculated.

		Use: 
			grubby --update-kernel=DEFAULT --args=crashkernel=$crashkernel_para

		###################
		#RHEL 5
		###################
		crashkernel=memory@offset

		+---------------------------------------+
		| RAM       | crashkernel | crashkernel |
		| size      | memory      | offset      |
		|-----------+-------------+-------------|
		|  0 - 2G   | 128M        | 16M         | 
		| 2G - 6G   | 256M        | 24M         | 
		| 6G - 8G   | 512M        | 16M         |
		| 8G - 24G  | 768M        | 32M         |
		+---------------------------------------+
		For RAM size greater than 24G:

		Try crashkernel memory 768M and RAM/crashkernel offset of 32, which
		looks like 768M@32M.
		If you get an Out-Of-Memory error message, then try with increasing
		the crashkernel parameter to 896M
		Depending on your system BIOS memory layout, you may need to alter
		the offset.

		A complete procedure to correctly determine the maximum possible size
		and precise offset is located at:

		How to properly calculate the crashkernel setting?
		Additional Notes

		Always test to ensure that the kdump service starts correctly and
		that the system is able to correctly dump by initiating a test.

		The offset for the kdump memory reservation (crashkernel=X@Y) must be
		specified in RHEL5. Not specifying offset (crashkernel=X) is not a
		valid configuration under RHEL5, although it is valid under RHEL6.

		kdump fails to initialise with crashkernel=1024M@16M on RHEL5 kernels
		earlier than 2.6.18-274.el5

		RHEL6's kdump is more memory-efficient than RHEL5's. It is likely
		more memory will need to be assigned on RHEL5 than on the same system
		running RHEL6.

		For settings of kdump on other version of Red Hat Enterprise Linux,
		please refer to: How should the crashkernel parameter be configured
		for using kdump on Red Hat Enterprise Linux?

		###################
		#RHEL 6
		###################
		Configuring crashkernel on RHEL6.0 and RHEL6.1 kernels

		The code for printing the warning:

		Raw
		Your running kernel is using more than 70% of the amount of space you
		reserved for 
		kdump, you should consider increasing your crashkernel reservation
		is part of the script /etc/init.d/kdump.

		The involved code

		First reads the Slab value from /proc/meminfo. Slab is the in kernel
		data structures cache, this value depends on the total amount of RAM
		present in the system as well as on other factors. The value is not
		consistent and can change during operation of the server.
		If the Slab value is bigger than 70% of the memory that was reserved
		with the crashkernel parameter then the warning is printed.Some
		mappings of ram and appropriate crashkernel values:
		ram size	crashkernel parameter	ram / crashkernel factor
		>0GB	          128MB	                  15
		>2GB	          256MB	                  23
		>6GB	          512MB	                  15
		>8GB	          768MB	                  31
		The last column contains a ram/crashkernel factor.

		The table is covered by the following crashkernel configuration:

		Raw
		crashkernel=0M-2G:128M,2G-6G:256M,6G-8G:512M,8G-:768M
		For servers with more RAM it is recommended to compute the
		crashkernel parameter using the factors that have been observed so
		far: 15 to stay on a safe side (maybe wasting memory), using a factor
		of 20 should also work. Please also note that the maximum size of RAM
		that should be reserved here is 896M, as outlined in (private)
		bz580843.

		Configuring crashkernel on RHEL6.2 (and later) kernels

		Starting with RHEL6.2 kernels crashkernel=auto should be used. The
		kernel will automatically reserve an appropriate amount of memory for
		the kdump kernel.

		Keep in mind that it is the best effort memory reservation and might
		not meet the needs of all systems (Especially for configurations with
		lots of IO cards and loaded drivers). So always make sure that memory
		reserved by crashkernel=auto is sufficient for the target machine by
		testing kdump. If it is not, reserve more memory by syntax
		crashkernel= XM (X is amount of memory to be reserved in mega bytes).

		Additionally some improvements have been made in the RHEL6.2 kernel
		which have reduced the overall memory requirements of kdump. For more
		details refer to article kdump memory usage improvements included in
		Red Hat Enterprise Linux 6.2.

		The amount of memory reserved for the kdump kernel can be estimated
		with the following scheme:

		Raw
		base memory to be reserved = 128MB  
		an additional 64MB added for each TB of physical RAM present in the
		system. So 
		for example if a system has 1TB of memory 192MB (128MB + 64MB) will
		be reserved.
		Note: It is recommended to verify that kdump is working on all
		systems after installation of all applications. The memory reserved
		by crashkernel=auto takes only typical RHEL configurations into
		account. If 3rd party modules are used more memory might have to be
		reserved. Thus, if a testdump fails it is a good strategy to verify
		if it works with crashkernel=768M@0M and if it does do further
		debugging of the memory requirements using the debug_mem_level option
		in /etc/kdump.conf. It is recommended that until a test dump works
		without failure that kdump not be considered configured properly.

		Note: Prior to the 6.3GA release, crashkernel=auto will only reserve
		memory on systems with 4GB or more physical memory. If the system has
		less than 4GB of memory the memory must be reserved by explicitly
		requesting the reservation size, for example: crashkernel=128M. Since
		the 6.3GA release (kernel-2.6.32-279.el6), this limit has been
		lowered to 2GB.

		Note: Some environments still require manual configuration of the
		crashkernel option, for example if dumps to very large local
		filesystems are performed. Please refer to kdump fails with large
		ext4 file system because fsck.ext4 gets OOM-killed for details.

		Further information

		If you are experiencing problems with your crashkernel setting see
		How to properly size and position the crashkernel?
		For settings of kdump on other version of Red Hat Enterprise Linux,
		please refer to:
		How should the crashkernel parameter be configured for using kdump on
		Red Hat Enterprise Linux?
		Root Cause

		A number of improvements related to crashkernel=auto and memory
		requirements of kdump have been made in the RHEL6.2 kernel.

		Diagnostic Steps

		The method used (pre-6.2) to calculate the approx amount of ram the
		normal kernel is using (from the /etc/init.d/kdump):
		Raw
		KMEMINUSE=`awk '/Slab:.*/ {print $2}' /proc/meminfo`
		Question: Is it possible to find out how much memory was reserved for
		the kdump kernel?
		Answer: This is available when executing cat /proc/cmdline. Even when
		the kernel was started with crashkernel=auto then /proc/cmdline will
		contain the computed value that got reserved. To verify that
		crashkernel=auto was really used the contents of /var/log/dmesg can
		be used.
		cat /proc/cmdline
		cat /sys/kernel/kexec_crash_size
		Question: I found out that 'sync; echo 3 > /proc/sys/vm/drop_caches'
		frees up Slab, can I use this regularly and then use a lower value
		for 'crashkernel'?
		Answer: This is not recommended. This command is dropping filesystem
		caches, when after execution data is requested by processes the data
		has to be read from disc/blockdevices, resulting in a degraded system
		performance.
		Question: On my system I did setup kdump. When triggering the kdump
		then kdump is not loaded completely.
		Answer: Are 3rd party drivers in use on the system, changing memory
		requirements? Does the system successfully kdump when
		crashkernel=768M@0M is used, or a different manual allocation that is
		bigger than the amount of memory that crashkernel=auto did reserve
		for the crash kernel? If this is the case then with the
		debug_mem_level option in /etc/kdump.conf the required amount of
		memory can be found out and the memory that has to be reserved for
		the crashkernel can be cut down.

		###################
		#RHEL 7
		###################
		Starting with RHEL7 kernels crashkernel=auto should be used. The
		kernel will automatically reserve an appropriate amount of memory for
		the kdump kernel.

		Keep in mind that it is the best effort memory reservation and might
		not meet the needs of all systems (Especially for configurations with
		lots of IO cards and loaded drivers). So always make sure that memory
		reserved by crashkernel=auto is sufficient for the target machine by
		testing kdump. If it is not, reserve more memory by syntax
		crashkernel= XM (X is amount of memory to be reserved in megabytes).

		The amount of memory reserved for the kdump kernel can be estimated
		with the following scheme:

		Raw
		base memory to be reserved = 160MB  
		an additional 2 bits added for every 4 KB of physical RAM present in
		the system. 
		So  for example if a system has 1TB of memory 224 MB is the minimum
		(160 + 64 MB). 
		Note: It is recommended to verify that kdump is working on all
		systems after installation of all applications. The memory reserved
		by crashkernel=auto takes only typical RHEL configurations into
		account. If 3rd party modules are used more memory might have to be
		reserved. Thus, if a testdump fails it is a good strategy to verify
		if it works with crashkernel=768M@0M and if it does do further
		debugging of the memory requirements using the debug_mem_level option
		in /etc/kdump.conf. It is recommended that until a test dump works
		without failure that kdump not be considered configured properly.

		Note: RHEL7 with crashkernel=auto will only reserve memory on systems
		with 2GB or more physical memory. If the system has less than 2GB of
		memory the memory must be reserved by explicitly requesting the
		reservation size, for example: crashkernel=128M.

		Note: Some environments still require manual configuration of the
		crashkernel option, for example if dumps to very large local
		filesystems are performed. Please refer to kdump fails with large
		ext4 file system because fsck.ext4 gets OOM-killed for details.

		further information

		RHEL7 product documentation Kernel Crash Dump Guide
		RHEL7 product documentation Kernel Crash Dump Guide: kdump memory
		requirements
		How should the crashkernel parameter be configured for using kdump on
		Red Hat Enterprise Linux?
		Root Cause

		A number of improvements related to crashkernel=auto and memory
		requirements of kdump have been made in RHEL7.


	#######################################################
	#What is the SysRq Facility and how do I use it?
	#######################################################

		#The SysRq facility is one of the best (and sometimes the only) way
		#to determine what a machine is really doing. When triggered, SysRq
		#will send a signal requesting some diagnostic information to the
		#operating system kernel. This is most useful when a system appears
		#to be "hung", and for diagnosing elusive, transient, kernel-related
		#problems.
	
		#What is the "Magic" SysRq key?
		#According to the Linux kernel documentation:
		#It is a 'magical' key combo you can hit to which the kernel will
		#respond regardless of whatever else it is doing, even if the
		#console is unresponsive.
	
		#How do I enable and disable the SysRq key?
		#For security reasons, Red Hat Enterprise Linux disables the SysRq
		#key by default. To enable it, run:
	
			echo 1 > /proc/sys/kernel/sysrq
	
		#And to disable it again:
	
			echo 0 > /proc/sys/kernel/sysrq
	
	
		#To enable it permanently, set the kernel.sysrq value in
		#/etc/sysctl.conf to 1. This will cause it to be enabled on reboot.
	
			 # grep sysrq /etc/sysctl.conf
			 kernel.sysrq = 1
	
		#Since enabling SysRq gives someone with physical console access
		extra abilities, it is recommended to disable it when not
		troubleshooting a problem or to ensure that physical console access
		is properly secured.
	
		#How do I trigger a SysRq event?
	
		#There are several ways to trigger a SysRq event. On most
		#architectures SysRq events can be triggered from the console with
		#the following key combination:
	
			  Alt+PrintScreen+[CommandKey]
	
		#For instance, to tell the kernel to dump memory info (command key
		#"m"), you would hold down the Alt and Print Screen keys, and then
		#hit the m key.
	
		#Note that this will not work from an X Window System screen. You
		#should first change to a text virtual terminal. Hit Ctrl+Alt+F1 to
		#switch to the first virtual console prior to hitting the SysRq key
		#combination.
	
		#On a serial console, you can achieve the same effect by sending a
		#Break signal to the console and then hitting the command key within
		#5 seconds. This also works for virtual serial console access
		#through an out-of-band service processor or remote console like HP
		#iLO, Sun ILOM and IBM RSA. Refer to service processor specific
		#documentation for details on how to send a Break signal; for
		#example, How to trigger SysRq over an HP iLo Virtual Serial Port (VSP).
	
		#If you have a root shell on the machine (and the system is
		#responding enough for you to do so), you can also write the command
		#key character to the /proc/sysrq-trigger file. This is useful for
		#triggering this info when you are not on the system console or for
		#triggering it from scripts.
	
			  echo 'm' > /proc/sysrq-trigger
	
		#This method has the additional benefit of working even when
		#kernel.sysrq is set to 0.
	
		#When I trigger a SysRq event that generates output, where does it go?
		#When a SysRq command is triggered, the kernel will print out the
		#information to the kernel ring buffer and to the system console.
		#This information is normally logged via syslog to /var/log/messages.
	
		#Unfortunately, when dealing with machines that are extremely
		#unresponsive, syslogd is often unable to log these events. In these
		#situations, provisioning a serial console is often recommended for
		#collecting the data.
	
		#What sort of SysRq events can be triggered?
		#There are several SysRq events that can be triggered once the SysRq
		#facility is enabled. These vary somewhat between kernel versions,
		#but there are a few that are commonly used:
	
			  -m - dump information about memory allocation
			  -t - dump thread state information
			  -p - dump current CPU registers and flags
			  -c - intentionally crash the system (useful for forcing a disk or
			  		netdump)
			  -s - immediately sync all mounted filesystems
			  -u - immediately remount all filesystems read-only
			  -b - immediately reboot the machine
			  -o - immediately power off the machine (if configured and
			  		supported)
			  -f - start the Out Of Memory Killer (OOM)
			  -w - dumps tasks that are in uninterruptable (blocked) state -
			  		[Introduced with kernel 2.6.32]
	
		#Before using the SysRq facility, please consult with your vendors
		#as third party applications may be impacted.
	
#############################################################################o
#KDUMP config: BEGIN
#############################################################################o
	
	#######################################
	# Some manpages
	#######################################

		# man makedumpfile
		# man makedumpfile.conf
		# man crash
	
	#######################################
	# Areas needing changes
	#######################################

		#need pkg: kexec-tools
		#grub.conf: add kernel option
		#kdump.conf: add/modify
		#sysctl.conf: add entries
		#turn on: chkconfig kdump on
		#reboot server to enable changes to system

	#######################################
	# 1 - Local (raw or fs) or NFS or SSH
	#######################################

		#######################################
		#PRE - all methods
		#######################################
		rpm -q kexec-tools || yum -y install kexec-tools

			#If ppc64|s390x -> yum -y install kernel-dump

		#######################################
		#Local
		#######################################
			#######################################
			#Local - Filesystem (preferred)
			#######################################
			#Disk(s) should be equal to memory, but, if you use the right options
			#it will only use somewhere from 3-10GB for the active pages

			#We will be using /crash_dir for our area (shared with mondo ISOs)
	
			# FS - should already be there
					MYDISK=/dev/mapper/mpathdX
					NEWSIZE=xG

					pvcreate $MYDISK [$MYDISK ...]
					vgcreate vgcrash $MYDISK [$MYDISK ...]
					lvcreate -n lvcrash -L $NEWSIZE vgcrash
					mkfs -t ext3 /dev/mapper/vgcrash/lvcrash

				#get UUID
				blkid /dev/mapper/vgcrash-lvcrash | tee /tmp/myuuid

			vi /etc/kdump.conf
				#path /var/crash
				#path /

				path /crash_dir

				ext3 UUID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
					:r /tmp/myuuid

				core_collector makedumpfile -c --message-level 1 -d 17,31

				default reboot

			#######################################
			#Local - raw
			#######################################
			#Disk should be equal to memory, but, if you use the right options
			#it will only use somewhere from 3-10GB for the active pages
	
			#This extra disk could be temporary.  Added for a duration that
			#the issue is happening, until it is solved, then removed.

			# raw use, no FS
				NEWSIZE=XXXG
				MYDISK=/dev/mapper/mpathdX
				pvcreate $MYDISK [$MYDISK ...]
				vgcreate vgcrash $MYDISK [$MYDISK ...]
				lvcreate -n lvrawcrash -L $NEWSIZE vgcrash

				blkid /dev/mapper/vgcrash-lvcrash
					#get UUID

			vi /etc/kdump.conf
				path /
				#ext3 UUID=294d60b2-96a1-4135-aba1-ebe1d8af65f1
				ext3 UUID=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
				core_collector makedumpfile -c --message-level 1 -d 17,31
				default reboot

		#######################################
		#NFS
		#######################################
			#setup normal export rules for the client's access, 
			# root write needed
			#
			# old: net
			# new: nfs, nfs4
			#

			vi /etc/kdump.conf
				nfs NFSSERVER:EXPORTED-MTPT
				#example:  nfs sever.com:/crashcores
					#In the NFSMOUNT directory, kdump will create a subdir of: 
					#       ./var/crash/%HOSTIP-%DATE
				core_collector makedumpfile -c --message-level 1 -d 17,31
				default reboot


				#
				# Example output from a kdump on NFS mounts
				#
				#sever.com:/unixteam/ISO  108G   97G  5.8G  95% /nfs/matt
				#ls -lRh /nfs/matt/var
				#/nfs/matt/var:
				#total 4.0K
				#drwxr-xr-x 4 nobody nobody 4.0K Aug 28 15:47 crash
				#
				#/nfs/matt/var/crash:
				# ...  10.131.164.40-2015-08-28-19:47:47
				# ...  10.131.248.220-2015-08-28-14:57:14
				#
				#/nfs/matt/var/crash/10.131.164.40-2015-08-28-19:47:47:
				#total 200M
				#-rw------- 1 nobody nobody 199M Aug 28 15:51 vmcore
				#-rw-r--r-- 1 nobody nobody  77K Aug 28 15:47 vmcore-dmesg.txt
				#
				#/nfs/matt/var/crash/10.131.248.220-2015-08-28-14:57:14:
				#total 1.1G
				#-rw------- 1 nobody nobody 1.1G Aug 28 15:17 vmcore
				#-rw-r--r-- 1 nobody nobody  64K Aug 28 14:57 vmcore-dmesg.txt

		#######################################
		#SSH
		#######################################
			#have enough room on other server
			#have account there
			#have ssh keys
			#
			# old: net
			# new: ssh
			#

			vi /etc/kdump.conf
				ssh USER@SERVER:/var/crash/%HOST-%DATE
				core_collector makedumpfile -c --message-level 1 -d 17,31
				default reboot

	#########################################
	# 2 - KDump-Helper ..or.. GUI ..or.. CLI
	#########################################

		#
		# the example here uses a RAW local disk
		# adjust those reference to net if you use a different device
		#

		#######################################
		#  KDump-Helper
		#######################################
		# Get from internal location 
		#  OR from access.redhat.com/labs/kdumphelper
		#     which asks you config questions and gives you an out of:
		#       a script to run to change things, 
		#       or discrete files to put in place
			kdumpconfig.sh

		#######################################
		#  CLI
		#######################################
		#Edit kdump.conf, enable/set the following
			#vi /etc/kdump.conf
			#should have been done in previous step 1
		
		#Update grub.conf
			#Make sure you have: "crashkernel=auto" the end of the default 
			#kernel line

			grep crashkernel /boot/grub/grub.conf

			#we are using auto, some older version or some situations will 
			#have issue with this, but works best for the most success and 
			#conssitency

			grubby --update-kernel=DEFAULT --args=crashkernel=auto

			grep crashkernel /boot/grub/grub.conf

		#Update /etc/sysctl.conf
			SYSCTL_CONF=/etc/sysctl.conf

			#comment out an existing line value
			sed -i 's/^kernel.sysrq/#kernel.sysrq/g' $SYSCTL_CONF

			echo 'kernel.sysrq=1' \
					>> $SYSCTL_CONF

			sed -i 's/^kernel.unknown_nmi_panic/#kernel.unknown_nmi_panic/g' \
						$SYSCTL_CONF

			echo 'kernel.unknown_nmi_panic=1' \
					>> $SYSCTL_CONF

			sysctl -p

		#######################################
		#  GUI
		#######################################
		#lots of pkgs - dependencies
			yum -y install system-config-kdump.noarch xterm
	
		#get X setup to display where you have a server
			export DISPLAY=X.X.X.X:0

		system-config-kdump

	#######################################
	#POST - all methods
	#######################################
	chkconfig boot.kump on
	service   boot.kdump start
	chkconfig kdump on
	service   kdump start     #will fail, but ignore, need to reboot

		#IGNORE Error
		#     Warning: There might not be enough space to save a vmcore.
		#     The size of UUID=xxxxxxx should be greater than xxxxxx kilo bytes.
		#UNLESS you need a FULL core of all pages, in which case your disk 
		#     needs to be at least equal to the size of memory.

	#NOTE: reboot system - IF the system hasn't been rebooted before 
	#      to learn about kdump config, otherwise the restart will rebuild
	#      the initrd file and a reboot is then unneeded

	service kdump status

#######################################
#KDUMP config: END
#######################################

################################################################################
#Managing a crash/hang
################################################################################

	###################
	#VM
	###################
	#You have an option for systems that hang, to use:
	#	/usr/bin/vmss2core
	#You will need to create a snapshot of they VM Guest
	#See access.redhat.com/solutions/411653

	###################
	#FORCING A CRASH
	###################
		
		#
		#if in a cluster, to insure a "quick crash"
		#
		# echo "exit 1" > /tmp/kdump_pre; chmod 755 /tmp/kdump_pre
		#
		# vi /etc/kdump.conf
		#    kdump_pre /tmp/kdump_pre
		#    default halt
		#
		# service kdump restart
		#
		
	#######	
	#######	 crash panic crash panic
	####### *** !!!  WARNNING - system will panic immediately !!!  *** #######
	#######	
	#######	
		#####  echo 'c' > /proc/sysrq-trigger
	#######	
	#######	
	####### *** !!!  WARNNING - system will panic immediately !!!  *** #######
	#######	 crash panic crash panic
	#######	

	###################
	#See if you have a crash file, needs to be non-zero
	###################
	ls -l /var/crash/*/vmcore

################################################################################
#Crash commands
################################################################################

	###################
	#Interactive 
	###################

	#Need file: /usr/lib/debug/lib/modules/<release><flavor>/vmlinux
	ls /usr/lib/debug/lib/modules/$(uname -r)/vmlinux
	    #ls /lib/modules/2.6.32-504.1.3.el6.x86_64.debug

		#If not, install from Redhat site for your particular kernel:
		#
		yum -y install \
			kernel-debug \
			kernel-debug-devel

	yum -y install crash crash-devel

	debuginfo-install kernel

	##############################
	#Interactive on LIVE kernel
	##############################
	crash

	##############################
	#Work on vmcore from a panic
	##############################

	#CRASHPATH=/var/crash
	CRASHPATH=/crash_dir/crash

	#
	#IF path is a local disk: 127.0.0.1-YYY-MM-DD-HH:MM:SS
	#IF nfs or ssh:           %HOSTIP-YYYY-MM-DD-HH:MM:SS
	#
	CRASHDIR=$CRASHPATH/SOME_DIR_FILENAME_STRING

		##############################
		#Interactive on a vmcore
		##############################
		crash -x \
				/usr/lib/debug/lib/modules/$(uname -r)/vmlinux \
					/var/crash/$CRASHDIR/vmcore
	
			set hex
			sys > crash_data.log
			bt -a >> crash_data.log
			mod >> crash_data.log
			ps >> crash_data.log
			foreach UN bt >> crash_data.log
			kmem -i >> crash_data.log
			net -a >> crash_data.log
			dev -d >> crash_data.log
			log >> crash_data.log
			quit
	
		##############################
		#Non-nteractive on a vmcore
		#technique used: shell here-to doc
		##############################
	
		#crash /usr/lib/debug/lib/modules/`uname -r`/vmlinux \
		crash -x \
				/var/crash/$CRASHDIR/vmcore << EOF >>
set hex
sys > crash_data.log
bt -a >> crash_data.log
mod >> crash_data.log
ps >> crash_data.log
foreach UN bt >> crash_data.log
kmem -i >> crash_data.log
net -a >> crash_data.log
dev -d >> crash_data.log
log >> crash_data.log
quit
EOF

################################################################################
#Miscellaneous
################################################################################

	#Filter out pages from an existing "more full" vmcore

		#get only the active pages from a larger/fuller vmcore
		# -c = compress
		# -d = filter out pages, see other area in this doc
		makedumpfile -c -d 31 <vmcore> <output file>


################################################################################
#Bibliography
################################################################################

-Tool:  https://access.redhat.com/labs/kerneloopsanalyzer/

-Main doc: https://access.redhat.com/solutions/6038

-Factors_that_can_affect_vmcore_generation_while_using_kdump._-_Red_Hat_Customer_Portal.pdf
-How_can_I_use_crash_to_send_Red_Hat_some_vmcore_pre-analysis_information_before_or_while_uploading_the_vmcore_image__-_Red_Hat_Customer_Portal.pdf
-How_should_the_crashkernel_parameter_be_configured_for_using_kdump_on_Red_Hat_Enterprise_Linux__-_Red_Hat_Customer_Portal.pdf
-How_should_the_crashkernel_parameter_be_configured_for_using_kdump_on_Red_Hat_Enterprise_Linux_5__-_Red_Hat_Customer_Portal.pdf
-How_should_the_crashkernel_parameter_be_configured_for_using_kdump_on_RHEL6__-_Red_Hat_Customer_Portal.pdf
-How_should_the_crashkernel_parameter_be_configured_for_using_kdump_on_RHEL7__-_Red_Hat_Customer_Portal.pdf
-How_to_capture_a_vmcore_of_hung_Red_Hat_Enterprise_Linux_VMware®_guest_system_using_VMware®__vmss2core__tool___-_Red_Hat_Customer_Portal.pdf
-How_to_troubleshoot_kernel_crashes_hangs_or_reboots_with_kdump_on_Red_Hat_Enterprise_Linux_-_Red_Hat_Customer_Portal.pdf
-What_is_the_SysRq_Facility_and_how_do_I_use_it__-_Red_Hat_Customer_Portal.pdf

################################################################################
#EOF
################################################################################