INFODOC ID: 13420 SYNOPSIS: SPARCstorage Array and Volume Manager Support Document DETAIL DESCRIPTION: ========================================================================== SPARC Storage Array and Veritas Software TABLE OF CONTENTS 1.0 About the SSA 1.1 Terms to be Aware of 2.0 Debugging Techniques 2.1 Diagnostic Switch on the Array Controller 2.2 POST Errors 2.2.1 OBP Info 2.3 Messages File Information 2.3.1 Boot Cycle and SSA Error Recovery 2.3.2 Reasons for OFFLINE/ONLINE Messages 2.3.2.1 Common Causes for these messages 2.4 COMMANDS to Use and What to Look For 2.4.1 SSAADM 2.4.2 PRTVTOC 2.4.3 Veritas GUI Views 2.4.4 VX Commands 2.4.5 VXDISK 2.4.6 VXPRINT 2.4.7 VXPRIVUTIL 2.4.8 VXSTAT 2.4.9 VXINFO 2.5 Understanding Plex States 2.6 Understanding Volume States 2.7 System Does Not See the SSA 2.8 System Cannot Run the vxconfigd, Volume Manager won't start 2.9 Miscellaneous Statistical Commands 3.0 Quick Fixes, How-To's and Theory of Operations 3.1 Bootable Disk Encapsulation and Mirroring - How-to 3.1a How to 'unroot' your Bootable Disk 3.2 Volume Manager is not seeing a disk device any longer 3.3 Software and Firmware Differences, What You Need to Know 3.4 Hot Spares How They Work 3.5 WWN (World Wide Number) and How to Change it 3.5a WWN How to Use the New One 3.6 Use of 'vxdiskadm' Utility 3.7 Remove/Replace a Disk 3.8 Boot Issues 3.9 SSA fast_write feature versus PrestoServe 3.9.1 Prestoserve with host-based RAID (SDS/VxVM) products 3.10 RAID-5 Information 3.11 ROOTDG Recovery 3.12 Loss of Disk Group Configuration Information; How to Save the Info 3.13 IOSTAT Output; How to find the listed "ssd" device it reports 3.14 Dual Hosted SSAs and Simulation of a "failover" on dual-host 3.15 Moving an SSA to another system. 3.16 Fail-Over Simulation for testing hot spares with Mirrors and Raid-5 4.0 FAQ's 5.0 Patch information 5.1 SPARCstorage Array, Veritas Volume Manager PATCHLIST 6.0 Bugs and RFE's - Known Problems 6.1 Downloading SSA FW and Resetting SSA has created some malfunctioning controllers 6.2 Prestoserve Causes Corruption with SSA, Volume Manager and RAID5 6.3 VXVA Core Dumps when Reconnecting a Disk 6.4 Documentation Error with Prestoserve and Volume Manager Booting Single-User 7.0 All other References Available 7.1 FRU part numbers 7.2 Documentation Part Numbers 8.0 Supportability from NASC 9.0 Additional Support Information 9.1 Veritas tech Bulletin: Prestoserve and Veritas Volume configuration 10.0 WARRANTY Information 11.0 SSA Support Matrix (Jan. 1997) 11.1 SPARCstorage Array Software Configuration Guide (Version 4.1 1996) =============================================================================== SPARC Storage Array and Veritas Software The focus of this document is information related to the SPARCSTorage Array (SSA) and the use of the Veritas Volume Manager software. DiskSuite users may find the SSA information very beneficial, but should ignore all references to the Volume Manager (VM) and instead go to their DiskSuite manuals for information. 1.0 About the SPARC Storage Array The SSA (SparcStorage Array) is a 'disk farm'. It can consist of up to 36 disk devices, depending on the model; the "100" series can contain up to 30 disk devices, and the "200" series can contain up to 36 disks devices. There is a controller interface that communicates to the host system via a fiber optic cable. Taking advantage of this wide-bandwidth for communication between the SSA and the host, we have placed 6 SCSI controllers into the array control board. This device gives you a huge storage capacity in a small space. 1.1 Some TERMS to be aware of: disk Physical or logical (virtual) disk device. DiskSuite/ODS/SDS Sun GUI based utility to allow "virtual" devices and ease of administering many devices. encapsulate Place (device) under control of the Veritas Volume Manager software application. plex An ordered collection of subdisks which are used to build virtual devices. pln SparcStorage Array controller board. RAID Redundant array of independent disks. soc Host adapter for the fiber channel cable to an SSA. SSA Physical device that houses multiple disk devices. SSA drivers Software drivers for the SSA box and disks. subdisk A part of a disk. In Veritas Volume Manager this is a logical or virtual sub-section of a disk. volume A virtual device that can be accessed by a host system. Volume Manager, vx, vxvm, vxva Veritas Volume Manager software - a GUI based utility used for ease of administering many disk devices. It allows the use of "virtual" devices comprised of multiple or single physical disk devices. ------------------------------------------------------------------------------ 2.0 Debugging Techniques 2.1. Diag button on the array controller card. This will give you some hardware diagnostics on the array controller. 2.2. POST (Power On Self Test) error codes: CODE: MEANING: ACTION: 01 LCD Failure Replace fan tray 08 Fan failure Replace fan tray 09 Power supply failure Replace power supply 30 Battery failure Replace battery module All others Controller failure Replace controller PMF Firmware failure Replace controller (3.x firmware, when disks are all spun up, what shows up on the front panel is all you get.) 2.2.1 OBP Info (Open Boot Prom) There is not much you can tell from the Prom prompt where an SSA is concerned. The 'show-devs' command displays the soc and pln. The display back depends on what version of FCode is running on your host adapter board: 1.18 the OBP uses "dummy" WWN addresses. 1.33 only shows the 'soc@n,n' - this is desirable for booting! To determine which FCode version(s) you have, use the following procedure: ok setenv fcode-debug? true ok reset ok show-devs (show-devs output will look something like) . . /iommu@0,10000000/sbus@0,10001000/le@1,c00000 /iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0 /iommu@0,10000000/sbus@0,10001000/ledma@4,8400010 /iommu@0,10000000/sbus@0,10001000/SUNW,bpp@4,c800000 /iommu@0,10000000/sbus@0,10001000/espdma@4,8400000 /iommu@0,10000000/sbus@0,10001000/SUNW,DBRIe@2,10000/mmcodec /iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0/SUNW,pln@a0000800,201cac11 /iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0/SUNW,pln@a0000800,201cac11/SUNW,ssd Find the lines for the 'soc' board. To determine the FCode version for any 'soc' card, go to that card and issue the correct sccsid command: ok cd /iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0 ok sccsid type returns: 1.33 95/04/19 ok device-end You can use this to look at all the 'soc' boards you have. When you are done, set the variable back: ok setenv fcode-debug? false ok reset ok 2.3. MESSAGES FILE: The device name is actually a pseudo name used by the kernel. We reference a disk slice to mount as a filesystem by saying: mount [options] /dev/dsk/c0t3d0s0 If you do a long listing of '/dev/dsk/c0...' you will notice that they are all links to a much longer directory structure. This actually points to the physical device (eventually). To be able to understand the messages information, you must first understand the structure of the addressing of the device(s). (Your system's directory structure may be different, this is only an example. They will all begin in the '/devices' directory, however.) Here is the first disk in 'our' SSA: c1t0d0s0 -> /devices/iommu@f,.../sbus@f,.../SUNW,soc@1,0/SUNW,pln@a0000000,7537b7/ssd@0,0:a ^^^^^^^^^^^ *** & |||| |||||| +,+ ^^^^^^ = CPU *** = SSA sbus host adapter card (i.e.: first is c1...) & = sbus slot plugged into |||||| = 12 digit WWN of the SSA controller board (8 digits to right of comma) +,+ = target scsi, disk address (i.e.:...t0d0) (On a multi-processor system, the 'iommu' will be 'io-unit' and this will indicate which CPU in the system.) The information we would be interested in from this would be which CPU, which sbus host-adapter, which SSA; then maybe which disk. Each listing in the messages file contains the above information. When the system boots up, there are many lines indicating whether or not the SSA has come up. A normal boot cycle reflects an online message and a login message for the fiber channel. If either one of these is missing, the SSA probably is not online. Example: (NOTE this has been formatted to fit 80 column output) Jan 8 16:35:19 unix unix: SUNW,soc1 at sbus0: SBus slot 1 0x0 and SBus slot 1 0x10000 and SBus slot 1 0x20000 SBus level 3 sparc ipl 5 Jan 8 16:35:21 unix unix: ID[SUNWssa.soc.link.6010] soc1: port 0: Fibre Channel is ONLINE Jan 8 16:35:21 unix unix: ID[SUNWssa.soc.login.6010] soc1: Fibre Channel login succeeded Jan 8 16:35:21 unix unix: ID[SUNWssa.soc.link.1010] soc1: message: SSA100 V2.4 (092995) Fri Sep 29 16:20:32 1995 Jan 8 16:35:21 unix unix: SUNW,pln4 at SUNW,soc1: soc_port 0 Jan 8 16:35:21 unix unix: SUNW,pln4 is /iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@1,0/SUNW,pln@a0000000,7537b7 Jan 8 16:35:21 unix unix: ssd90 at SUNW,pln4: target 0 lun 0 Jan 8 16:35:21 unix unix: ssd90 is /iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@1,0/SUNW,pln@a0000000,7537b7/ssd@0,0 [... messages about all the disks follows...] Notice the first line in this group. It contains information about the 'soc1' board; this one is connected in sbus slot 1. Next there is the appropriate ONLINE message followed by a successful login message. Next there is a line with the model and firmware version of this controller. This one happens to be running firmware version 2.4 The last lines are the array controller reporting that it is connected to port 0 on the sbus board and is communicating with the system. [**NOTE:** The sbus is a dual-ported board, but we do not recommend use of the second port at this time. If there are two FC/OM (fiber module) on one sbus card, the system usually will have IO contention problems on that card. Our system "bus" does not have enough 'power' to drive two fiber channels on one card. If configured this way, you will see many OFFLINE/ONLINE messages. Future architectures may not have this problem, but the SPARC processor architecture does not have enough power.] Look for all of the above. If there is no 'fibre channel ONLINE' message, then the SSA has a serious communication problem with the system. This is usually a hardware problem with the fiber channel. 2.3.1 BOOT CYCLE and SSA ERROR Recovery. The SSA always goes through these steps in the following order each time the soc gets a reset: 1. Wait for ONLINE. 2. Issue the 'login'. 3. Wait for 'login' response. TEST for response - timeout? if YES force OFFLINE. - other error? if YES go to 2. 4. Allow the pln to send commands. TEST - command not completed? if YES, do 'timeout recovery' and force OFFLINE. - other errors? if YES, go to 4. (These get printed in the messages file) (While in OFFLINE state, wait for ONLINE.....step 1.) Notice that the pln prints messages for the errors it gets, however, no retryable messages are printed, so you see only real errors in the messages file. 2.3.2 Reasons for the OFFLINE/ONLINE Messages. There are many reasons for these messages to appear on your console, and in your messages file. The most common occurrence is due to peak system loads. When they system is so busy, that it doesn't wait for the SSA to respond, or it can't service the SSA fast enough. These will show up once in a while, so you may some (a few) entries, maybe even a few times per day. It is the frequency and duration of these that may indicate a problem. Otherwise this is normal. You know there is a problem when these messages go on and on for a couple minutes or longer, all throughout the day. If you look at the 'error recovery' cycle steps listed above, you can see where these messages come from, it is OFFLINE, so it immediately goes into waiting for an ONLINE; then if it goes ONLINE, and has a problem, it goes back to a forced OFFLINE, and waits again. Why does this happen? The system issues a command to the array, and expects it to respond within a specific time. If the system does not get a response within the timeframe (about 60 to 70 nanoseconds), it forces the controller OFFLINE due to a timeout on response to the command issued. While a command is timing out, all other commands are still being processed! There is one exception to this, a database can also timeout, (like Oracle can 'crash' if its command is timed out) and its process may die. 2.3.2.1 Common CAUSES of the OFFLINE problem: In order of the most likely cause of the majority of Offline/Online messages: Fiber channel hardware. -The FC/OM (optical module) needs to be a '-03'(or higher) at the end of the part number. The lower revisions had problems. Check both ends of the cable. -Fiber cable could be dirty, or have a loose connection. (Be sure to always use the end caps to protect cable if it is being moved. -Fiber cable being bent beyond what it should be, or being broken by someone standing or stepping on it. The Array Controller card. -Firmware version is a big issue here. If running a current version of firmware then there is a possibility of a faulty board. Software drivers and/or firmware System IO load balance. -We have had a couple instances where the system was running too many memory intensive applications, causing the SSA to have to 'wait' for CPU time. This can be 'fine-tuned' by distributing the IO loading in larger systems, or maybe even adding enough memory for all the applications to run (with out stepping on each other). [For assistance in this area, please contact your local sales representative.] How does one know which of the above is the source of the problem? How can you tell whether these messages are coming from software, firmware, or the Fiber channel hardware? Check the system's Messages. In order to be able to troubleshoot the problem further, you must look at the messages. What is preceding the first OFFLINE, or subsequent OFFLINE messages? This will tell you where the most likely source of the problem really is. First, what device is reporting the error? Is it the soc, or the pln? This will begin to point to the source of the problem. The soc is the Fiber channel/handler, and the pln is the SSA driver. If you see messages relating to 'soc' there is good chance the problem is either in the fiber channel hardware. If there are 'pln' messages, then it's more likely not the fiber channel, but elsewhere. The pln is the driver software that talks to the SSA, so based on the actual messages, you should be able to find the source of the problem. Below is a short 'table' of the most likely sources of 'offline/online' messages. (They are ordered top (most likely) to bottom (least likely) per category. [NOTE: This is a guide based on what we know at this time; this should not imply these are the only possibilities.] -------------------------------------------------------------------------- -------------------------------------------------------------------------- Messages information: Suspect source of problem: (ordered most likely to least likely) ------------------------ --------------------------- Offline/Online - "plain" Hardware; Fiber cable (without any other associated soc** message indications) SSA controller Timeout Recovery - Timeout recovery being invoked Usually software (75% it is); SSA firmware SSA driver SSA hardware Transport Error - For all of these: Transport error: incomplete Bad disk drives reset Software; timeout ssd data_ovr ssa/pln driver SSA firmware kernel Transport Rejected - Hardware; Fiber cable or soc** Media errors - Disk drives If any messages with SYS_NOTICE - SSA firmware [ NOTE ** denotes FC/OMs also as a possibility ] --------------------------------------------------------------------------- --------------------------------------------------------------------------- One of the most frustrating messages is the 'Timeout Recovery' message. This one in particular needs to be examined a bit more closely. Check for any other messages, like any disk related errors, etc. If so, then use those to determine the most likely cause of the problem. Here is one example of what these may look like: 16:17:21 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@0,0/ SUNW,pln@a0000000,78ad31 (SUNW,pln0): 16:17:21 unix: Timeout recovery being invoked... 16:17:21 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 16:17:22 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 16:17:22 unix: ID[SUNWssa.soc.login.6010] soc0: Fibre Channel login succeeded 16:17:22 unix: ID[SUNWssa.soc.link.1010] soc0: message: SSA110 V3.9 When we get Timeout Recovery messages, it means that there was a timeout on a command sent to the SSA. In the example above, this is the only message; there are no other messages from the SSA or disks, only this 'Timeout Recovery' message followed by the recovery procedure messages (the subsequent offline and online/login messages). The software will attempt a recovery by flushing out the transport and reconnecting. This is what causes the following offline/online sequence. It is trying to do the operation again. A hardware problem on the controller that allows the link to become established (online and login succeeds) but the commands are all timing out will just continually 'timeout' and go through the recovery offline/online over and over again. If this is all there is in the messages file and all from one SSA, then the most likely suspect would be the controller board itself. You might also see this on one of the isp (scsi controller chips) on the board, but you will also have other messages relating to those addresses. Of course, a combination of software and hardware may still be the cause of problems. The best you can do is to get the software at the most current levels (including disk firmware levels), and from there most problems may be hardware related. Basically, try to rule out one or the other based on versions and messages. EXAMPLES: Here are some examples of what you might see in a messages file relating to offline/online sequences. See if you can figure out the source of the problems. [ I have stripped out the date and system name for space savings. ] ---------------------------------------------------- #1) 07:25:08 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05 (SUNW,pln2): 07:25:08 unix: Timeout recovery being invoked... 07:25:08 unix: ID[SUNWssa.soc.link.5010] soc1: port 0: Fibre Cha nnel is OFFLINE 07:25:09 unix: ID[SUNWssa.soc.link.6010] soc1: port 0: Fibre Cha nnel is ONLINE 07:25:09 unix: ID[SUNWssa.soc.login.6010] soc1: Fibre Channel lo gin succeeded 07:25:09 unix: ID[SUNWssa.soc.link.1010] soc1: message: SSA100 V3.6 (031896) Mon Mar 18 19:57:51 1996 07:29:28 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05 (SUNW,pln2): 07:29:28 unix: Timeout recovery being invoked... 07:29:28 unix: ID[SUNWssa.soc.link.5010] soc1: port 0: Fibre Cha nnel is OFFLINE 07:29:29 unix: ID[SUNWssa.soc.link.6010] soc1: port 0: Fibre Cha nnel is ONLINE 07:29:29 unix: ID[SUNWssa.soc.login.6010] soc1: Fibre Channel lo gin succeeded 07:29:29 unix: ID[SUNWssa.soc.link.1010] soc1: message: SSA100 V3.6 (031896) Mon Mar 18 19:57:51 1996 07:30:08 unix: ID[SUNWssa.soc.link.5010] soc1: port 0: Fibre Cha nnel is OFFLINE 07:31:08 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145): 07:31:08 unix: Transport error: Fibre Channel 07:31:08 unix: Offline 07:31:08 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145): 07:31:08 unix: requeue of command fails (ffffff 07:31:08 unix: fe) 07:31:09 unix: NOTICE: vxvm:vxio: Disk c3t3d1s2: Unexpected stat us on close: 0 07:31:09 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145): 07:31:09 unix: transport rejected (-2) 07:31:09 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145): 07:31:09 unix: transport rejected (-2) 07:31:09 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145): 07:31:09 unix: transport rejected (-2) 07:31:09 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05/ssd@3,2 (ssd146): 07:31:09 unix: transport rejected (-2) 07:31:09 unix: NOTICE: vxvm:vxio: Disk c3t3d2s2: Unexpected stat us on close: 0 07:31:09 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0 /SUNW,pln@a0000000,740f05/ssd@3,3 (ssd147): 07:31:09 unix: transport rejected (-2) ---------------------------------------------------- For #1, if you decided that there is good chance the fiber cable is the most likely sus[ect, you win the prize! ---------------------------------------------------- #2) 02:07:29 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e (SUNW,pln1): 02:07:29 unix: Timeout recovery being invoked... 02:07:37 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e (SUNW,pln1): 02:07:37 unix: Timeout recovery failed, resetting 02:07:37 unix: ID[SUNWssa.soc.driver.1010] soc0: host adapter fw date code: Wed Jan 17 20:34:59 1996 02:07:37 unix: 02:07:37 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:37 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:37 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:37 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:38 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:38 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:38 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:38 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:39 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:39 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:39 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:39 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:40 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:40 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:40 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:40 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:41 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:41 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:07:41 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:07:42 unix: ID[SUNWssa.soc.login.6010] soc0: Fibre Channel login succeeded 02:07:42 unix: ID[SUNWssa.soc.link.1010] soc0: message: SSA110 V3.6 (031896) Mon Mar 18 19:57:51 1996 02:09:28 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e (SUNW,pln1): 02:09:28 unix: Timeout recovery being invoked... 02:09:28 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:09:29 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:09:29 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:09:29 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:09:29 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:09:30 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:09:30 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:09:30 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:09:30 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 02:09:31 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 02:10:32 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@5,1 (ssd41): 02:10:32 unix: Error for command 'write(10)' Err 02:10:32 unix: or Level: Retryable 02:10:32 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5): 02:10:32 unix: Transport error: Fibre Channel Of 02:10:32 unix: Requested Block 1705952, Error Block: 1705952 02:10:32 unix: fline 02:10:33 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5): 02:10:33 unix: Transport error: Fibre Channel Of 02:10:33 unix: Sense Key: Hardware Error 02:10:33 unix: fline 02:10:33 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5): 02:10:33 unix: Transport error: Fibre Channel Of 02:10:33 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@5,1 (ssd41): 02:10:33 unix: requeue of command fails (fffffff 02:10:33 unix: fline 02:10:33 unix: e) 02:10:33 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5): 02:10:33 unix: Transport error: Fibre Channel Of 02:10:33 unix: fline 02:10:33 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5): 02:10:33 unix: Transport error: Fibre Channel Of 02:10:33 unix: fline 02:10:33 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,1 (ssd6): 02:10:33 unix: Error for command 'write(10)' Erro 02:10:33 unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5): 02:10:33 unix: Transport error: Fibre Channel Of 02:10:33 unix: r Level: Retryable 02:10:33 unix: fline 02:10:33 unix: Requested Block 570688, Error Block: 570688 ---------------------------------------------------- Now, #2 is a bit more difficult, and not unlike what might be seen on a system. If we analyze this, based on the information given about offline/online messages, we come up with more than one possible source of the problem. Let's take a look together. We begin with 'timeout recovery being invoked' which is usually caused by a software problem; either firmware or the ssa/pln driver. Next, we see a whole bunch of "plain offline/online messages which is normally a fiber cable problem; it might not be seated well. Finally we end up with disk errors, one is a retryable write; the rest are just reproting 'transport error' fiber offline with hardware sense error. So, in this one example we now have three possibilities: bad or loose fiber cable to the ssa; bad firmware running in the ssa; bad disk(s). [this messages file continued the same patterns of message outputs over and over, listing almost each disk device in the array.] Based on the evidence here, I would first try the fiber cable, because it is the easiest, and because if the cable connection is not good, the communications will not be correct. If this did not clean up the problem, I would then try a firmware download. Normally doing these two to a system like this one should eliminate most of the extraneous messages. Then all that would be left, most likely, might be one or two bad disk devices. If after changing out the fiber cable and the firmware and maybe a disk or two, if the messages still persist, we still have the balance of the hardware for the fiber channel connection, the SSA driver software package and the kernel. I wanted to use this example to show that it can be a combination of things, but normally they are inter-related. ---------------------------------------------------- ---------------------------------------------------- 2.4 COMMANDS to use, and what to look for. There are many commands available to assist you in debugging a problem in your SSA. 2.4.1 SSAADM: ssaadm (or ssacli) This command can give you information about the SSA. Example of displaying controller information: # ssaadm display c1 SPARCstorage Array Configuration (ssaadm version: 1.10 95/11/27) Controller path:/devices/iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@1, 0/SUNW,pln@a0000000,7537b7:ctlr DEVICE STATUS TRAY 1 TRAY 2 TRAY 3 slot 1 Drive: 0,0 Drive: 2,0 Drive: 4,0 2 NO SELECT NO SELECT NO SELECT 3 NO SELECT NO SELECT NO SELECT 4 NO SELECT NO SELECT NO SELECT 5 NO SELECT NO SELECT NO SELECT 6 Drive: 1,0 Drive: 3,0 Drive: 5,0 7 Drive: 1,1 Drive: 3,1 Drive: 5,1 8 Drive: 1,2 Drive: 3,2 Drive: 5,2 9 Drive: 1,3 Drive: 3,3 Drive: 5,3 10 Drive: 1,4 Drive: 3,4 Drive: 5,4 CONTROLLER STATUS Vendor: SUN Product ID: SSA100 Product Rev: 1.0 Firmware Rev: 2.4 Serial Num: 0000007537B7 Accumulate Performance Statistics: Enabled You can also use this command to give you some performance information. Just use a -p option after the display: # ssaadm display -p cN Other things available here are enabling or disabling the 'fast_write' NVRAM caching. Starting and stopping or reserving disks, trays, etc. (Use extreme caution when using these types of options!) Refer to section 3.3 for more information on enable/disable of fast_writes. 2.4.2 PRTVTOC: This command will give you information about a disk. This can be very useful, especially when running the Veritas Volume Manager software and having a root disk encapsulated for use with this software. prtvtoc /dev/rdsk/s2 An example of a normal Solaris bootable (root) disk: # prtvtoc /dev/rdsk/c0t3d0s2 * /dev/rdsk/c0t3d0s2 partition map * * Dimensions: * 512 bytes/sector * 72 sectors/track * 14 tracks/cylinder * 1008 sectors/cylinder * 2038 cylinders * 2036 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 2 00 0 51408 51407 / 1 3 01 51408 132048 183455 2 5 00 0 2052288 2052287 4 7 00 183456 226800 410255 /var 5 6 00 410256 719712 1129967 /opt 6 4 00 1129968 819504 1949471 /usr 7 8 00 1949472 102816 2052287 /export An example of a bootable (root) encapsulated disk: # prtvtoc /dev/rdsk/c0t3d0s2 * /dev/rdsk/c0t3d0s2 partition map * * Dimensions: * 512 bytes/sector * 72 sectors/track * 14 tracks/cylinder * 1008 sectors/cylinder * 2038 cylinders * 2036 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 2 00 0 51408 51407 / 1 3 01 51408 132048 183455 2 5 00 0 2052288 2052287 3 14 01 2016 2050272 2052287 4 7 00 183456 226800 410255 /var 5 6 00 410256 719712 1129967 /opt 6 4 00 1129968 819504 1949471 /usr 7 15 01 0 2016 2015 An example of an encapsulated root disk MIRROR: # prtvtoc /dev/rdsk/c0t1d0s2 * /dev/rdsk/c0t1d0s2 partition map * * Dimensions: * 512 bytes/sector * 72 sectors/track * 14 tracks/cylinder * 1008 sectors/cylinder * 2038 cylinders * 2036 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First Sector Last * Sector Count Sector * 2052288 4293328288 413279 * * First Sector Last * Partition Tag Flags Sector Count Sector Mount Directory 0 2 00 413280 1639008 2052287 / 2 5 00 0 2052288 2052287 3 15 01 0 2016 2015 4 14 01 2016 2050272 2052287 Note that some slices have unusual "tags" of 14 and 15. This is the Veritas software. The tag of 15 is the Private region that the Volume Manager uses, and the tag of 14 is the Public region that it uses (which is the rest of the disk). Only disks under control of the Volume Manager software will have these tags. The interesting fact about the mirror disk is that there is no underlying 'unix' slicing information. This disk must have the Veritas software running. Yes, it is a bootable device, but the Veritas software MUST start up, or the system will not boot up all the way to multi-user mode. 2.4.3 VERITAS GUI VIEWS: The GUI is used for summary status information. You can tell at a glance whether or not a raid-5 or mirrored volume is completely 'up' or not. You can tell what volumes are running or not running. It shows you what physical disks have what 'logical' disk names. There are various Views available to you. One shows everything; one shows disks that are not under control of this software; one shows a graphic representation of the SSA device; the rest are the disk groups that you have set up for use. 2.4.4 VX COMMANDS: The Veritas Volume Manager software has quite a few handy commands to get information about what is happening on the disks. This is nice, because we cannot always use the GUI. vxdisk; vxprint; vxprivutil 2.4.5 The vxdisk command can show you what device belongs to what disk group and its current status. # vxdisk list DEVICE TYPE DISK GROUP STATUS c0t1d0s2 sliced - - online c0t2d0s2 sliced - - error c0t3d0s2 sliced - - error c1t0d0s2 sliced disk12 rootdg online c1t1d0s2 sliced disk04 rootdg online c1t1d1s2 sliced disk01 rootdg online c1t1d2s2 sliced - - online c1t1d3s2 sliced - - online c1t1d4s2 sliced - - online c1t2d0s2 sliced disk08 rootdg online c1t3d0s2 sliced disk09 rootdg online c1t3d1s2 sliced - - online c1t3d3s2 sliced - - offline c1t3d4s2 sliced - - online c1t4d0s2 sliced - - error c1t5d0s2 sliced - - error c1t5d1s2 sliced - - error c1t5d2s2 sliced - - error c1t5d3s2 sliced - - error c1t5d4s2 sliced - - error - - disk03 rootdg was:c1t3d3s2 ^^^^^^ ^^^^^^ ^^^^^^^^^^^^ NOTICE that this tells you disk03 was device c1t3d3(s2) and was in the rootdg disk group. The status information about the SSA devices through Volume Manger can tell you what is happening very quickly. You might use this information to bring this device back under Volume Manager control. Here is another use of this list command. This lists more details about each disk: # vxdisk -s list Disk: c0t3d0s2 type: sliced flags: online error private autoconfig error: Disk is not usable Disk: c1t0d0s2 type: sliced flags: online ready private autoconfig autoimport imported diskid: 821159273.1577.unix dgname: rootdg dgid: 820868326.1025.unix hostid: unix Disk: c1t1d0s2 type: sliced flags: online ready private autoconfig autoimport imported diskid: 820868338.1091.unix dgname: rootdg dgid: 820868326.1025.unix hostid: unix The interesting pieces of information from this command are the "dgid" and "hostid". This can be used to recreate missing disk groups, or add existing rootdg disks to a newly created rootdg disk group! 2.4.6 The vxprint command gives you status information, but shows information about volumes and how they are configured. # vxprint -ht This is the most common form of this command that we use. It gives all the basic information about the current configuration and states of plexes. Disk group: rootdg DG NAME NCONFIG NLOG MINORS GROUP-ID DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE dg rootdg default default 0 820868326.1025.unix dm disk01 c1t1d1s2 sliced 2015 2050272 - dm disk02 c1t1d2s2 sliced 2015 2050272 - dm disk03 c1t3d3s2 sliced 2015 2050272 - dm disk04 c1t1d0s2 sliced 2015 2050272 - dm disk05 c1t3d0s2 sliced 2015 2050272 - dm disk06 c1t3d1s2 sliced 2015 2050272 - dm disk07 c1t3d2s2 sliced 2015 2050272 - dm disk08 c1t2d0s2 sliced 2015 2050272 - dm disk12 c1t0d0s2 sliced 2015 2050272 - pl pl-01 - DISABLED - 0 STRIPE 4/409600 RW v vol01 fsgen ENABLED ACTIVE 6144000 SELECT vol01-01 pl vol01-01 vol01 ENABLED ACTIVE 6144768 STRIPE 2/128 RW sd disk12-01 vol01-01 disk12 0 2050272 0/0 c1t0d0 ENA sd disk01-01 vol01-01 disk01 0 1022112 0/2050272 c1t1d1 ENA sd disk04-01 vol01-01 disk04 0 2050272 1/0 c1t1d0 ENA sd disk08-01 vol01-01 disk08 0 1022112 1/2050272 c1t2d0 ENA pl vol01-02 vol01 ENABLED TEMPRMSD 6144768 STRIPE 2/128 WO sd disk02-01 vol01-02 disk02 0 1022112 0/0 c1t1d2 ENA sd disk03-01 vol01-02 disk03 0 2050272 0/1022112 c1t3d3s2 ENA sd disk05-01 vol01-02 disk05 0 1022112 1/0 c1t3d0 ENA sd disk07-01 vol01-02 disk07 0 2050272 1/1022112 c1t3d2 ENA v vol02 fsgen ENABLED ACTIVE 409600 SELECT - pl vol02-01 vol02 ENABLED ACTIVE 410256 CONCAT - RW sd disk06-01 vol02-01 disk06 0 410256 0 c1t3d1 ENA Notice plex 'vol01-02' has a state of: TEMPRMSD. This is the mirror resync in progress. # vxprint -l This command shows you in order all the disk groups, disks, subdisks, plexes and volumes. Group: rootdg info: dgid=820868326.1025.unix copies: nconfig=default nlog=default minors: >= 0 Disk: disk01 info: diskid=821319546.2125.unix assoc: device=c1t1d1s2 type=sliced flags: autoconfig device: pubpath=/dev/dsk/c1t1d1s4 privpath=/dev/dsk/c1t1d1s3 devinfo: publen=2050272 privlen=2015 Disk: disk02 info: diskid=821319456.2121.unix assoc: device=c1t1d2s2 type=sliced flags: autoconfig device: pubpath=/dev/dsk/c1t1d2s4 privpath=/dev/dsk/c1t1d2s3 devinfo: publen=2050272 privlen=2015 ... ... Subdisk: disk01-01 info: disk=disk01 offset=0 len=1022112 assoc: vol=vol01 plex=vol01-01 (column=0 offset=2050272) flags: enabled busy device: device=c1t1d1s2 path=/dev/dsk/c1t1d1s4 diskdev=192/812 Subdisk: disk02-01 info: disk=disk02 offset=0 len=1022112 assoc: vol=vol01 plex=vol01-02 (column=0 offset=0) flags: enabled busy device: device=c1t1d2s2 path=/dev/dsk/c1t1d2s4 diskdev=192/820 ... ... Plex: vol01-01 info: len=6144768 type: layout=STRIPE columns=2 width=128 state: state=ACTIVE kernel=ENABLED io=read-write assoc: vol=vol01 sd=disk12-01,disk01-01,disk04-01,disk08-01 flags: busy complete Plex: vol01-02 info: len=6144768 type: layout=STRIPE columns=2 width=128 state: state=TEMPRMSD kernel=ENABLED io=write-only assoc: vol=vol01 sd=disk02-01,disk03-01,disk05-01,disk07-01 flags: busy utils: t0=ATT Plex: vol02-01 info: len=410256 type: layout=CONCAT state: state=ACTIVE kernel=ENABLED io=read-write assoc: vol=vol02 sd=disk06-01 flags: complete ... ... Volume: vol01 info: len=6144000 type: usetype=fsgen state: state=ACTIVE kernel=ENABLED assoc: plexes=vol01-01,vol01-02 policies: read=SELECT (prefer vol01-01) exceptions=GEN_DET_SPARSE flags: open writeback logging: type=REGION loglen=0 serial=0/0 (disabled) device: minor=5 bdev=115/5 cdev=115/5 path=/dev/vx/dsk/rootdg/vol01 perms: user=root group=root mode=0600 utils: t0=ATT1 Volume: vol02 info: len=409600 type: usetype=fsgen state: state=ACTIVE kernel=ENABLED assoc: plexes=vol02-01 policies: read=SELECT (round-robin) exceptions=GEN_DET_SPARSE flags: closed writeback logging: type=REGION loglen=0 serial=0/0 (disabled) device: minor=6 bdev=115/6 cdev=115/6 path=/dev/vx/dsk/rootdg/vol02 perms: user=root group=root mode=0600 Some interesting information would be the id lines from the devices and also the flags from the plexes and volumes. Notice that vol01 (which is a mirror doing a resync operation at the moment) has a flag of 'open writeback', while vol02's flag is 'closed writeback'. vol02 is not in use at this time, while vol01 is mounted and is in process or resyncing. 2.4.7 Use the vxprivutil command only with EXTREME CAUTION. It can be used to test the readability of the Private Region on any disk, and to gather some information when you cannot get it from any other source. # vxprivutil /dev/rdsk/c..t..d..s2 (Below are two separate command entry outputs; notice the host names.) diskid: 821159273.1577.unix group: name=rootdg id=820868326.1025.unix flags: private autoimport hostid: unix version: 2.1 iosize: 512 public: slice=4 offset=0 len=2050272 private: slice=3 offset=1 len=2015 update: time: 821462653 seqno: 0.13 headers: 0 248 configs: count=1 len=1456 logs: count=1 len=220 diskid: 821297570.1085.twodotfive group: name=rootdg id=821297558.1025.twodotfive flags: private autoimport hostid: twodotfive version: 2.1 iosize: 512 public: slice=4 offset=0 len=2050272 private: slice=3 offset=1 len=2015 update: time: 821297573 seqno: 0.5 headers: 0 248 configs: count=1 len=1456 logs: count=1 len=220 The most interesting piece of information from this command is the hostid and the group id information. Have you ever had a problem getting to a disk, or a disk group? This may help determine why, or assist in being able to get to it. 2.4.8 Use vxstat command to retrieve or reset statistical information on any VM object (volume, plex, disk, subdisk). 2.4.9 The vxinfo command prints the accessibility and usability of a volume. It prints out a one-line summary of each volume. bigvol fsgen Startable vol2 fsgen Startable brokenvol gen Unstartable The vxinfo utility reports the following conditions for volumes: Startable A vxvol startall operation would likely succeed in starting the volume. Unstartable The volume is not started and either is not correctly configured or doesn't meet the prerequisites for automatic startup (with volume startup) because of errors or other conditions. Started The volume has been started and can be used. Started Unusable The volume has been started but is not operationally accessible. This condition may result from errors that have occurred since the volume was started, or may be a result of administrative actions, such as vxdg -k rmdisk. 2.5 Understanding plex States The plex is the stuff volumes are made of. The state of a plex may help you to determine what is happening with a volume. The Volume Manager software can use this information to: - Determine whether or not volume contents are initialized to a known state. - Decide whether a plex has a valid copy of volume contents or not. - Track whether or not the plex was in active use at the time of a system failure. - Monitor operations on plex(es). There is a kernel state associated with all plexes also. This state will determine the accessibility of the plex. There are three kernel states: Disabled Offline, cannot be accessed. Detached Maintenance, plex ops and io are accepted, but not acted on. Enabled On-line, fully accessible. [**NOTE** No user intervention is required with the kernel state of a plex. This is maintained internally in the software.] A plex that is associated with a volume always has one of the following plex states: Empty Set at creation time. Clean Contains a consistent copy of volume contents, and an operation has disabled. (No recovery required.) Active Normal volume IO; or was active at time of system crash. Stale IO error; or question of completeness of volume contents. (Volume must be recovered.) Offline Result of manual offline operation on the plex. Temp Operations that are not truly atomic. (resyncs...) Temprm Same as Temp, but will be automatically removed at end of operation. IOfail Active plex failure; will not recover at volume start time. When a volume is started, the plex state cycles. For example, the system is shutdown normally, the plexes that are active are now marked clean. When the system boots up, all clean plexes will become active plexes. If a crash occurs and plex is active, the software looks for the most up to date plex and marks that one as active and marks the others in that volume as stale, meaning they must be recovered, or copied again. All stale plexes at boot, or volume start time, are recovered then marked active when the recovery has completed. The state of a plex can tell you a lot, if you know what has recently happened on a system. (Did it crash, or not?) To modify the state of a plex, use the vxmend command. You can modify the states to force operations. # vxmend fix 2.6 Understanding volume States There are four states for volumes, they differ between Raid-5 and other configurations. Non Raid-5 volumes: Clean Volume is not started, plexes are synchronized. Active Volume has been started, or was operational at boot. Empty Volume is not initialized. Sync Volume is in process of recovery, or was at boot time. Raid-5 volumes: Clean Volume is not started, and parity is good; stripes are consistent. Active Volume has been started, or was operational at boot; if kernel state is disabled, parity may not be good. Empty Volume is not initialized. Sync Volume is undergoing resync of parity, or was at boot time. Needsync Volume will require parity resynchronization at next start time. Use the vxvol command to change the state of a volume. This can be helpful when a volume will not start up, or if you wish to start a volume and not invoke automatic recovery. # vxvol init .... 2.7 System Does Not See the SSA There are a few things that can cause this to happen. 1) Bad fiber connection. Possible bad components; bad FC/OM module; bad cable; bad boards. 2) Wrong patch level or firmware level for this OS level. **Solaris 2.3 MUST have minimum kernel patch 101318-54. **Solaris 2.4 MUST have minimum kernel patch 101945-27. Refer to Patch Information (see Section 5.0), and "Configuration Matrix" information in References (see Section 7.0) 3) Missing driver software packages for the SSA. To check for these packages, run a 'pkginfo | grep storage' SUNWssadv SUNWssahd SUNWssaop If these are missing, install them. For instance, the 2.1 version of Volume Manager cdrom does not have any Solaris 2.4 SSA device driver software on it. You must load these drivers from the release 3/95 Solaris 2.4 cdrom, or from the version 2.0 SSA/Volume Manager cdrom. 4) System has not been configured for the addition of the array. - A reconfigure boot has not been done. Bring the system down and issue a 'boot -r' command. - The device tree has not built correctly. No devices are in the /dev/dsk area, or the /dev/rdsk area. 1) Check for an entry of the soc with the new array's address in the /devices/io.../sbus... directory structure. 2) If there is an entry here, remove the entire directory for this soc only. 3) do a reconfiguration boot (boot -r). 2.8 The System Cannot Run the vxconfigd, Volume Manager won't start. There are two things that can leave a file in the /etc/vx directory that tells the daemon the software has not been installed, so do not startup. 1) Running of vxinstall and breaking out of it before completion. 2) Installation of Volume Manager packages. # cd /etc/vx/reconfig.d/state.d # rm install-db 3) Reboot, or bring daemon up manually: (Refer to User's Guide Page E-38 for Version 2.x; Refer to Administrator's Guide Page B-23 for version 1.x) # vxiod set 10 (** NOTE: use a 2 for version 1.x) # vxconfigd -m disable # vxdctl init # vxdctl enable If the daemon does not start up automatically and there are no error messages reported form vxconfigd, try using the manual startup procedure above. If there are error messages being reported, refer to appendix C in the version 2.0 User's Guide for the exact error, and follow steps accordingly. (Version 1.x should refer to appendix A.) ------------------------------------------------------------------------------ 3.0 Quick Fixes, How-to's and Theory of Operations 3.1 Bootable Disk Encapsulation and Mirroring - How-to (NOTE: if NOT planning to mirror to same size disk device, encapsulation of bootable disk is not recommended by Sun Service Technical Support.) Encapsulation of a disk means bringing it in under control of the Volume Manager software. It does not disturb data, but just creates an 'overlay' on the disk so that the Volume Manager software has access to the device. This mostly comes into play on our systems with the bootable system disk. By placing the boot disk under control of this software, we can mirror it, giving us a backup copy to work or run from. This is very desirable in many system locations. It increases the up-time of the system. There are, however, some precautions that should be understood before one goes blindly into root encapsulation. There are a few rules that should be followed (possibly), depending on the version of Volume Manager software you are running. 1) The VM must have 'room' to operate on this device. (2 slices) If this disk has all 7 usable slices in use, you will not be able to encapsulate it. It needs two slices unused, one for the private region, one for the public region. 2) It is preferable to give the VM the space it will need for the private region. If you do not give it space, it will take what it needs from the end of swap. This may be fine, if you have plenty of swap space to hold both a kernel core dump and the actual private region info. There may be a problem if you must get a core dump, and do not have that much space. The core dump may overwrite the private region, and if this disk is the only disk in the rootdg disk group, the VM software will not operate at all. If possible you can give the VM enough room at the very end of the disk. This means making the last used slice a cylinder or two shorter in length. This means re-labeling the disk. (You will then have to restore the filesystems that were on this disk, so before you begin do a full backup of each filesystem.) Also, if the last used cylinder on the disk is 'full' of data, if you take space, it might leave a hole in the filesystem. Getting the boot disk setup correctly for encapsulation can be a very tedious and time consuming operation. Be careful of what you are about to do. By default this area needs to be 1024 sectors; but we bound on full cylinders. This means that it will need 1 or 2 cylinders depending on the footprint (size) of the disk. If slice 6 is the last used slice, this one needs to be made shorter. It is a good idea to verify that you can take this space from the slice BEFORE you begin. If the filesystem on this slice is near 90% you may want to rethink taking space form here. Keep in mind that when you shorten the length of this slice, you will lose the data that was there. Let us suppose that we have the /var filesystem on slice 6 of our boot disk, and that this is the last used slice on our disk. This filesystem is only 67% full, so we have plenty of space at the end of the filesystem. This being the case, let us proceed. (For this example, I will use 2 cylinders.) We must shorten this filesystem. We do this with the format utility. system # format (select our disk number to work with) .......gives us a listing of format options..... (we must partition) format> partition ....gives us a listing of partition options..... (Let's first look at our slicing) partition> print ....gives us a view of our slicing information.....may look like Current partition table (original): Total disk cylinders available: 2036 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 40 20.18MB (41/0/0) 1 swap wu 41 - 171 64.48MB (131/0/0) 2 backup wm 0 - 2035 1002.09MB (2036/0/0) 3 unassigned wm 0 0 (0/0/0) 4 unassigned wm 0 0 (0/0/0) 5 usr wm 172 - 1151 482.34MB (980/0/0) 6 var wm 1152 - 2035 435.10MB (884/0/0) 7 - wm 0 0 0 (0/0/0) partition> (now we select the slice we will modify, number 6) partition> 6 .....gives us slice information.... Enter partition id tag[home]: (carriage return here) Enter partition permission flags[wm]: (carriage return here) Enter new starting cyl[1152]: (carriage return here) Enter partition size[891084b, 884c, 435.10mb]: 882c (then we check it again, then label) partition> print ....gives us a view of our slicing information.....may look like Current partition table (original): Total disk cylinders available: 2036 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 40 20.18MB (41/0/0) 1 swap wu 41 - 171 64.48MB (131/0/0) 2 backup wm 0 - 2035 1002.09MB (2036/0/0) 3 unassigned wm 0 0 (0/0/0) 4 unassigned wm 0 0 (0/0/0) 5 usr wm 172 - 1151 482.34MB (980/0/0) 6 var wm 1152 - 2033 433.06MB (882/0/0) 7 - wm 0 0 0 (0/0/0) partition> label ........ Notice that the ending cylinder number is now 2033, not 2035. This gives the VM software plenty of room to work. 3) Use the 'vxdiskadm' utility to encapsulate the boot device. Use option number 2 for encapsulation. 4) To mirror, be sure you have a disk the same size as the bootable disk. It must not be sliced up. Only the backup slice (slice 2) should be set up. Again using 'vxdiskadm', select option 6 to mirror. 3.1a How to 'unroot' your Bootable Disk To remove the Volume Manager control of your bootable disk, run the 'vxunroot' script in the /usr/lib/vxvm/bin (or the /etc/vx/bin) directory: # /usr/lib/vxvm/bin/vxunroot Then run the 'vxedit' command to remove the references to the 'rootvol' and 'swapvol', etc. volumes that were created on the bootable disk: # vxedit -fr rm rootvol # vxedit -fr rm swapvol # vxedit -fr rm usrvol 3.2 Volume Manager is not seeing a disk device any longer This can happen because you have replaced a disk device, or possibly because the system was booted before a device in the SSA was ready. In either case, it is a simple matter to 'ping' the device(s) and bring them back 'on-line'. There are two ways to accomplish this: [**With EXTREME caution, on a quiesced system only] **use 'drvconfig' -- if the system has booted without all disks ready use 'vxdctl enable' -- if you have replaced a disk device, and volume manager cannot find the new device. (I prefer the use of the vxdctl command, unless the disk has been flagged as off-line to the system.) These commands find what's out there, and act accordingly. The correct way to replace a failed device in the Volume Manager is to tell it the device that you are replacing a component before you physically remove it. Use 'vxdiskadm' option 4 to remove a component for replacement. (Very often this procedure is not followed, so the VM cannot find the disk. You will get messages informing you that the device is not part of VM if you try to remove it, or that it already is part of VM if you try to add it.) Once the VM software has been notified that the device is there, it is a simple matter to bring it back into use. When you find that you have replaced a disk without first telling the VM software that you were going to do this, my recommended procedure is: Run vxdctl enable Run 'vxdiskadm' and select option 5 to replace the failed component. 3.3 Software and Firmware Differences, What You Need to Know (Refer to section 7.0 for a copy of a support matrix) We began with two Operating Systems (OS), so we had two versions of firmware that corresponded with the OS. It began to get a bit confusing, as we had many versions of SSA/Volume Manager, firmware and OS levels. We have a matrix available for use that explains what goes with what. I will list out the different versions: OS SSA drivers Volume Manager ------ ------------- ---------------- Solaris 2.3 1.x, 2.x 1.3, 2.x Solaris 2.4 2.0, 2.1 2.0, 2.1, 2.1.1 Solaris 2.5 2.1 2.1.1 There have been a few revisions of firmware developed over the life of the SSA. There are many reasons for so many revisions, which I will not go into in this forum. Suffice it say that each revision had a purpose, and for the most part was better than its predecessor. We recommend that the latest revision be downloaded into all SSAs. We have found historically that the latest revs of the firmware will repair or prevent many problems. (If you have a question regarding any version of firmware, please do not hesitate to contact a Solution Center and ask about it.) We are getting better, as now there are firmware revisions coming out that are compatible with any OS*, and any version of SSA driver. The version numbers began following the SSA driver level versions. For version 1.3, the firmware versions are 1.xx For version 2.0 or 2.1, the firmware versions are 2.xx The new version will be 3.x; these versions are compatible with version 1.x* and all version 2.xx of the SSA drivers. *(At the time of the writing of this document, version 3.3 firmware is not yet available for Solaris 2.3 OS. Any controllers that contain this version are not compatible, cannot be used) There were some issues related to the age of the controller board and the ability to 'uprev' or 'backrev' the firmware. The oldest boards will not allow some versions of uprev'ed firmware to download. The newer boards will not allow some older versions of firmware to be downloaded. Yes, this was a bit of a confusing problem! The good news is that the current boards should work anywhere. We have found that placing the NVRAM fastwrite tables into a known state before any downloading of firmware will enable the firmware to load correctly. Also, all the instructions indicate a reset of the SSA as a required step. This makes the SSA use the newly downloaded firmware. The instructions may tell you to use the power switch to power cycle the unit, to reset it, but there is a better way, use the reset switch. On the Array controller board, there is a reset switch. This just 'rereads' the firmware. It is located between the DIN connector and the "Diag" switch. It is large enough to use a pencil eraser to push it, and sits inside a raised circle. If there was a problem with the firmware, and the unit is power cycled, you might see a failure (PMF displayed) and you might lose the connection to the host. When this happens, the array controller board must be replaced. Here is some interesting information: SSA controller boards 501-2080-09 or higher and 501-2651-xx were shipped with a newer SCSI controller chip from a second source that makes it incompatible with older SSA firmware versions. On a SSA with Solaris 2.3 and an sbus with 1.33 Fcode version, the sbus may not recognize the SSA if the SSA driver patch levels are not high enough. The NVRAM procedure places the NVRAM fastwrite tables into a known state and saves them properly. This allows the SSA firmware to create the correct checksum for the SSA firmware just loaded, thus eliminating the PMF ERROR. The NVRAM Procedure is: a) Issue the appropriate command to enable the SSA Fastwrite option: (i) Solaris 2.4: Hardware 3/95 and later /opt/SUNWssa/bin/ssaadm fast_write -s -e X where X is any disk drive (ii) All other Solaris versions /opt/SUNWssa/bin/ssacli -s -e fast_write X where X is any disk drive b) Issue the appropriate command to disable the SSA Fastwrite option: (i) Solaris 2.4: 3/95 and later /opt/SUNWssa/bin/ssaadm fast_write -s -d X where X is any disk drive. (ii) All other Solaris versions /opt/SUNWssa/bin/ssacli -s -d fast_write X where X is any disk drive. NOTE: It is NOT required to apply the above commands against all drives; ONLY any ONE disk drive. CAUTION: Do not interrupt the download procedure once it has begun. If it has to be interrupted, DO NOT RESET the controller. For example, if you notice that you are loading in the wrong version, just let it complete, then go ahead and load down the correct one. When the correct one is completed, then reset the SSA. 3.4 Hot Spares How They Work (Volume Manager) A hot spare is a disk device that is kept 'in reserve' for use by the software only. It is used in place of a disk that fails, but only if that disk is part of a mirror or raid-5 volume type. Keep in mind that a normal striped volume or a simple (concatenated) volume does not have any ability to recreate missing data, so hot spare will not be able to operate with these volume types. If there is a disk failure with one of these, the data must be restored from a backup source. The software will use the remaining information from a volume to re-create the missing data from a failed disk. In the case of a raid-5 volume, the remaining components are XOR'd with the parity to yield the missing data. In the case of a mirror, it is simply copied over from the mirrored side. The use of hot spare and Volume Manager has raised some controversy. Once one looks at the way it actually operates, it's way of use will be better understood. A hot spare will not be placed into use unless the entire disk fails. What this means is, if there is a failure in a subdisk, the hot spare may not necessarily come into play. When there is a failure on a subdisk, the VM will try a write operation in its private region of that device. If this operation succeeds, it assumes the disk is 'good' and will not use a hot spare. If this operation fails, it assumes whole disk has failed, and will use a hot spare. WHY? Why does it have to have a whole disk fail? The answer is very simple. With this software we are able to create many volumes out of many components, in many configurations. We can have striped and raid-5 and simple volumes all in the same disk group. We can place these separate volume types on the same disk. A hot spare replacing that disk may cause many volumes to go offline. Veritas decided that they would only replace a disk if they have a problem writing to both the private and public regions. Otherwise it is left as is. Here is an example. Let's say we have a few volume types on one disk: a simple volume subdisk, a striped volume subdisk and a mirrored volume subdisk. A read failure occurs on the mirror subdisk. We have hot spare on. Think about what would happen if this hot spare replaces this disk. As the hot spare is put in place of the original disk, the simple and striped volumes go down! The hot spare disk has no mechanism to recover their data. Those two volumes would have to be re-created and restored from a backup source. The VM software has an extremely high limit on the number of subdisks that can be on one disk. 3.5 WWN (World Wide Number) and How to Change it The World Wide Number is a unique number. It was intended to remain unique per SSA, due to the future possibility of being able to access them directly on a network. It is like the IP address for use on the Internet. Since this isn't that future (yet), there are times when it may be necessary to change this inside the SSA. This address has its last four numbers displayed on the LCD front panel of the SSA. It is contained in a prom on the Array Controller. If the Array Controller board is replaced, there will be a new WWN for that SSA. Since this is effectively the address that is used by the system when the device trees are created, this SSA may have "disappeared" from your system. How do you find the original address (WWN)? One way would be to pull it from the messages file (find the boot cycle's Fiber Channel ONLINE message). One way would be to find it in the /devices directory structures. One way would be to follow the link in the /dev/dsk directory: # ls -l /dev/dsk/c2t0d0s2 /dev/dsk/c2t0d0s2 ->../../devices/io-unit@f,e1200000/sbi@0,0/ SUNW,soc@1,0/SUNW,pln@b00008000,5438af/sd@0,0:c ^^^^ ^^^^^^ The WWN is a twelve (12) digit number. It is represented in a strange way. The numbers to the right of the comma must be padded with zeros to fill 8 digits. The upper four digits are left of the comma. In the example above, the WWN is 8000005438af, not 0080005438af. The choice then is whether to use the new address or the old address. To use the new address, remove the old address information and do a reconfigure boot (to place the new one where the old one was i.e.: cNtNdN) This means removing the listing in the /devices/io.../s.... (Possibly, also removing the controller links in /dev/dsk and /dev/rdsk.) The subsequent reconfiguration boot should reconstruct a device tree now pointing to the new controller address in place of the original. To change the WWN on the Array Controller, use the following commands: Solaris 2.3: # ssacli -s -w download cN Solaris 2.4 and above: # ssaadm download -w cN Reset the SSA (use the reset switch). 3.5a WWN How to Use the New One, instead of changing it. You can use the new WWN that is on a new controller card by removing the references to the original in the device tree on the system, before doing a reconfiguration boot. Remove the "SUNW,soc@N,N" directory and contents in the '/devices/io.....' directory structure of the device tree. Also, remove all entries in /dev/dsk and /dev/rdsk for this controller's disks. (For example all the entries for c5) Since the location of the Host Adapter Sbus card has not changed, the 'boot -r' will simply rebuild that portion of that controller with the NEW address of the array board (pln), and create the devices in /dev/dsk and /dev/rdsk. The VM software will use the same devices, since that will not have changed. 3.6 Using the 'vxdiskadm' Utility Many administrative operations can be performed through the use of this utility. # vxdiskadm Volume Manager Support Operations Menu: VolumeManager/Disk 1 Add or initialize one or more disks 2 Encapsulate one or more disks 3 Remove a disk 4 Remove a disk for replacement 5 Replace a failed or removed disk 6 Mirror volumes on a disk 7 Move volumes from a disk 8 Enable access to (import) a disk group 9 Remove access to (deport) a disk group 10 Enable (online) a disk device 11 Disable (offline) a disk device 12 Mark a disk as a hot-spare for a disk group 13 Turn off the hot-spare flag on a disk list List disk information ? Display help about menu ?? Display help about the menuing system q Exit from menus Select an operation to perform: 3.7 How to Remove or Replace a Disk When a disk is failing, or has failed, replace the disk. Since the VM software keeps its configuration information in the private regions on all disks, you must notify the software of the change that is about to take place. Otherwise what happens is the software keeps looking for the original disk that really isn't there, and will not allow you to add the new one in its place. The command to use for this operation is: # vxdiskadm When you are ready to replace the disk device, first tell the VM software that you are going to replace the disk. Once the disk has been replaced, simply tell the VM software that you have completed replacement. (**NOTE: if a raid-5 or mirror component, the recovery will begin automatically following this procedure. Other volume types must be restored from a backup source.) Example: # vxdiskadm Begin with option 4 to remove for replacement. Remove a disk for replacement Menu: VolumeManager/Disk/RemoveForReplace Use this menu operation to remove a physical disk from a disk group, while retaining the disk name. This changes the state for the disk name to a "removed" disk. If there are any initialized disks that are not part of a disk group, you will be given the option of using one of these disks as a replacement. disk name [,list,q,?] Enter the disk name, like disk12, or sybase03. There will be other questions asked to be sure of the operation you are about to perform.... Replace the disk, then: Select option 5 to replace a failed disk: Replace a failed or removed disk Menu: VolumeManager/Disk/ReplaceDisk Use this menu operation to specify a replacement disk for a disk that you removed with the "Remove a disk for replacement" menu operation, or that failed during use. You will be prompted for a disk name to replace and a disk device to use as a replacement. You can choose an uninitialized disk, in which case the disk will be initialized, or you can choose a disk that you have already initialized using the Add or initialize a disk menu operation. Select a removed or failed disk [,list,q,?] Input the disk name that you removed (i.e.: disk12), the access name will be listed for the replacement device: The following devices are available as replacements: c1t5d2s2 You can choose one of these disks to replace disk16. Choose "none" to initialize another disk to replace disk16. Choose a device, or select "none" [,none,q,?] (default: c1t5d2s2) Then continue to answer the questions verifying operation. 3.8 Boot Issues One of the most frustrating occurrences with a system is when it will not boot up. This problem may be compounded if the boot disk is encapsulated for use by the Volume Manager. Of course if it is encapsulated, it should be mirrored. If the primary boot disk will not boot up, then the mirror boot disk should allow a boot cycle to complete and bring the system up. If the system will not boot up from either disk device, our alternative to attempt system boot is to remove references to volume manager for the system's filesystems. (Refer to section 2.3.1 for prtvtoc information.) Most systems are capable of booting up without needing the Volume Manager software to operate. If the underlying filesystem structure slicing still resides on the primary bootable disk, the following procedure can be followed. (If not, then the recovery procedure in the User's Guide should be followed, beginning with either restoration of the filesystems, or re-installation of the OS. version 2.x: Page E-34) Boot the system from your OS cdrom in single user mode. (ok> boot cdrom -s) run a filesystem check (fsck) on the root slice of the disk; then mount it to the /a mount point on the cdrom, and "cd" to /a/etc. Perform one of the following procedures: A) Replace the current vfstab file with the original copy from prior to having VM software. Then comment out any entry in the /etc/system file that refers to a 'bootdev' (remember to use a '*' as the comment symbol). When the 'vxinstall' program has been run, it copies your original vfstab file to a file called: vfstab.prevm You can move your current file to another filename, and copy this ".prevm" file as vfstab. Then simply unmount the filesystem from the cdrom and boot from your boot disk. (Subsequent boot issues will not be discussed here.) 1) # mv vfstab vfstab.org 2) # cp vfstab.prevm vfstab 3) # cd / 4) # umount /a 5) # halt ok> boot If this '.prevm' file does not exist, you must hand-edit your current vfstab file. ***CAUTION*** copy it first to another filename. B) Hand-edit the vfstab file to remove all references to "/dev/vx" devices to mount: 1) Replace the entries for the bootable filesystems with the direct slices to mount. #device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # #/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr ufs 1 yes - /proc - /proc proc - no - fd - /dev/fd fd - no - swap - /tmp tmpfs - yes - /dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 / ufs 1 no - /dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 1 no - /dev/dsk/c0t3d0s3 /dev/rdsk/c0t3d0s3 /var ufs 1 no - etc..... 2) # cd / 3) # umount /a 4) # halt ok> boot (Other boot issues are not discussed here.) 3.9 SSA fast_write feature versus PrestoServe You can use either the SSA fast_write capability, or Prestoserve, to cache write data. You can even use both at the same time, although it is not cost-effective. Presto + SSA fast_write is slightly faster than either alone, but the differential is not worth the price. Generally speaking the SSA fast_write feature is more general, is more scalable (since it's in every SSA, not just in the host), and is safely multi-ported. SSA Fast_write is therefore the preferred solution where available. 3.9.1 Prestoserve with host-based RAID (SDS/VxVM) products If using Prestoserve with either Solstice Disksuite (SDS), or SPARCstorage Array Volume Manager (VxVM), then there is some prework required. With VxVM, you must ensure that the various product-notes on this topic, a copy of which is in INFODOC 13492, should be followed with respect to the volumes/filesystems which may be prestoized, and the order in which prestoserve should be started with respect to those products. See also the SDS 4.0 jumbo-patch README or relevant product-note, for similar information for SDS. 3.10 RAID-5 Information One of the questions we most often are asked in regard to Raid-5 configuring is "How can I tell how much overhead the parity will take up on my volume? I must know how big a volume I can create." There are a couple things you can do here. One is to use the 'vxassist' command to help you determine how much space is available in a disk group for use with a Raid-5 configuration. vxassist has an option, undocumented in the manual page but documented in the help output, that can be used to find out the maximum size in K that a volume can be made in a given diskgroup. # vxassist help usage vxvm:vxassist: INFO: vxassist - Perform simple volume administration vxvm:vxassist: INFO: Usage: vxassist [-g diskgroup] [-U usetype] [-d file] [-nbf] keyword arg... Recognized keywords: make volume-name len [attrs...] mirror volume-name [attrs...] addlog volume-name [attrs...] move volume-name [attrs...] growto volume-name new-length [attrs...] growby volume-name length-change [attrs...] shrinkto volume-name new-length [attrs...] shrinkby volume-name length-change [attrs...] snapstart volume-name [attrs...] snapwait volume-name snapshot volume-name snapshot-name [-p] maxsize [attrs...] [-p] maxgrow volume-name [attrs...] For example, to find out what size we can use for a raid-5 volume, use the following command: # vxassist -p -g datadg maxsize layout=raid5 max_nraid5column=4 The result would be listed in 'k'. Or, another way would be to use the 'rule-of-thumb' of the parity being about equal to a 1/#-of-columns in the volume. So the formula: totaldiskspace - (1/#ofcols * totaldiskspace) = volume size 4000m - (1/4 * 4000) == 4000 - 1000 = 3000m (or 3gb) In our SSA device, when contained within one SSA the maximum number of columns used would be 6. So the parity overhead should fall between 1/6 and 1/3. Subtract this amount from the total, and the balance should be close to the size that may be used for the volume, and still fit. 3.11 ROOTDG Recovery (Refer. SRDB 12072, 11136) When the rootdg is not able to run, the vxconfigd will not start; when this happens, the Volume Manager software will not run at all. Most screen messages will report a vxconfigd: error or a VxVM error, and most will point to the fact that a disk group is not correct or usable. In response, the bootable disk can be encapsulated, or you can use even one slice from a disk that is not being used to hold the disk group. Once the decision has been made the procedures for the decision can be followed. If the only rootdg disks were the bootable disk and its mirror, you should re-encapsulate the bootable disk. You can use the vxinstall utility to do this, but, please use with caution. Use the "custom" installation; when 'c0' disks are listed, tell vxinstall to do them individually; select 'encapsulation' for the bootable disk only; select 'leave these disks alone' for all the rest on 'c0'. For all other controllers that are be listed, tell vxinstall to 'leave these disks alone'. DO NOT BREAK OUT OF THIS UTILITY (i.e.: control-C). This will place the bootable disk as a rootdg disk, from here the daemons can be restarted manually (see below, or refer to Appendix E-38). To use a slice of a disk to hold the rootdg disk group, use the following procedure: (Be sure that you have a slice that can be used - one that contains no data. If need be, use the format utility to create one slice to be used, approximately 1024 sectors in length. One or two cylinders, based on the size of the disk.) Example: (we are using slice 7 of one of the disks on our system) 1. Disable transactions: # vxconfigd -m disable # ps -ef | grep vxconfigd root 58 1 80 10:08:39? 0:01 vxconfigd -m disable root 520 328 4 10:35:09 pts/0 0:00 grep vxconfigd 2. Initialize the database: # vxdctl init 3. Make a new rootdg group: # vxdg init rootdg 4. Add a simple slice: # vxdctl add disk c0t1d0s7 vxvm:vxdctl: WARNING: Device c0t1d0s7: Not currently in the configuration (Note: this warning is normal) 5. Add disk records: # vxdisk -f init c0t1d0s7 6. Add the disk name to the rootdg disk group: # vxdg adddisk c0t1d0s7 7. Enable transactions: # vxdctl enable (Note: You might need to bring the daemons up with the complete startup procedure:) # vxiod set 10 # vxconfigd -m disable (this should be done already...) # vxdctl init # vxdctl enable 3.12 Loss of Disk Group Configuration Information; How to Save the Information (Refer to SRDB 12006) The Volume Manager keeps all of its configuration information in the private region on all disks under its control. If the configuration information on disk cannot be read by the VM software, that disk group will not be available for use. This can be devastating to a system, and may require days worth of restoration of data from tape. This is not an area of a disk that can be 'backed up' in the conventional manner for restoration if the need arises. So we are often asked what can be done to 'keep' this information for use "in case". There is a way to get copies of this information in a format that the VM software can use if needed. Basically you must run two commands out to two separate output files, then keep a copy of these files handy. Get a copy of the existing devices: # vxdisk list > Get a copy of the configuration in a format that can be used: (This will be run for each disk group, and after any configuration changes are made, must be run again to get the current info.) # vxprint -g -hmvps > In the event of a disk group configuration problem, the outputs of the two commands above can be used to recreate it. 1. Use the vxdisk list file to re-initialize the disks. Be sure to name the correct devices as the correct disk names (for the disk group), etc. Make sure when you have finished that the current vxdisk listing looks identical to the original file. 2. Run the output of the vxprint command in to the vxmake command. This will recreate the configuration files in the private regions of the disks in this disk group. # vxmake -d 3.13 IOSTAT Output; How to find the listed "ssd" device it reports The iostat command reports the ssd number. This is the 'instance' number of the device. To find the actual device, grep the /etc/path_to_inst file: # grep 128 /etc/path_to_inst 3.14 Dual-Hosted SSAs and Simulation of a "failover" on Dual-Host This is strictly a manual operation on normal Solaris systems. Normal Veritas software is not designed for true Dual-host connections. The Solaris OS is also not designed for Dual-host configurations. Neither piece of software has any mechanism to verify what is where, or who is up, and who owns what. (Even with third-party software running like FirstWatch or OpenVision.) In VM software, the rootdg cannot be moved from one system to another system; so to use two hosts to one ssa box, there must be other diskgroups involved. Ground rules for Dual-Host configurations: 1) You MUST have disk groups other than rootdg, preferably no SSA disks for rootdg for either (any) system. 2) Only one system can `own' a disk group at any given time. 3) No system cross-mirroring allowed; since dg's are exclusive ownership. 4) It is best to keep the controller configurations the same on both systems; this way device names remain the same. 5) If one system dies, the ONLY way a fail-over will occur is in a manual operation. There is no automatic fail-over mechanism in the normal VM software, nor in Solaris 2.x This means that diskgroups must be forcibly imported onto the other system. To run a 'fail-over' test: 1. Set up the two systems with all the software and current patches. 2. On the first system (System-A), configure the SSA and Volume Manager software. 3. Make some test volumes with data, etc. Bring this system down. 4. Move the SSA connection to the 'System-B' if necessary, and run a reconfiguration boot (boot -r). 5. On the second system (System-B), run the 'boot -r' to allow the system to configure for the SSA devices. Remove the /etc/vx/reconfig.d/state.d/install-db file. (Refer to Appendix E-38 of the SSA User's Guide:) Manually start the daemons, specifying the System-A hostname to get the rootdg up on System-B: # vxiod set 10 # vxconfigd -m disable # vxdctl init (in our case: System-A) # vxdctl enable Import all other disk groups. The Volume Manager should be completely up and running now. 3.15 Moving an SSA to another system. (see above procedure in section 3.14.) 1) The system does not have any SSA devices on it currently: Do a reconfiguration boot, so that this system can 'see' this SSA. When moving an SSA to another host, the Volume Manager must be manually started specifying the original host name. If this host name is unknown, you can get the information out of the private region with the following command: #/etc/vx/diag.d/vxprivutil list /dev/rdsk/cXtYdZs2 Look for the hostid information. (example: hostid: unix) Once you have the original host name, you can use the four step procedure above (refer Appendix E-38 in the SSA User's Guide) If this SSA is going to stay on this system, you may want to make the host name the current host name on all the disks. To do this, use the 'hostid' option to the daemon control command: # vxdctl hostid 2) The system has SSA devices on it currently: When placing an SSA from a system onto a system that currently has SSAs on it, you must remember that there is already a rootdg. If there is a rootdg on the 'new' SSA, you will have to import it as a new disk group. Remember to do a reconfiguration boot in order for the system to 'see' this SSA. For all disk groups other than rootdg, use an import command to bring them online with this system. For the rootdg disk group, create a new disk group and import the rootdg to this new group. Obtain the id string of the diskgroup before removal, if possible with this command: # vxdisk -s list Otherwise, use the vxprivutil command to scan a disk, and find the 'dgid' information there. # vxprivutil scan /dev/rdsk/c4t3d0s2 Verify a disk that is listed as part of 'rootdg' and find the line that begins with "dgid: " This is the id string for the diskgroup. [Example: dgid: 832724095.1025.systemb] Record this information for use after moving the SSA onto the new system. To bring in this SSA's rootdg configuration, use the following to import giving a new disk group name: # vxdg -n import [Example: vxdg -n sysbdg import 832724095.1025.systemb] 3.16 Fail-Over Simulation for testing hot spares with Mirrors and Raid-5 There is correct way and an incorrect way to test this fail-over capability inside a model 1xx desktop SSA. The incorrect way is to pull out a tray. The best way to test this is to set up your 'volume' made up of a disk in each of the three trays (minmum); setup a hot spare disk somewhere. Create a filesystem on the volume, and mount it. 1-Begin a lengthy IO session to the mounted filesystem; 2-use the format utility to remove the slice info and relabel a disk. format --> select disk from volume --> partition --> make slice 3 and 4 begin at 0 and go for 0 length label the disk By removing the references to both the public and private regions on the disk, the VM software assumes a full-disk failure, and will begin to bring the Hot Spare disk in place. ------------------------------------------------------------------------------ 4.0 FAQ's Please contact your local Sun office for a list of FAQ's. ------------------------------------------------------------------------------ 5.0 Patches 5.1 SPARCstorage Array, Veritas Volume Manager PATCHLIST (Please refer to the SunSolve system or CDrom for the current version.) Solaris 2.3 VM 1.3 101765 SunOS 5.3: disks program supports only 16 drives per controller 102198 vxva 1.3: patch to fix known problems with 1.3 vxva 102199 vxvm 1.3: patch to fix known problems with 1.3 vxvm 102368 SPARCstorage Array 1.0: bug fixes for firmware 1.9 on SSA softw 102408 SPARCstorage Array 1.0: Jumbo patch for SSA drivers Solaris 2.3 VM 2.0 101765 SunOS 5.3: disks program supports only 16 drives per controller 102301 Volume Manager 2.0: log replay problems with raid5 write entrie 102400 SPARCstorage Array 2.0: Jumbo patch for SSA drivers Solaris 2.3 VM 2.1 101765 SunOS 5.3: disks program supports only 16 drives per controller 102403 Volume Manager 2.1: Volume Manager Visual Administrator Fixes 102465 SPARCstorage Array 2.1: Jumbo patch for SSA Solaris 2.4 VM 2.0 102283 SunOS 5.4: disks program supports only 32 drives per controller 102446 SunOS 5.4: format fix 102301 Volume Manager 2.0: log replay problems with raid5 write entrie *HW1194 102347 SPARCstorage Array 2.0: Jumbo patch for SSA **HW395 102432 SPARCstorage Array HW395: Jumbo patch for SSA drivers *** 103290 SPARCstorage Array 2.0: SSA Jumbo patch for Solaris 2.4 11/94,HW395 Solaris 2.4 VM 2.1 102283 SunOS 5.4: disks program supports only 32 drives per controller 102446 SunOS 5.4: format fix 102403 Volume Manager 2.1: Volume Manager Visual Administrator Fixes *HW1194 102347 SPARCstorage Array 2.0: Jumbo patch for SSA **HW395 102432 SPARCstorage Array HW395: Jumbo patch for SSA drivers *** 103290 SPARCstorage Array 2.0: SSA Jumbo patch for Solaris 2.4 11/94, HW395 Solaris 2.5 VM 2.1.1 103017 SPARCstorage Array Solaris 2.5: Point patch for SSA ------------------------------------------------------------------------------ 6.0 Bugs & RFE's - Known Problems 6.1 VXVA Core Dumps when Reconnecting a Disk Bug Id: 1242519 Category: pluto Subcategory: vxvm_va State: dispatched Release summary: 2.1.1 Synopsis: VXVA core dumps when reconnecting a disk that vxconfigd doesn't see Description: solaris 2.5 vxva 2.1.1 on sun4d 2000 vxva core dumps dual hosted ssa. disk goes bad Not knowing there is a disk failure, systemA is rebooted because disk is unreadable it's not in vxconfigd's config on systemA. SystemB has not been rebooted so it still see's disk. Disk is replaced and initialized from systemB. Now back on systemA the attempt is made to issue through vxva advanced-ops > diskgroup > reconnect. vxva core dumps. Workaround: If you then go to SystemB and reinitialize by vxdctl disable vxdctl enable. After which the "advanced-ops > diskgroup > reconnect" operation works without error. Summary: solaris 2.5 vxva 2.1.1 on sun4d 2000 vxva core dumps dual hosted ssa. disk goes bad Not knowing there is a disk failure systemA is rebooted because disk is unreadable it's not in vxconfigd's config on systemA. SystemB has not been rebooted so it still see's disk. Disk is replaced and initialized from systemB. Now back on systemA the attempt is made to issue through vxva advanced-ops > diskgroup > reconnect. vxva core dumps. Work around: If you then go to SystemB and reinitialize by vxdctl disable vxdctl enable. After which the "advanced-ops > diskgroup > reconnect" operation works without error. ------------------------------------------------------------------------------ 7.0 All Other References Available 7.1 FRU part numbers Explanation of the Model numbering: XXX = Type of SSA; CPU type; disks Model 100 = table-top box; Sparc chip;.5gb disks 101 = table-top box; Sparc chip; 1.5gb disks 102 = table-top box; Sparc chip; 2.1gb disks 112 = table-top box; Swift chip; 2.1gb disks 200 = Rack mount box; Sparc chip; 210 = Rack mount box; Swift chip; FRUs Description Part Number ---------------------------------------------------------------------------- MODEL (10x): Front Panel Assy. 540-2382-xx Fan Tray 540-2573-xx Chassis Enclosure Plug Cover 330-1589-xx Chassis enclosure 340-2670-xx Side panels (2) 330-1470-xx Foot (4) 330-1590-xx Top/Bottom cover (2) 330-1469-xx Subsystem backplane 501-2029-xx Array Controller (without Battery or FC/OM) 501-2080-xx Power Supply 540-2465-xx Disk Tray 540-2245-xx MODELs 11x and 21x: (Pluto-II) Array Controller 11x (with Battery and FC/OM) 501-2982-02 Array Controller 21x (with Battery and FC/OM) 501-3024-02 Array Controller 11x (without) 501-2872-03 Array Controller 21x (without) 501-3021-03 7.2 Documentation Part Numbers Part Number: Document: ------------ --------- Version2.x: 802-2041-10 SPARCstorage Array Configuration Guide 802-2042-10 SPARCstorage Array User's Guide 801-2205-12 SPARCstorage Array Installation Manual 801-2206-12 SPARCstorage Array Service Manual ========================================================================== Version 1.x: 801-7838-11 SPARCstorage Array Product Note 801-2204-11 SPARCstorage Array User's Guide 802-1242-10 SPARCstorage Volume Manager System Administrators Guide 801-6530-10 SPARCstorage Array Configuration Guide 801-2205-11 SPARCstorage Array Installation Manual 801-2206-11 SPARCstorage Array Service Manual 801-2207-10 Disk Drive Installation Manual for the SPARCstorage Array 801-6313-10 Fibre Channel SBus Card Installation Manual 801-6326-10 Fibre Channel Optical Module Installation Manual 801-6306-11 Fibre Optic Cable Product Note 801-7103-10 SPARCstorage Array Regulatory Compliance Manual 801-7173-10 Power Cord selection Product Note 7.3 White Paper Locations: Sun internal: all white papers reside on ftp site on newstop, or web page: http://www.corp/prodmktg/pme/newstop/whitepapers Veritas home page: http://www.veritas.com ------------------------------------------------------------------------------ 8.0 Supportability statement clarifying what is supported, what is not supported. General Summary for Support of the SPARCstorage Array and VM Software * Answer specific questions and assist in troubleshooting specific installation issues. Answer specific configuration questions. {Will refer a customer to a service provider for installation and configuration issues too detailed for telephone support or requiring hand-holding assistance.} * Answer questions regarding product usage. {Refer customers to training or to a service provider for issues too detailed for telephone support or requiring hand-holding assistance.} * Assist customer in debugging and troubleshooting problems specific to SUN systems and SUN products. {We cannot guarantee a solution for problems with non-Sun products, and will generally refer a customer to a service provider for these issues.} * File bug reports and escalate issues to engineering for product defects to provide a fix or workaround. We may require the customer to supply a code sample of 100 lines or less of a reproducible case of the defect. {Warranty calls will not be escalated through CTE.} * For performance and third party issues, will provide customer with general how-to-debug information, and refer them to an appropriate service provider. The service providers are not limited to, but may include: ITOPS, Sun Integration, Local Sun office (SSE), Third party equipment or software Vendor, Sun Education Services, or NASC T&M assistance. ------------------------------------------------------------------------------ 9.0 Additional support information Veritas 415-335-8000 800-258-8649 Raid-5 configuration information: http://www.sun.com/sunworldonline/swol-09-1995/swol-09-raid5.html 9.1 Veritas Tech Note 8889 (1995) Using Prestoserve with Volume Manager Prestoserve is a Sun Microsystems product designed to accelerate performance of filesystems, particularly when used on a server for NFS advertised filesystems. This is accomplished via the use of NVRAM hardware and the Prestoserve drivers. The hardware provides a fast, non-volatile solid-state writeback cache that can cause writes to a disk device to be returned to the user as completed before the data reaches the disk. This mechanism can be configured to work below VxVM as direct replacement for the disk device that VxVM uses. This approach presents no particular problems for VxVM, which remains unaware of the underlying cache device. In the event of a failure of the NVRAM devices however, it is possible to lose data since the disks backing the NVRAM may not be up-to-date. Prestoserve can, however, be configured to run above vxVM in such a way that VxVM replaces the disks that Prestoserve controls. In this situation, VxVM has a number of problems to address. One problem is Prestoserve's use of disk devices. Some applications (including Prestoserve) maintain device numbers between reboots. VxVM attempts to maintain device numbers between reboots, but if a different combination of disk groups is imported it is possible for a conflict of minor numbers to be detected. In this case, the later import will have conflicting devices renumbered to a new minor number range. The GA load of this product will provide a new mechanism for setting minor number ranges for a disk group, which will provide the user a mechanism to reliably avoid this problem. The danger of VxVM changing its device numbers on a reboot following a system failure is that Prestoserve may flush its dirty buffers to the wrong volume devices. This can have destructive results. To avoid this problem, use Prestoserve either only for volumes that are in the rootdg disk group, or the set of disk groups imported on different hosts should be strictly controlled.f VxVM changing its device number Unfortunately there is, as yet, no mechanism to enforce this ordering for any disk groups that are automatically imported. However, is all disk groups have at least one configuration copy that is readable and if no collisions are detected between the disk groups, this operation will work. Another problem is with the start up of Prestoserve. Following a system failure, the Prestoserve drivers will cause a flush of all outstanding dirty buffers to be flushed to disk. If this flush request occurs before VxVM drivers have been loaded into the kernel and before the volume devices can be started and made available for use, then Prestoserve's attempts at flushing to the volumes will fail. Warning: This problem could lead to data loss. To prevent this situation, it is recommended that the order of the starting of Prestoserve with respect to the volumes be altered to occur after the volumes have been started. To achieve this result perform the following steps: 1. Edit the /etc/system file. Add the line: exclude: drv/pr This loads the Prestoserve driver and starts the flush operation after the volume devices have been started. 2. Edit the /etc/init.d/vxvm-startup2 file and add the following lines to the end of the file: modload /kernel/drv/pr presto -p > /dev/null This will cause a load of the Prestoserve driver following the start of the VxVM daemon. ------------------------------------------------------------------------------ 10.0 WARRANTY WARRANTY: if we include all this info in product shipments, and explain that warranty will only receive this info as help, it may greatly reduce our warranty calls. Other than what would be here, they can phone their local Sun office for further assistance. =============================================================================== =============================================================================== =============================================================================== 11.0 SSA Support Matrix (Jan. 1997) Offical SSA Software/Firmware Configuration Matrix Rev 0.9.12 15 Jan 97 The following table shows the most recent levels of OS, software and firmware for the SSA. Table 1: Solaris SSA SSA SOCHA Software Firmware Fcode (b) 2.3 (c) 103351-02 3.6(i) 1.18/1.3- (e) 3/1.52 103479- (a)(h) 02(e)(f) 2.4 103290-04 3.9 1.18/1.3- (a)(d) 3/1.52 (a)(h) 2.5 103017-05 3.9 1.18/1.3- 3/1.52 (a)(h) 2.5.1 103766-02 3.9 1.18/1.3- 3/1.52 (a)(h) The following table shows the OS and the most recent levels of disc managment software for the SSA Table 2: Solaris Vm CD Vm Patch Solstice SDS Release DiskSuite Patch(j) 2.3 (c) 2.1 102403-04 4.0 102580-13 2.1.1 NONE 4.1 103421-01 2.3 NONE (g) 2.4 2.1 102403-04 4.0 102580-13 2.1.1 NONE 4.1 103421-01 2.3 NONE (g) 2.5 2.1.1 103367-03 4.0 102580-13 2.3 NONE 4.1 103421-01 (g) 2.5.1 2.1.1 103367-03 4.0 102580-13 2.3 NONE 4.1 103421-01 (g) (a) Neither Solaris 2.3, 2.4 11/94 or the 1.18 Fcode support booting from the SSA. (b) 101765-02 , for Solaris 2.3 and SSA2x0 with more than 32 disc. 102283-01 for Solaris 2.4 and SSA2x0 with more than 32 disc. 102446-01, sd/ssd disc format patch for Solaris 2.4. (c) Does not support FASTWRITE. (d) Solaris 2.4 11/94 and 2.4 3/95 now use the same SSA Jumbo Patch. (e) The two patches 103351-02 and 103479-01 are the Solaris 2.3 equivalent of the Solaris 2.4 patch 103290-02 and Solaris 2.5 patch 103017-04. 103351-02 contains the PLN driver, SOC driver, and 3.6 firmware, while 103479-01 contains the SD driver. It is recommended that they be implemented together. (f) This is an SD point patch and will be replaced by a Solaris 2.3 patch in the future. (g) Patch 103421-01 is a diagnostic utility to check for damaged parity on SDS RAID5 volumes. Patch applies only to SDS4.0 (h) The 1.52 Fcode presently is only available by the replacement of the present SOCHA Sbus card with the latest SOCHA Sbus card (501-2069-09). (i) A patch for Solaris 2.3 support with the SSA 3.9 firmware has not been produced and tested. (j) only SDS 4.0 has patches, there are none for 4.1 yet 1. =============================================================================== 11.1 SPARCstorage Array Software Configuration Guide (1996) ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 1 Version 4.1, 29 February 1996 INTRODUCTION This document has two parts: a functionality table to help you decide which version of Solaris will best meet your SPARCstorage Array (SSA) needs, and four different matrices of supported SSA software configurations, one for each supported version of Solaris. Once you have used the functionary table below to choose your preferred version of Solaris, you can go straight to the pages indicated. It is not necessary to refer to the other matrices in the document. We hope you find the document useful. ________________________________________________________________________ SPARCstorage Array Functionality Support Table INSTRUCTIONS: 1. Choose the functionality you need. 2. Find the version of Solaris 2.x which supports that functionality (newer is better). 3. Look up the specific SSA Software Configuration Matrix for the desired Solaris 2.x version on the page number indicated in the table below. |Solaris 2.5|Solaris 2.4|Solaris 2.4|Solaris 2.3| Functionality | HW 11/95 | HW 3/95 | HW 11/94 | | -----------------------|-----------|-----------|-----------|-----------| Matrix page numbers | 2 | 3,4 | 5,6 | 7,8 | -----------------------|-----------|-----------|-----------|-----------| SSA Model 101,102,200 | Yes | Yes | Yes | Yes | -----------------------|-----------|-----------|-----------|-----------| SSA Model 112,210 | Yes | Yes | Yes | Yes** | -----------------------|-----------|-----------|-----------|-----------| RAID 5 | Yes | Yes | Yes | Yes | -----------------------|-----------|-----------|-----------|-----------| NVRAM fast-write | Yes | Yes | Yes | No | -----------------------|-----------|-----------|-----------|-----------| Bootability | Yes | Yes | No | No | -----------------------|-----------|-----------|-----------|-----------| Kernel Asynch. I/O | Yes | Yes | Yes | No | -----------------------|-----------|-----------|-----------|-----------| SunSoft Solstice | Yes | Yes | Yes | Yes* | Disksuite (SDS) 4.0 | | | | | -----------------------|-----------|-----------|-----------|-----------| Veritas Volume Manager | Yes | Yes | Yes | Yes | (VxVM) 2.1.1 | | | | | -----------------------|-----------|-----------|-----------|-----------| Veritas VxVM 2.1 | No | Yes | Yes | Yes | -----------------------|-----------|-----------|-----------|-----------| * Some features not supported. ** Requires patch due to be released 31 March 1996. ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 2 for Solaris 2.5 Hardware: 11/95 Version 4.1, 29 February 1995 GENERAL INFORMATION: As a general rule of thumb, we strongly recommend you get the latest jumbo patches to ensure you have correct functionality support, plus the latest bug fixes. REQUIRED COMPONENTS: Solaris 2.5 Hardware: 11/95 patch 103017-02 or later OPTIONAL COMPONENTS: Solstice DiskSuite (SDS) 4.0 patch 102580-xx for SDS 4.0 Veritas Volume Manager (VxVM) 2.1.1 CONFIGURATION MATRIX SSA System Software is part of Solaris 2.5 Hardware: 11/95, including support for Models 101/102/112/200/210. For SSA installation/upgrade instructions, see: - SMCC SPARC(tm) Hardware Platform Guide Solaris(tm) 2.5 (802-3697-10) - SPARCstorage Array Software and Volume Manager 2.1.1 Product Note (802-5314-10) (if using VxVM) Any combination of software, firmware, and FCode not shown in the table is not supported. ssa | FC/S | Model | Model | firmware | FCode |101,102,200|112,210| ODS | SDS | VxVM | ----------|----------|-----------|-------|---------|---------|-----------| 3.4 (e) | 1.18 (a) | Y | Y | 3.0 (b) | 4.0 (c) | 2.1.1 (d) | 3.4 (e) | 1.33 | Y | Y | 3.0 (b) | 4.0 (c) | 2.1.1 (d) | ----------|----------|-----------|-------|---------|---------|-----------| 2.4 | 1.18 (a) | Y | N | 3.0 (b) | 4.0 (c) | 2.1.1 (d) | 2.4 | 1.33 | Y | N | 3.0 (b) | 4.0 (c) | 2.1.1 (d) | ----------|----------|-----------|-------|---------|---------|-----------| NOTES: a. Does not support bootability; see the SSA install/upgrade instructions to determine the FC/S FCode rev, and how to upgrade it, if necessary. b. Does not support KAIO or RAID 5. c. Requires patch 102580-01 or later to support KAIO. d. VxVM 1.x, 2.0 and 2.1 are not supported on Solaris 2.5. e. Requires patch 103017-02 or later. ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 3 for Solaris 2.4 Hardware: 3/95 Version 4.1, 29 February 1995 GENERAL INFORMATION: SSA System Software is part of Solaris 2.4 Hardware: 3/95 For SSA installation/upgrade instructions, see: - 2.4 HW 3/95 Hardware Platform Guide (802-2966-10) - SPARCstorage Array Product Note (802-2043-10) - SPARCstorage Array Software and Volume Manager 2.1.1 Product Note (802-5314-10) (if using VxVM 2.1.1) - SPARCstorage Array Software and Volume Manager 2.1 Product Note (804-4996-10) (only if using VxVM 2.1) As a general rule of thumb, we strongly recommend you get the latest jumbo patches to ensure you have correct functionality support, plus the latest bug fixes. REQUIRED COMPONENTS: Solaris 2.4 Hardware: 3/95 (Solaris CD, ignore the SSA patches on the Updates CD) SSA jumbo patch 102432-08 (or later) for Solaris 2.4 HW 3/95 patch 102446-01 (format pgm confused by sd and ssd) patch 102283-01 (disks pgm support for more than 32 disks, for Model 200/210) OPTIONAL COMPONENTS: Solstice DiskSuite (SDS) 4.0 patch 102580-xx for SDS 4.0 Online DiskSuite (ODS) 3.0 Veritas Volume Manager (VxVM) 2.1.1 Veritas Volume Manager (VxVM) 2.1 patch 102403-xx for VxVM 2.1 (Software configuration matrix on next page.) ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 4 for Solaris 2.4 Hardware: 3/95 Version 4.1, 29 February 1995 Any combination of software, firmware, and FCode not shown in the table is not supported. ssa | FC/S | Model | Model | firmware | FCode |101,102,200|112,210| ODS | SDS | VxVM | ---------|----------|-----------|-------|---------|---------|----------| 3.4 (a) | 1.18 (c) | Y | Y | 3.0 (d) | 4.0 (e) | 2.1.1 | 3.4 (a) | 1.33 | Y | Y | 3.0 (d) | 4.0 (e) | 2.1.1 | ---------|----------|-----------|-------|---------|---------|----------| 2.4 (b) | 1.18 (c) | Y | N | 3.0 (d) | 4.0 (e) | 2.1.1 | 2.4 (b) | 1.33 | Y | N | 3.0 (d) | 4.0 (e) | 2.1.1 | ---------|----------|-----------|-------|---------|---------|----------| 3.4 (a) | 1.18 (c) | Y | N | | | 2.1 (f) | 3.4 (a) | 1.33 | Y | N | | | 2.1 (f) | ---------|----------|-----------|-------|---------|---------|----------| 2.4 (b) | 1.18 (c) | Y | N | | | 2.1 (f) | 2.4 (b) | 1.33 | Y | N | | | 2.1 (f) | ---------|----------|-----------|-------|---------|---------|----------| a. ssafirmware 3.4 is on patch 102432-10 (or later). See patch README to determine current ssafirmware rev level and how to download new ssafirmware, if necessary. b. ssafirmware 2.4 is on patch 102432-08. See patch README to determine current ssafirmware rev level and how to download new ssafirmware, if necessary. c. Does not support bootability. d. Does not support KAIO or RAID 5. e. Requires patch 102580-01 or later to support KAIO. f. Requires patch 102403-01 or later to support KAIO. ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 5 for Solaris 2.4 Hardware: 11/94 Version 4.1, 29 February 1995 GENERAL INFORMATION: SPARCstorage Array System Software is not part of Solaris 2.4 Hardware: 11/94, nor is it part of any currently supported unbundled software release. Therefore we recommend that only SPARCstorage Array upgrade customers already using Solaris 2.4 Hardware: 11/94 remain on this release. We strongly recommend that customers with new installations use Solaris 2.4 Hardware: 3/95 instead. For SSA upgrade instructions, see: - SPARCstorage Array Product Note (802-2043-10) - SPARCstorage Array Software and Volume Manager 2.1.1 Product Note (802-5314-10) (if using VxVM 2.1.1) - SPARCstorage Array Software and Volume Manager 2.1 Product Note (804-4996-10) (only if using VxVM 2.1) As a general rule of thumb, we strongly recommend you get the latest jumbo patches to ensure you have correct functionality support, plus the latest bug fixes. REQUIRED COMPONENTS: Solaris 2.4 Hardware: 11/94 (Solaris CD, no SSA software on the Updates CD) SSA jumbo patch 102347-08 (or later) for Solaris 2.4 HW 11/94 patch 102446-01 (format pgm confused by sd and ssd) patch 102283-01 (disks pgm support for more than 32 disks, for Model 200/210) OPTIONAL COMPONENTS: Solstice DiskSuite (SDS) 4.0 patch 102580-xx for SDS 4.0 Online DiskSuite (ODS) 3.0 Veritas Volume Manager (VxVM) 2.1.1 Veritas Volume Manager (VxVM) 2.1 patch 102403-xx for VxVM 2.1 (Software configuration matrix on next page.) ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 6 for Solaris 2.4 Hardware: 11/94 Version 4.1, 29 February 1995 Any combination of software, firmware, and FCode not shown in the table is not supported. ssa | FC/S | Model | Model | firmware | FCode |101,102,200|112,210| ODS | SDS | VxVM | ---------|----------|-----------|-------|---------|---------|----------| 3.4 (a) | 1.18 (c) | Y | Y | 3.0 (d) | 4.0 (e) | 2.1.1 | 3.4 (a) | 1.33 (c) | Y | Y | 3.0 (d) | 4.0 (e) | 2.1.1 | ---------|----------|-----------|-------|---------|---------|----------| 2.4 (b) | 1.18 (c) | Y | N | 3.0 (d) | 4.0 (e) | 2.1.1 | 2.4 (b) | 1.33 (c) | Y | N | 3.0 (d) | 4.0 (e) | 2.1.1 | ---------|----------|-----------|-------|---------|---------|----------| 3.4 (a) | 1.18 (c) | Y | N | | | 2.1 (f) | 3.4 (a) | 1.33 (c) | Y | N | | | 2.1 (f) | ---------|----------|-----------|-------|---------|---------|----------| 2.4 (b) | 1.18 (c) | Y | N | | | 2.1 (f) | 2.4 (b) | 1.33 (c) | Y | N | | | 2.1 (f) | ---------|----------|-----------|-------|---------|---------|----------| a. ssafirmware 3.4 is on patch 102347-10 (or later). See patch README to determine current ssafirmware rev level and how to download new ssafirmware, if necessary. b. ssafirmware 2.4 is on patch 102347-08. See patch README to determine current ssafirmware rev level and how to download new ssafirmware, if necessary. c. Bootability is not supported on Solaris 2.4 HW 11/94 regardless of the FCode used. d. Does not support KAIO or RAID 5. e. Requires patch 102580-01 or later to support KAIO. f. Requires patch 102403-01 or later to support KAIO. ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 7 for Solaris 2.3 Version 4.1, 29 February 1995 GENERAL INFORMATION: SPARCstorage Array system software for Solaris 2.3 is part of the unbundled SPARCstorage Array Software and Volume Manager product. You will need this product in order to install SPARCstorage Array system software, even if you do not wish to use the optional Volume Manager. SPARCstorage Array system software is bundled as part of newer Solaris releases. For SSA installation/upgrade instructions, see: - SPARCstorage Array Product Note (802-2043-10) - SPARCstorage Array Software and Volume Manager 2.1.1 Product Note (802-5314-10) or SPARCstorage Array Software and Volume Manager 2.1 Product Note (804-4996-10) As a general rule of thumb, we strongly recommend that you get the latest jumbo patches to ensure you have correct functionality support, plus the latest bug fixes. We strongly recommend you use Solaris 2.4!!! REQUIRED COMPONENTS: Solaris 2.3 kernel jumbo -54, -68, or later SPARCstorage Array Software and Volume Manager 2.1.1 (or 2.1) 102465-03 (or later) for SPARCstorage software 2.1.1 (or 2.1) patch 101765-02 (disks pgm support for more than 32 disks, for Model 200/210) OPTIONAL COMPONENTS: Solstice DiskSuite (SDS) 4.0 patch 102580-xx for SDS 4.0 Online DiskSuite (ODS) 3.0 Veritas Volume Manager (VxVM) 2.1.1 Veritas Volume Manager (VxVM) 2.1 patch 102403-xx for VxVM 2.1 (Software configuration matrix on next page.) ________________________________________________________________________ SPARCstorage Array Software Configuration Matrix Page 8 for Solaris 2.3 Version 4.1, 29 February 1995 Any combination of software, firmware, and FCode not shown in the table is not supported. ssa | FC/S | Model | Model | firmware | FCode |101,102,200|112,210| ODS | SDS | VxVM | ---------|----------|-----------|-------|---------|---------|----------| 3.4 (a) | 1.18 (c) | Y | Y | 3.0 (d) | 4.0 (e) | 2.1.1 (e)| 3.4 (a) | 1.33 (c) | Y | Y | 3.0 (d) | 4.0 (e) | 2.1.1 (e)| ---------|----------|-----------|-------|---------|---------|----------| 1.12(b) | 1.18 (c) | Y | N | 3.0 (d) | 4.0 (e) | 2.1.1 (e)| 1.12(b) | 1.33 (c) | Y | N | 3.0 (d) | 4.0 (e) | 2.1.1 (e)| ---------|----------|-----------|-------|---------|---------|----------| 3.4 (a) | 1.18 (c) | Y | N | | | 2.1 (e) | 3.4 (a) | 1.33 (c) | Y | N | | | 2.1 (e) | ---------|----------|-----------|-------|---------|---------|----------| 1.12(b) | 1.18 (c) | Y | N | | | 2.1 (e) | 1.12(b) | 1.33 (c) | Y | N | | | 2.1 (e) | ---------|----------|-----------|-------|---------|---------|----------| a. ssafirmware 3.4 will be on a future revision of patch 102465, due 31 March 1996. b. ssafirmware 1.12 is on patch 102465-02. See patch README to determine current ssafirmware rev level and how to download new ssafirmware, if necessary. c. Bootability is not supported on Solaris 2.3 regardless of the FCode used. d. Does not support KAIO or RAID 5. e. KAIO not supported on Solaris 2.3. =============================================================================== SOLUTION SUMMARY: PRODUCT AREA: SunOS Unbundled PRODUCT: Veritas Volume Manager SUNOS RELEASE: any HARDWARE: SPARCstorage Array