INFODOC ID: 13420

SYNOPSIS: SPARCstorage Array and Volume Manager Support Document
DETAIL DESCRIPTION:

 
==========================================================================
		SPARC Storage Array and Veritas Software

			TABLE OF CONTENTS

1.0  About the SSA
1.1	Terms to be Aware of

2.0  Debugging Techniques
2.1  Diagnostic Switch on the Array Controller
2.2  POST Errors
2.2.1	OBP Info
2.3  Messages File Information
2.3.1	Boot Cycle and SSA Error Recovery
2.3.2	Reasons for OFFLINE/ONLINE Messages
2.3.2.1   Common Causes for these messages
2.4  COMMANDS to Use and What to Look For
2.4.1   SSAADM
2.4.2   PRTVTOC
2.4.3   Veritas GUI Views
2.4.4   VX Commands
2.4.5   VXDISK
2.4.6   VXPRINT
2.4.7   VXPRIVUTIL
2.4.8   VXSTAT
2.4.9   VXINFO
2.5  Understanding Plex States
2.6  Understanding Volume States  
2.7  System Does Not See the SSA
2.8  System Cannot Run the vxconfigd, Volume Manager won't start
2.9  Miscellaneous Statistical Commands

3.0  Quick Fixes, How-To's and Theory of Operations
3.1  Bootable Disk Encapsulation and Mirroring - How-to
3.1a  How to 'unroot' your Bootable Disk
3.2  Volume Manager is not seeing a disk device any longer
3.3  Software and Firmware Differences, What You Need to Know
3.4  Hot Spares How They Work
3.5  WWN (World Wide Number) and How to Change it
3.5a  WWN How to Use the New One
3.6  Use of 'vxdiskadm' Utility
3.7  Remove/Replace a Disk
3.8  Boot Issues
3.9  SSA fast_write feature versus PrestoServe
3.9.1  Prestoserve with host-based RAID (SDS/VxVM) products
3.10  RAID-5 Information
3.11  ROOTDG Recovery
3.12  Loss of Disk Group Configuration Information; How to Save the Info
3.13  IOSTAT Output; How to find the listed "ssd" device it reports
3.14  Dual Hosted SSAs and Simulation of a "failover" on dual-host
3.15  Moving an SSA to another system.
3.16  Fail-Over Simulation for testing hot spares with Mirrors and Raid-5

4.0  FAQ's

5.0  Patch information
5.1  SPARCstorage Array, Veritas Volume Manager PATCHLIST

6.0  Bugs and RFE's - Known Problems
6.1  Downloading SSA FW and Resetting SSA has created some malfunctioning 
       controllers
6.2  Prestoserve Causes Corruption with SSA, Volume Manager and RAID5
6.3  VXVA Core Dumps when Reconnecting a Disk
6.4  Documentation Error with Prestoserve and Volume Manager Booting Single-User


7.0  All other References Available
7.1  FRU part numbers
7.2  Documentation Part Numbers


8.0  Supportability from NASC
 
9.0  Additional Support Information
9.1  Veritas tech Bulletin: Prestoserve and Veritas Volume configuration

10.0  WARRANTY Information

11.0   SSA Support Matrix (Jan. 1997)
11.1  SPARCstorage Array Software Configuration Guide (Version 4.1 1996)


===============================================================================

		SPARC Storage Array and Veritas Software

The focus of this document is information related to the
SPARCSTorage Array (SSA) and the use of the Veritas 
Volume Manager software.  DiskSuite users may find the SSA information 
very beneficial, but should ignore all references to the Volume Manager (VM)
and instead go to their DiskSuite manuals for information.

1.0  	About the SPARC Storage Array

The SSA (SparcStorage Array) is a 'disk farm'.  It can consist of up to 36 
disk devices, depending on the model; the "100" series can contain up to 
30 disk devices, and the "200" series can contain up to 36 disks devices.

There is a controller interface that communicates to the host system via a 
fiber optic cable.  Taking advantage of this wide-bandwidth for communication 
between the SSA and the host, we have placed 6 SCSI controllers into the 
array control board.

This device gives you a huge storage capacity in a small space.  


1.1	Some TERMS to be aware of:

disk		Physical or logical (virtual) disk device.

DiskSuite/ODS/SDS
		Sun GUI based utility to allow "virtual" devices and ease of
		administering many devices.

encapsulate	Place (device) under control of the Veritas Volume Manager 
		software application.

plex		An ordered collection of subdisks which are used to build 
		virtual devices.

pln		SparcStorage Array controller board.	

RAID		Redundant array of independent disks.

soc		Host adapter for the fiber channel cable to an SSA.

SSA		Physical device that houses multiple disk devices.

SSA drivers	Software drivers for the SSA box and disks.

subdisk		A part of a disk.  In Veritas Volume Manager this is a logical 
		or virtual sub-section of a disk.

volume		A virtual device that can be accessed by a host system.

Volume Manager, vx, vxvm, vxva
		Veritas Volume Manager software - a GUI based utility used for
		ease of administering many disk devices.  It allows the use of
		"virtual" devices comprised of multiple or single physical
		disk devices.

------------------------------------------------------------------------------

2.0   Debugging Techniques 

2.1.  Diag button on the array controller card.
      This will give you some hardware diagnostics on the array controller.
2.2.  POST (Power On Self Test) error codes:
	CODE:		MEANING:		ACTION:
	01		LCD Failure		Replace fan tray
	08		Fan failure		Replace fan tray
	09		Power supply failure	Replace power supply
	30		Battery failure		Replace battery module
	All others	Controller failure	Replace controller
	PMF		Firmware failure	Replace controller

(3.x firmware, when disks are all spun up, what shows up on the front
panel is all you get.)

2.2.1  OBP Info (Open Boot Prom)

There is not much you can tell from the Prom prompt where an SSA is concerned.
The 'show-devs' command displays the soc and pln.

The display back depends on what version of FCode is running on your host
adapter board:
	1.18 the OBP uses "dummy" WWN addresses.
	1.33 only shows the 'soc@n,n' - this is desirable for booting!

To determine which FCode version(s) you have, use the following procedure:
	ok  setenv fcode-debug? true
	ok  reset
	ok  show-devs

(show-devs output will look something like)
.
.
/iommu@0,10000000/sbus@0,10001000/le@1,c00000
/iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0
/iommu@0,10000000/sbus@0,10001000/ledma@4,8400010
/iommu@0,10000000/sbus@0,10001000/SUNW,bpp@4,c800000
/iommu@0,10000000/sbus@0,10001000/espdma@4,8400000
/iommu@0,10000000/sbus@0,10001000/SUNW,DBRIe@2,10000/mmcodec
/iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0/SUNW,pln@a0000800,201cac11
/iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0/SUNW,pln@a0000800,201cac11/SUNW,ssd

Find the lines for the 'soc' board.  To determine the FCode version for any 
'soc' card, go to that card and issue the correct sccsid 
command:
	ok  cd <short-path-to-the-soc>
	       /iommu@0,10000000/sbus@0,10001000/SUNW,soc@0,0
	ok  sccsid type
returns:  <version> <date>    1.33 95/04/19
	ok  device-end

You can use this to look at all the 'soc' boards you have.
When you are done, set the variable back:
	ok  setenv fcode-debug? false
	ok  reset
	ok


2.3. MESSAGES FILE:

The device name is actually a pseudo name used by the kernel.  We reference a 
disk slice to mount as a filesystem by saying:

	mount [options] /dev/dsk/c0t3d0s0 <mountpoint>
 
If you do a long listing of '/dev/dsk/c0...' you will notice that they are
all links to a much longer directory structure.  This actually points to the 
physical device (eventually).  

To be able to understand the messages information, you must first 
understand the structure of the addressing of the device(s).
(Your system's directory structure may be different, this is only an example.  
They will all begin in the '/devices' directory, however.)

Here is the first disk in 'our' SSA:
c1t0d0s0 -> /devices/iommu@f,.../sbus@f,.../SUNW,soc@1,0/SUNW,pln@a0000000,7537b7/ssd@0,0:a
                     ^^^^^^^^^^^                 *** &                |||| ||||||     +,+

^^^^^^  = CPU
***   	= SSA sbus host adapter card  (i.e.: first is c1...)
&	= sbus slot plugged into
||||||	= 12 digit WWN of the SSA controller board (8 digits to right of comma)
+,+	= target scsi, disk address   (i.e.:...t0d0)

(On a multi-processor system, the 'iommu' will be 'io-unit' and this will 
indicate which CPU in the system.)

The information we would be interested in from this would be which CPU,
which sbus host-adapter, which SSA; then maybe which disk.

Each listing in the messages file contains the above information.

When the system boots up, there are many lines indicating whether or not 
the SSA has come up.  A normal boot cycle reflects an online
message and a login message for the fiber channel.  If either one of 
these is missing, the SSA probably is not online.
 
Example: (NOTE this has been formatted to fit 80 column output)
Jan  8 16:35:19 unix unix: SUNW,soc1 at sbus0: SBus slot 1 0x0 and SBus slot 1
 0x10000 and SBus slot 1 0x20000 SBus level 3 sparc ipl 5
Jan  8 16:35:21 unix unix: ID[SUNWssa.soc.link.6010] soc1: port 0: Fibre 
Channel is ONLINE 
Jan  8 16:35:21 unix unix: ID[SUNWssa.soc.login.6010] soc1: Fibre Channel 
login succeeded 
Jan  8 16:35:21 unix unix: ID[SUNWssa.soc.link.1010] soc1: message:  SSA100 
V2.4 (092995) Fri Sep 29 16:20:32 1995   
Jan  8 16:35:21 unix unix: SUNW,pln4 at SUNW,soc1: soc_port 0
Jan  8 16:35:21 unix unix: SUNW,pln4 is 
/iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@1,0/SUNW,pln@a0000000,7537b7
Jan  8 16:35:21 unix unix: ssd90 at SUNW,pln4: target 0 lun 0
Jan  8 16:35:21 unix unix: ssd90 is 
/iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@1,0/SUNW,pln@a0000000,7537b7/ssd@0,0
[... messages about all the disks follows...]

Notice the first line in this group.  It contains information about the 
'soc1' board; this one is connected in sbus slot 1.
Next there is the appropriate ONLINE message followed by a successful login 
message.
Next there is a line with the model and firmware version of this controller.  
This one happens to be running firmware version 2.4
The last lines are the array controller reporting that it is connected to 
port 0 on the sbus board and is communicating with the system.  
[**NOTE:**
The sbus is a dual-ported board, but we do not recommend use of the second 
port at this time.  If there are two FC/OM (fiber module) on one sbus card, 
the system usually will have IO contention problems on that card.  Our 
system "bus" does not have enough 'power' to drive two fiber channels on 
one card.  If configured this way, you will see many OFFLINE/ONLINE messages.
Future architectures may not have this problem, but the SPARC processor
architecture does not have enough power.]

Look for all of the above.  If there is no 'fibre 
channel ONLINE' message, then the SSA has a serious communication problem 
with the system.  This is usually a hardware problem with the fiber channel.


2.3.1  BOOT CYCLE and SSA ERROR Recovery.

The SSA always goes through these steps in the following order each time the 
soc gets a reset:
	1.  Wait for ONLINE.
	2.  Issue the 'login'.
	3.  Wait for 'login' response.
	    TEST for response - timeout? if YES force OFFLINE.
		              - other error?  if YES go to 2.
	4.  Allow the pln to send commands.
	    TEST - command not completed? if YES, do 'timeout recovery'  
			                  and force OFFLINE.
		 - other errors? if YES, go to 4.  
	       (These get printed in the messages file)
    (While in OFFLINE state, wait for ONLINE.....step 1.)

Notice that the pln prints messages for the errors it gets, however, 
no retryable messages are printed, so you see only real errors in the 
messages file.


2.3.2  Reasons for the OFFLINE/ONLINE Messages.

There are many reasons for these messages to appear on your console, and in 
your messages file.

The most common occurrence is due to peak system loads.  When they system is 
so busy, that it doesn't wait for the SSA to respond, or it can't service the 
SSA fast enough.
These will show up once in a while, so you may some (a few) entries, maybe
even a few times per day.  It is the frequency and duration of these that
may indicate a problem.  Otherwise this is normal.

You know there is a problem when these messages go on and on for a couple 
minutes or longer, all throughout the day.
If you look at the 'error recovery' cycle steps listed above, you can see 
where these messages come from, it is OFFLINE, so it immediately goes into 
waiting for an ONLINE; then if it goes ONLINE, and has a problem, it goes 
back to a forced OFFLINE, and waits again.

Why does this happen?
The system issues a command to the array, and expects it to respond within 
a specific time.  If the system does not get a response within the timeframe 
(about 60 to  70 nanoseconds), it forces the controller OFFLINE due to a 
timeout on response to the command issued.

While a command is timing out, all other commands are still being processed!
There is one exception to this, a database can also timeout, (like Oracle 
can 'crash' if its command is timed out) and its process may die.


  2.3.2.1  Common CAUSES of the OFFLINE problem:

In order of the most likely cause of the majority of Offline/Online
messages:

	Fiber channel hardware.
	   -The FC/OM (optical module) needs to be a '-03'(or higher) at the 
		end of the part number.  The lower revisions had problems.
		Check both ends of the cable.
	   -Fiber cable could be dirty, or have a loose connection.  
		(Be sure to always use the end caps to protect cable if it
		is being moved.
 	   -Fiber cable being bent beyond what it should be, or being 
		broken by someone standing or stepping on it.
	 
	The Array Controller card.
	   -Firmware version is a big issue here.  If running a current 
		version of firmware then there is a possibility of a faulty 
		board.

	Software drivers and/or firmware

	System IO load balance.
	   -We have had a couple instances where the system was running too 
		many memory intensive applications, causing the SSA to have 
		to 'wait' for CPU time.  This can be 'fine-tuned' by
 		distributing the IO loading in larger systems, or maybe 
		even adding enough memory for all the applications to run 
		(with out stepping on each other).  [For assistance in this
		area, please contact your local sales representative.]

How does one know which of the above is the source of the problem?
How can you tell whether these messages are coming from software,
firmware, or the Fiber channel hardware?

Check the system's Messages.

In order to be able to troubleshoot the problem further, you must look
at the messages.  What is preceding the first OFFLINE, or subsequent
OFFLINE messages?  This will tell you where the most likely source
of the problem really is.

First, what device is reporting the error?  Is it the soc, or the pln?
This will begin to point to the source of the problem.  The soc
is the Fiber channel/handler, and the pln is the SSA driver.  If
you see messages relating to 'soc' there is good chance the problem is 
either in the fiber channel hardware.  If there are 'pln' messages,
then it's more likely not the fiber channel, but elsewhere.  The
pln is the driver software that talks to the SSA, so based on the
actual messages, you should be able to find the source of the
problem.

Below is a short 'table' of the most likely sources of 'offline/online'
messages.  (They are ordered top (most likely) to bottom (least likely)
per category.  [NOTE: This is a guide based on what we know at this time;
this should not imply these are the only possibilities.]

--------------------------------------------------------------------------
--------------------------------------------------------------------------

Messages information:			Suspect source of problem:
				      (ordered most likely to least likely)
------------------------		---------------------------
Offline/Online -  "plain"		Hardware; Fiber cable
(without any other associated 			  soc**
message indications)				  SSA controller
						

Timeout Recovery -    
   Timeout recovery being invoked	Usually software (75% it is); 
   					    SSA firmware 	
   					    SSA driver
					    SSA hardware
   					    
Transport Error -  			For all of these:
   Transport error:  incomplete		    Bad disk drives
   		     reset		  Software;
   		     timeout		    ssd
   		     data_ovr		    ssa/pln driver
   		        		    SSA firmware
   		        		    kernel
   
   
Transport Rejected - 			Hardware; Fiber cable 
					   or soc**

Media errors -				Disk drives

If any messages with SYS_NOTICE -	SSA firmware
					
	[ NOTE  ** denotes FC/OMs also as a possibility ]

---------------------------------------------------------------------------
---------------------------------------------------------------------------

One of the most frustrating messages is the 'Timeout Recovery' message.
This one in particular needs to be examined a bit more closely.  Check
for any other messages, like any disk related errors, etc.  If so, then
use those to determine the most likely cause of the problem.

Here is one example of what these may look like:

 16:17:21 unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@0,0/
SUNW,pln@a0000000,78ad31 (SUNW,pln0):
 16:17:21 unix:  Timeout recovery being invoked...
 16:17:21 unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel
is OFFLINE
 16:17:22 unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel
is ONLINE
 16:17:22 unix: ID[SUNWssa.soc.login.6010] soc0: Fibre Channel login 
succeeded
 16:17:22 unix: ID[SUNWssa.soc.link.1010] soc0: message:  SSA110 V3.9 

When we get Timeout Recovery messages, it means that there was a timeout
on a command sent to the SSA.  In the example above, this is the only 
message; there are no other messages from the SSA or disks, only this
'Timeout Recovery' message followed by the recovery procedure messages
(the subsequent offline and online/login messages).

The software will attempt a recovery by flushing out the transport and
reconnecting.  This is what causes the following offline/online sequence.
It is trying to do the operation again.

A hardware problem on the controller that allows the link to become 
established (online and login succeeds) but the commands are all timing 
out will just continually 'timeout' and go through the recovery 
offline/online over and over again.  If this is all there is in the
messages file and all from one SSA, then the most likely suspect would 
be the controller board itself.  You might also see this on one of 
the isp (scsi controller chips) on the board, but you will also have
other messages relating to those addresses.

Of course, a combination of software and hardware may still be the
cause of problems.  The best you can do is to get the software at the 
most current levels (including disk firmware levels), and from there
most problems may be hardware related.  Basically, try to rule out
one or the other based on versions and messages.
	
EXAMPLES:
Here are some examples of what you might see in a messages file 
relating to offline/online sequences.  See if you can figure out the
source of the problems.

[ I have stripped out the date and system name for space savings. ]
       ----------------------------------------------------
#1)
 07:25:08  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05 (SUNW,pln2):  
 07:25:08  unix:  Timeout recovery being invoked...  
 07:25:08  unix: ID[SUNWssa.soc.link.5010] soc1: port 0: Fibre Cha
nnel is OFFLINE 
 07:25:09  unix: ID[SUNWssa.soc.link.6010] soc1: port 0: Fibre Cha
nnel is ONLINE 
 07:25:09  unix: ID[SUNWssa.soc.login.6010] soc1: Fibre Channel lo
gin succeeded 
 07:25:09  unix: ID[SUNWssa.soc.link.1010] soc1: message:  SSA100 
V3.6 (031896) Mon Mar 18 19:57:51 1996   
 07:29:28  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05 (SUNW,pln2):  
 07:29:28  unix:  Timeout recovery being invoked...  
 07:29:28  unix: ID[SUNWssa.soc.link.5010] soc1: port 0: Fibre Cha
nnel is OFFLINE 
 07:29:29  unix: ID[SUNWssa.soc.link.6010] soc1: port 0: Fibre Cha
nnel is ONLINE 
 07:29:29  unix: ID[SUNWssa.soc.login.6010] soc1: Fibre Channel lo
gin succeeded 
 07:29:29  unix: ID[SUNWssa.soc.link.1010] soc1: message:  SSA100 
V3.6 (031896) Mon Mar 18 19:57:51 1996   
 07:30:08  unix: ID[SUNWssa.soc.link.5010] soc1: port 0: Fibre Cha
nnel is OFFLINE 
 07:31:08  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145):  
 07:31:08  unix:  Transport error:  Fibre Channel 
 07:31:08  unix: Offline
 07:31:08  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145):  
 07:31:08  unix:  requeue of command fails (ffffff
 07:31:08  unix: fe)  
 07:31:09  unix: NOTICE: vxvm:vxio: Disk c3t3d1s2: Unexpected stat
us on close: 0
 07:31:09  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145):  
 07:31:09  unix:  transport rejected (-2)  
 07:31:09  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145):  
 07:31:09  unix:  transport rejected (-2)  
 07:31:09  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05/ssd@3,1 (ssd145):  
 07:31:09  unix:  transport rejected (-2)  
 07:31:09  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05/ssd@3,2 (ssd146):  
 07:31:09  unix:  transport rejected (-2)  
 07:31:09  unix: NOTICE: vxvm:vxio: Disk c3t3d2s2: Unexpected stat
us on close: 0
 07:31:09  unix: WARNING: /io-unit@f,e1200000/sbi@0,0/SUNW,soc@2,0
/SUNW,pln@a0000000,740f05/ssd@3,3 (ssd147):  
 07:31:09  unix:  transport rejected (-2)  
       ----------------------------------------------------
For #1, if you decided that there is good chance the fiber cable is
the most likely sus[ect, you win the prize!
       ----------------------------------------------------
#2)
 02:07:29  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e (SUNW,pln1):  
 02:07:29  unix:  Timeout recovery being invoked...  
 02:07:37  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e (SUNW,pln1):  
 02:07:37  unix:  Timeout recovery failed, resetting  
 02:07:37  unix: ID[SUNWssa.soc.driver.1010] soc0: host adapter fw date code: Wed Jan 17 20:34:59 1996  
 02:07:37  unix:  
 02:07:37  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:37  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:37  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:37  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:38  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:38  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:38  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:38  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:39  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:39  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:39  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:39  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:40  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:40  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:40  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:40  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:41  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:41  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:07:41  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:07:42  unix: ID[SUNWssa.soc.login.6010] soc0: Fibre Channel login succeeded 
 02:07:42  unix: ID[SUNWssa.soc.link.1010] soc0: message:  SSA110 V3.6 (031896) Mon Mar 18 19:57:51 1996  
 02:09:28  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e (SUNW,pln1):  
 02:09:28  unix:  Timeout recovery being invoked...  
 02:09:28  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:09:29  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:09:29  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:09:29  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:09:29  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:09:30  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:09:30  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:09:30  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:09:30  unix: ID[SUNWssa.soc.link.5010] soc0: port 0: Fibre Channel is OFFLINE 
 02:09:31  unix: ID[SUNWssa.soc.link.6010] soc0: port 0: Fibre Channel is ONLINE 
 02:10:32  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@5,1 (ssd41):  
 02:10:32  unix:  Error for command 'write(10)' Err
 02:10:32  unix: or Level: Retryable
 02:10:32  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5):  
 02:10:32  unix:  Transport error:  Fibre Channel Of
 02:10:32  unix:  Requested Block 1705952, Error Block: 1705952 
 02:10:32  unix: fline
 02:10:33  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5):  
 02:10:33  unix:  Transport error:  Fibre Channel Of
 02:10:33  unix:  Sense Key: Hardware Error 
 02:10:33  unix: fline
 02:10:33  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5):  
 02:10:33  unix:  Transport error:  Fibre Channel Of
 02:10:33  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@5,1 (ssd41):  
 02:10:33  unix:  requeue of command fails (fffffff
 02:10:33  unix: fline
 02:10:33  unix: e)  
 02:10:33  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5):  
 02:10:33  unix:  Transport error:  Fibre Channel Of
 02:10:33  unix: fline
 02:10:33  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5):  
 02:10:33  unix:  Transport error:  Fibre Channel Of
 02:10:33  unix: fline
 02:10:33  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,1 (ssd6):  
 02:10:33  unix:  Error for command 'write(10)' Erro
 02:10:33  unix: WARNING: /io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,pln@a0000000,78be6e/ssd@0,0 (ssd5):  
 02:10:33  unix:  Transport error:  Fibre Channel Of
 02:10:33  unix: r Level: Retryable
 02:10:33  unix: fline
 02:10:33  unix:  Requested Block 570688, Error Block: 570688 
       ----------------------------------------------------
Now, #2 is a bit more difficult, and not unlike what might be seen 
on a system.  If we analyze this, based on the information given
about offline/online messages, we come up with more than one possible
source of the problem.   Let's take a look together.

We begin with 'timeout recovery being invoked' which is usually caused
by a software problem; either firmware or the ssa/pln driver.
Next, we see a whole bunch of "plain offline/online messages which is
normally a fiber cable problem; it might not be seated well.
Finally we end up with disk errors, one is a retryable write; the rest
are just reproting 'transport error' fiber offline with hardware sense
error.

So, in this one example we now have three possibilities: bad or loose
fiber cable to the ssa; bad firmware running in the ssa; bad disk(s).
[this messages file continued the same patterns of message outputs 
over and over, listing almost each disk device in the array.]

Based on the evidence here, I would first try the fiber cable, because
it is the easiest, and because if the cable connection is not good,
the communications will not be correct.  If this did not clean up the
problem, I would then try a firmware download.  Normally doing these
two to a system like this one should eliminate most of the extraneous
messages.  Then all that would be left, most likely, might be one
or two bad disk devices.

If after changing out the fiber cable and the firmware and maybe a 
disk or two, if the messages still persist, we still have the balance 
of the hardware for the fiber channel connection, the SSA driver software
package and the kernel.

I wanted to use this example to show that it can be a combination of
things, but normally they are inter-related.

       ----------------------------------------------------
       ----------------------------------------------------


2.4 	COMMANDS to use, and what to look for.

There are many commands available to assist you in debugging a problem in 
your SSA.

  2.4.1  SSAADM:
	 ssaadm (or ssacli) This command can give you information about the 
         SSA.

Example of displaying controller information:
# ssaadm display c1

                    SPARCstorage Array Configuration
                     (ssaadm version: 1.10 95/11/27)
Controller path:/devices/iommu@f,e0000000/sbus@f,e0001000/SUNW,soc@1,                0/SUNW,pln@a0000000,7537b7:ctlr
                          DEVICE STATUS
      TRAY 1                 TRAY 2                 TRAY 3
slot
1     Drive: 0,0             Drive: 2,0             Drive: 4,0        
2     NO SELECT              NO SELECT              NO SELECT         
3     NO SELECT              NO SELECT              NO SELECT         
4     NO SELECT              NO SELECT              NO SELECT         
5     NO SELECT              NO SELECT              NO SELECT         
6     Drive: 1,0             Drive: 3,0             Drive: 5,0        
7     Drive: 1,1             Drive: 3,1             Drive: 5,1        
8     Drive: 1,2             Drive: 3,2             Drive: 5,2        
9     Drive: 1,3             Drive: 3,3             Drive: 5,3        
10    Drive: 1,4             Drive: 3,4             Drive: 5,4        

                          CONTROLLER STATUS
Vendor:        SUN     
Product ID:    SSA100          
Product Rev:   1.0 
Firmware Rev:  2.4 
Serial Num:    0000007537B7
Accumulate Performance Statistics: Enabled


You can also use this command to give you some performance information.
Just use a -p option after the display:   # ssaadm display -p cN

Other things available here are enabling or disabling the 'fast_write' NVRAM 
caching.  Starting and stopping or reserving disks, trays, etc.  (Use extreme
caution when using these types of options!)  Refer to section 3.3 for more 
information on enable/disable of fast_writes.


  2.4.2  PRTVTOC:

This command will give you information about a disk.  This can be very 
useful, especially when running the Veritas Volume Manager software and 
having a root disk encapsulated for use with this software.

	prtvtoc /dev/rdsk/<disk>s2

An example of a normal Solaris bootable (root) disk:

# prtvtoc /dev/rdsk/c0t3d0s2
* /dev/rdsk/c0t3d0s2 partition map
*
* Dimensions:
*     512 bytes/sector
*      72 sectors/track
*      14 tracks/cylinder
*    1008 sectors/cylinder
*    2038 cylinders
*    2036 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00          0     51408     51407   /
       1      3    01      51408    132048    183455
       2      5    00          0   2052288   2052287
       4      7    00     183456    226800    410255   /var
       5      6    00     410256    719712   1129967   /opt
       6      4    00    1129968    819504   1949471   /usr
       7      8    00    1949472    102816   2052287   /export

An example of a bootable (root) encapsulated disk:

# prtvtoc /dev/rdsk/c0t3d0s2
* /dev/rdsk/c0t3d0s2 partition map
*
* Dimensions:
*     512 bytes/sector
*      72 sectors/track
*      14 tracks/cylinder
*    1008 sectors/cylinder
*    2038 cylinders
*    2036 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00          0     51408     51407   /
       1      3    01      51408    132048    183455
       2      5    00          0   2052288   2052287
       3     14	   01       2016   2050272   2052287
       4      7    00     183456    226800    410255   /var
       5      6    00     410256    719712   1129967   /opt
       6      4    00    1129968    819504   1949471   /usr
       7     15    01          0      2016      2015


An example of an encapsulated root disk MIRROR:

# prtvtoc /dev/rdsk/c0t1d0s2
* /dev/rdsk/c0t1d0s2 partition map
*
* Dimensions:
*     512 bytes/sector
*      72 sectors/track
*      14 tracks/cylinder
*    1008 sectors/cylinder
*    2038 cylinders
*    2036 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*	First     Sector    Last
*	Sector     Count    Sector 
*     2052288 4293328288    413279
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00     413280   1639008   2052287	/
       2      5    00          0   2052288   2052287
       3     15    01          0      2016      2015
       4     14    01       2016   2050272   2052287

Note that some slices have unusual "tags" of 14 and 15.  This is the 
Veritas software.  The tag of 15 is the Private region that the Volume Manager 
uses, and the tag of 14 is the Public region that it uses (which is the 
rest of the disk).

Only disks under control of the Volume Manager software will have these tags.

The interesting fact about the mirror disk is that there is no underlying
'unix' slicing information.  This disk must have the Veritas software running.
Yes, it is a bootable device, but the Veritas software MUST start up, or the 
system will not boot up all the way to multi-user mode.


  2.4.3  VERITAS GUI VIEWS:

The GUI is used for summary status information.  You can tell at a glance
whether or not a raid-5 or mirrored volume is completely 'up' or not.  
You can tell what volumes are running or not running.  It shows you what 
physical disks have what 'logical' disk names.

There are various Views available to you.  One shows everything; one shows 
disks that are not under control of this software; one shows a graphic 
representation of the SSA device; the rest are the disk groups that you have 
set up for use.


  2.4.4  VX COMMANDS:

The Veritas Volume Manager software has quite a few handy commands to get
information about what is happening on the disks.  This is nice, because we 
cannot always use the GUI.

	vxdisk; vxprint; vxprivutil

  2.4.5  The vxdisk command 	can show you what device belongs to what disk 
	 group and its current status.
	
	# vxdisk list

DEVICE       TYPE      DISK         GROUP        STATUS
c0t1d0s2     sliced    -            -            online
c0t2d0s2     sliced    -            -            error
c0t3d0s2     sliced    -            -            error
c1t0d0s2     sliced    disk12       rootdg       online
c1t1d0s2     sliced    disk04       rootdg       online
c1t1d1s2     sliced    disk01       rootdg       online
c1t1d2s2     sliced    -            -            online
c1t1d3s2     sliced    -            -            online
c1t1d4s2     sliced    -            -            online
c1t2d0s2     sliced    disk08       rootdg       online
c1t3d0s2     sliced    disk09       rootdg       online
c1t3d1s2     sliced    -            -            online
c1t3d3s2     sliced    -            -            offline
c1t3d4s2     sliced    -            -            online
c1t4d0s2     sliced    -            -            error
c1t5d0s2     sliced    -            -            error
c1t5d1s2     sliced    -            -            error
c1t5d2s2     sliced    -            -            error
c1t5d3s2     sliced    -            -            error
c1t5d4s2     sliced    -            -            error
-            -         disk03       rootdg       <some status> was:c1t3d3s2
                       ^^^^^^       ^^^^^^                     ^^^^^^^^^^^^
NOTICE that this tells you disk03 was device c1t3d3(s2) and was in the rootdg 
disk group.

The status information about the SSA devices through Volume Manger can tell 
you what is happening very quickly.  You might use this information to bring 
this device back under Volume Manager control.


Here is another use of this list command.  This lists more details about 
each disk:

	# vxdisk -s list

Disk:   c0t3d0s2
type:   sliced
flags:  online error private autoconfig
error:  Disk is not usable

Disk:   c1t0d0s2
type:   sliced
flags:  online ready private autoconfig autoimport imported
diskid: 821159273.1577.unix
dgname: rootdg
dgid:   820868326.1025.unix
hostid: unix

Disk:   c1t1d0s2
type:   sliced
flags:  online ready private autoconfig autoimport imported
diskid: 820868338.1091.unix
dgname: rootdg
dgid:   820868326.1025.unix
hostid: unix

The interesting pieces of information from this command are the "dgid" and 
"hostid".  This can be used to recreate missing disk groups, or add existing 
rootdg disks to a newly created rootdg disk group!


  2.4.6 The vxprint command gives you status information, but shows 
	information about volumes and how they are configured.

	# vxprint -ht

This is the most common form of this command that we use.  It gives all the 
basic information about the current configuration and states of plexes.

Disk group: rootdg

DG NAME         NCONFIG      NLOG     MINORS   GROUP-ID
DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE
V  NAME         USETYPE      KSTATE   STATE    LENGTH   READPOL   PREFPLEX
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE

dg rootdg       default      default  0        820868326.1025.unix

dm disk01       c1t1d1s2     sliced   2015     2050272  -
dm disk02       c1t1d2s2     sliced   2015     2050272  -
dm disk03       c1t3d3s2     sliced   2015     2050272  -
dm disk04       c1t1d0s2     sliced   2015     2050272  -
dm disk05       c1t3d0s2     sliced   2015     2050272  -
dm disk06       c1t3d1s2     sliced   2015     2050272  -
dm disk07       c1t3d2s2     sliced   2015     2050272  -
dm disk08       c1t2d0s2     sliced   2015     2050272  -
dm disk12       c1t0d0s2     sliced   2015     2050272  -

pl pl-01        -            DISABLED -        0        STRIPE    4/409600 RW

v  vol01        fsgen        ENABLED  ACTIVE   6144000  SELECT    vol01-01
pl vol01-01     vol01        ENABLED  ACTIVE   6144768  STRIPE    2/128    RW
sd disk12-01    vol01-01     disk12   0        2050272  0/0       c1t0d0   ENA
sd disk01-01    vol01-01     disk01   0        1022112  0/2050272 c1t1d1   ENA
sd disk04-01    vol01-01     disk04   0        2050272  1/0       c1t1d0   ENA
sd disk08-01    vol01-01     disk08   0        1022112  1/2050272 c1t2d0   ENA
pl vol01-02     vol01        ENABLED  TEMPRMSD 6144768  STRIPE    2/128    WO
sd disk02-01    vol01-02     disk02   0        1022112  0/0       c1t1d2   ENA
sd disk03-01    vol01-02     disk03   0        2050272  0/1022112 c1t3d3s2 ENA
sd disk05-01    vol01-02     disk05   0        1022112  1/0       c1t3d0   ENA
sd disk07-01    vol01-02     disk07   0        2050272  1/1022112 c1t3d2   ENA

v  vol02        fsgen        ENABLED  ACTIVE   409600   SELECT    -
pl vol02-01     vol02        ENABLED  ACTIVE   410256   CONCAT    -        RW
sd disk06-01    vol02-01     disk06   0        410256   0         c1t3d1   ENA

Notice plex 'vol01-02' has a state of: TEMPRMSD.  This is the mirror resync in 
progress.


	# vxprint -l

This command shows you in order all the disk groups, disks, subdisks, 
plexes and volumes.

Group:    rootdg
info:     dgid=820868326.1025.unix
copies:   nconfig=default nlog=default
minors:   >= 0

Disk:     disk01
info:     diskid=821319546.2125.unix
assoc:    device=c1t1d1s2 type=sliced
flags:    autoconfig
device:   pubpath=/dev/dsk/c1t1d1s4 privpath=/dev/dsk/c1t1d1s3
devinfo:  publen=2050272 privlen=2015

Disk:     disk02
info:     diskid=821319456.2121.unix
assoc:    device=c1t1d2s2 type=sliced
flags:    autoconfig
device:   pubpath=/dev/dsk/c1t1d2s4 privpath=/dev/dsk/c1t1d2s3
devinfo:  publen=2050272 privlen=2015
...
...
Subdisk:  disk01-01
info:     disk=disk01 offset=0 len=1022112
assoc:    vol=vol01 plex=vol01-01 (column=0 offset=2050272)
flags:    enabled busy
device:   device=c1t1d1s2 path=/dev/dsk/c1t1d1s4 diskdev=192/812

Subdisk:  disk02-01
info:     disk=disk02 offset=0 len=1022112
assoc:    vol=vol01 plex=vol01-02 (column=0 offset=0)
flags:    enabled busy
device:   device=c1t1d2s2 path=/dev/dsk/c1t1d2s4 diskdev=192/820
...
...
Plex:     vol01-01
info:     len=6144768
type:     layout=STRIPE columns=2 width=128
state:    state=ACTIVE kernel=ENABLED io=read-write
assoc:    vol=vol01 sd=disk12-01,disk01-01,disk04-01,disk08-01
flags:    busy complete

Plex:     vol01-02
info:     len=6144768
type:     layout=STRIPE columns=2 width=128
state:    state=TEMPRMSD kernel=ENABLED io=write-only
assoc:    vol=vol01 sd=disk02-01,disk03-01,disk05-01,disk07-01
flags:    busy
utils:    t0=ATT

Plex:     vol02-01
info:     len=410256
type:     layout=CONCAT
state:    state=ACTIVE kernel=ENABLED io=read-write
assoc:    vol=vol02 sd=disk06-01
flags:    complete
...
...
Volume:   vol01
info:     len=6144000
type:     usetype=fsgen
state:    state=ACTIVE kernel=ENABLED
assoc:    plexes=vol01-01,vol01-02
policies: read=SELECT (prefer vol01-01) exceptions=GEN_DET_SPARSE
flags:    open writeback
logging:  type=REGION loglen=0 serial=0/0 (disabled)
device:   minor=5 bdev=115/5 cdev=115/5 path=/dev/vx/dsk/rootdg/vol01
perms:    user=root group=root mode=0600
utils:    t0=ATT1

Volume:   vol02
info:     len=409600
type:     usetype=fsgen
state:    state=ACTIVE kernel=ENABLED
assoc:    plexes=vol02-01
policies: read=SELECT (round-robin) exceptions=GEN_DET_SPARSE
flags:    closed writeback
logging:  type=REGION loglen=0 serial=0/0 (disabled)
device:   minor=6 bdev=115/6 cdev=115/6 path=/dev/vx/dsk/rootdg/vol02
perms:    user=root group=root mode=0600


Some interesting information would be the id lines from the devices and 
also the flags from the plexes and volumes.

Notice that vol01 (which is a mirror doing a resync operation at the moment) 
has a flag of 'open writeback', while vol02's flag is 'closed writeback'.  
vol02 is not in use at this time, while vol01 is mounted and is in process 
or resyncing.


  2.4.7  Use the vxprivutil command only with EXTREME CAUTION. It
	can be used to test the readability of the Private Region on any 
	disk, and to gather some information when you cannot get it from 
	any other source.

	# vxprivutil /dev/rdsk/c..t..d..s2
(Below are two separate command entry outputs; notice the host names.)

diskid:  821159273.1577.unix
group:   name=rootdg id=820868326.1025.unix
flags:   private autoimport
hostid:  unix
version: 2.1
iosize:  512
public:  slice=4 offset=0 len=2050272
private: slice=3 offset=1 len=2015
update:  time: 821462653  seqno: 0.13
headers: 0 248
configs: count=1 len=1456
logs:    count=1 len=220

diskid:  821297570.1085.twodotfive
group:   name=rootdg id=821297558.1025.twodotfive
flags:   private autoimport
hostid:  twodotfive
version: 2.1
iosize:  512
public:  slice=4 offset=0 len=2050272
private: slice=3 offset=1 len=2015
update:  time: 821297573  seqno: 0.5
headers: 0 248
configs: count=1 len=1456
logs:    count=1 len=220

The most interesting piece of information from this command is the hostid 
and the group id information.  

Have you ever had a problem getting to a disk, or a disk group?  This may 
help determine why, or assist in being able to get to it.


  2.4.8  Use vxstat command to retrieve or reset statistical information on 
	 any VM object (volume, plex, disk, subdisk).



  2.4.9  The vxinfo command prints the accessibility and usability of a 
	 volume. It prints out a one-line summary of each volume.

          bigvol         fsgen    Startable
          vol2           fsgen    Startable
          brokenvol      gen      Unstartable



     The vxinfo utility  reports  the  following  conditions  for
     volumes:

     Startable      A  vxvol  startall  operation  would   likely
                    succeed in starting the volume.

     Unstartable    The volume is not started and either  is  not
                    correctly  configured  or  doesn't  meet  the
                    prerequisites  for  automatic  startup  (with
                    volume  startup)  because  of errors or other
                    conditions.

     Started        The volume has been started and can be used.

     Started Unusable
		    The volume has been started but is not
		    operationally accessible.   This  condition  may
		    result from errors that have  occurred  since the
		    volume was started, or may be a result of
		    administrative  actions,  such  as  vxdg   -k
		    rmdisk.


2.5  Understanding plex States

The plex is the stuff volumes are made of.  The state of a plex may help 
you to determine what is happening with a volume.

The Volume Manager software can use this information to:
 - Determine whether or not volume contents are initialized to a known state.
 - Decide whether a plex has a valid copy of volume contents or not.
 - Track whether or not the plex was in active use at the time of a system 
	failure.
 - Monitor operations on plex(es).

There is a kernel state associated with all plexes also.  This state
will determine the accessibility of the plex.  There are three kernel states:
	Disabled	Offline, cannot be accessed.
	Detached	Maintenance, plex ops and io are accepted, but not 
			   acted on.
	Enabled		On-line, fully accessible.
[**NOTE**
	No user intervention is required with the kernel state of a plex.
	This is maintained internally in the software.]

A plex that is associated with a volume always has one of the 
following plex states:
	Empty	Set at creation time.
	Clean	Contains a consistent copy of volume contents, and
		   an operation has disabled. (No recovery required.)
	Active	Normal volume IO; or was active at time of system crash.
	Stale	IO error; or question of completeness of volume contents.
		   (Volume must be recovered.)
	Offline	Result of manual offline operation on the plex.
	Temp	Operations that are not truly atomic. (resyncs...)
	Temprm	Same as Temp, but will be automatically removed at end of 
		   operation.
	IOfail	Active plex failure; will not recover at volume start time.

When a volume is started, the plex state cycles.  For example, the system
is shutdown normally, the plexes that are active are now marked clean.  
When the system boots up, all clean plexes will become active plexes.  
If a crash occurs and plex is active, the software looks for the most
up to date plex and marks that one as active and marks the others in that
volume as stale, meaning they must be recovered, or copied again.

All stale plexes at boot, or volume start time, are recovered then marked 
active when the recovery has completed.

The state of a plex can tell you a lot, if you know what has recently happened
on a system.  (Did it crash, or not?)

To modify the state of a plex, use the vxmend command.  You can modify the 
states to force operations.
	# vxmend fix <state> <plex>

2.6  Understanding volume States

There are four states for volumes, they differ between Raid-5 and other 
configurations.

Non Raid-5 volumes:
	Clean	Volume is not started, plexes are synchronized.
	Active	Volume has been started, or was operational at boot.
	Empty	Volume is not initialized.
	Sync	Volume is in process of recovery, or was at boot time.

Raid-5 volumes:
	Clean	Volume is not started, and parity is good; stripes are 
	   	   consistent.
	Active	Volume has been started, or was operational at boot; if 
		   kernel state is disabled, parity may not be good.
	Empty	Volume is not initialized.
 	Sync	Volume is undergoing resync of parity, or was at boot time.
	Needsync	Volume will require parity resynchronization at next 
			   start time.

Use the vxvol command to change the state of a volume.  This can be helpful 
when a volume will not start up, or if you wish to start a volume and not 
invoke automatic recovery.
	# vxvol init <state> <volume>....

2.7  System Does Not See the SSA

There are a few things that can cause this to happen.

1) Bad fiber connection.
	Possible bad components; bad FC/OM module; bad cable; bad boards.

2) Wrong patch level or firmware level for this OS level.
**Solaris 2.3 MUST have minimum kernel patch 101318-54.
**Solaris 2.4 MUST have minimum kernel patch 101945-27.
	Refer to Patch Information (see Section 5.0),
	and "Configuration Matrix" information in References (see Section 7.0)

3) Missing driver software packages for the SSA.  To check for these packages,
run a 'pkginfo | grep storage'
	SUNWssadv
	SUNWssahd
	SUNWssaop
   If these are missing, install them.  

  	For instance, the 2.1 version of Volume Manager cdrom does not
	have any Solaris 2.4 SSA device driver software on it.  You must
	load these drivers from the release 3/95 Solaris 2.4 cdrom, or from 
	the version 2.0 SSA/Volume Manager cdrom.

4) System has not been configured for the addition of the array.
  - A reconfigure boot has not been done.
	Bring the system down and issue a 'boot -r' command.

  - The device tree has not built correctly.  No devices are in the /dev/dsk 
  area, or the /dev/rdsk area.
	1) Check for an entry of the soc with the new array's address in the
	/devices/io.../sbus...  directory structure.  

	2) If there is an entry here, remove the entire directory for 
	this soc only.

	3) do a reconfiguration boot (boot -r).


2.8  The System Cannot Run the vxconfigd, Volume Manager won't start.

There are two things that can leave a file in the /etc/vx directory that 
tells the daemon the software has not been installed, so do not startup.

  1) Running of vxinstall and breaking out of it before completion.
  2) Installation of Volume Manager packages.
 
	# cd /etc/vx/reconfig.d/state.d
	# rm install-db
	
  3) Reboot, or bring daemon up manually:
(Refer to User's Guide Page E-38 for Version 2.x; Refer to Administrator's
Guide Page B-23 for version 1.x)
	# vxiod set 10  (** NOTE: use a 2 for version 1.x)
	# vxconfigd -m disable
	# vxdctl init
	# vxdctl enable

If the daemon does not start up automatically and there are no error messages 
reported form vxconfigd, try using the manual startup procedure above.

If there are error messages being reported, refer to appendix C in the version 
2.0 User's Guide for the exact error, and follow steps accordingly.
(Version 1.x should refer to appendix A.)
------------------------------------------------------------------------------

3.0  	Quick Fixes, How-to's and Theory of Operations

3.1  Bootable Disk Encapsulation and Mirroring - How-to

(NOTE: if NOT planning to mirror to same size disk device, encapsulation
of bootable disk is not recommended by Sun Service Technical Support.)

Encapsulation of a disk means bringing it in under control of the Volume 
Manager software.  It does not disturb data, but just creates an 'overlay' 
on the disk so that the Volume Manager software has access to the device.

This mostly comes into play on our systems with the bootable system disk.
By placing the boot disk under control of this software, we can mirror it, 
giving us a backup copy to work or run from.  This is very desirable in
many system locations.  It increases the up-time of the system.

There are, however, some precautions that should be understood before one 
goes blindly into root encapsulation.  There are a few rules that should be 
followed (possibly), depending on the version of Volume Manager software
you are running.

1) The VM must have 'room' to operate on this device.  (2 slices)
If this disk has all 7 usable slices in use, you will not be able to 
encapsulate it.  It needs two slices unused, one for the private region, 
one for the public region.

2) It is preferable to give the VM the space it will need for the private 
region.  If you do not give it space, it will take what it needs from the
end of swap.  This may be fine, if you have plenty of swap space to hold 
both a kernel core dump and the actual private region info.  There may be
a problem if you must get a core dump, and do not have that much space.
The core dump may overwrite the private region, and if this disk is the only 
disk in the rootdg disk group, the VM software will not operate at all.

If possible you can give the VM enough room at the very end of the disk.
This means making the last used slice a cylinder or two shorter in length.
This means re-labeling the disk.  (You will then have to restore the 
filesystems that were on this disk, so before you begin do a full backup of 
each filesystem.) Also, if the last used cylinder on the disk is 'full' of 
data, if you take space, it might leave a hole in the filesystem.  
Getting the boot disk setup correctly for encapsulation can be a very 
tedious and time consuming operation.  Be careful of what you are
about to do.

By default this area needs to be 1024 sectors; but we bound on full cylinders.
This means that it will need 1 or 2 cylinders depending on the footprint 
(size) of the disk.  If slice 6 is the last used slice, this one needs to
be made shorter.  It is a good idea to verify that you can take this space
from the slice BEFORE you begin.  If the filesystem on this slice is near 90%
you may want to rethink taking space form here.  Keep in mind that when you 
shorten the length of this slice, you will lose the data that was there.

Let us suppose that we have the /var filesystem on slice 6 of our boot disk, 
and that this is the last used slice on our disk.  This filesystem is only
67% full, so we have plenty of space at the end of the filesystem.  This
being the case, let us proceed.  (For this example, I will use 2 cylinders.)

We must shorten this filesystem.  We do this with the format utility.
system # format
(select our disk number to work with)
.......gives us a listing of format options.....
(we must partition)
format> partition
....gives us a listing of partition options.....
(Let's first look at our slicing)
partition> print
....gives us a view of our slicing information.....may look like

Current partition table (original):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders        Size       Blocks
  0       root    wm       0 -   40       20.18MB    (41/0/0)
  1       swap    wu      41 -  171       64.48MB    (131/0/0)
  2     backup    wm       0 - 2035     1002.09MB    (2036/0/0)
  3 unassigned    wm       0               0         (0/0/0)
  4 unassigned	  wm	   0		   0	     (0/0/0)
  5        usr	  wm	 172 - 1151	 482.34MB    (980/0/0)
  6        var    wm	1152 - 2035 	 435.10MB    (884/0/0)
  7          -    wm       0      0        0         (0/0/0)

partition>
(now we select the slice we will modify, number 6)
partition> 6
.....gives us slice information....
Enter partition id tag[home]: 		(carriage return here)
Enter partition permission flags[wm]:   (carriage return here)
Enter new starting cyl[1152]: 		(carriage return here)
Enter partition size[891084b, 884c, 435.10mb]: 882c
(then we check it again, then label) 
partition> print
....gives us a view of our slicing information.....may look like

Current partition table (original):
Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders        Size       Blocks
  0       root    wm       0 -   40       20.18MB    (41/0/0)
  1       swap    wu      41 -  171       64.48MB    (131/0/0)
  2     backup    wm       0 - 2035     1002.09MB    (2036/0/0)
  3 unassigned    wm       0               0         (0/0/0)
  4 unassigned	  wm	   0		   0	     (0/0/0)
  5        usr	  wm	 172 - 1151	 482.34MB    (980/0/0)
  6        var    wm	1152 - 2033 	 433.06MB    (882/0/0)
  7          -    wm       0      0        0         (0/0/0)
partition> label
........

Notice that the ending cylinder number is now 2033, not 2035.  This
gives the VM software plenty of room to work.

3) Use the 'vxdiskadm' utility to encapsulate the boot device.
Use option number 2 for encapsulation.

4) To mirror, be sure you have a disk the same size as the 
bootable disk. It must not be sliced up.  Only the backup 
slice (slice 2) should be set up.
Again using 'vxdiskadm', select option 6 to mirror.


3.1a  How to 'unroot' your Bootable Disk

To remove the Volume Manager control of your bootable disk, 
run the 'vxunroot' script in the /usr/lib/vxvm/bin (or the /etc/vx/bin)
directory:
	# /usr/lib/vxvm/bin/vxunroot

Then run the 'vxedit' command to remove the references to the
'rootvol' and 'swapvol', etc. volumes that were created on the bootable disk:
	# vxedit -fr rm rootvol
	# vxedit -fr rm swapvol
	# vxedit -fr rm usrvol


3.2  Volume Manager is not seeing a disk device any longer

This can happen because you have replaced a disk device, or possibly because
the system was booted before a device in the SSA was ready.
In either case, it is a simple matter to 'ping' the device(s) and bring
them back 'on-line'.

There are two ways to accomplish this:
[**With EXTREME caution, on a quiesced system only] 
**use 'drvconfig'      -- if the system has booted without all disks ready
use 'vxdctl enable'  -- if you have replaced a disk device, and volume
			manager cannot find the new device.
(I prefer the use of the vxdctl command, unless the disk has been flagged
as off-line to the system.)
These commands find what's out there, and act accordingly.

The correct way to replace a failed device in the Volume Manager is to
tell it the device that you are replacing a component before you physically 
remove it.  

Use 'vxdiskadm' option 4 to remove a component for replacement.  
(Very often this procedure is not followed, so the VM cannot find the 
disk.  You will get messages informing you that the device is not part of 
VM if you try to remove it, or that it already is part of VM if you try to 
add it.)

Once the VM software has been notified that the device is there, it is a 
simple matter to bring it back into use.  When you find that you have 
replaced a disk without first telling the VM software that you were going 
to do this, my recommended procedure is:
Run vxdctl enable
Run 'vxdiskadm' and select option 5 to replace the failed component.


3.3  Software and Firmware Differences, What You Need to Know

(Refer to section 7.0 for a copy of a support matrix)
We began with two Operating Systems (OS), so we had two versions of firmware
that corresponded with the OS.  It began to get a bit confusing, as we had 
many versions of SSA/Volume Manager, firmware and OS levels.  We have a 
matrix available for use that explains what goes with what.

I will list out the different versions:
  OS			SSA drivers		Volume Manager
------                 -------------           ---------------- 
Solaris 2.3		 1.x, 2.x		   1.3, 2.x
Solaris 2.4		 2.0, 2.1		2.0, 2.1, 2.1.1
Solaris 2.5		   2.1			    2.1.1

There have been a few revisions of firmware developed over the life of 
the SSA.  There are many reasons for so many revisions, which I will not 
go into in this forum.  Suffice it say that each revision had a purpose, 
and for the most part was better than its predecessor.

We recommend that the latest revision be downloaded into all SSAs.  We have 
found historically that the latest revs of the firmware will repair or 
prevent many problems.  

(If you have a question regarding any version of firmware, please do not
hesitate to contact a Solution Center and ask about it.)

We are getting better, as now there are firmware revisions coming out that
are compatible with any OS*, and any version of SSA driver.

The version numbers began following the SSA driver level versions.
For version 1.3, the firmware versions are 1.xx
For version 2.0 or 2.1, the firmware versions are 2.xx
The new version will be 3.x; these versions are compatible with version 
1.x* and all version 2.xx of the SSA drivers.

*(At the time of the writing of this document, version 3.3 firmware is 
not yet available for Solaris 2.3 OS.  Any controllers that contain this
version are not compatible, cannot be used)

There were some issues related to the age of the controller board and the 
ability to 'uprev' or 'backrev' the firmware.  The oldest boards will not 
allow some versions of uprev'ed firmware to download.  The newer boards
will not allow some older versions of firmware to be downloaded.  Yes,
this was a bit of a confusing problem!

The good news is that the current boards should work anywhere.

We have found that placing the NVRAM fastwrite tables into a known state 
before any downloading of firmware will enable the firmware to load correctly.

Also, all the instructions indicate a reset of the SSA as a required step.
This makes the SSA use the newly downloaded firmware.  The instructions
may tell you to use the power switch to power cycle the unit, to reset it,
but there is a better way, use the reset switch.

On the Array controller board, there is a reset switch.  This just 'rereads'
the firmware.  It is located between the DIN connector and the "Diag" switch.
It is large enough to use a pencil eraser to push it, and sits inside a raised
circle. 

If there was a problem with the firmware, and the unit is power cycled, you 
might see a failure (PMF displayed) and you might lose the connection to the 
host. When this happens, the array controller board must be replaced.

Here is some interesting information:
	SSA controller boards 501-2080-09 or higher and 501-2651-xx 
     were shipped with a newer SCSI controller chip from a second source 
     that makes it incompatible with older SSA firmware versions.

	On a SSA with Solaris 2.3 and an sbus with 1.33 Fcode version, the 
     sbus may not recognize the SSA if the SSA driver patch levels are not 
     high enough.

	The NVRAM procedure places the NVRAM fastwrite tables into a known 
   state and saves them properly.  This allows the SSA firmware to create 
   the correct checksum for the SSA firmware just loaded, thus eliminating 
   the PMF ERROR.

		The NVRAM Procedure is:
	a) Issue the appropriate command to enable the SSA Fastwrite option:

		(i) Solaris 2.4: Hardware 3/95 and later		

			/opt/SUNWssa/bin/ssaadm fast_write -s -e X
				where X	is any disk drive 

		(ii) All other Solaris versions		
			
			/opt/SUNWssa/bin/ssacli -s -e fast_write X
				where X	is any disk drive

	b) Issue the appropriate command to disable the SSA Fastwrite option:

		(i) Solaris 2.4:  3/95 and later		

			/opt/SUNWssa/bin/ssaadm fast_write -s -d X
				where X is any disk drive.

		(ii) All other Solaris versions			

			/opt/SUNWssa/bin/ssacli -s -d fast_write X
				where X is any disk drive.

NOTE: It is NOT required to apply the above commands against all drives;  ONLY
      any ONE disk drive.

CAUTION: Do not interrupt the download procedure once it has begun.  If
it has to be interrupted, DO NOT RESET the controller.  For example, if
you notice that you are loading in the wrong version, just let it
complete, then go ahead and load down the correct one.  When the
correct one is completed, then reset the SSA.


3.4  Hot Spares How They Work (Volume Manager)

A hot spare is a disk device that is kept 'in reserve' for use 
by the software only.  It is used in place of a disk that fails, but only 
if that disk is part of a mirror or raid-5 volume type.

Keep in mind that a normal striped volume or a simple (concatenated) volume
does not have any ability to recreate missing data, so hot spare will not
be able to operate with these volume types.  If there is a disk failure with 
one of these, the data must be restored from a backup source.

The software will use the remaining information from a volume to re-create 
the missing data from a failed disk.  In the case of a raid-5 volume, the 
remaining components are XOR'd with the parity to yield the missing data.
In the case of a mirror, it is simply copied over from the mirrored side.

The use of hot spare and Volume Manager has raised some controversy.  Once
one looks at the way it actually operates, it's way of use will be better 
understood.

A hot spare will not be placed into use unless the entire disk fails.  What 
this means is, if there is a failure in a subdisk, the hot spare may not 
necessarily come into play.

When there is a failure on a subdisk, the VM will try a write operation in
its private region of that device.  If this operation succeeds, it assumes 
the disk is 'good' and will not use a hot spare.  If this operation fails,
it assumes whole disk has failed, and will use a hot spare.

WHY?  Why does it have to have a whole disk fail?  The answer is very simple.
With this software we are able to create many volumes out of many components, 
in many configurations.  We can have striped and raid-5 and simple volumes 
all in the same disk group.  We can place these separate volume types on the
same disk.  A hot spare replacing that disk may cause many volumes to go
offline.  Veritas decided that they would only replace a disk if they have
a problem writing to both the private and public regions.  Otherwise it is
left as is.

Here is an example.  Let's say we have a few volume types on one disk: a 
simple volume subdisk, a striped volume subdisk and a mirrored volume 
subdisk.  A read failure occurs on the mirror subdisk.  We have hot spare
on.  Think about what would happen if this hot spare replaces this disk.
  As the hot spare is put in place of the original disk, the simple and
  striped volumes go down!  The hot spare disk has no mechanism to recover
  their data.  Those two volumes would have to be re-created and restored
  from a backup source.
The VM software has an extremely high limit on the number of subdisks that
can be on one disk.


3.5  WWN (World Wide Number) and How to Change it

The World Wide Number is a unique number.  It was intended to remain unique
per SSA, due to the future possibility of being able to access them directly
on a network.  It is like the IP address for use on the Internet.

Since this isn't that future (yet), there are times when it may be necessary
to change this inside the SSA.

This address has its last four numbers displayed on the LCD front panel of
the SSA.  It is contained in a prom on the Array Controller.  If the Array
Controller board is replaced, there will be a new WWN for that SSA.
Since this is effectively the address that is used by the system when 
the device trees are created, this SSA may have "disappeared" from your 
system.

How do you find the original address (WWN)?  One way would be to pull it 
from the messages file (find the boot cycle's Fiber Channel ONLINE message).
One way would be to find it in the /devices directory structures.  
One way would be to follow the link in the /dev/dsk directory:

	# ls -l /dev/dsk/c2t0d0s2
	/dev/dsk/c2t0d0s2 ->../../devices/io-unit@f,e1200000/sbi@0,0/
	SUNW,soc@1,0/SUNW,pln@b00008000,5438af/sd@0,0:c
                                   ^^^^ ^^^^^^
The WWN is a twelve (12) digit number.  It is represented in a strange way.
The numbers to the right of the comma must be padded with zeros to fill 8
digits.  The upper four digits are left of the comma.  In the example above,

    the WWN is 8000005438af, not 0080005438af.

The choice then is whether to use the new address or the old address.

To use the new address, remove the old address information 
and do a reconfigure boot (to place the new one where the old one was
i.e.: cNtNdN)  This means removing the listing in the /devices/io.../s....
(Possibly, also removing the controller links in /dev/dsk and /dev/rdsk.)
The subsequent reconfiguration boot should reconstruct a device tree
now pointing to the new controller address in place of the original.

To change the WWN on the Array Controller, use the following commands:

  Solaris 2.3:
	# ssacli -s -w <WWN> download cN

  Solaris 2.4 and above:       
	# ssaadm download -w <WWN> cN

Reset the SSA (use the reset switch).


3.5a  WWN How to Use the New One, instead of changing it.

You can use the new WWN that is on a new controller card by removing the 
references to the original in the device tree on the system, before doing a 
reconfiguration boot.

Remove the "SUNW,soc@N,N" directory and contents in the '/devices/io.....'
directory structure of the device tree.
Also, remove all entries in /dev/dsk and /dev/rdsk for this controller's disks.
(For example  all the entries for c5)

Since the location of the Host Adapter Sbus card has not changed, the 'boot -r'
will simply rebuild that portion of that controller with the NEW address of 
the array board (pln), and create the devices in /dev/dsk and /dev/rdsk.

The VM software will use the same devices, since that will not have changed.


3.6  Using the 'vxdiskadm' Utility

Many administrative operations can be performed through the use of this
utility.  

	# vxdiskadm

	Volume Manager Support Operations
	Menu: VolumeManager/Disk

 	1	Add or initialize one or more disks
 	2	Encapsulate one or more disks
 	3	Remove a disk
 	4	Remove a disk for replacement
 	5	Replace a failed or removed disk
 	6	Mirror volumes on a disk
 	7	Move volumes from a disk
 	8	Enable access to (import) a disk group
 	9	Remove access to (deport) a disk group
 	10	Enable (online) a disk device
 	11	Disable (offline) a disk device
 	12	Mark a disk as a hot-spare for a disk group
 	13	Turn off the hot-spare flag on a disk
 	list	List disk information

	?	Display help about menu
 	??	Display help about the menuing system
 	q	Exit from menus

	Select an operation to perform: 

3.7  How to Remove or Replace a Disk

When a disk is failing, or has failed, replace the disk.  Since the VM
software keeps its configuration information in the private regions on all
disks, you must notify the software of the change that is about to take 
place.  Otherwise what happens is the software keeps looking for the original
disk that really isn't there, and will not allow you to add the new one in 
its place.

The command to use for this operation is:  
	# vxdiskadm

When you are ready to replace the disk device, first tell the VM software
that you are going to replace the disk.  Once the disk has been replaced,
simply tell the VM software that you have completed replacement.

(**NOTE: if a raid-5 or mirror component, the recovery will begin 
automatically following this procedure.  Other volume types must be 
restored from a backup source.)

Example:
	# vxdiskadm
	<utility menu options>

Begin with option 4 to remove for replacement.

	Remove a disk for replacement
	Menu: VolumeManager/Disk/RemoveForReplace

	  Use this menu operation to remove a physical disk from a disk
	  group, while retaining the disk name.  This changes the state
	  for the disk name to a "removed" disk.  If there are any
	  initialized disks that are not part of a disk group, you will be
	  given the option of using one of these disks as a replacement.

	 disk name [<disk>,list,q,?] 
Enter the disk name, like disk12, or sybase03.
There will be other questions asked to be sure of the operation you are 
about to perform....

Replace the disk, then:

Select option 5 to replace a failed disk:

	Replace a failed or removed disk
	Menu: VolumeManager/Disk/ReplaceDisk

	  Use this menu operation to specify a replacement disk for a disk
	  that you removed with the "Remove a disk for replacement" menu
	  operation, or that failed during use.  You will be prompted for
	  a disk name to replace and a disk device to use as a replacement.
	  You can choose an uninitialized disk, in which case the disk will
	  be initialized, or you can choose a disk that you have already
	  initialized using the Add or initialize a disk menu operation.

	Select a removed or failed disk [<disk>,list,q,?] 

Input the disk name that you removed (i.e.: disk12), the access name will
be listed for the replacement device:

	  The following devices are available as replacements:

	  	c1t5d2s2   
	
	  You can choose one of these disks to replace disk16.
	  Choose "none" to initialize another disk to replace disk16.

	Choose a device, or select "none"
	[<device>,none,q,?] (default: c1t5d2s2) 

Then continue to answer the questions verifying operation.


3.8  Boot Issues

One of the most frustrating occurrences with a system is when it will not
boot up.  This problem may be compounded if the boot disk is encapsulated 
for use by the Volume Manager.  Of course if it is encapsulated, it should 
be mirrored.  If the primary boot disk will not boot up, then the mirror boot 
disk should allow a boot cycle to complete and bring the system up.

If the system will not boot up from either disk device, our alternative
to attempt system boot is to remove references to volume manager for the
system's filesystems.
(Refer to section 2.3.1 for prtvtoc information.)

Most systems are capable of booting up without needing the Volume Manager
software to operate.  If the underlying filesystem structure slicing still
resides on the primary bootable disk, the following procedure can be 
followed.  (If not, then the recovery procedure in the User's Guide should
be followed, beginning with either restoration of the filesystems, or
re-installation of the OS.     version 2.x: Page E-34)

Boot the system from your OS cdrom in single user mode. (ok>  boot cdrom -s)
run a filesystem check (fsck) on the root slice of the disk; then mount it
to the /a mount point on the cdrom, and "cd" to /a/etc.

Perform one of the following procedures: 

A)	Replace the current vfstab file with the original copy from prior to 
	having VM software.  Then comment out any entry in the /etc/system 
	file that refers to a 'bootdev' (remember to use a '*' as the comment
	symbol).

	When the 'vxinstall' program has been run, it copies your original 
	vfstab file to a file called:  vfstab.prevm

	You can move your current file to another filename, and copy this 
	".prevm" file as vfstab.  Then simply unmount the filesystem from 
	the cdrom and boot from your boot disk.  (Subsequent boot issues 
	will not be discussed here.)

	   1) # mv vfstab vfstab.org
	   2) # cp vfstab.prevm vfstab
	   3) # cd /
	   4) # umount /a
	   5) # halt 
	 ok> boot

	If this '.prevm' file does not exist, you must hand-edit your 
	current vfstab file.  ***CAUTION*** copy it first to another filename.

B)	 Hand-edit the vfstab file to remove all references to "/dev/vx" 
  	devices to mount:
 
	   1) Replace the entries for the bootable filesystems with the direct
	      slices 
	      to mount.   

#device		device		mount		FS	fsck	mount	mount
#to mount	to fsck		point		type	pass	at boot	options
#
#/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr		ufs	1	yes	-
/proc		-		/proc		proc	-	no	-
fd		-		/dev/fd		fd	-	no	-
swap		-		/tmp		tmpfs	-	yes	-
	
/dev/dsk/c0t3d0s0	/dev/rdsk/c0t3d0s0	/	ufs	1	no   -
/dev/dsk/c0t3d0s6	/dev/rdsk/c0t3d0s6	/usr	ufs	1	no   -	
/dev/dsk/c0t3d0s3	/dev/rdsk/c0t3d0s3	/var	ufs	1	no   -	
	
	etc.....

	   2) # cd /
	   3) # umount /a
	   4) # halt
	 ok> boot

(Other boot issues are not discussed here.)



3.9  SSA fast_write feature versus PrestoServe

You can use either the SSA fast_write capability, or Prestoserve,
to cache write data.  You can even use both at the same time,
although it is not cost-effective.  Presto + SSA fast_write
is slightly faster than either alone, but the differential
is not worth the price.

Generally speaking the SSA fast_write feature is more general,
is more scalable (since it's in every SSA, not just in the host),
and is safely multi-ported.  SSA Fast_write is therefore the
preferred solution where available.


3.9.1  Prestoserve with host-based RAID (SDS/VxVM) products

If using Prestoserve with either Solstice Disksuite (SDS), or
SPARCstorage Array Volume Manager (VxVM), then there is some prework
required.

With VxVM, you must ensure that the various product-notes on this
topic, a copy of which is in INFODOC 13492, should be followed with
respect to the volumes/filesystems which may be prestoized, and the
order in which prestoserve should be started with respect to those
products.

See also the SDS 4.0 jumbo-patch README or relevant product-note, for
similar information for SDS.


3.10  RAID-5 Information

One of the questions we most often are asked in regard to Raid-5 configuring 
is "How can I tell how much overhead the parity will take up on my volume?  
I must know how big a volume I can create."

There are a couple things you can do here.  One is to use the 'vxassist' 
command to help you determine how much space is available in a disk group 
for use with a Raid-5 configuration.

vxassist has an option, undocumented in the manual page but documented in the 
help output, that can be used to find out the maximum size in K that a volume 
can be made in a given diskgroup.

# vxassist help usage
vxvm:vxassist: INFO: vxassist - Perform simple volume administration
 
vxvm:vxassist: INFO: 
    Usage: vxassist [-g diskgroup] [-U usetype] [-d file] [-nbf] keyword arg...
Recognized keywords:
    make volume-name len [attrs...]
    mirror volume-name [attrs...]
    addlog volume-name [attrs...]
    move volume-name [attrs...]
    growto volume-name new-length [attrs...]
    growby volume-name length-change [attrs...]
    shrinkto volume-name new-length [attrs...]
    shrinkby volume-name length-change [attrs...]
    snapstart volume-name [attrs...]
    snapwait volume-name
    snapshot volume-name snapshot-name
    [-p] maxsize [attrs...]
    [-p] maxgrow volume-name [attrs...]

For example, to find out what size we can use for a raid-5 volume, use the 
following command:
	# vxassist -p -g datadg maxsize layout=raid5 max_nraid5column=4
The result would be listed in 'k'.

Or, another way would be to use the 'rule-of-thumb' of the parity being about
equal to a 1/#-of-columns in the volume.  So the formula:
	totaldiskspace - (1/#ofcols * totaldiskspace) = volume size
	4000m  -  (1/4 * 4000) ==  4000 - 1000 = 3000m  (or 3gb)

In our SSA device, when contained within one SSA the maximum number of columns
used would be 6. So the parity overhead should fall between 1/6 and 1/3.
Subtract this amount from the total, and the balance should be close to the 
size that may be used for the volume, and still fit.

3.11  ROOTDG Recovery

(Refer.  SRDB 12072, 11136)
When the rootdg is not able to run, the vxconfigd will not start; when this 
happens, the Volume Manager software will not run at all.

Most screen messages will report a vxconfigd: error or a VxVM error, and 
most will point to the fact that a disk group is not correct or usable.

In response, the bootable disk can be encapsulated, or you can use even one 
slice from a disk that is not being used to hold the disk group.  Once the 
decision has been made the procedures for the decision can be followed.

If the only rootdg disks were the bootable disk and its mirror, you should 
re-encapsulate the bootable disk.  You can use the vxinstall utility to do 
this, but, please use with caution.  Use the "custom" installation; when 'c0'
disks are listed, tell vxinstall to do them individually; select 
'encapsulation' for the bootable disk only; select 'leave these disks alone' 
for all the rest on 'c0'.  For all other controllers that are be listed, tell 
vxinstall to 'leave these disks alone'.  DO NOT BREAK OUT OF THIS UTILITY 
(i.e.: control-C). This will place the bootable disk as a rootdg disk, 
from here the daemons can be restarted manually (see below, or refer to 
Appendix E-38).

To use a slice of a disk to hold the rootdg disk group, use the following
procedure:

(Be sure that you have a slice that can be used - one that contains no data.  If
need be, use the format utility to create one slice to be used, approximately 
1024 sectors in length.  One or two cylinders, based on the size of the disk.)

Example:  (we are using slice 7 of one of the disks on our system)

1. Disable transactions:
 	# vxconfigd -m disable
	# ps -ef | grep vxconfigd
    root    58     1 80 10:08:39?        0:01 vxconfigd -m disable
    root   520   328  4 10:35:09 pts/0    0:00 grep vxconfigd

2. Initialize the database:
	# vxdctl init

3. Make a new rootdg group:
	# vxdg init rootdg

4. Add a simple slice:
	# vxdctl add disk c0t1d0s7 
vxvm:vxdctl: WARNING: Device c0t1d0s7: Not currently in the configuration

(Note: this warning is normal)

5. Add disk records:
	# vxdisk -f init c0t1d0s7 

6. Add the disk name to the rootdg disk group:
	# vxdg adddisk c0t1d0s7

7. Enable transactions:
	# vxdctl enable

(Note:  You might need to bring the daemons up with the complete startup 
procedure:)
	# vxiod set 10
	# vxconfigd -m disable    (this should be done already...)
	# vxdctl init
	# vxdctl enable



3.12  Loss of Disk Group Configuration Information; How to Save the 
Information (Refer to SRDB 12006)

The Volume Manager keeps all of its configuration information in the private
region on all disks under its control.  If the configuration information 
on disk cannot be read by the VM software, that disk group will not be 
available for use.  This can be devastating to a system, and may require
days worth of restoration of data from tape.

This is not an area of a disk that can be 'backed up' in the conventional 
manner for restoration if the need arises.  So we are often asked what
can be done to 'keep' this information for use "in case".

There is a way to get copies of this information in a format that the VM 
software can use if needed.  Basically you must run two commands out
to two separate output files, then keep a copy of these files handy.

   Get a copy of the existing devices:
	# vxdisk list > <somefilename>

   Get a copy of the configuration in a format that can be used:
     (This will be run for each disk group, and after any configuration
      changes are made, must be run again to get the current info.)
	# vxprint -g <diskgroupname> -hmvps > <otherfilename>

In the event of a disk group configuration problem, the outputs of the
two commands above can be used to recreate it.

1.	Use the vxdisk list file to re-initialize the disks.  Be sure to 
	name the correct devices as the correct disk names (for the disk 
	group), etc.  Make sure when you have finished that the current 
	vxdisk listing looks identical to the original file.

2.	Run the output of the vxprint command in to the 
	vxmake command.  This will recreate the configuration files in the 
	private regions of the disks in this disk group.
	# vxmake -d <otherfilename>



3.13  IOSTAT Output; How to find the listed "ssd" device it reports

The iostat command reports the ssd number. This is the 'instance' number of 
the device. To find the actual device,  grep the /etc/path_to_inst file:

	# grep 128 /etc/path_to_inst

 

3.14  Dual-Hosted SSAs and Simulation of a "failover" on Dual-Host

This is strictly a manual operation on normal Solaris systems.

Normal Veritas software is not designed for true Dual-host connections.
The Solaris OS is also not designed for Dual-host configurations.
Neither piece of software has any mechanism to verify what is where, or
who is up, and who owns what.  (Even with third-party software running
like FirstWatch or OpenVision.)

In VM software, the rootdg cannot be moved from one system 
to another system; so to use two hosts to one ssa box, there must be other 
diskgroups involved.

Ground rules for Dual-Host configurations:

1) You MUST have disk groups other than rootdg, preferably no SSA disks for 
	rootdg for either (any) system.

2) Only one system can `own' a disk group at any given time.

3) No system cross-mirroring allowed; since dg's are exclusive ownership.

4) It is best to keep the controller configurations the same on both systems; 
	this way device names remain the same.

5) If one system dies, the ONLY way a fail-over will occur is in a manual 
	operation.  There is no automatic fail-over mechanism in the normal 
	VM software, nor in Solaris 2.x

This means that diskgroups must be forcibly imported onto the other system.

To run a 'fail-over' test:

1.	 Set up the two systems with all the software and current patches. 

2.	 On the first system (System-A), configure the SSA and Volume Manager
	 software.

3.	 Make some test volumes with data, etc.  Bring this system down.

4.	 Move the SSA connection to the 'System-B' if necessary, and run a 
reconfiguration boot (boot -r).

5.	 On the second system (System-B), run the 'boot -r' to allow the
	 system to configure for the SSA devices.
	   Remove the /etc/vx/reconfig.d/state.d/install-db file.
	   (Refer to Appendix E-38 of the SSA User's Guide:)  
	   Manually start the daemons, specifying the System-A hostname to get
	   the rootdg up on System-B:
	# vxiod set 10
	# vxconfigd -m disable
	# vxdctl init <originalhostname>	(in our case:  System-A)
	# vxdctl enable
   Import all other disk groups.

The Volume Manager should be completely up and running now.



3.15  Moving an SSA to another system.
(see above procedure in section 3.14.)

  1)  The system does not have any SSA devices on it currently:
Do a reconfiguration boot, so that this system can 'see' this SSA.

When moving an SSA to another host, the Volume Manager must be manually 
started specifying the original host name.

If this host name is unknown, you can get the information out of the
private region with the following command:
	#/etc/vx/diag.d/vxprivutil list /dev/rdsk/cXtYdZs2
   Look for the hostid information.  (example: hostid:  unix)

Once you have the original host name, you can use the four step procedure 
above (refer Appendix E-38 in the SSA User's Guide)

If this SSA is going to stay on this system, you may want to make the host
name the current host name on all the disks.  To do this, use the 'hostid'
option to the daemon control command:
	# vxdctl hostid <hostname>


  2) The system has SSA devices on it currently:
When placing an SSA from a system onto a system that currently has SSAs on
it, you must remember that there is already a rootdg.  If there is a 
rootdg on the 'new' SSA, you will have to import it as a new disk group.

Remember to do a reconfiguration boot in order for the system to 'see'
this SSA.

For all disk groups other than rootdg, use an import command to bring them
online with this system.

For the rootdg disk group, create a new disk group and 
import the rootdg to this new group.  Obtain the id string of the diskgroup
before removal, if possible with this command:
	# vxdisk -s list
Otherwise, use the vxprivutil command to scan a disk, and find the 'dgid'
information there.
	# vxprivutil scan /dev/rdsk/c4t3d0s2

Verify a disk that is listed as part of 'rootdg' and find the line that 
begins with "dgid: "  This is the id string for the diskgroup.
  [Example: dgid:   832724095.1025.systemb]
Record this information for use after moving the SSA onto the new system.
To bring in this SSA's rootdg configuration, use the following to import
giving a new disk group name:
	# vxdg -n <newdiskgroupname> import <idstring>
	 [Example:  vxdg -n sysbdg import 832724095.1025.systemb]


3.16  Fail-Over Simulation for testing hot spares with Mirrors and Raid-5

There is correct way and an incorrect way to test this fail-over capability 
inside a model 1xx desktop SSA.

The incorrect way is to pull out a tray.

The best way to test this is to set up your 'volume' made up of a disk in 
each of the three trays (minmum); setup a hot spare disk somewhere.  Create
a filesystem on the volume, and mount it.

	1-Begin a lengthy IO session to the mounted filesystem;
	2-use the format utility to remove the slice info and relabel a disk.

	format -->  select disk from volume -->  partition -->
	make slice 3 and 4 begin at 0 and go for 0 length
	label the disk

By removing the references to both the public and private regions on the disk,
the VM software assumes a full-disk failure, and will begin to bring the Hot
Spare disk in place.


------------------------------------------------------------------------------
4.0	FAQ's

Please contact your local Sun office for a list of FAQ's.

------------------------------------------------------------------------------
5.0  	Patches

5.1  SPARCstorage Array, Veritas Volume Manager PATCHLIST
	(Please refer to the SunSolve system or CDrom for the current version.)

Solaris 2.3	VM 1.3
	101765	SunOS 5.3: disks program supports only 16 drives per controller
	102198	vxva 1.3: patch to fix known problems with 1.3 vxva
	102199	vxvm 1.3: patch to fix known problems with 1.3 vxvm
	102368	SPARCstorage Array 1.0: bug fixes for firmware 1.9 on SSA softw
	102408	SPARCstorage Array 1.0: Jumbo patch for SSA drivers

Solaris 2.3	VM 2.0
	101765	SunOS 5.3: disks program supports only 16 drives per controller
	102301	Volume Manager 2.0: log replay problems with raid5 write entrie
	102400	SPARCstorage Array 2.0: Jumbo patch for SSA drivers

Solaris 2.3	VM 2.1
	101765	SunOS 5.3: disks program supports only 16 drives per controller
	102403	Volume Manager 2.1: Volume Manager Visual Administrator Fixes
	102465  SPARCstorage Array 2.1: Jumbo patch for SSA
	
Solaris 2.4	VM 2.0
	102283	SunOS 5.4: disks program supports only 32 drives per controller
	102446	SunOS 5.4: format fix
	102301	Volume Manager 2.0: log replay problems with raid5 write entrie
*HW1194	102347	SPARCstorage Array 2.0: Jumbo patch for SSA
**HW395	102432	SPARCstorage Array HW395: Jumbo patch for SSA drivers
***	103290	SPARCstorage Array 2.0: SSA Jumbo patch for Solaris 2.4 11/94,HW395

Solaris 2.4	VM 2.1
	102283	SunOS 5.4: disks program supports only 32 drives per controller
	102446	SunOS 5.4: format fix
	102403	Volume Manager 2.1: Volume Manager Visual Administrator Fixes
*HW1194	102347	SPARCstorage Array 2.0: Jumbo patch for SSA
**HW395	102432	SPARCstorage Array HW395: Jumbo patch for SSA drivers
***	103290	SPARCstorage Array 2.0: SSA Jumbo patch for Solaris 2.4 11/94,
                HW395

Solaris 2.5	VM 2.1.1
	103017	SPARCstorage Array Solaris 2.5: Point patch for SSA
	

------------------------------------------------------------------------------
6.0  	Bugs & RFE's - Known Problems

6.1  VXVA Core Dumps when Reconnecting a Disk

 Bug Id:     1242519
 Category:  pluto
 Subcategory:  vxvm_va
 State:  dispatched
 Release summary: 2.1.1
 Synopsis:  VXVA core dumps when reconnecting a disk that vxconfigd doesn't 
            see

 Description:
solaris 2.5 vxva 2.1.1 on sun4d 2000 vxva core dumps 

dual hosted ssa.

disk goes bad
Not knowing there is a disk failure, systemA is rebooted
because disk is unreadable it's not in vxconfigd's config
on systemA. SystemB has not been rebooted so it still see's
disk. Disk is replaced and initialized from systemB. Now
back on systemA the attempt is made to issue through vxva
advanced-ops > diskgroup > reconnect. vxva core dumps.
 Workaround:
If you then go to SystemB and reinitialize by vxdctl disable vxdctl enable.  
After which the "advanced-ops > diskgroup > reconnect" operation
works without error.
 Summary:
solaris 2.5 vxva 2.1.1 on sun4d 2000 vxva core dumps 

dual hosted ssa.

disk goes bad
Not knowing there is a disk failure systemA is rebooted
because disk is unreadable it's not in vxconfigd's config
on systemA. SystemB has not been rebooted so it still see's
disk. Disk is replaced and initialized from systemB. Now
back on systemA the attempt is made to issue through vxva
advanced-ops > diskgroup > reconnect. vxva core dumps.

 Work around:
If you then go to SystemB and reinitialize by vxdctl disable vxdctl enable.  
After which the "advanced-ops > diskgroup > reconnect" operation
works without error.


------------------------------------------------------------------------------
7.0  	All Other References Available

 
7.1  FRU part numbers
Explanation of the Model numbering:

XXX = Type of SSA; CPU type; disks
  Model 100 = table-top box; Sparc chip;.5gb disks
        101 = table-top box; Sparc chip; 1.5gb disks
	102 = table-top box; Sparc chip; 2.1gb disks
	112 = table-top box; Swift chip; 2.1gb disks

        200 = Rack mount box; Sparc chip; <no disks>
	210 = Rack mount box; Swift chip; <no disks>

FRUs
 Description						Part Number
----------------------------------------------------------------------------
MODEL (10x):
Front Panel Assy.					540-2382-xx
Fan Tray						540-2573-xx
Chassis Enclosure  	
		Plug Cover				330-1589-xx
		Chassis enclosure			340-2670-xx
		Side panels (2)				330-1470-xx
		Foot (4)				330-1590-xx
		Top/Bottom cover (2)			330-1469-xx
Subsystem backplane					501-2029-xx
Array Controller
	 (without Battery or FC/OM)	      		501-2080-xx
Power Supply						540-2465-xx
Disk Tray						540-2245-xx


MODELs 11x and 21x:  (Pluto-II)
Array Controller 11x (with Battery and FC/OM)		501-2982-02
Array Controller 21x (with Battery and FC/OM)		501-3024-02
Array Controller 11x	(without)			501-2872-03
Array Controller 21x    (without)			501-3021-03


7.2  Documentation Part Numbers

Part Number:	Document:
------------	--------- 

Version2.x:
802-2041-10	SPARCstorage Array Configuration Guide
802-2042-10	SPARCstorage Array User's Guide
801-2205-12	SPARCstorage Array Installation Manual
801-2206-12	SPARCstorage Array Service Manual

==========================================================================
Version 1.x:
801-7838-11	SPARCstorage Array Product Note
801-2204-11	SPARCstorage Array User's Guide
802-1242-10	SPARCstorage Volume Manager System Administrators Guide
801-6530-10	SPARCstorage Array Configuration Guide
801-2205-11	SPARCstorage Array Installation Manual
801-2206-11	SPARCstorage Array Service Manual
801-2207-10	Disk Drive Installation Manual for the SPARCstorage Array 
801-6313-10	Fibre Channel SBus Card Installation Manual 
801-6326-10	Fibre Channel Optical Module Installation Manual 
801-6306-11	Fibre Optic Cable Product Note 
801-7103-10 	SPARCstorage Array Regulatory Compliance Manual 
801-7173-10	Power Cord selection Product Note 


7.3  White Paper Locations:

Sun internal:
all white papers reside on ftp site on newstop, or web page:
	http://www.corp/prodmktg/pme/newstop/whitepapers

Veritas home page:
	http://www.veritas.com


------------------------------------------------------------------------------
8.0  	Supportability
	statement clarifying what is supported, what is not supported.

General Summary for Support of the SPARCstorage Array and VM Software

* Answer specific questions and assist in troubleshooting specific installation 
issues. Answer specific configuration questions.  {Will refer a customer to a 
service provider for installation and  configuration issues too detailed for 
telephone support or requiring  hand-holding assistance.}

* Answer questions regarding product usage. {Refer customers to training or to 
a service provider for  issues too detailed for  telephone support or requiring 
hand-holding assistance.}

* Assist customer in debugging and troubleshooting problems specific to SUN 
systems and SUN products.   {We cannot  guarantee a solution for problems with 
non-Sun products, and will generally refer a customer to a service provider for 
these issues.}

* File bug reports and escalate issues to engineering for product defects to 
provide a fix or workaround.   We may  require the customer to supply a code 
sample of 100 lines or less of a reproducible case of the defect.  {Warranty
calls will not be escalated through CTE.}

* For performance and third party issues, will provide customer with general 
how-to-debug information, and refer them to an appropriate service provider.

The service providers are not limited to, but may include:
ITOPS, Sun Integration, Local Sun office (SSE), Third party equipment or  
software Vendor, Sun Education Services, or NASC T&M assistance.

------------------------------------------------------------------------------
9.0  	Additional support information

Veritas 
	415-335-8000
	800-258-8649

Raid-5 configuration information:  			
	http://www.sun.com/sunworldonline/swol-09-1995/swol-09-raid5.html



9.1 Veritas Tech Note 8889 (1995)

Using Prestoserve with Volume Manager

Prestoserve is a Sun Microsystems product designed to accelerate 
performance of filesystems, particularly when used on a server for NFS 
advertised filesystems. This is accomplished via the use of NVRAM hardware 
and the Prestoserve drivers.  The hardware provides a fast, non-volatile 
solid-state writeback cache that can cause  writes to a disk device to be 
returned to the user as completed before the data reaches the disk.

This mechanism can be configured to work below VxVM as direct replacement 
for the disk device that VxVM uses.  This approach presents no particular 
problems for VxVM, which remains unaware of the underlying cache device.  In 
the event of a failure of the NVRAM devices however, it is possible to lose 
data since the disks backing the NVRAM may not be up-to-date.

Prestoserve can, however, be configured to run above vxVM in such a way that 
VxVM replaces the disks that Prestoserve controls.  In this situation, VxVM 
has a number of problems to address.

One problem is Prestoserve's use of disk devices.  Some applications 
(including Prestoserve) maintain device numbers between reboots.  VxVM 
attempts to maintain device numbers between reboots, but if a different 
combination of disk groups is imported it is possible for a conflict of 
minor numbers to be detected.  In this case, the later import will have 
conflicting devices renumbered to a new minor number range.  The GA load of 
this product will provide a new mechanism for setting minor number ranges 
for a disk group, which will provide the user a mechanism to reliably avoid 
this problem.

The danger of VxVM changing its device numbers on a reboot following a 
system failure is that Prestoserve may flush its dirty buffers to the wrong 
volume devices.  This can have destructive results.

To avoid this problem, use Prestoserve either only for volumes 
that are in the rootdg disk group, or the set of disk groups imported on 
different hosts should be strictly controlled.f VxVM changing its device 
number

Unfortunately there is, as yet, no mechanism to enforce this ordering for 
any disk groups that are automatically imported.  However, is all disk 
groups have at least one configuration copy that is readable and if no 
collisions are detected between the disk groups, this operation will work.

Another problem is with the start up of Prestoserve.  Following a system 
failure, the Prestoserve drivers will cause a flush of all outstanding dirty 
buffers to be flushed to disk.  If this flush request occurs before VxVM 
drivers have been loaded into the kernel and before the volume devices can 
be started and made available for use, then Prestoserve's attempts at 
flushing to the volumes will fail.

Warning:	 This problem could lead to data loss.

To prevent this situation, it is recommended that the order of the starting 
of Prestoserve with respect to the volumes be altered to occur after the 
volumes have been started.  To achieve this result perform the following 
steps:

1. Edit the 	/etc/system file.  Add the line:

exclude: drv/pr

This loads the Prestoserve driver and starts the flush operation
after the volume devices have been started.

2. Edit the  /etc/init.d/vxvm-startup2 file and add the following lines 
to the end of the file:

modload /kernel/drv/pr

presto -p > /dev/null

This will cause a load of the Prestoserve driver following the start of the 
VxVM daemon.

------------------------------------------------------------------------------

10.0 	WARRANTY

WARRANTY:  if we include all this info in product shipments, and explain that 
warranty will only receive this info as help, it may greatly reduce our
warranty calls.
Other than what would be here, they can phone their local Sun office for 
further assistance.

===============================================================================
===============================================================================
===============================================================================
11.0   SSA Support Matrix (Jan. 1997)

      Offical SSA Software/Firmware Configuration Matrix               
                     Rev 0.9.12    15 Jan 97
The following table shows the most recent levels of OS, software 
and firmware for the SSA.

               Table 1: 

 Solaris     SSA       SSA      SOCHA 
          Software   Firmware   Fcode
             (b)

2.3 (c)  103351-02    3.6(i)  1.18/1.3-
         (e)                   3/1.52 
         103479-               (a)(h)
         02(e)(f)

2.4      103290-04     3.9    1.18/1.3-
(a)(d)                         3/1.52 
                               (a)(h)

2.5      103017-05     3.9    1.18/1.3-
                               3/1.52 
                               (a)(h)

2.5.1    103766-02     3.9    1.18/1.3-
                               3/1.52 
                               (a)(h)


The following table shows the OS and the most recent levels of 
disc managment software for the SSA

                   Table 2: 

 Solaris  Vm CD   Vm Patch   Solstice SDS 
         Release            DiskSuite Patch(j)

2.3  (c) 2.1     102403-04     4.0    102580-13
         2.1.1   NONE          4.1    103421-01 
         2.3     NONE                 (g)

2.4      2.1     102403-04     4.0    102580-13
         2.1.1   NONE          4.1    103421-01 
         2.3     NONE                 (g)

2.5      2.1.1   103367-03     4.0    102580-13
         2.3     NONE          4.1    103421-01 
                                      (g)

2.5.1    2.1.1   103367-03     4.0    102580-13
         2.3     NONE          4.1    103421-01 
                                      (g)


(a) Neither Solaris 2.3, 2.4 11/94 or the 1.18 Fcode support 
booting from the SSA.
(b) 101765-02 , for  Solaris 2.3 and SSA2x0 with more than 32 
disc.  102283-01 for  Solaris 2.4 and SSA2x0 with more than 32 
disc. 102446-01, sd/ssd disc format patch for Solaris 2.4. 
(c) Does not support FASTWRITE.
(d) Solaris 2.4 11/94 and 2.4 3/95 now use the same SSA Jumbo 
Patch.
(e) The two patches 103351-02 and 103479-01 are the Solaris 2.3 
equivalent of the Solaris 2.4 patch 103290-02 and Solaris 2.5 
patch 103017-04.  103351-02 contains the PLN driver, SOC driver, 
and 3.6 firmware, while 103479-01 contains the SD driver.  It is 
recommended that they be implemented together.
(f) This is an SD point patch and will be replaced by a Solaris 
2.3 patch in the future.
(g) Patch 103421-01 is a diagnostic utility to check for damaged 
parity on SDS RAID5 volumes. Patch applies only to SDS4.0
(h) The 1.52 Fcode presently is only available by the replacement 
of the present SOCHA Sbus card with the latest SOCHA Sbus card 
(501-2069-09).
(i) A patch for Solaris 2.3 support with the SSA 3.9 firmware has 
not been  produced and tested.
(j) only SDS 4.0 has patches, there are none for 4.1 yet 1.

===============================================================================
11.1  SPARCstorage Array Software Configuration Guide (1996)
________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 1
Version 4.1, 29 February 1996

INTRODUCTION

This document has two parts:  a functionality table to help you decide
which version of Solaris will best meet your SPARCstorage Array (SSA)
needs, and four different matrices of supported SSA software
configurations, one for each supported version of Solaris.

Once you have used the functionary table below to choose your preferred
version of Solaris, you can go straight to the pages indicated.  It is
not necessary to refer to the other matrices in the document.

We hope you find the document useful.

________________________________________________________________________
SPARCstorage Array Functionality Support Table

INSTRUCTIONS:
 
1. Choose the functionality you need.
2. Find the version of Solaris 2.x which supports that functionality 
   (newer is better).
3. Look up the specific SSA Software Configuration Matrix for the 
   desired Solaris 2.x version on the page number indicated in the 
   table below.
 
                       |Solaris 2.5|Solaris 2.4|Solaris 2.4|Solaris 2.3|
Functionality          | HW 11/95  |  HW 3/95  | HW 11/94  |           |
-----------------------|-----------|-----------|-----------|-----------|
Matrix page numbers    |     2     |    3,4    |    5,6    |    7,8    |
-----------------------|-----------|-----------|-----------|-----------|
SSA Model 101,102,200  |    Yes    |    Yes    |    Yes    |    Yes    |
-----------------------|-----------|-----------|-----------|-----------|
SSA Model 112,210      |    Yes    |    Yes    |    Yes    |    Yes**  |
-----------------------|-----------|-----------|-----------|-----------|
RAID 5                 |    Yes    |    Yes    |    Yes    |    Yes    |
-----------------------|-----------|-----------|-----------|-----------|
NVRAM fast-write       |    Yes    |    Yes    |    Yes    |    No     |
-----------------------|-----------|-----------|-----------|-----------|
Bootability            |    Yes    |    Yes    |    No     |    No     |
-----------------------|-----------|-----------|-----------|-----------|
Kernel Asynch. I/O     |    Yes    |    Yes    |    Yes    |    No     |
-----------------------|-----------|-----------|-----------|-----------|
SunSoft Solstice       |    Yes    |    Yes    |    Yes    |    Yes*   |
Disksuite (SDS) 4.0    |           |           |           |           |
-----------------------|-----------|-----------|-----------|-----------|
Veritas Volume Manager |    Yes    |    Yes    |    Yes    |    Yes    |
(VxVM) 2.1.1           |           |           |           |           |
-----------------------|-----------|-----------|-----------|-----------|
Veritas VxVM 2.1       |    No     |    Yes    |    Yes    |    Yes    |
-----------------------|-----------|-----------|-----------|-----------|

* Some features not supported.
** Requires patch due to be released 31 March 1996.
 


________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 2
for Solaris 2.5 Hardware: 11/95
Version 4.1, 29 February 1995


GENERAL INFORMATION:

As a general rule of thumb, we strongly recommend you get the latest 
jumbo patches to ensure you have correct functionality support, plus 
the latest bug fixes.  

REQUIRED COMPONENTS:

Solaris 2.5 Hardware: 11/95 
   patch 103017-02 or later

OPTIONAL COMPONENTS:

Solstice DiskSuite (SDS) 4.0
   patch 102580-xx for SDS 4.0
Veritas Volume Manager (VxVM) 2.1.1

CONFIGURATION MATRIX

SSA System Software is part of Solaris 2.5 Hardware: 11/95, including 
support for Models 101/102/112/200/210.
For SSA installation/upgrade instructions, see:

- SMCC SPARC(tm) Hardware Platform Guide Solaris(tm) 2.5 (802-3697-10)
- SPARCstorage Array Software and Volume Manager 2.1.1 Product Note
  (802-5314-10) (if using VxVM)

Any combination of software, firmware, and FCode not shown in the table 
is not supported.

   ssa    | FC/S     |   Model   | Model |
 firmware | FCode    |101,102,200|112,210|   ODS   |   SDS   |   VxVM    |
----------|----------|-----------|-------|---------|---------|-----------|
  3.4 (e) | 1.18 (a) |     Y     |   Y   | 3.0 (b) | 4.0 (c) | 2.1.1 (d) |
  3.4 (e) | 1.33     |     Y     |   Y   | 3.0 (b) | 4.0 (c) | 2.1.1 (d) |
----------|----------|-----------|-------|---------|---------|-----------|
  2.4     | 1.18 (a) |     Y     |   N   | 3.0 (b) | 4.0 (c) | 2.1.1 (d) |
  2.4     | 1.33     |     Y     |   N   | 3.0 (b) | 4.0 (c) | 2.1.1 (d) |
----------|----------|-----------|-------|---------|---------|-----------|

NOTES:

a. Does not support bootability; see the SSA install/upgrade instructions 
   to determine the FC/S FCode rev, and how to upgrade it, if necessary.
b. Does not support KAIO or RAID 5.
c. Requires patch 102580-01 or later to support KAIO.
d. VxVM 1.x, 2.0 and 2.1 are not supported on Solaris 2.5.
e. Requires patch 103017-02 or later.


________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 3
for Solaris 2.4 Hardware: 3/95
Version 4.1, 29 February 1995

GENERAL INFORMATION:
 
SSA System Software is part of Solaris 2.4 Hardware: 3/95
For SSA installation/upgrade instructions, see:
 
- 2.4 HW 3/95 Hardware Platform Guide (802-2966-10)
- SPARCstorage Array Product Note (802-2043-10)
- SPARCstorage Array Software and Volume Manager 2.1.1 Product Note
  (802-5314-10) (if using VxVM 2.1.1)
- SPARCstorage Array Software and Volume Manager 2.1 Product Note
  (804-4996-10) (only if using VxVM 2.1)

As a general rule of thumb, we strongly recommend you get the latest
jumbo patches to ensure you have correct functionality support, plus
the latest bug fixes.  
 
REQUIRED COMPONENTS:
 
Solaris 2.4 Hardware: 3/95 (Solaris CD, ignore the SSA patches on the
  Updates CD)
SSA jumbo patch 102432-08 (or later) for Solaris 2.4 HW 3/95
patch 102446-01 (format pgm confused by sd and ssd)
patch 102283-01 (disks pgm support for more than 32 disks, 
                 for Model 200/210)
 
OPTIONAL COMPONENTS:
 
Solstice DiskSuite (SDS) 4.0
   patch 102580-xx for SDS 4.0
Online DiskSuite (ODS) 3.0
Veritas Volume Manager (VxVM) 2.1.1
Veritas Volume Manager (VxVM) 2.1
   patch 102403-xx for VxVM 2.1


(Software configuration matrix on next page.)


________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 4
for Solaris 2.4 Hardware: 3/95
Version 4.1, 29 February 1995

Any combination of software, firmware, and FCode not shown in the table
is not supported.

  ssa    | FC/S     |   Model   | Model |
firmware | FCode    |101,102,200|112,210|   ODS   |   SDS   |  VxVM    |
---------|----------|-----------|-------|---------|---------|----------|
 3.4 (a) | 1.18 (c) |     Y     |   Y   | 3.0 (d) | 4.0 (e) | 2.1.1    |
 3.4 (a) | 1.33     |     Y     |   Y   | 3.0 (d) | 4.0 (e) | 2.1.1    |
---------|----------|-----------|-------|---------|---------|----------|
 2.4 (b) | 1.18 (c) |     Y     |   N   | 3.0 (d) | 4.0 (e) | 2.1.1    |
 2.4 (b) | 1.33     |     Y     |   N   | 3.0 (d) | 4.0 (e) | 2.1.1    |
---------|----------|-----------|-------|---------|---------|----------|
 3.4 (a) | 1.18 (c) |     Y     |   N   |         |         | 2.1 (f)  |
 3.4 (a) | 1.33     |     Y     |   N   |         |         | 2.1 (f)  |
---------|----------|-----------|-------|---------|---------|----------|
 2.4 (b) | 1.18 (c) |     Y     |   N   |         |         | 2.1 (f)  |
 2.4 (b) | 1.33     |     Y     |   N   |         |         | 2.1 (f)  |
---------|----------|-----------|-------|---------|---------|----------|

a. ssafirmware 3.4 is on patch 102432-10 (or later).
   See patch README to determine current ssafirmware rev level and
   how to download new ssafirmware, if necessary.
b. ssafirmware 2.4 is on patch 102432-08.
   See patch README to determine current ssafirmware rev level and 
   how to download new ssafirmware, if necessary.
c. Does not support bootability.
d. Does not support KAIO or RAID 5.
e. Requires patch 102580-01 or later to support KAIO.
f. Requires patch 102403-01 or later to support KAIO.


________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 5
for Solaris 2.4 Hardware: 11/94
Version 4.1, 29 February 1995
 
GENERAL INFORMATION:
 
SPARCstorage Array System Software is not part of Solaris 2.4 
Hardware: 11/94, nor is it part of any currently supported unbundled 
software release.  Therefore we recommend that only SPARCstorage Array 
upgrade customers already using Solaris 2.4 Hardware: 11/94 remain on 
this release.  We strongly recommend that customers with new installations 
use Solaris 2.4 Hardware: 3/95 instead.

For SSA upgrade instructions, see:
- SPARCstorage Array Product Note (802-2043-10)
- SPARCstorage Array Software and Volume Manager 2.1.1 Product Note
  (802-5314-10) (if using VxVM 2.1.1)
- SPARCstorage Array Software and Volume Manager 2.1 Product Note
  (804-4996-10) (only if using VxVM 2.1)

As a general rule of thumb, we strongly recommend you get the latest
jumbo patches to ensure you have correct functionality support, plus
the latest bug fixes.  
 
REQUIRED COMPONENTS:

Solaris 2.4 Hardware: 11/94 (Solaris CD, no SSA software on the Updates CD)
SSA jumbo patch 102347-08 (or later) for Solaris 2.4 HW 11/94
patch 102446-01 (format pgm confused by sd and ssd)
patch 102283-01 (disks pgm support for more than 32 disks,
                 for Model 200/210)
 
OPTIONAL COMPONENTS:
 
Solstice DiskSuite (SDS) 4.0
   patch 102580-xx for SDS 4.0
Online DiskSuite (ODS) 3.0
Veritas Volume Manager (VxVM) 2.1.1
Veritas Volume Manager (VxVM) 2.1
   patch 102403-xx for VxVM 2.1
 
(Software configuration matrix on next page.)


________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 6
for Solaris 2.4 Hardware: 11/94
Version 4.1, 29 February 1995

Any combination of software, firmware, and FCode not shown in the table
is not supported.

  ssa    | FC/S     |   Model   | Model |
firmware | FCode    |101,102,200|112,210|   ODS   |   SDS   |  VxVM    |
---------|----------|-----------|-------|---------|---------|----------|
 3.4 (a) | 1.18 (c) |     Y     |   Y   | 3.0 (d) | 4.0 (e) | 2.1.1    |
 3.4 (a) | 1.33 (c) |     Y     |   Y   | 3.0 (d) | 4.0 (e) | 2.1.1    |
---------|----------|-----------|-------|---------|---------|----------|
 2.4 (b) | 1.18 (c) |     Y     |   N   | 3.0 (d) | 4.0 (e) | 2.1.1    |
 2.4 (b) | 1.33 (c) |     Y     |   N   | 3.0 (d) | 4.0 (e) | 2.1.1    |
---------|----------|-----------|-------|---------|---------|----------|
 3.4 (a) | 1.18 (c) |     Y     |   N   |         |         | 2.1 (f)  |
 3.4 (a) | 1.33 (c) |     Y     |   N   |         |         | 2.1 (f)  |
---------|----------|-----------|-------|---------|---------|----------|
 2.4 (b) | 1.18 (c) |     Y     |   N   |         |         | 2.1 (f)  |
 2.4 (b) | 1.33 (c) |     Y     |   N   |         |         | 2.1 (f)  |
---------|----------|-----------|-------|---------|---------|----------|
 
a. ssafirmware 3.4 is on patch 102347-10 (or later).
   See patch README to determine current ssafirmware rev level and
   how to download new ssafirmware, if necessary.
b. ssafirmware 2.4 is on patch 102347-08.
   See patch README to determine current ssafirmware rev level and
   how to download new ssafirmware, if necessary.
c. Bootability is not supported on Solaris 2.4 HW 11/94 regardless of 
   the FCode used.
d. Does not support KAIO or RAID 5.
e. Requires patch 102580-01 or later to support KAIO.
f. Requires patch 102403-01 or later to support KAIO.


________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 7
for Solaris 2.3
Version 4.1, 29 February 1995

GENERAL INFORMATION:
 
SPARCstorage Array system software for Solaris 2.3 is part of the 
unbundled SPARCstorage Array Software and Volume Manager product.
You will need this product in order to install SPARCstorage Array 
system software, even if you do not wish to use the optional Volume 
Manager.  SPARCstorage Array system software is bundled as part of
newer Solaris releases.

For SSA installation/upgrade instructions, see:
 
- SPARCstorage Array Product Note (802-2043-10)
- SPARCstorage Array Software and Volume Manager 2.1.1 Product Note
  (802-5314-10) or
  SPARCstorage Array Software and Volume Manager 2.1 Product Note
  (804-4996-10)

As a general rule of thumb, we strongly recommend that you get the latest
jumbo patches to ensure you have correct functionality support, plus
the latest bug fixes.  We strongly recommend you use Solaris 2.4!!!
 
REQUIRED COMPONENTS:
 
Solaris 2.3
      kernel jumbo -54, -68, or later
SPARCstorage Array Software and Volume Manager 2.1.1 (or 2.1)
102465-03 (or later) for SPARCstorage software 2.1.1 (or 2.1)
patch 101765-02 (disks pgm support for more than 32 disks,
                 for Model 200/210)

OPTIONAL COMPONENTS:

Solstice DiskSuite (SDS) 4.0
   patch 102580-xx for SDS 4.0
Online DiskSuite (ODS) 3.0
Veritas Volume Manager (VxVM) 2.1.1
Veritas Volume Manager (VxVM) 2.1
   patch 102403-xx for VxVM 2.1

(Software configuration matrix on next page.)


________________________________________________________________________
SPARCstorage Array Software Configuration Matrix                  Page 8
for Solaris 2.3
Version 4.1, 29 February 1995

Any combination of software, firmware, and FCode not shown in the table
is not supported.

 
  ssa    | FC/S     |   Model   | Model |
firmware | FCode    |101,102,200|112,210|   ODS   |   SDS   |  VxVM    |
---------|----------|-----------|-------|---------|---------|----------|
 3.4 (a) | 1.18 (c) |     Y     |   Y   | 3.0 (d) | 4.0 (e) | 2.1.1 (e)|
 3.4 (a) | 1.33 (c) |     Y     |   Y   | 3.0 (d) | 4.0 (e) | 2.1.1 (e)|
---------|----------|-----------|-------|---------|---------|----------|
 1.12(b) | 1.18 (c) |     Y     |   N   | 3.0 (d) | 4.0 (e) | 2.1.1 (e)|
 1.12(b) | 1.33 (c) |     Y     |   N   | 3.0 (d) | 4.0 (e) | 2.1.1 (e)|
---------|----------|-----------|-------|---------|---------|----------|
 3.4 (a) | 1.18 (c) |     Y     |   N   |         |         | 2.1 (e)  |
 3.4 (a) | 1.33 (c) |     Y     |   N   |         |         | 2.1 (e)  |
---------|----------|-----------|-------|---------|---------|----------|
 1.12(b) | 1.18 (c) |     Y     |   N   |         |         | 2.1 (e)  |
 1.12(b) | 1.33 (c) |     Y     |   N   |         |         | 2.1 (e)  |
---------|----------|-----------|-------|---------|---------|----------|
 
a. ssafirmware 3.4 will be on a future revision of patch 102465,
   due 31 March 1996.
b. ssafirmware 1.12 is on patch 102465-02.
   See patch README to determine current ssafirmware rev level and
   how to download new ssafirmware, if necessary.
c. Bootability is not supported on Solaris 2.3 regardless of the FCode used.
d. Does not support KAIO or RAID 5.
e. KAIO not supported on Solaris 2.3.



===============================================================================

SOLUTION SUMMARY:

 

PRODUCT AREA: SunOS Unbundled
PRODUCT: Veritas Volume Manager
SUNOS RELEASE: any
HARDWARE: SPARCstorage Array