Documented configuration tunables The Solaris 2 AnswerBook Performance Section offers a list of tunable parameters. The size of these data structures has no effect on performance, but if they are set too low an application might not run at all. Configuring shared memory allocations for databases falls into this category. Kernel configuration and tuning variables normally are edited into the /etc/system file by hand. Unfortunately, any kernel data that has a symbol can be set via this file at boot time, whether it is a documented tunable or not. The result of such fiddling can be less than ideal. The kernel is supplied as many separate modules (type ls /kernel/* to see some of them); to set a variable in a module or device driver when it is loaded, the variable name must be prefixed by the module name and a colon. For example: set pt_cnt = 1000 set shmsys:shminfo_shmmax = 0x20000000 The history of kernel tuning Given today's self-adjusting kernels, why is there so much emphasis on tuning? And why do we expect big performance boosts available from kernel tweaks? I think the reasons are historical. For an explanation, let's revist my car analogy. Compare a car from the 1970s with a similar 1995 model. The older car has a carburetor, needs regular tune-ups, and is likely to be temperamental at best. The 1995 car has computerized fuel injection, self-adjusting engine components, and is easier to live with, consistent and reliable. If the old car won't start reliably, you get out the manual and tinker with a large number of fine adjustments. In contrast, the new car's computerized ignition and fuel injection systems have few if any user-serviceable components. Unix started out in an environment where users had source code and did their own tuning and support. (If you like this way of working, you probably should run the free Unix clone Linux on your PC at home -- if you don't already.) As Unix became a commercial platform for running applications, the user profile changed. Today's users typically only want to run their applications, and consider tinkering with the operating system an unwelcome distraction. SunSoft engineers put a lot of effort into automating the tuning for Solaris 2. It adaptively scales according to the hardware capabilities and the current workload. The self-tuning nature of modern cars is now a major selling point. Likewise, the self-configuring and self-tuning nature of Solaris contributes to its ease of use and greatly reduces the potential gains from tweaking it yourself. With each successive version of Solaris 2, SunSoft has removed tuning variables by converting hand-adjusted values into adaptively managed limits. If SunSoft can describe a tunable variable and offer detailed guidelines about when and how it should be tuned, it could have either documented this in the manual or implemented the tuning automatically. Rather than require manual tuning by users and administrators, SunSoft opted to employ automatic tuning in most cases. The Solaris 2 tuning manual should really tell you which things don't need to be tuned any more, but it doesn't. This is one of my complaints about the manual, which, in my opinion, is in need of a complete rewrite. It is too closely based on the original Unix System V manual from many years ago, when tuning was needed and worthwhile. Tuning to incorporate extra information An adaptively managed kernel can react only to the workload it sees. If you know enough about the workload, you may be able to use the extra information to effectively pre-configure the algorithms. In most cases the gains are minor. Increasing the size of the name caches on NFS servers falls into this category. One problem is that the administrator often knows enough to be dangerous, but not enough to be useful. Tuning during development The primary reason so many obscure "folklore" kernel tunables exist is that tunables are often used to provide options and allow tuning during the development process. Kernel developers can read the source code and try things out under controlled conditions. When the final product ships, the tunables are often still there. Each bug fix and new version of a product potentially changes the meaning of the tunables. This is the biggest danger for an end user, who is guessing what a tunable does from its name or from knowledge of an older Unix implementation. Tuning to solve problems When a bug or performance problem is needs fixing, the engineer typically tries to find an easy workaround that can be implemented immediately. It takes much longer to rewrite and test the code to eliminate the problem, so a proper fix likely won't exist until it appears in a patch or in the next release of the operating system. There may be a kernel tunable that can be changed to provide a partial workaround, and this information will be provided to users. Unfortunately, these "point-patch" fixes sometimes become part of the folklore and are propagated indiscriminately -- a short-term fix in one case may be a long-term problem in another. In one real-life case a large SPARCcenter 2000 configuration was running very slowly. The problem turned out to be a setting in /etc/system that had been supplied to fix a problem on a small SPARCstation 2 several years before. The administrator had carefully added it during installation to every machine at his site. Instead of increasing the size of a dynamically configured kernel table on a SPARCstation 2 with 32 megabytes of RAM, the tweak was drastically reducing the table size on a machine with 1 gigabyte of RAM. The underlying problem did not even exist in the version of Solaris 2 that was currently being used at the site! The lesson: Clean out your /etc/system when you upgrade. The placebo effect You may be convinced that setting a tunable has a profound effect on your system when it is truly doing nothing. In one case an administrator was adamant that a bogus setting could not be removed from /etc/system without causing serious performance problems. Although the "variable not found" error message that displayed during boot was pointed out, it took a while to convince him that this meant that the variable no longer existed in this release and thus the setting could not be having any effect. Tunable kernel parameters The kernel chapter of my book, Sun Performance and Tuning: SPARC and Solaris, explains how the main kernel algorithms work. The kernel tunable values listed in this section include the main tunables worth worrying about. A huge number of global values are defined in the kernel; if you hear of a tweak that is not listed here, think twice before using it. The algorithms, default values, and existence of many of these variables vary from one release to the next. Do not assume that an undocumented tweak that works well for one kernel will apply to other releases, other kernel architectures of the same release, or even a different patch level. The tables that follow are taken from Appendix A of my book, and contain cross-references to the detailed descriptions in the book. -------------------------------------------------------------------------------- Primary Configuration Variables in Solaris 2.3, 2.4 and 2.5 Name Default Min Max Reference ____ _______ ___ ___ _________ maxusers MB available 8 2048 "Autoconfiguration of maxusers in Solaris RAM(physmem) 2.3 and Solaris 2.4" on page\x11188 pt_cnt 48 48 3000 "Changing maxusers and Pseudo-ttys in Solaris 2" on page\x11187 -------------------------------------------------------------------------------- maxusers I never set maxusers. It sizes itself based on the amount of RAM in the system. In some cases on configurations with gigabytes of RAM it needs to be reduced to avoid problems with lack of kernel address space. The kernel uses up a lot of space keeping track of all the RAM in a system. Several other kernel table sizes and limits are derived from maxusers. The name is historical, and has no real link to the number of users a system is expected to support. pt_cnt The variable that really limits the number of remote user logins on the system is pt_cnt. It may be necessary to set the number of pseudo-ttys higher than the default of 48, especially in a time-sharing system that uses telnet from Ethernet terminal servers to connect users to the system. Solaris 2.3 and later are tested up to about 3000 idle, pseudo-tty-based logins. A practical limit is imposed by the format of the utmp file entry of 62*62 = 3844 telnets and another 3844 rlogins; it is best to keep pt_cnt under 3000. To actually create the /dev/pts entries, a boot -r is required after pt_cnt is set. -------------------------------------------------------------------------------- File Name and Attribute Cache Sizes for Solaris 2 Name Default Min Max Reference ____ _______ ___ ___ _________ ncsize (maxusers 226 34906 "Directory Name Lookup Cache" on page\x11189 * 17) + 90 ufs_ninode (maxusers 226 34906 "The Inode Cache and File Data Caching" on * 17) + 90 page\x11191 -------------------------------------------------------------------------------- ncsize The directory name lookup cache (DNLC) is sized to a default value based on maxusers. A large cache size (ncsize) significantly helps NFS servers that have a lot of clients. On other systems the default is adequate. The only limit to the size of the DNLC cache is available kernel memory. For NFS server benchmarks, the limit has been set as high as 16,000; for the maximum maxusers value of 2048, the limit would be set at 34,906. Each DNLC cache entry is quite small, since it basically just holds up to a 30-character name. Increase it to at least 5000 on a busy NFS server that has 256 megabytes or less RAM by adding the following line to /etc/system: set ncsize=5000 If you have more than 256 megabytes of RAM, ncsize will already be big enough. -------------------------------------------------------------------------------- Hardware-Specific Configuration Tunables Name Default Min Max Reference ____ _______ ___ ___ _________ use_mxcc_prefetch 0 (sun4d) 0 1 "The SuperSPARC with SuperCache 1 (sun4m) Two-level Cache Architecture" on page\x11159. -------------------------------------------------------------------------------- use_mxcc_prefetch This one falls in the category of knowing your workload and optimizing accordingly. The SuperSPARC's external cache controller can pre-fetch the next cache subblock before you need it. This tends to improve performance in floating-point-intensive applications that sweep through memory sequentially. Database applications have a random access pattern, so prefetching does not help, and will most likely get in the way. By default, prefetch is turned on for desktop systems like the SPARCstation 20, and turned off on servers like the SPARCserver 1000. You could try changing the setting for SPARCstation 20 database servers and/or SPARCserver 1000 compute servers. System V shared memory and semaphores Shared-memory parameters usually are set based on the needs of specific applications. Most of these parameters are limits, so setting them too high does not consume any extra resources. The shmsys:shminfo_shmni tunable is an exception, as it causes structures to be preallocated. -------------------------------------------------------------------------------- Shared Memory and Semaphore Tunables in Solaris 2 Name Default Min Max Reference ____ _______ ___ ___ _________ shmsys:shminfo_shmmax 1048576 1048576 Available Maximum shm segment RAM size in bytes shmsys:shminfo_shmmin 1 1 - Minimum shm segment size in bytes shmsys:shminfo_shmni 100 100 - Number of shm identifiers to pre-allocate shmsys:shminfo_shmseg 6 6 - Maximum number of shm segments per process semsys:seminfo_semmap 10 10 - Number of entries in semaphore map semsys:seminfo_semmni 10 10 65535 Number of semaphore identifiers semsys:seminfo_semmns 60 - - Number of semaphores in system semsys:seminfo_semmnu 30 - - Number of undo structures in system semsys:seminfo_semmsl 25 - - Maximum number of semaphores per ID semsys:seminfo_semopm 10 - - Maximum number of operations per semop call semsys:seminfo_semume 10 - - Maximum number of undo entries per process semsys:seminfo_semusz 96 - - Size in bytes of undo structure, derived from semume semsys:seminfo_semvmx 32767 - - Semaphore maximum value semsys:seminfo_semaem 16384 - - Adjust on exit maximum value msgsys:msgmap 100 100 - # of entries in msg map msgsys:msgma 2048 2048 - max message size msgsys:msgnb 4096 4096 - max # bytes on queue msgsys:msgmni 50 50 - # of message queue identifiers msgsys:msgssz 8 8 - msg segment size (should be word size multiple msgsys:msgtql 40 40 - # of system message header msgsys:msgseg 1024 1024 32767 # of msg segments -------------------------------------------------------------------------------- The ones that went away I looked at HP-UX 9.0 on an HP 9000 server. The sam utility provides an interface for kernel configuration. Like Solaris 1/SunOS 4, the HP-UX kernel must be recompiled and relinked to tune it and to add drivers and subsystems. In Solaris 2, filesystems, drivers, and modules are loaded into memory when they are used, and the memory is returned if the module is no longer needed. Rather than provide a GUI, the whole process is made transparent. There are 50 or more tunable values listed in sam. Some of them are familiar or map to dynamically managed Solaris 2 parameters. There is a maxusers parameter that must be set manually, and several other parameters that are sized based upon maxusers in a similar way to Solaris 2. Of the tunables that I can identify, the Solaris 2 equivalents are either unnecessary or listed above. Dynamic kernel tables in Solaris 2 Solaris 2 dynamically manages the memory used by the open file table, the lock table (in 2.5), the callout queue, the streams subsystem, the process table, and the inode cache. Unlike other Unix implementations, which statically allocate a full-size array of data structures and thus waste a lot of precious memory, Solaris 2 allocates memory as it goes along. Some of the old tunables used to size the statically allocated memory in other Unixes still exist in Solaris 2, but now they are used as limits to prevent too many data structures from being allocated. This dynamic allocation approach is one reason why it is safe to let maxusers scale automatically to very high levels. In Solaris 1 or HP-UX 9, se tting maxusers to 1024 and rebuilding the kernel would result in a huge kernel (which might not be able to boot) and a huge waste of memory. In Solaris 2, however, the relatively small DNLC is the only statically sized table derived from maxusers.