The Sun Constant Performance Metric The Sun Constant Performance Metric (SCPM) is a metric that is designed for capacity planning purposes. It indicates the processing potential of a combination of hardware and operating system. The specific intent of the metric is to enable the comparison of dissimilar systems, in order to provide the user with an indication of how to plan capacity in a consistent and methodical fashion. In addition to comparing systems, the metric provides a convenient way of expressing the amount of work being done by a system. Historical Context Traditionally, comparing systems has been done by comparing similar benchmarks, such as SPECint92 or TPC-C. However, getting a comparable set of benchmarks on each platform is difficult at best, because the benchmarks evolve at a different rate than the platforms. The current UltraSPARC-based platforms are benchmarked with SPEC95 and TPC-C, while older SuperSPARC-based systems such as the SPARCstation 20 were benchmarked with SPEC92 and TPC-B. Because of the very large semantic differences between successive generations of benchmarks, there isn't any realistic way to compare a SPEC92 result with a SPEC95 result, or a TPC-B with a TPC-C. Multiprocessor systems further complicate the problem by enabling many possible configurations for each basic model. Which would deliver higher performance, an E6000 with 13x 167MHz/1MB modules or an E4500 with 7x 250MHz/4MB modules? With multicomputer processing systems beginning to appear, it was clear that a new way of comparing machine capacity had to be created. Metrics The SCPM indicates the processing potential of the system as a whole. It is derived from a combination of benchmarking and quantitative estimations. It is neither practical nor necessary to benchmark every single configuration. Fortunately, systems are sufficiently well-behaved that interpolations from extreme configurations need only spot-checking to ensure the required accuracy. (For example, all of the configurations using an odd number of processors are interpolated, with the exception of uniprocessors.) Also known as an "M-value", the units of SCPM are quanta. Both the term "M-value" and the unit "quanta" are historical, and reflect the terms used in the literature[1]. A configuration with an SCPM of 10,000 is said to have an M-value of 10,000. If it is running at 50% utilization, the same system is said to be expending 5,000 quanta. Comparing systems is straightforward: if system A has twice the M-value of system B, it is about twice as fast. In the example above, the 13x 167/1MB system has M=19,997, while the 7x 250/4MB system has M=18,049. So the older system delivers greater throughput, by a small margin. It is often useful to characterize existing workloads in terms of the quanta expended per user. A system with M=10,000 that serves 300 users is expending an average of about 33 quanta per user, while another system with M=71 but only one user is is working more than twice as hard on a per-user basis. By knowing the basic characteristics of an operational workload, it is relatively straightforward to plan platform transitions or predicted growth in user populations. If a particular application is known to consume about 50 quanta per user, a reasonable estimate is that 1,000 users will require a system with M=50,000. Similarly, an entire methodology has been developed that characterizes the demand on peripheral resources as a function of expended processing effort. For example, the term relative I/O content (denoted R) is the ratio of disk I/Os to CPU expenditure, and similarly for relative network content (N). Because workloads usually have fairly predictable trends, these and other related metrics are very useful for predicting workload characteristics [2]. None of these metrics should be taken as absolute gospel, nor should one expect them to be minutely accurate. Getting to within 10%-15% is about all that can be expected, since there are far too many variables not taken into account. However, the metrics do provide a structured, methodical framework in which to begin the capacity planning process. Metric Tables Because the operating system is a significant component of the operating platform's capability, the SCPM is provided as a series of tables, one for each operating system. These are: Solaris 7 Solaris 2.6 Solaris 2.5 (and Solaris 2.5.1) Solaris 2.4 No tables have been measured or computed for Solaris 1, nor for any competitive platforms. Latency Like every other performance metric, the SCPM has some idiosyncracies. In particular, it is biased toward analysis of throughput, rather than processing latency. Two systems that have the same M-value will process approximately the same amount of work in a given period of time. However, this does not mean that the user experience on the two systems will be precisely the same. In particular, the metric does not account for wide variances in the basic processor speed. For example, consider a uniprocessor with M=10,000 and a dual-processor with M=10,000. In most circumstances, they run the same number of transactions per minute, but the uniprocessor's basic processing speed is twice as fast, so response time will probably feel much different to a user. Fortunately, this type of analysis isn't always mandatory, since most capacity planners are focused on throughput. Furthermore, the differences in basic CPU speed are usually not so disparate as to create serious problems with the analysis. Scalability of large systems Another major issue is scalability. The scalability of an application is dependent on a large number of variables, including the design of the application, the way it uses a database management system and the inherent interaction of data and users within the application. Naturally, the scalability of the hardware platform and it operating system must also be considered. Fortunately, the hardware and Solaris are mature and scale sufficiently well that it is unlikely that either will be the limiting factor in scalability. The SCPM tables necessarily make assumptions about the scalability of the application-DBMS-OS-hardware stack. The specific assumptions are rarely an issue for small and even mid-range platforms, but are of critical importance to E10000 users. The scalability represented in the table reflects typical user experience. However, applications have been observed to be both much less and much more scalable. The chart at left shows the scalability of a number of benchmarks. The typical scalability of user benchmarks is in the middle of the range, near the theoretical linear (dashed line).The user benchmarks included SAP, PeopleSoft, ad-hoc decision support and some home-grown transaction processing applications (their specific identities cannot be disclosed due to non-disclosure requirements.) In the absence of any other information, a first-order approximation of scalability can be deduced from the ratio of processing to shared code. The more processing done per unit of data, the less likely it is that the user will encounter other users in shared code. TPC-C scales less well than the user applications because it has proportionately less processing per unit of shared work in the database. The NFS server benchmark LADDIS does not scale very well past about 16-20 CPUs. It is nearly a worst-case, because 100% of its code is running under the operating system's locks, compared with about 15% in the other commercially-oriented benchmarks. The last curve illustrates a case which in fact is superlinear. The code is a heavily multithreaded HPC program that requires approximately 50 MB of CPU cache to run efficiently. Each processor has 4MB of cache, so the performance of small configurations is terrible (and also not very scalable). The application is missing cache constantly and therefore running at memory speeds. But when the cache configuration exceeds 50 MB, the code runs from cache and at far higher speed. Once the code runs from cache, it is far more efficient than the base case, resulting in superlinear performance. This type of code is uncommon, but does exist. Counterexamples also exist, of course. We have seen at least one code that ran about as fast on a 16-cpu system as on a uniprocessor. In fact, when we turned off fourteen of the cpus, response time got slightly better (gulp)! It turned out that the application was using a very crude locking strategy. Users had to obtain exclusive access to the primary table in the database in order to make progress. Even worse, the table was locked for essentially the entire duration of a transaction, resulting in a system that could effectively run only a single user's transaction at any given time. Scalability was almost literally zero. Scalability of "small" systems One last comment on scalability. Scalability is almost never an issue on small to midrange platforms. The graph below zooms in on the same data presented above, but concentrates on the 1-24 processor range.With the exception of the LADDIS curve, it is nearly impossible to distinguish the various curves. We've used LADDIS as an example of a code that does not scale well. It's worth noting that despite the relatively poor scalability, Sun's LADDIS scores are excellent, and are the highest non-cluster results. See the SPEC reporting page for LADDIS for an illustration. ------------------------------------------------------------------------ NOTE: to convert between x.xx and xxxx values (new versus old), use the multiplier of 3899 for x.xx to xxxx or divide xxxx by 3899. ------------------------------------------------------------------------ Solaris 8 Relative Performance E10K/1-400MHz Sun Fire Servers (F3800-F6800) NCPU 900/8MB 1 1.92 2 3.82 3 5.71 4 7.59 5 9.45 6 11.31 7 13.15 8 14.96 9 16.79 10 18.57 11 20.36 12 22.14 13 23.90 14 25.66 15 27.39 16 29.12 17 30.83 18 32.54 19 34.22 20 35.90 21 37.55 22 39.21 23 40.87 24 42.50 Sun Fire F15000 (15K) NCPU 900/8MB 4 7.26 8 14.22 12 20.92 16 27.34 20 33.50 24 39.44 28 45.15 32 50.62 36 55.87 40 60.92 44 65.78 48 70.45 52 74.90 ... 56 79.21 60 83.34 64 87.31 68 91.13 72 94.78 Sun Fire V880 NCPU 750/8MB 1 1.71 2 3.39 3 5.04 4 6.70 5 8.36 6 9.96 7 11.59 8 13.20 Enterprise Servers (E3000-E6500) NCPU 464/8MB 400/8MB 1 1.18 1.08 2 2.34 2.14 3 3.49 3.18 4 4.61 4.20 5 5.73 5.22 6 6.83 6.22 7 7.90 7.18 8 8.94 8.15 9 9.99 9.12 10 11.01 10.04 11 12.03 10.98 12 13.02 11.87 13 13.99 12.79 14 14.96 13.66 15 15.90 14.52 16 16.82 15.39 17 17.73 16.23 18 18.65 17.07 19 19.54 17.89 20 20.41 18.68 21 21.27 19.49 22 22.11 20.25 23 22.93 21.02 24 23.77 21.78 25 24.56 22.55 26 25.35 23.29 27 26.14 24.00 28 26.90 24.71 29 27.67 25.43 30 28.41 26.11 Enterprise 10000 (Starfire) NCPU 466/8MB 400/8MB 1 1.10 1.00 2 2.19 1.99 3 3.26 2.98 4 4.33 3.95 5 5.38 4.89 6 6.42 5.86 7 7.46 6.80 8 8.48 7.72 9 9.48 8.64 10 10.47 9.55 11 11.46 10.47 12 12.43 11.36 13 13.40 12.25 14 14.37 13.12 15 15.31 13.99 16 16.25 14.85 17 17.17 15.69 18 18.09 16.54 19 19.01 17.38 20 19.90 18.22 21 20.79 19.03 22 21.66 19.82 23 22.52 20.64 24 23.39 21.43 25 24.23 22.22 26 25.07 22.98 27 25.91 23.77 28 26.73 24.51 29 27.54 25.27 30 28.33 26.01 31 29.15 26.75 32 29.94 27.49 33 30.70 28.23 34 31.46 28.94 35 32.23 29.66 36 32.99 30.34 37 33.73 31.03 38 34.47 31.75 39 35.21 32.41 40 35.92 33.10 41 36.64 33.76 42 37.35 34.42 43 38.06 35.08 44 38.75 35.72 45 39.44 36.36 46 40.10 36.99 47 40.76 37.63 48 41.43 38.27 49 42.09 38.88 50 42.75 39.49 51 43.39 40.10 52 44.03 40.69 53 44.64 41.27 54 45.27 41.86 55 45.89 42.45 56 46.50 43.03 57 47.11 43.59 58 47.69 44.15 59 48.28 44.71 60 48.87 45.27 61 49.43 45.81 62 50.01 46.34 63 50.57 46.88 64 51.13 47.41 Workgroup Servers E250/E450 NCPU 480/8MB 400/4MB 1 1.24 1.08 2 2.42 2.11 3 3.54 3.08 4 4.64 4.03 E220/E420 NCPU 450/4MB 1 1.19 2 2.34 3 3.41 4 4.46 Sun Fire 280R NCPU 750/8MB 1 1.63 2 3.24 ------------------------------------------------------------------------ Sun Constant Performance Metrics *** must convert to convert between x.xx and xxxx values (new versus old), use the multiplier of 3899 for x.xx to xxxx or divide xxxx by 3899. ------------------------------------------------------------------------ Enterprise 3000 - Enterprise 6500 NCPU 400/8MB 400/4MB 336/4MB 250/4MB 250/1MB 167/1MB 167/512K 1 4210 3900 3300 2710 2360 1880 1670 2 8360 7730 6550 5380 4620 3710 3260 3 12400 11400 9740 8000 6790 5460 4790 4 16400 15100 12800 10500 8870 7160 6240 5 20400 18700 15900 13100 10800 8800 7630 6 24300 22200 18900 15600 12700 10300 8960 7 28200 25700 21900 18000 14600 11900 10200 8 32000 29100 24800 20400 16300 13300 11400 9 35700 32400 27700 22800 18000 14700 12600 10 39400 35700 30500 25100 19600 16100 13700 11 43000 38800 33200 27400 21200 17400 14700 12 46600 42000 35900 29600 22700 18700 15700 13 50100 45000 38600 31800 24100 19900 16700 14 53600 48000 41200 34000 25500 21100 17600 15 57000 51000 43800 36100 26800 22300 18500 16 60300 53900 46300 38200 28100 23400 19300 17 63600 56700 48800 40300 29300 24500 20100 18 66900 59500 51200 42300 30400 25500 20900 19 70100 62200 53600 44300 31500 26500 21600 20 73300 64800 55900 46300 32600 27400 22300 21 76400 67400 58200 48200 33600 28400 23000 22 79500 70000 60500 50100 34600 29300 23600 23 82500 72500 62700 52000 35500 30100 24200 24 85500 74900 64900 53800 36400 31000 24800 25 88400 77400 67000 55600 37300 31800 25300 26 91300 79700 69100 57400 38100 32500 25900 27 94100 82000 71200 59100 38900 33300 26400 28 96900 84300 73200 60800 39700 34000 26900 29 99700 86500 75200 62500 40400 34700 27300 30 102000 88700 77200 64100 41100 35400 27800 Enterprise 10000 (Starfire) NCPU 400/8MB 400/4MB 336/4MB 250/4MB 250/1MB 1 3920 3630 2970 2360 2120 2 7800 7220 5920 4700 4200 3 11600 10700 8840 7020 6250 4 15400 14200 11700 9330 8250 5 19200 17700 14600 11600 10200 6 22900 21100 17400 13800 12100 7 26600 24500 20200 16100 14000 8 30300 27800 23000 18300 15800 9 33900 31100 25700 20600 17600 10 37500 34400 28400 22800 19400 11 41000 37600 31100 25000 21100 12 44500 40800 33800 27100 22800 13 48000 43900 36400 29300 24500 14 51400 47000 39100 31400 26100 15 54900 50100 41600 33600 27700 16 58200 53100 44200 35700 29300 17 61600 56100 46700 37800 30800 18 64900 59100 49300 39800 32400 19 68200 62000 51700 41900 33800 20 71400 64900 54200 43900 35300 21 74600 67700 56600 46000 36700 22 77800 70600 59100 48000 38100 23 80900 73300 61400 50000 39500 24 84000 76100 63800 52000 40800 25 87100 78800 66100 53900 42100 26 90200 81500 68500 55900 43400 27 93200 84200 70800 57800 44600 28 96200 86800 73000 59800 45900 29 99100 89400 75300 61700 47100 30 102000 91900 77500 63600 48300 31 105000 94500 79700 65400 49400 32 107000 97000 81900 67300 50600 33 110000 99400 84000 69200 51700 34 113000 101000 86200 71000 52800 35 116000 104000 88300 72800 53800 36 119000 106000 90400 74700 54900 37 121000 109000 92500 76500 55900 38 124000 111000 94500 78200 56900 39 127000 113000 96600 80000 57900 40 129000 115000 98600 81800 58900 41 132000 118000 100000 83500 59800 42 135000 120000 102000 85200 60700 43 137000 122000 104000 87000 61600 44 140000 124000 106000 88700 62500 45 142000 126000 108000 90400 63400 46 145000 129000 110000 92100 64300 47 147000 131000 112000 93700 65100 48 150000 133000 114000 95400 65900 49 152000 135000 115000 97000 66700 50 154000 137000 117000 98700 67500 51 157000 139000 119000 100000 68300 52 159000 141000 121000 101000 69100 53 162000 143000 123000 103000 69800 54 164000 144000 124000 105000 70500 55 166000 146000 126000 106000 71200 to convert between x.xx and xxxx values (new versus old), use the multiplier of 3899 for x.xx to xxxx or divide xxxx by 3899. 56 168000 148000 128000 108000 71900 57 171000 150000 129000 109000 72600 58 173000 152000 131000 111000 73300 59 175000 154000 133000 112000 74000 60 177000 156000 134000 114000 74600 61 179000 157000 136000 115000 75200 62 181000 159000 138000 117000 75900 63 184000 161000 139000 118000 76500 64 186000 162000 141000 120000 77100 Enterprise 250 / Enterprise 450 NCPU 400/2MB 300/2MB 250/2MB 1 3800 3120 2490 2 7500 6150 4810 3 0 0 0 4 14300 11700 9360 Ultra2 / Ultra60 / Ultra30 NCPU 360/2MB 300/2MB 200/2MB 1 3680 3120 2520 2 6830 5790 4690 Ultra5 / Ultra10 NCPU 440/2MB 360/512K 300/512K 266/512K 1 3280 2410 2050 2050 1840 Ultra1 NCPU 167/512K 143/512K 1 1500 1280 SPARCcenter 2000E / SPARCcenter 2000 NCPU 85/2MB 60/2MB 60/1MB 50/2MB 50/1MB 40/1MB 1 837 770 707 651 451 397 2 1660 1530 1400 1290 898 790 3 2480 2280 2090 1930 1340 1170 4 3290 3020 2780 2560 1770 1560 5 4090 3760 3460 3180 2200 1940 6 4880 4490 4120 3800 2630 2310 7 5660 5210 4790 4410 3050 2690 8 6440 5920 5440 5010 3470 3050 9 7200 6630 6090 5610 3880 3420 10 7960 7320 6730 6200 4290 3780 11 8710 8010 7360 6780 4700 4130 12 9450 8690 7990 7360 5100 4480 13 10100 9370 8610 7930 5490 4830 14 10900 10000 9220 8490 5880 5180 15 11600 10600 9830 9050 6270 5520 16 12300 11300 10400 9600 6650 5850 17 13000 11900 11000 10100 7030 6190 18 13700 12600 11600 10600 7400 6520 19 14400 13200 12100 11200 7770 6840 20 15000 13800 12700 11700 8140 7160 SPARCserver 1000E / SPARCserver 1000 NCPU 85/1MB 60/1MB 50/1MB 40/1MB 1 662 623 441 387 2 1280 1210 829 724 3 1880 1770 1170 1010 4 2430 2310 1470 1270 5 2960 2810 1730 1490 6 3460 3290 1960 1680 7 3930 3750 2170 1850 8 4380 4190 2350 2000 SPARCstation20 NCPU 75/1MB 60/1MB 50/1MB 50/noE$ 1 638 530 402 290 2 1230 1020 782 563 3 0 0 0 0 4 0 0 1480 0 SPARCstation10 NCPU 50/1MB 40/1MB 40/noE$ 1 338 284 204 2 657 552 397 3 0 0 0 4 1240 0 0 SPARCstation2, SPARCstation IPX NCPU 40/64K 1 149