[Cockcroft98] Chapter 1. Quick Tips and Recipes

来源:百度文库 编辑:神马文学网 时间:2024/06/13 00:00:31

Chapter 1. Quick Tips and Recipes

Thisbook presents too much detail for you to absorb quickly, so, to startyou off, I’ll first list a few recurring situations and frequentlyasked questions I have encountered, with references to the rest of thebook for in-depth detail. If you are new to Solaris, some of these tipsmay be confusing. You can follow the cross-references to get a deeperexplanation, or read on and return to the tips later.

Forsituations where you have no idea where to begin, I have also outlineda “cold start” procedure that should help you focus on problem areas.That section is followed by some performance-oriented configurationrecipes for common system types.

Quick Reference for Common Tuning Tips

Thislist focuses primarily, but not exclusively, on servers running Solaris2. It should help you decide whether you have overloaded the disks,network, available RAM, or CPUs.

The system will usually have a disk bottleneck.

In nearly every case the most serious bottleneck is an overloaded or slow disk. Useiostat -xn 30[1]to look for disks that are more than 5 percent busy and have averageresponse times of more than 30 ms. The response time is mislabeledsvc_t;it is the time between a user process issuing a read and the readcompleting (for example), so it is often in the critical path for userresponse times. If many other processes are accessing one disk, a queuecan form, and response times of over 1000 ms (not a misprint, over onesecond!) can occur as you wait to get to the front of the queue. Withcareful configuration and a disk array controller that includesnonvolatile RAM (NVRAM), you can keep average response time under 10ms. See “Load Monitoring and Balancing” on page 75 for more details.

[1] The-xn option is Solaris 2.6 specific; use-x in previous releases.

Increasing the inode cache size may help reduce the number of disk I/Os required to manage file systems; see “The Inode Cache and File Data Caching” on page 362. If you have a large memory configuration, the inode cache will already be big enough.

Keep checkingiostat -xn 30as tuning progresses. When a bottleneck is removed, the system maystart to run faster, and as more work is done, some other disk willoverload. At some point, you may need to stripe file systems andtablespaces over multiple disks.

Disksthat contain UFS file systems will show high average service times whenthey are idle. This is caused by short bursts of updates from thefilesystem flushing process described in “Idle Disks and Long Service Times” on page 186. This effect can safely be ignored, hence the 5% busy threshold mentioned above.

Poor NFS response times may be hard to see.

Waitingfor a network-mounted file system to respond is not counted in the sameway as waiting for a local disk. The system will appear to be idle whenit is really in a network I/O wait state. Usenfsstat -m or (if you are running Solaris 2.6)iostat -xnto find out which NFS® server is likely to be the problem, go to it,and check its disk performance. You should look at the NFS operationmix withnfsstat on both the clientand server and, if writes are common or the server’s disk is too busy,configure a Prestoserve or NVRAM at the server. A 10-Mbit Ethernet willbe overloaded very easily; the network should be replaced with 100-MbitEthernet, preferably in switched full-duplex mode. See the SMCC NFS Server Performance and Tuning Guideon the Solaris SMCC Hardware AnswerBook® CD, or look for it on http://docs.sun.com.

Avoid the commonvmstat misconceptions.

When you look atvmstat, pleasedon’t waste time worrying about where all the RAM has gone. After awhile, the free list will stabilize at around 3% of the total memoryconfigured[2]. The system stops bothering to reclaim memory above this level, even when you aren’t running anything. See “Understanding vmstat and sar Output” on page 320. You can also ignore the third “w” column, which is a count of how many idle processes are currently swapped out. Of course, you must also remember to ignore the first line output byvmstat.

[2] The actual level depends on the version of the operating system you are running; it may be fixed at a megabyte or less.

Code View:Scroll/Show All
% vmstat 5 
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr f0 s0 s1 s5 in sy cs us sy id
0 0 0 9760 16208 0 4 6 2 5 0 0 0 141 35 19 149 898 99 6 2 92
0 0 12 212672 1776 0 1 3 0 0 0 0 0 1 0 0 105 140 50 0 0 99


Don’t panic when you see page-ins and page-outs invmstat.

Theseactivities are normal since all filesystem I/O is done by means of thepaging process. Hundreds or thousands of kilobytes paged in and pagedout are not a cause for concern, just a sign that the system is workinghard.

Use page scanner “sr” activity as your RAM shortage indicator.

Whenyou really are short of memory, the scanner will be runningcontinuously at a high rate (over 200 pages/second averaged over 30seconds). If it runs in separated high-level bursts and you are runningSolaris 2.5 or earlier, make sure you have a recent kernel patchinstalled—an updated paging algorithm in Solaris 2.5.1 was backportedto previous releases. See “Understanding vmstat and sar Output” on page 320.

Look for a long run queue (vmstat procs r).

Ifthe run queue or load average is more than four times the number ofCPUs, then processes end up waiting too long for a slice of CPU time.This waiting can increase the interactive response time seen by users.Add more CPU power to the system (see “Monitoring Processors” on page 229).

Look for processes blocked waiting for I/O (vmstat procs b).

Ablocked process is a sign of a disk bottleneck. If the number ofblocked processes approaches or exceeds the number in the run queue,tune your disk subsystem. Whenever there are any blocked processes, all CPU idle time is treated as wait for I/O time! Thevmstat command correctly includes wait for I/O in its idle value, but it can be viewed withiostat orsar.If you are running database batch jobs, you should expect to have someblocked processes, but you can increase batch throughput by removingdisk bottlenecks.

Check for CPU system time dominating user time.

Ifthere is more system time than user time and the machine is not an NFSserver, you may have a problem. NFS service is entirely inside thekernel, so system time will normally dominate on an NFS server. To findout the source of system calls, see “Tracing Applications” on page 155. To look for high interrupt rates and excessive mutex contention, see “Use of mpstat to Monitor Interrupts and Mutexes” on page 236.

Watch out for processes that hog the CPU.

Processessometimes fail in a way that consumes an entire CPU. This type offailure can make the machine seem sluggish. Watch for processes thatare accumulating CPU time rapidly when you don’t think they should be.Useps or see “pea.se” on page 488. If you find that the system processfsflush is using a lot of CPU power, see the description of the kernel variables “tune_t_fsflushr and autoup” on page 339.

Cold Start Procedure

Isee a lot of questions from users or administrators who have decidedthat they have a performance problem but don’t know where to start orwhat information to provide when they ask for help. I have seen emailfrom people who just say “my system is slow” and give no additionalinformation at all. I have also seen 10-megabyte email messages with 20attachments containing days ofvmstat,sar, andiostatreports, but with no indication of what application the machine issupposed to be running. In this section, I’ll lead you through theinitial questions that need to be answered. This may be enough to getyou on the right track to solving the problem yourself, and it willmake it easier to ask for help effectively.

  1. What is the business function of the system?

    What is the system used for? What is its primary application? It could be a file server, database server, end-user CAD workstation, internet server, embedded control system.

  2. Who and where are the users?

    How many users are there, how do they use the system, what kind of work patterns do they have? They might be a classroom full of students, people browsing the Internet from home, data entry clerks, development engineers, real-time data feeds, batch jobs. Are the end users directly connected? From what kind of device?

  3. Who says there is a performance problem, and what is slow?

    Are the end users complaining, or do you have some objective business measure like batch jobs not completing quickly enough? If there are no complaints, then you should be measuring business-oriented throughput and response times, together with system utilization levels. Don’t waste time worrying about obscure kernel measurements. If you have established a baseline of utilization, business throughput, and response times, then it is obvious when there is a problem because the response time will have increased, and that is what drives user perceptions of performance. It is useful to have real measures of response times or a way to derive them. You may get only subjective measures—“it feels sluggish today”—or have to use a stopwatch to time things. See “Collecting Measurements” on page 48.

  4. What is the system configuration?

    How many machines are involved, what is the CPU, memory, network, and disk setup, what version of Solaris is running, what relevant patches are loaded? A good description of a system might be something like this: an Ultra2/2200, with 512 MB, one 100-Mbit switched duplex Ethernet, two internal 2-GB disks with six external 4-GB disks on their own controller, running Solaris 2.5.1 with the latest kernel, network device, and TCP patches.

  5. What application software is in use?

    If the system is just running Solaris services, which ones are most significant? If it is an NFS server, is it running NFS V2 or NFS V3 (this depends mostly upon the NFS clients). If it is a web server, is it running Sun’s SWS, Netscape, or Apache (and which version)? If it is a database server, which database is it, and are the database tables running on raw disk or in filesystem tables? Has a database vendor specialist checked that the database is configured for good performance and indexed correctly?

  6. What are the busy processes on the system doing?

    A system becomes busy by running application processes; the most important thing to look at is which processes are busy, who started them, how much CPU they are using, how much memory they are using, how long they have been running. If you may have a lot of short-lived processes, the only way to catch their usage is to use system accounting; see “Using Accounting to Monitor the Workload” on page 48. For long-lived processes, you can use the ps command or a tool such as top, proctool, or symon; see “Sun Symon” on page 35. A simple and effective summary is to use the old Berkeley version of ps to get a top ten listing, as shown in Figure 1-1. On a large system, there may be a lot more than ten busy processes, so get all that are using significant amounts of CPU so that you have captured 90% or more of the CPU consumption by processes.

    Figure 1-1. Example Listing the Busiest Processes on a System
    Code View: Scroll / Show All
    % /usr/ucb/ps uaxw | head 
    USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
    adrianc 2431 17.9 22.63857628568 ? S Oct 13 7:38 maker
    adrianc 666 3.0 14.913073618848 console R Oct 02 12:28 /usr/openwin/bin/X :0
    root 6268 0.2 0.9 1120 1072 pts/4 O 17:00:29 0:00 /usr/ucb/ps uaxw
    adrianc 2936 0.1 1.8 3672 2248 ?? S Oct 14 0:04 /usr/openwin/bin/cmdtool
    root 3 0.1 0.0 0 0 ? S Oct 02 2:17 fsflush
    root 0 0.0 0.0 0 0 ? T Oct 02 0:00 sched
    root 1 0.0 0.1 1664 136 ? S Oct 02 0:00 /etc/init -
    root 2 0.0 0.0 0 0 ? S Oct 02 0:00 pageout
    root 93 0.0 0.2 1392 216 ? S Oct 02 0:00 /usr/sbin/in.routed -q


    Unfortunately, some of the numbers above run together: the %MEM field shows the RSS as a percentage of total memory. SZ shows the size of the process virtual address space; for X servers this size includes a memory-mapped frame buffer, and in this case, for a Creator3D the frame buffer address space adds over 100 megabytes to the total. For normal processes, a large SZ indicates a large swap space usage. The RSS column shows the amount of RAM mapped to that process, including RAM shared with other processes. In this case, PID 2431 has an SZ of 38576 Kbytes and RSS of 28568 Kbytes, 22.6% of the available memory on this 128-Mbyte Ultra. The X server has an SZ of 130736 Kbytes and an RSS of 18848 Kbytes.

  7. What are the CPU and disk utilization levels?

    How busy is the CPU overall, what’s the proportion of user and system CPU time, how busy are the disks, which ones have the highest load? All this information can be seen with iostat -xc ( iostat -xPnce in Solaris 2.6—think of “expense” to remember the new options). Don’t collect more than 100 samples, strip out all the idle disks, and set your recording interval to match the time span you need to instrument. For a 24-hour day, 15-minute intervals are fine. For a 10-minute period when the system is busy, 10-second intervals are fine. The shorter the time interval, the more “noisy” the data will be because the peaks are not smoothed out over time. Gathering both a long-term and a short-term peak view helps highlight the problem areas. One way to collect this data is to use the SE toolkit—a script I wrote, called virtual_adrian.s, (See “The SymbEL Language” on page 505, and “virtual_adrian.se and /etc/rc2.d/S90va_monitor” on page 498.) writes out to a text-based log whenever it sees part of the system (a disk or whatever) that seems to be slow or overloaded.

  8. What is making the disks busy?

    If the whole disk subsystem is idle, then you can skip this question. The per-process data does not tell you which disks the processes are accessing. Use the df command to list mounted file systems, and use showmount to show which ones are exported from an NFS server; then, figure out how the applications are installed to work out which disks are being hit and where raw database tables are located. The swap -l command lists swap file locations; watch these carefully in the iostat data because they all become very busy with paging activity when there is a memory shortage.

  9. What is the network name service configuration?

    If the machine is responding slowly but does not seem to be at all busy, it may be waiting for some other system to respond to a request. A surprising number of problems can be caused by badly configured name services. Check /etc/nsswitch.conf and /etc/resolv.conf to see if DNS, NIS, or NIS+ is in use. Make sure the name servers are all running and responding quickly. Also check that the system is properly routing over the network.

  10. How much network activity is there?

    You need to look at the packet rate on each interface, the NFS client and server operation rates, and the TCP connection rate, throughput, and retransmission rate. One way is to run this twice, separated by a defined time interval.

    % netstat -i; nfsstat; netstat -s

    Another way is to use the SE toolkit’s nx.se script that monitors the interfaces and TCP data along the lines of iostat -x.

    % se nx.se 10 
    Current tcp RtoMin is 200, interval 10, start Thu Oct 16 16:52:33 1997
    Name Ipkt/s Opkt/s Err/s Coll% NoCP/s Defr/s tcpIn tcpOut Conn/s %Retran
    hme0 212.0 426.9 0.00 0.00 0.00 0.00 65 593435 0.00 0.00
    hme0 176.1 352.6 0.00 0.00 0.00 0.00 53 490379 0.00 0.00
  11. Is there enough memory?

    When an application starts up or grows or reads files, it takes memory from the free list. When the free list gets down to a few megabytes, the kernel decides which files and processes to steal memory from, to replenish the free list. It decides by scanning pages, looking for ones that haven’t been used recently and paging out their contents so that the memory can be put on the free list. If there is no scanning, then you definitely have enough memory. If there is a lot of scanning and the swap disks are busy at the same time, you need more memory. If the swap disks are more than 50% busy, you should make swap files or partitions on other disks to spread the load and improve performance while waiting for more RAM to be delivered. You can use vmstat or sar -g to look at the paging system, or virtual_adrian.se will watch it for you, using the technique described in “RAM Rule” on page 456.

  12. What changed recently and what is on the way?

    It is always useful to know what was changed. You might have added a lot more users, or some event might have caused higher user activity than usual. You might have upgraded an application to add features or installed a newer version. Other systems may have been added to the network. Configuration changes or hardware “upgrades” can sometimes impact performance if they are not configured properly. You might have added a hardware RAID controller but forgotten to enable its nonvolatile RAM for fast write capability. It is also useful to know what might happen in the future. How much extra capacity might be needed for the next bunch of additional users or new applications?

Configuration and Tuning Recipes

Therest of this book gives you the information you need in order tounderstand a lot about how SPARC systems running Solaris work and aboutthe basic principles involved in performance and tuning. Probably allyou really want right now is to be told what to do or what to buy andhow to set it up. This section just tells you what to do, with references to the rest of the book if you want to find out why.If you decide that you want to vary the recipes and do somethingdifferent, you should really read the rest of the book first! For muchmore information on configuration issues, you should definitely get acopy of the book Configuration and Capacity Planning for Solaris Servers, by Brian Wong (Sun Press).

Theintention behind these recipes is to provide situation-oriented advice,which gathers together information about how to use a system ratherthan focusing on a particular subsystem in a generic sense.

Single-User Desktop Workstation Recipe

Local Disks and NFS Mounts

Myrecommended configuration is to NFS-mount home, mail, and applicationsprograms directories from one or more workgroup servers. Configure asingle, local disk to have two partitions, one for the operating systemand one for swap space. It is easy to overload a single swap disk, soif you have more than one local disk, split any swap partitions orfiles evenly across all the disks (keep one clear for cachefs ifnecessary; see below).

Swap Space

Mostapplication vendors can tell you how much swap space their applicationneeds. If you have no idea how much swap space you will need, configureat least 128 Mbytes of virtual memory to start with. It’s easy to addmore later, so don’t go overboard. With Solaris 2, the swap partitionshould be sized to top up the RAM size to get to the amount of virtualmemory that you need[3];e.g., 96-Mbyte swap with 32-Mbyte RAM, 64-Mbyte swap with 64-Mbyte RAM,no swap partition at all with 128 or more Mbytes of RAM. If yourapplication vendor says a Solaris 2 application needs 64 Mbytes of RAMand 256 Mbytes of swap, this adds up to 320 Mbytes of virtual memory.You could configure 128 Mbytes of RAM and 192 Mbytes of swap instead.If you run out of swap space, make a swap file (I put them in/swap) or add more RAM. Older systems tend to run smaller

[3] See “Virtual Memory Address Space Segments” on page 325 for a description of the unique Solaris 2 swap system.

applications,so they can get away with less virtual memory space. Later systemsrunning the Common Desktop Environment (CDE) window system will need alot more virtual memory space than systems running OpenWindows™.

File Systems and Upgrades

Make the rest of the disk into one big root partition that includes/usr,/opt,/var, and/swap. The main reason for doing this is to pool all the free space so that you can easily useupgrade or install to move up to the next OS release without running out of space in one of the partitions. It also prevents/var from overflowing and makes it easy to have a/swap directory to hold extra swap files if they are needed. In Solaris 2,/tmp uses the RAM-basedtmpfs by default; themount /tmp command should be uncommented in/etc/rc.local to enable it for SunOS 4.X.

Solaris2 systems should be automatically installed from a JumpStart™ installserver that includes a post-install script to set up all the localcustomizations. Since the disk can be restored by means of JumpStartand contains no local users files, it is never necessary to back it upover the network. A JumpStart install is much less frequent than anetwork backup, so its use aids good performance of the network. Auseful tip to free up disk space for upgrades is to remove any swapfiles before you run the upgrade, then re-create the swap fileafterwards.

# swap -d /swap/swapfile 
# rm /swap/swapfile
comment out the swapfile in /etc/vfstab
shutdown and run the upgrade
# mkfile 100M /swap/swapfile
# swap -a /swap/swapfile
add the swapfile back into /etc/vfstab

Applications and Cachefs

Ifpossible, NFS-mount the applications read-only to avoid the write-backof file access times, which is unwanted overhead. Configure the cachefile system for all application code mount points. First, make/cache,then mount all the application directories, using the same cache. Ifyou use large applications, check the application file sizes; ifanything you access often is over 3 Mbytes, increase the maxfilesize parameter for the cache withcfsadmin.

Donot use cachefs for mail directories. It might be useful for homedirectories if most files are read rather than written—try it with andwithout to see. Cache loads when a large file is read for the firsttime can overload the disk. If there is more than one disk on thesystem, then don’t put any cache on the same disk as any swap space.Swap and cache are often both busy at the same time when a newapplication is started. The cache works best for files that don’tchange often and are read many times by the NFS client.

Ifyour application is very data-intensive, reading and writing largefiles (as often occurs with EDA, MCAD, and Earth Resourcesapplications), you are likely to need an FDDI or 100-Mbit Fast Ethernetinterface. If large files are written out a lot, avoid cachefs for thatfile system. Figure 1-2 shows how to set up and use cachefs.

Figure 1-2. Setting up Cachefs and Checking for Large Files
# cfsadmin -c /cache 
# find /net/apphost/export/appdir
-size +3000k -ls
105849 3408 -rwxr-xr-x 1 root bin 3474324 Mar 1 13:16
/net/apphost/export/appdir/SUNWwabi/bin/wabiprog
# cfsadmin -u -o maxfilesize=4 /cache
cfsadmin: list cache FS information
maxblocks 90%
minblocks 0%
threshblocks 85%
maxfiles 90%
minfiles 0%
threshfiles 85%
maxfilesize 4MB
# mount -F cachefs -o backfstype=nfs,cachedir=/cache
apphost:/export/appdir
/usr/appdir

Example Filesystem Table

The filesystem mount table shown in Figure 1-3 is for a system with a single local disk. Application program code is mounted read-only from apphost, using cachefs. Mail is mounted from mailhost.Home directories are automounted and so do not appear in this table.This system has a swap partition, and an additional swap file has beenadded. Direct automount mappings can be used to mount applications(including the cachefs options) and mail.

Figure 1-3. Sample/etc/vfstab for Workstation Recipe
Code View: Scroll / Show All
#device             device               mount      FS      fsck  mount    mount 
#to mount to fsck point type pass at boot options
/proc - /proc proc - no -
fd - /dev/fd fd - no -
swap - /tmp tmpfs - yes -
/dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 / ufs 1 no -
/dev/dsk/c0t3d0s1 - swap - no -
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
/swap/swapfile - - swap - no -
apphost:/usr/dist /cache /usr/dist cachefs 3 yes ro,backfstype=
nfs,cachedir=/
cache
mailhost:/var/mail - /var/mail nfs - yes rw,bg


Kernel Tuning

Since this setup is the most common, Solaris is already well tuned for it, so you don’t need to set anything.

Workgroup Server Recipe

Workgroupservers provide reliable file storage and email and printer support ata departmental level. In a Unix environment, NFS file services areprovided; in a PC environment, Microsoft SMB, Novell NetWare, and Applefile service protocols need to be supported, commonly using SyntaxTotalNet or Samba to provide the service. It has also become common toprovide web services on the home file server, so that it also acts as ahome page server for the users and the projects associated with thedepartment. Network Computers such as Sun’s JavaStation™ need a bootserver and backend application support that can be combined with theother functions of a workgroup server. If a sophisticated proxy cachingweb server hierarchy is implemented across the enterprise, then theworkgroup server may also be the first-level cache.

Thissounds like a complex mixture, but over the last few years, individualsystem performance has increased and several functions have coalescedinto a single system. Solaris has several advantages over WindowsNT-based workgroup servers in this respect. The stability andavailability requirements of a server increase as more and more usersand services depend upon it, and Solaris is far more robust than NT.Solaris can also be reconfigured and tuned without requiring rebooting,whereas almost every change to an NT server requires a reboot that addsto the downtime. NT has also picked up a bad reputation for poorperformance when several workloads are run together on a system.Solaris not only time-shares many competing workloads effectively, italso scales up to far larger systems, so more work can be combined intoa single system. Taking all this into account, a typical new Solarisworkgroup server installation is likely to replace several diverseexisting servers. Administrative and installation simplicity isimportant, the hardware can be configured to be resilient to commonfailures, and Solaris is well tested and reliable, so a single serveris appropriate. The complexity and rigorous procedures needed to run ahigh-availability cluster will rule it out for all but the biggestinstallations.

Workload Mixture

Fileservice is going to be the largest component of the workload. The basicSolaris email services are efficient enough to support many hundreds ofusers, and by the use of Solstice™ Internet Mail Server (SIMS), tens ofthousands of users can be hosted on a single machine, so a workgroupenvironment will not stress the email system. The web server load willalso be light, with occasional short bursts of activity.

NFSrequests will always run in the kernel at a higher priority than thatof user-level programs, so a saturated NFS server is unlikely toprovide high performance for other uses. However, NFS is now soefficient and system performance is so high that it is, in practice,impossible to saturate even a small server. Serving PC-orientedprotocols by using Samba is less efficient than NFS but is stillcomparable to a dedicated PC file server. A uniprocessor UltraSPARC™server can saturate a 100-Mbit network with HTTP, NFS and/or SMBtraffic if there is enough demand from the clients.

RAM Requirements

Dedicatedfile and mail servers do not need much RAM. The kernel will beconfigured to be a little larger than normal, but the main, active userprogram will besendmail. The rest ofRAM is a cache for the file systems. Unix-based NFS clients do a greatdeal of caching, so they don’t usually ask for the same data more thanonce. Home directories and mail files are specific to a single user, sothere will not often be multiple reads of the same data from differentmachines. Anything that is constantly reread from the server bymultiple workstations is a prime candidate for setting up with cachefson the clients.

NFSclients that are PCs running MS-Windows™ or MacOS™ put much less loadon an NFS server than a Unix client does, but they require fastresponse times for short bursts of activity and do little cachingthemselves. It may be worth having extra memory on the server to tryand cache requests to common files.

Allow16 Mbytes of RAM for the kernel, printing, and other Solaris daemons,then 64 Mbytes for each fully loaded FDDI or 100-Mbit Ethernet beingserved.

Write Acceleration

Theproblem of making writes go fast on file servers merits specialconsideration. As an example, consider a single NFS write. The servermust update the inode block containing the last modified time, theindirect block (and possibly a double indirect block), and the datablock itself. The inodes and indirect blocks may not be near the datablock, so three or four random seeks may be required on the same diskbefore the server can confirm that the data is safely written. Toaccelerate this process, all the filesystem metadata writes (inodes andindirect blocks) are quickly logged somewhere as a sequential stream.The log is not read back very often because the original data is stillin memory and it is eventually flushed to its proper location. If thesystem crashes, the data in memory is lost, and on reboot, the log isread to quickly restore the filesystem state.

There are several ways to implement the log.

  • The oldest implementation is a nonvolatile memory board known as a Prestoserve™ that intercepts synchronous filesystem writes. These boards are still available for use in small systems, but the SBus device driver can only handle systems that have a single SBus, so large servers are not supported.

  • The SPARCstation™ 10, 20, SPARCserver™ 1000, and SPARCcenter™ 2000 systems accept a special nonvolatile memory SIMM (NVSIMM) that uses a Prestoserve driver. A problem with this solution is that the data is stored inside a system, so if it is configured as a dual-failover, high-availability system, the logged data cannot be accessed when failover occurs. However, it is the fastest option for a single system.

  • The current generation of Ultra Enterprise server products does not have an NVSIMM option, and all the newest machines have a PCIbus rather than an SBus. The log is stored in the disk subsystem so it can be shared in an HA setup. The disk subsystem could include a controller with its own NVRAM (remember to enable fast writes on the SPARCstorage™ Array and similar products), or at the low end, a dedicated log disk should be used with a product such as Solstice DiskSuite™. The log disk can be used to accelerate several file systems, but it should be completely dedicated to its job, with no distractions and a small partition (no more than 100 Mbytes) to make sure that the disk heads never have to seek more than a few cylinders.

  • The current 7200rpm disks have very high sequential data rates of about 10 Mbytes/s and handle logging well. You just have to ignore the unused few gigabytes of data space: don’t make a file system on it.

Therecommended disk configuration is to have a single, large, rootpartition like the workstation recipe with the exception of the/var directories./var/mailshould be on a separate disk partition because it needs to beaccelerated. Mail programs on the NFS clients rewrite the entire mailfile when the user saves changes; the time this rewrite takes is verynoticeable, and a log disk or Prestoserve speeds it up a great deal.The/var file system should be big enough to hold a lot of mail (several megabytes per user account). You will need space in/var/spoolfor outgoing mail and printer jobs (at least 10 or 20 megabytes,sometimes much more). Home directories should be striped over severaldisks and configured with a log disk.

Network Configurations

Atypical setup for a commercial workgroup is based on a 100-Mbitconnection from a server to a network switch that feeds multiple,independent 10-Mbit networks with client PCs or Network Computers.Engineering workstations should be using 100-Mbit client connections ina switched full-duplex network infrastructure. Don’t skimp on networkbandwidth if you care about performance—an Ultra 1/170 is fast enoughto saturate a 100-Mbit connection. The server may need to use multiple100-Mbit networks to feed all its clients. High bandwidth connectionsmay be done by trunking, where the Quadruple Fast Ethernet (QFE) cardaggregates its ports to behave like a 400-Mbit duplex connection.

Anupcoming option is gigabit Ethernet. Sun’s initial product is a switchthat takes a gigabit connection from a server and feeds multipleswitched 100-Mbit ports to its clients. Some high-end sites are using155-Mbit ATM cards in the client systems and 622-Mbit cards from theserver to an ATM switch, or back-to-back ATM622 cards to provide a highbandwidth link between a pair of servers. A benefit of ATM is that thedefault maximum IP packet size is 9 Kbytes when used as a LAN. This isfar more efficient that the 1500 byte packets used by ethernet, but isonly effective if there is an ATM connection all the way from theserver to the clients.

CPU Loading for NFS Servers

ASuperSPARC™ can handle four or five 10-Mbit Ethernets. Each fullyloaded 100-Mbit Ethernet or FDDI should have two SuperSPARC processorsor one UltraSPARC to handle the network, NFS protocol, and disk I/Oload.

Disk Configurations

Sincean NFS lookup or read can involve two trips over the network from theclient as well as a disk I/O, getting good perceived performance fromthe server requires a low latency disk subsystem that averages betterthan 40 ms service time. Use Solstice DiskSuite or SPARCstorage Manager(VxVM) to stripe file systems so that the load is evenly balancedacross as many independent disks as possible. You will get six timesbetter performance from a stripe of six 4.3-Gbyte disks than you willget from one 23-Gbyte disk. For good performance, configure four to sixdisks in the stripe for each loaded 100-Mbit network. Thedata-intensive clients that need faster networks tend to do moresequential accesses and so get more throughput from the disks than thetypical random access load. The logging file system supported bySolstice DiskSuite is especially useful with multi-gigabyte filesystems because it avoids a time-consuming, full filesystem check.

Setting the Number of NFS Threads

InSunOS™ 4.X, each NFS daemon appears as a separate process, although theNFS daemons do all their work in the kernel. In Solaris 2, a single NFSdaemon process and a number of kernel threads do basically the samejob. Configure two threads per active client machine, or 32 perEthernet. The default of 16 is suitable only for casual NFS use, andthere is little overhead from having several hundred threads, even on alow-end server. Since kernel threads all use the same context, there islittle thread switch overhead in either SunOS 4 or Solaris 2.

For example, a server with two Ethernets running SunOS 4 would neednfsd 64 to be set in/etc/rc.local and, when running Solaris 2, would need/usr/lib/nfs/nfsd-a 64 to be set in the file/etc/init.d/nfs.server (which is hardlinked to/etc/rc3.d/S15nfs.server).

Kernel Tuning

NFSservers don’t often need a lot of RAM but do need large, name lookupcaches, which are sized automatically, based on the RAM size in Solaris2. The two main changes recommended are to make the inode and directoryname lookup caches have at least 8,000 entries. See “Vnodes, Inodes, and Rnodes” on page 360 for more details.

Thedefault size of both caches in Solaris 2 will be (RAM-2)*17+90. For a64-Mbyte system, this size works out at 1144. If you have more than 512Mbytes of RAM, the cache is big enough (see Figure 1-4).

Figure 1-4. Sample/etc/system Kernel Tuning Parameters for a 128-Mbyte Solaris 2 NFS Server
set ncsize=8000              for NFS servers with under 512 MB RAM 
set ufs:ufs_ninode=8000 for NFS servers with under 512 MB RAM

Database Server Recipe

Iam not a database specialist. You should refer to the many good bookson database performance tuning for each of the main database vendors.Here, I’ll give some general recommendations on system considerationsthat apply to most databases.

Workload Mixture

Systemperformance scales to much higher levels than most databases require.Whereas in the past we would recommend that a separate system be usedfor each database, it is now more common to consolidate severalworkloads onto a single, large system. With a greater dependency on asmaller number of systems, configuring for high availability is moreimportant. Failover and parallel database clusters are becoming muchmore common.

Thetwo main categories of database usage are online transactionprocessing, such as order entry typified by the TPC-C benchmark, andcomplex analytical queries made against a data warehouse, as typifiedby the TPC-D benchmark. Take the time to read the detailed reports oftested configurations that are available on the TPC web site at http://www.tpc.org.A lot of effort goes into setting up very high performanceconfigurations for those tests, and you can copy some of the techniquesyourself.

RAM Requirements

Databaseservers need a lot of RAM. Each database and application vendor shouldprovide detailed guidelines on how to configure the system. If you haveno other guidance, I will suggest a starting point that has been usedwith Oracle®. For the database back end, allow 64 Mbytes of RAM for thekernel, Solaris daemons, and database backend processes. Then, allowanother 2 Mbytes for each user if time-shared, or 512 Kbytes per useron a pure back end. Finally, add memory for the shared global area.

Log -Based and Direct I/O File Systems

Usethe log-based UFS file system supported by Solstice DiskSuite if yourdatabase tables must be stored in UFS file systems rather than on rawdisk. Oracle, Informix, and Sybase work best with raw disk, but otherdatabases such as Ingres and Progress have to use file systems. WithSolaris 2.6 there is a direct I/O option that can be used to provideraw access to a file in the file system. This option is not as good asraw but can greatly reduce memory demands.

Network Loading for Database Servers

Thenetwork activity caused by SQL is so application dependent that it ishard to give general guidelines. You will need to do some work with thesnoop command or a network analyzer ona live system to work out the load. Some applications may work wellover a dial-up modem, whereas others will need 100-Mbit networks or mayonly work well when the client and server are on the same machine.

CPU Loading

CPUloading cannot be estimated easily; any guidelines provided in thisbook would lead to a mixture of overconfigured and underconfiguredsystems. Database sizing involves too many variables and is outside thescope of this book. I recommend that you read Brian Wong’s book Configuration and Capacity Planning for Solaris Servers,which covers the subject in depth. Sizing data for common applicationand database combinations is being generated within Sun for use bysystems engineers and resellers. Database vendors often placerestrictions on the publication of performance-related informationabout their products.

Disk Configurations

Gettinggood, perceived performance from the server requires a low-latency disksubsystem. For good write latency, NVRAM is normally configured in thedisk controller subsystem. Use Solstice DiskSuite, SPARCstorage Manageror a hardware RAID system such as Sun’s RSM2000 to stripe file systemsso that the load is evenly balanced across as many independent disks aspossible. You will get six times better performance from a stripe ofsix 4.3-Gbyte disks than you will from one 23-Gbyte disk. Extremelylarge disk configurations are now common, and several thousand disksmay be connected to a single system. Make sure you are familiar withthe newiostat options in Solaris 2.6, described in “Output Formats and Options for iostat” on page 183. You can now get an inventory of the disk configuration and look for error counts on a drive-by-drive basis.

Setting the Shared Memory Size

Sharedmemory size is often set too small. On a database using raw disks, thesize needs to be set higher than on a system with UFS. The effect ofUFS is to provide additional data buffering and a duplicate copy of thedata in shared memory. This improves read performance and can helpsequential table scan rates when UFS prefetching is more aggressivethan the database’s own prefetching. The drawback is that more RAM andCPU time is used. When running raw, you can choose to run with lesscaching and use less RAM, or you can go for higher performance by usinga much bigger shared memory area and similar total RAM usage. As afirst approximation, use half of your total main memory as sharedmemory, then measure the database to see if it is oversized or toosmall and adjust as necessary.

Kernel Tuning

Databasestend to use lots of shared memory and semaphore settings. These do notaffect performance; as long as shared memory and semaphore settings arebig enough, the programs will run. Each database vendor supplies itsown guidelines. See “tune_t_fsflushr and autoup” on page 339 for advice on tuning thefsflush daemon. Figure 1-5 presents an example.

Figure 1-5. Example/etc/system Entries for a Database Server
* example shared memory settings needed for database 
set shmsys:shminfo_shmmax=268435456
set shmsys:shminfo_shmmni=512
set shmsys:shminfo_shmseg=150
set semsys:seminfo_semmap=350
set semsys:seminfo_semmni=350
set semsys:seminfo_semmns=1000
set semsys:seminfo_semmnu=700
set semsys:seminfo_semume=100
* keep fsflush from hogging a CPU
set autoup=240

Multiuser Server with ASCII or X Terminals Recipe

Thereis little difference in kind between dumb ASCII terminals, proprietarygraphics terminals connected over serial ports, IBM 3270 terminalsconnected over SNA, and X terminals. The terminal understands a fixed,low-level display protocol and has varying amounts of built-infunctionality, but all application processing is done on time-sharedmultiuser servers.

What’s the Difference between Client/Server and Time-shared Configurations?

The term client/serveris sometimes used to describe a time-shared system with X terminals.Personally, I don’t like this use of the term because I think it ismisleading. The primary extra capability of an X terminal over othertypes of terminals is that it can make direct connections to manyservers at the same time. As an example, consider upgrading atime-shared server by replacing ASCII terminals with X terminalsrunning the same application in a terminal emulator window. There isstill no separate client processing going on. An upgrade toclient/server would be to have users running part or all of theapplication on a SPARCstation or PC on their desk, with anapplication-specific protocol linking them to a database or an NFSserver back end.

Performance Is Usually a Problem on Time-shared Systems

Froma performance point of view, time-shared systems are usually a sourceof problems. Part of the problem is that Unix assumes that its userswill be well behaved and has few ways to deal with users or programsthat intentionally or accidentally take an unfair share of the system.It is sometimes known as a Denial-of-Service Attack when a user tries to consume all the CPU, RAM, disk space[4],swap space, or overflow kernel tables. Even unintentional overload cancause serious problems. Instead, if you can configure a client/serversystem where users get their own SPARCstation or PC to use and abuse,then it is much harder for one user to affect the performance for therest of the user community.

[4] If this is a problem, then you can use the standard BSD Unix disk quotas system in Solaris.

TheSoftway ShareII resource management system has been ported to Solaris2. This product was developed to solve the problem by allocating fairshares of system resources to users and groups of users. See http://www.softway.com.au for more details.

Anotherproblem occurs when applications that were developed for use onhigh-end SPARCstations are installed on a server and used via Xterminals. If the application runs well on a low-powered machine like aSPARCstation 5, then sharing a more powerful machine makes sense. Ifthe application is normally used on an Ultra 2, then don’t expect manycopies to run simultaneously on an X-terminal server.

Xterminals work very well in a general office environment where mostusers spend a small proportion of their time actually working at the Xterminal. X terminals don’t work well if all the users are active allthe time. Try to avoid configuring data entry or telephone salessweatshops or student classrooms full of X terminals; the backendsystem needed to support them will often be large and expensive orseriously underpowered. Some software licensing practices can make asingle, large system with X terminals cheaper than lots of smallersystems. Hopefully, more software vendors will convert to floatingper-user licenses for software.

Thereis a large movement in the industry to replace all kinds of terminalswith Network Computers running Java, so old-style terminals arebeginning to die out.

Internet and Java Server Recipes

Thewhole subject of Internet and intranet web servers, proxy caching webservers, and servers for Java-based applications is covered in detailin Chapter 4, “Internet Servers,” and Chapter 5,“Java Application Servers.” The simplest sizing recipe is based on twokey things. The first is that the minimum baseline for good web serverperformance is Solaris 2.6 and an up-to-date, efficient web server. Thesecond is that given efficient software, it will be easy to saturateyour network, so base all your system sizing on how much networktraffic your switches, backbones, and wide-area links can handle.