Let's begin our discussion of network I/O monitoring by revisiting our old standby, netstat, which displays overall network statistics. Probably one of the most common commands you will type is netstat -in:
Network — The actual network address to which the interface connects
Address — Media Access Control (MAC) or IP address
Ipkts — Total number of packets received by the interface
Ierrs — Number of errors reported back from the interface
Opkts — Number of packets transmitted from the interface
Oerrs — Number of error packets transmitted from the interface
Coll — Number of collisions on the adapter (if you're using Ethernet, you won't see anything here)
Another handy netstat flag is -m.This option lets you view the kernel memory allocation statistics,including mbuf memory requests (and buffer size), amount of memory inuse, and failures by CPU:
The entstatoutput provides a potpourri of information. You won't see manycollisions because you'll probably be working in a switchedenvironment. Look for transmit errors, and make sure they're notincreasing too fast.
Youneed to learn to troubleshoot collision and error problems before youeven begin to think about tuning. As an alternative, you can use netstat -v which provides similar information.
The netpmoncommand reports information about CPU usage as it relates to thenetwork. It also provides data about network device driver I/O,Internet socket calls, and various other statistics.
Similar to its other trace brethren, tprof and filemon, netpmon starts a trace and runs in the background until you stop it with the trcstop command. I like netpmonbecause it really gives you a detailed overview of network activity andalso captures data for trending and analysis (although it's not asuseful as nmon for the latter purpose). In the following example, we'll use a trace buffer size of 2 million bytes:
Asyou can see, little overall network I/O activity was going on duringthis time. The top section of the output is most important. It helpsyou gain an understanding of which processes are eating up network I/Otime.
The lsattr command, which we used in Chapter 13to view hardware parameters, is another tool you'll use frequently todisplay statistics about your interfaces. The attributes reported bythis command are configured using either the chdev or the no command. Let's display the driver parameters using lsattr:
# lsattr -El en0
alias4 IPv4 Alias including Subnet Mask True alias6 IPv6 Alias including Prefix Length True arp on Address Resolution Protocol (ARP) True authority Authorized Users True broadcast Broadcast Address True mtu 1500 Maximum IP Packet Size for This Device True netaddr Internet Address True netaddr6 IPv6 Internet Address True netmask Subnet Mask True prefixlen Prefix Length for IPv6 Internet Address True remmtu 576 Maximum IP Packet Size for REMOTE Networks True rfc1323 Enable/Disable TCP RFC 1323 Window Scaling True security none Security Level True state detach Current Interface Status True tcp_mssdflt Set TCP Maximum Segment Size True tcp_nodelay Enable/Disable TCP_NODELAY Option True tcp_recvspace Set Socket Buffer Space for Receiving True tcp_sendspace Set Socket Buffer Space for Sending True
Sometimes, I also like to use the spray command to troubleshoot possible problems (although oftentimes this command is blocked because it's not very secure). The spraycommand sends a one-way stream of packets from your host to the remotehost machines and reports the number of packets dropped as well as thenumber of packets transferred:
# /usr/etc/spray lpar8test -c 2000 -l 1400 -d 1
sending 2000 packets of length 1402 to lpar8test ... 34 packets (1.700%) dropped by lpar8test 23667 packets/second, 33181234 bytes/second
Inthe preceding example, 2,000 packets were sent to the lpar8test host,with a delay of one microsecond. Each packet consisted of 1,400 bytes.
Before using spray, make sure the sprayd daemon isn't commented out of the inetd daemon (the default configuration in AIX), and don't forget to refresh inetd. If you're seeing a substantial number of dropped packets, that obviously is not good.
14.2. Monitoring NFS
This section covers the use of the nmon, topas, nfsstat, nfs, nfs4cl, and netpmon commands to monitor the Network File System (NFS). For NFS tuning, you could use a tool such as topas or nmoninitially because these commands provide a nice dashboard view of whatis happening in your system. Remember that NFS performance problemsmight not be related to your NFS subsystem at all; your bottleneckcould be on the network or, from a server perspective, related to CPUor disk I/O. Running a tool such as topas or nmon can quickly help you get a sense of what the real issues are.
Consider a system that has two CPUs and is running AIX 5.3 TL_6. The report in Figure 14.1 shows nmon output from an NFS perspective.
Figure 14.1. NFS nmon output
Look at all the information that is available to you from an NFS (client and server) perspective using nmon! There are no current bottlenecks at all on this system.
Although topas has improved recently with its ability to capture data, nmon might still be a better first choice. While topas provides a front end similar to nmon, nmon is more useful in terms of long-term trending and analysis.
14.3. nfsstat
The nfsstattool is arguably the most important tool you'll work with as youmonitor your network. This command displays all types of informationabout NFS and remote procedure calls (RPCs). You can use nfsstat as a monitoring tool to troubleshoot problems and also employ it for performance tuning.
Depending on the flags you use, you can have nfsstatdisplay NFS client or server information. The command can also show theactual usage count of file system operations. This detail helps youunderstand exactly how each file system is utilized, so that you canknow how to best tune your system. Look at the client flag (c) first.
As you can tell, in this example no file systems are mounted using NFS Version 4, only NFS Version 3.
Unlike the vast majority of performance tuning commands, nfs4cl can also be used to tune your system. You do this by using the setfsoptions subcommand to tune NFS Version 4. Another parameter you can tune is the previously mentioned timeo, which specifies the timeout value for the RPC calls to the server.
14.5. netpmon and NFS
The netpmon command can also help you troubleshoot NFS bottlenecks. In addition to monitoring many other types of network statistics, netpmon monitors for clients — both read and write subroutines and NFS RPC requests. For servers, netpmon monitors read and write requests. The command starts a trace and runs in the background until you stop it.
First, let's kick off the trace:
# netpmon -T 3000000 -o /tmp/nfrss.out
You run the trcstop command to signal the end of the trace, as the following message informs you:
# Sun Oct 7 07:06:14 2007
System: AIX 5.3 Node: lpar24ml162f_pub Machine: 00C22F2F4C00 Run trcstop command to signal end of trace.
call times (msec): avg 1.408 min 0.274 max 979.611 sdev 21.310 COMBINED (All Servers) calls: 5602 call times (msec): avg 1.408 min 0.274 max 979.611 sdev 21.310
In this case, you can see the NFS Version 3 client statistics by server.
Although netpmonis a useful trace utility, its performance overhead can sometimesoutweigh its benefits, particularly when you have other ways to obtainsimilar information. So be aware of this consideration when using thisutility.
14.6. Monitoring Network Packets
Earlier, I addressed some of the very basic flags, such as -in, that you typically use with the netstat command. Using netstat, you can also monitor more detailed information about the packets themselves. For example, the -Doption reports the overall number of packets received, transmitted, anddropped in your communications subsystem. The command output sorts theresults by device, driver, and protocol:
There are actually so many different ways to use netstat that the best place to start is to look at the man page for netstat and go from there. Don't be afraid to run these commands, because they won't eat up disk space or affect performance.
14.7. iptrace, ipreport, and ipfilter
Thetracing tools provided within AIX are used to record detailedinformation about packets. Use these commands with more caution.
The tools are extremely helpful when you're trying to determine the root cause of network performance problems. Check out iptrace and ipreport first. The iptrace command records all packets received from the network interfaces. The ipreport command formats the data generated from iptrace into a readable trace report. You can also use the ipfilter command to sort the output file created from ipreport.
Let's try starting the trace and running it for one minute:
# /usr/sbin/iptrace -a -i en0 iptrace [1] 7375
# [774252 [1] + Done /usr/sbin/iptrace -a -i en0 iptrace.out
Asyou can imagine, the trace file can become very large fairly quickly.The file for this example grew to 40 MB in less than a minute! Be verycareful when running these traces because you'll run out of disk spacereally fast if you don't have the disk bandwidth for these files.
You can also start the trace using the System Resource Controller (SRC).
14.8. tcpdump
What about tcpdump?This command prints the headers of the packets that are captured foreach network interface card (NIC). One important difference with tcpdump is that, unlike iptrace, it can look at only one network interface at a time. And because iptrace examines the entire packet from the kernel space, its results can include lots of dropped packets. With tcpdump, you can limit the amount of data to be traced. Also, you don't need to use an ipreport type of command to format the binary data because tcpdump performs both the trace and the output.