As heavy users of the LAMP stack for our applications, we of course find that various systems are not performing as expected. We have one webserver (part of an application cluster) that often spikes loads that seem to be unrelated to the actual traffic on the machine. For example, we may have 80 httpd requests, yet the load on the machine is 8 or 9. So- how do we begin to identify where the bottleneck exists? Typically, those bottlenecks can be narrowed to two places- CPU and I/O (disk). We can check our system with a couple of tools to identify where the problem is: vmstat and iostat.
On this particular machine a ‘vmstat 5′ shows this:
[user@host:~] vmstat 5
procs ———–memory———- —swap– —–io—- –system– —–cpu——
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 40 1327880 676512 4689568 0 0 2 90 1 0 24 3 69 1 3
0 0 40 1330500 676516 4689696 0 0 0 135 1384 356 36 7 53 0 4
0 0 40 1328756 676516 4689768 0 0 0 406 1517 454 35 5 56 0 5
0 0 40 1317160 676516 4689904 0 0 0 1058 1928 590 27 5 64 1 3
Note the ‘us’ colum- we show 24, 36,35,and 27% usage- so we are not currently CPU bound.
Here is another machine who is currently at high load. Note the ‘us’ column:
[user@otherhost root]# vmstat 5
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
6 1 1 697600 63072 647548 932252 0 0 32 238 1332 1274 83 5 12
17 1 1 695724 82228 647560 932476 0 0 22 193 1355 1225 91 5 4
This machine looks to be CPU bound.
Moving on to disk, let’s look at the first machine:
mklatsky@adweb:~] iostat -dx 5
Linux 2.6.18-xenU-ec2-v1.0 (host) 04/15/2009
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdb 0.01 21.91 0.35 17.48 7.33 315.10 18.08 0.43 23.85 1.31 2.34
sdc 0.00 0.00 0.00 0.00 0.00 0.00 30.03 0.00 2.45 2.07 0.00
sda1 0.01 3.37 0.08 2.38 2.18 46.08 19.59 0.18 72.80 1.83 0.45
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdb 0.00 27.00 0.00 12.40 0.00 315.20 25.42 0.09 7.50 0.29 0.36
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda1 0.00 4.20 0.00 1.00 0.00 41.60 41.60 0.00 0.00 0.00 0.00
(Yes, it’s a Xen nstance).
Note the ‘util’ column. The disk utilization is minimal. That’s good. But this system is not currently under load. We’ll need to look at the above under load.
This is the second system- still under load:
[root@draco root]# iostat -dx 5
Linux 2.4.9-e.65smp (anotherhost) 04/15/2009
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 7.80 53.40 2.80 7.40 84.80 488.00 42.40 244.00 56.16 0.06 6.27 1.96 2.00
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 7.80 29.20 2.40 2.20 81.60 251.20 40.80 125.60 72.35 0.03 6.52 3.91 1.80
sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda5 0.00 2.00 0.00 0.40 0.00 19.20 0.00 9.60 48.00 0.00 5.00 5.00 0.20
sda6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda8 0.00 13.40 0.00 2.60 0.00 129.60 0.00 64.80 49.85 0.02 8.46 0.77 0.20
sda9 0.00 8.80 0.40 2.20 3.20 88.00 1.60 44.00 35.08 0.01 3.85 1.54 0.40
sda10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
‘util’ is pretty low- so we need to address why this system is CPU bound. That will be for the next post
…to be continued