Identifying bottlenecks on your server

As heavy users of the LAMP stack for our applications, we of course find that various systems are not performing as expected. We have one webserver (part of an application cluster) that often spikes loads that seem to be unrelated to the actual traffic on the machine. For example, we may have 80 httpd requests, yet the load on the machine is 8 or 9. So- how do we begin to identify where the bottleneck exists? Typically, those bottlenecks can be narrowed to two places- CPU and I/O (disk). We can check our system with a couple of tools to identify where the problem is: vmstat and iostat.

On this particular machine a ‘vmstat 5′ shows this:

[user@host:~] vmstat 5

procs ———–memory———- —swap– —–io—- –system– —–cpu——

r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st

2  0     40 1327880 676512 4689568    0    0     2    90    1    0 24  3 69  1  3

0  0     40 1330500 676516 4689696    0    0     0   135 1384  356 36  7 53  0  4

0  0     40 1328756 676516 4689768    0    0     0   406 1517  454 35  5 56  0  5

0  0     40 1317160 676516 4689904    0    0     0  1058 1928  590 27  5 64  1  3

Note the ‘us’ colum- we show 24, 36,35,and 27% usage- so we are not currently CPU bound.

Here is another machine who is currently at high load. Note the ‘us’ column:

[user@otherhost root]# vmstat 5

procs                      memory    swap          io     system         cpu

r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id

6  1  1 697600  63072 647548 932252   0   0    32   238 1332  1274  83   5  12

17  1  1 695724  82228 647560 932476   0   0    22   193 1355  1225  91   5   4

This machine looks to be CPU bound.

Moving on to disk, let’s look at the first machine:

mklatsky@adweb:~] iostat -dx 5

Linux 2.6.18-xenU-ec2-v1.0 (host)         04/15/2009

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sdb               0.01    21.91  0.35 17.48     7.33   315.10    18.08     0.43   23.85   1.31   2.34

sdc               0.00     0.00  0.00  0.00     0.00     0.00    30.03     0.00    2.45   2.07   0.00

sda1              0.01     3.37  0.08  2.38     2.18    46.08    19.59     0.18   72.80   1.83   0.45

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util

sdb               0.00    27.00  0.00 12.40     0.00   315.20    25.42     0.09    7.50   0.29   0.36

sdc               0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

sda1              0.00     4.20  0.00  1.00     0.00    41.60    41.60     0.00    0.00   0.00   0.00

(Yes, it’s a Xen nstance).

Note the ‘util’ column. The disk utilization is minimal. That’s good. But this system is not currently under load. We’ll need to look at the above under load.

This is the second system- still under load:

[root@draco root]# iostat -dx 5

Linux 2.4.9-e.65smp (anotherhost)     04/15/2009

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util

sda          7.80  53.40  2.80  7.40   84.80  488.00    42.40   244.00    56.16     0.06    6.27   1.96   2.00

sda1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

sda2         7.80  29.20  2.40  2.20   81.60  251.20    40.80   125.60    72.35     0.03    6.52   3.91   1.80

sda3         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

sda4         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

sda5         0.00   2.00  0.00  0.40    0.00   19.20     0.00     9.60    48.00     0.00    5.00   5.00   0.20

sda6         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

sda7         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

sda8         0.00  13.40  0.00  2.60    0.00  129.60     0.00    64.80    49.85     0.02    8.46   0.77   0.20

sda9         0.00   8.80  0.40  2.20    3.20   88.00     1.60    44.00    35.08     0.01    3.85   1.54   0.40

sda10        0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

‘util’ is pretty low- so we need to address why this system is CPU bound. That will be for the next post

…to be continued

Leave a Reply

Your email address will not be published. Required fields are marked *