SUMMARY: Performance problems

From: David Matinyarare <d.matinyarare_at_iorin.gov.pl> Date: Thu May 13 2004 - 03:33:03 EDT · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:33 EST

Thanks a lot to all of you, especial Terry L Moore and Sreanath
Sarikonda.
Sreanath made available to me an explanation which he got from Bret.
>From what I picked up from all the mails which I got, most of you
pointed "man vmstat pages". As for "prstat" there is a mixed
understanding of the command output. Below are explanations, which I got
from Terry and Sreanath. These mails gave me an insight into the
intepretation of the vmstat and prtstat output.

I was walking in the same path as you were now.These group is a great
group.Below is the explanation of the prstat out put given by
Bret.(Thank you Bret). I make a fair guess that you require more cpu in
your system.I dont think it has any memory shortage.

Sreenath,

The problem is how prstat is summing the memory. With databases, shared
memory is used, which means that all the processes have this pool of
memory, a big chunk is defined by the app set aside for it's use. So,
every database process will attach to this memory. If you look at an
individual oracle process, it will have that it is using 2G of memory
for
example, but that is the shared memory. What prstat is doing is looking
at all those processes and adding as if each process is using a separate

chunk of memory, when actually, that is all shared memory. So, if there
are 100 processes attached to a 2G shared memory pool (cache), then
prstat may report that 200G of memory is being used, when actually only
2G
is being used.

If you want a true breakdown of memory, either pay a lot of cash for
something that breaks it down or use prtmem from the RMCmemtool package.

The problem with tools like top, vmstat and prstat and a bunch that
cost a lot of money is they report just about all memory always used.
This
is because the way Sun uses it's file cache. When you open a file, it
is kept in file cache for faster retrieval later. This file cache is
actually free memory that can be allocated at any given time. But some
tools see it as "used" memory. So they report the memory incorrectly.

Thanks,
Brent

David,
     vmstat output is unreliable in almost all cases.  It is useful
though it
pointing in the general direction of a problem.  You can then use other
tools to
nail down what is happening.  The one column that is very reliable in
vmstat is
the "sr" column.  This column indicates the "pages scanned by [clock]
clock
algorithm"(from the man page for vmstat).  The clock algorithm runs when
the
system is out of memory and must scan memory for pages that haven't been

accessed within a certain time period. If you see numbers in sr, your
system is
in a desperate need for more memory.
     s2 is a partition on one of your SCSI disk drives.  The column
represents
"the number of disk operations per second" (again from man vmstat).
Also from
the man page, "the number is the logical unit number".  What do you have
mounted
on slice 2 of your main SCSI disk?  You can use the "disks" option of
vmstat to
put exactly which disk in which column.  See the "OPERANDS" section of
the man
page under the "disks" paragraph.
     When I look at your output, I see two things.  First, your CPU is
tied up
with user processes. (us in range of 98% almost constantly).  So this
tells me
that you have some heavy duty user programs that are running, not a
system
problem.  I would suggest you look at your user processes one at a time
and see
which ones are the biggest hogs.
     Second, I see that you have most processes in the run queue.  The
system is
working them as fast as it can. They seem to be pretty demanding though.
I
don't see any blocked processes.  I don't see a lot of swapped
processes.  I do
see that you have almost 8GB of memory with about 1.3GB free.
     I wouldn't worry about re & mf so much.  They mean that in the
course of
doing its job, the OS went to look at some memory and it had been used
by
something else.  So it gets the data again.  It happens all the time.
     Where I get a bad feeling is that you have 63GB of programs in
memory and
only about 7.8GB of memory.  What are the big memory hogs?  Is you
system
swapping out to the disk (swap space) almost continually.  That will
slow the
machine WAY down.  If it is true, then you may be memory contrained.
More
memory might help that, but moving some of the heavy hitting processes
off to
other machines might help, too!  Maybe your lack of CPUs means that you
need
more CPUs in the form of other machines to handle the load.
     Directions?  Nail down which disk is getting the most use.  Take a
look at
"iostat 5".  Also consider using sar to capture some better statistics.
From
what you gave in the message, I only have about a 40% confidence that
anything I
said above is true.  Look some other places.  And good luck in your
hunting.
                     Terry

And my original question was:

> Date: Wed, 12 May 2004 08:08:42 +0200
> From: "David Matinyarare" <d.matinyarare@piorin.gov.pl>
> To: <sunmanagers@sunmanagers.org>
> Subject: performance problems
>
> I have been trying the net but without avail for a full clear
> information (explanation) of the fields in the "vmstat 5" output and
> "prstat -a" output. Of late my system have been performing badly and
> ran "vmstat 5". The output which I got is as follows:
>
> procs     memory            page            disk          faults
> cpu
>  r b w   swap  free  re  mf pi po fr de sr f0 s0 s1 s2   in   sy   cs
us
> sy id
>  55 0 0 7867392 1323736 7 89 2  0  0  0  0  0  0  0  9  536 1227  334
> 98 2  0  53 0 0 7867760 1323816 0 0  2  0  0  0  0  0  5  0  7  625
> 1576  479 97 3  0
>  51 0 0 7867760 1323784 0 0  4  0  0  0  0  0  0  0  5  558 1838  465
98
> 2  0
>  52 0 0 7866312 1322960 18 210 4 0 0  0  0  0  1  0  7  671 2390  530
95
> 5  0
>  46 0 0 7866440 1323152 17 201 4 0 0  0  0  0  1  0  6  927 1878  595
97
> 3  0
>  43 0 0 7866480 1322856 8 104 5 0  0  0  0  0  0  0 19  808 2535  661
96
> 4  0
>  36 0 0 7867096 1323104 33 370 4 1 1  0  0  0  0  0  9  964 2551  659
95
> 5  0
>  42 0 0 7865824 1321944 8 140 8 0  0  0  0  0  0  0 13  673 2605  575
97
> 3  0
>  37 0 0 7867248 1323056 0 0  1  0  0  0  0  0  3  0  7  661 2341  477
95
> 5  0
>  34 0 0 7867248 1323040 0 0  5  0  0  0  0  0  0  0  6  671 2977  510
96
> 4  0
>  24 0 0 7867248 1322864 6 0 51  0  0  0  0  0  1  0 58  961 1621  708
98
> 2  0
>  21 0 0 7867248 1322680 9 98 24 0  0  0  0  0  0  0 77 1008 1067  634
98
> 2  0
>  14 0 0 7867248 1322616 2 0 16  0  0  0  0  0  0  0 69  930 1155  660
98
> 2  0
>  16 0 0 7866816 1322456 17 200 9 1 1  0  0  0  3  0 55  990 1589  737
94
> 6  0
>  29 0 0 7867040 1322264 30 176 13 0 0 0  0  0  0  0 13  963 5145  830
95
> 5  0
>  26 0 0 7866776 1321968 18 196 86 3 3 0  0  0  1  0 12  751 4172  637
96
> 4  0
>  22 0 0 7867208 1322032 19 204 8 0 0  0  0  0  0  0  6 1013 4015  985
94
> 6  0
>  13 0 0 7867120 1321888 8 100 8 0  0  0  0  0  0  0  3 1016 1907 1036
97
> 3  0
>  13 0 0 7867104 1321368 317 100 2340 0 0 0 0 0 0  0 59 1102 1923 1086
96
> 4  0
>
> Well there is a lot of processes waiting to run, that means the number

> of CPUs (processors) is low. What about the "re" and "mf" fields?
> There are page reclaim and small faults, then what? What clear
> performance leak is it showing?  There is also the field under disk
> ..."s2". What is this field telling me? Also the field "faults" in,
> sy, cs. Are they surpose to be high as they are?
>
> When I run prstat I get something of the nature, as below
>
>  NPROC USERNAME  SIZE   RSS MEMORY      TIME  CPU
>
>     55       xxxxxxxx        64G   63G   100%           5:21.02  69%
>
> What does 100% mean here under MEMORY?
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers