Thanks a lot to all of you, especial Terry L Moore and Sreanath Sarikonda. Sreanath made available to me an explanation which he got from Bret. >From what I picked up from all the mails which I got, most of you pointed "man vmstat pages". As for "prstat" there is a mixed understanding of the command output. Below are explanations, which I got from Terry and Sreanath. These mails gave me an insight into the intepretation of the vmstat and prtstat output. I was walking in the same path as you were now.These group is a great group.Below is the explanation of the prstat out put given by Bret.(Thank you Bret). I make a fair guess that you require more cpu in your system.I dont think it has any memory shortage. Sreenath, The problem is how prstat is summing the memory. With databases, shared memory is used, which means that all the processes have this pool of memory, a big chunk is defined by the app set aside for it's use. So, every database process will attach to this memory. If you look at an individual oracle process, it will have that it is using 2G of memory for example, but that is the shared memory. What prstat is doing is looking at all those processes and adding as if each process is using a separate chunk of memory, when actually, that is all shared memory. So, if there are 100 processes attached to a 2G shared memory pool (cache), then prstat may report that 200G of memory is being used, when actually only 2G is being used. If you want a true breakdown of memory, either pay a lot of cash for something that breaks it down or use prtmem from the RMCmemtool package. The problem with tools like top, vmstat and prstat and a bunch that cost a lot of money is they report just about all memory always used. This is because the way Sun uses it's file cache. When you open a file, it is kept in file cache for faster retrieval later. This file cache is actually free memory that can be allocated at any given time. But some tools see it as "used" memory. So they report the memory incorrectly. Thanks, Brent David, vmstat output is unreliable in almost all cases. It is useful though it pointing in the general direction of a problem. You can then use other tools to nail down what is happening. The one column that is very reliable in vmstat is the "sr" column. This column indicates the "pages scanned by [clock] clock algorithm"(from the man page for vmstat). The clock algorithm runs when the system is out of memory and must scan memory for pages that haven't been accessed within a certain time period. If you see numbers in sr, your system is in a desperate need for more memory. s2 is a partition on one of your SCSI disk drives. The column represents "the number of disk operations per second" (again from man vmstat). Also from the man page, "the number is the logical unit number". What do you have mounted on slice 2 of your main SCSI disk? You can use the "disks" option of vmstat to put exactly which disk in which column. See the "OPERANDS" section of the man page under the "disks" paragraph. When I look at your output, I see two things. First, your CPU is tied up with user processes. (us in range of 98% almost constantly). So this tells me that you have some heavy duty user programs that are running, not a system problem. I would suggest you look at your user processes one at a time and see which ones are the biggest hogs. Second, I see that you have most processes in the run queue. The system is working them as fast as it can. They seem to be pretty demanding though. I don't see any blocked processes. I don't see a lot of swapped processes. I do see that you have almost 8GB of memory with about 1.3GB free. I wouldn't worry about re & mf so much. They mean that in the course of doing its job, the OS went to look at some memory and it had been used by something else. So it gets the data again. It happens all the time. Where I get a bad feeling is that you have 63GB of programs in memory and only about 7.8GB of memory. What are the big memory hogs? Is you system swapping out to the disk (swap space) almost continually. That will slow the machine WAY down. If it is true, then you may be memory contrained. More memory might help that, but moving some of the heavy hitting processes off to other machines might help, too! Maybe your lack of CPUs means that you need more CPUs in the form of other machines to handle the load. Directions? Nail down which disk is getting the most use. Take a look at "iostat 5". Also consider using sar to capture some better statistics. From what you gave in the message, I only have about a 40% confidence that anything I said above is true. Look some other places. And good luck in your hunting. Terry And my original question was: > Date: Wed, 12 May 2004 08:08:42 +0200 > From: "David Matinyarare" <d.matinyarare@piorin.gov.pl> > To: <sunmanagers@sunmanagers.org> > Subject: performance problems > > I have been trying the net but without avail for a full clear > information (explanation) of the fields in the "vmstat 5" output and > "prstat -a" output. Of late my system have been performing badly and > ran "vmstat 5". The output which I got is as follows: > > procs memory page disk faults > cpu > r b w swap free re mf pi po fr de sr f0 s0 s1 s2 in sy cs us > sy id > 55 0 0 7867392 1323736 7 89 2 0 0 0 0 0 0 0 9 536 1227 334 > 98 2 0 53 0 0 7867760 1323816 0 0 2 0 0 0 0 0 5 0 7 625 > 1576 479 97 3 0 > 51 0 0 7867760 1323784 0 0 4 0 0 0 0 0 0 0 5 558 1838 465 98 > 2 0 > 52 0 0 7866312 1322960 18 210 4 0 0 0 0 0 1 0 7 671 2390 530 95 > 5 0 > 46 0 0 7866440 1323152 17 201 4 0 0 0 0 0 1 0 6 927 1878 595 97 > 3 0 > 43 0 0 7866480 1322856 8 104 5 0 0 0 0 0 0 0 19 808 2535 661 96 > 4 0 > 36 0 0 7867096 1323104 33 370 4 1 1 0 0 0 0 0 9 964 2551 659 95 > 5 0 > 42 0 0 7865824 1321944 8 140 8 0 0 0 0 0 0 0 13 673 2605 575 97 > 3 0 > 37 0 0 7867248 1323056 0 0 1 0 0 0 0 0 3 0 7 661 2341 477 95 > 5 0 > 34 0 0 7867248 1323040 0 0 5 0 0 0 0 0 0 0 6 671 2977 510 96 > 4 0 > 24 0 0 7867248 1322864 6 0 51 0 0 0 0 0 1 0 58 961 1621 708 98 > 2 0 > 21 0 0 7867248 1322680 9 98 24 0 0 0 0 0 0 0 77 1008 1067 634 98 > 2 0 > 14 0 0 7867248 1322616 2 0 16 0 0 0 0 0 0 0 69 930 1155 660 98 > 2 0 > 16 0 0 7866816 1322456 17 200 9 1 1 0 0 0 3 0 55 990 1589 737 94 > 6 0 > 29 0 0 7867040 1322264 30 176 13 0 0 0 0 0 0 0 13 963 5145 830 95 > 5 0 > 26 0 0 7866776 1321968 18 196 86 3 3 0 0 0 1 0 12 751 4172 637 96 > 4 0 > 22 0 0 7867208 1322032 19 204 8 0 0 0 0 0 0 0 6 1013 4015 985 94 > 6 0 > 13 0 0 7867120 1321888 8 100 8 0 0 0 0 0 0 0 3 1016 1907 1036 97 > 3 0 > 13 0 0 7867104 1321368 317 100 2340 0 0 0 0 0 0 0 59 1102 1923 1086 96 > 4 0 > > Well there is a lot of processes waiting to run, that means the number > of CPUs (processors) is low. What about the "re" and "mf" fields? > There are page reclaim and small faults, then what? What clear > performance leak is it showing? There is also the field under disk > ..."s2". What is this field telling me? Also the field "faults" in, > sy, cs. Are they surpose to be high as they are? > > When I run prstat I get something of the nature, as below > > NPROC USERNAME SIZE RSS MEMORY TIME CPU > > 55 xxxxxxxx 64G 63G 100% 5:21.02 69% > > What does 100% mean here under MEMORY? _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Thu May 13 03:32:55 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:33 EST