SUMMARY: mutexes and heavy load

From: Nasiadek, Jedrzej <JNasiadek_at_era.pl> Date: Mon Jan 07 2002 - 04:56:28 EST · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:31 EST

Hi!

Ok. I've got the following answer from Don (thanks a bunch, Don) and it
seems it's good explanation of what
was going on with my system.

> There is a problem in Solaris 7 and early Solaris 8 OS systems with regard
to large
> numbers of mutexes.  As the number of mutexes increases in the system, the
amount
> of cpu time they consume increases linearly after a certain point until
they are consuming
> virtually all of the available cpu time, even if the system is essentially
idle!

> There is a patch for Solaris 7 (which I don't have the number of any more)
and one of the
> early Solaris 8 systems which is 108827-05.

I've found that patch for Solaris 7 - it's 106980-17. I applied it and
rebooted machine.
Now, after 2 days it still works like charm. We'll se what will it be like
after a few weeks...

And here is my original mail:

> Hi people!
> 
> I've got strange problem:
> The machine is 14 processor domain on E10k with Solaris 7 11/99. It 
> runs Oracle 8.1.7 and some processes that use this database. There is 
> about 1000 oracle sessions. Machine is _heavily_ loaded - loadavg is 
> 95. And.. well just look at some statistics:
> 
> "vmstat 3" shows:
>  r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 s8   in   sy   cs us
sy
> id
>  3 0 0  16928  1384   0   0  0  0  0  0  0  0  0  0  0 4294967196 0 0 -46
-8
> -95
>  81 1 0 17316992 6124176 0 66 0 0  0  0  0  0  0  0  0 7880 589133 
> 12364 75 25 0  87 3 0 17315136 6122600 0 133 0 0 0  0  0  0  0  0  0 
> 9435 552462 14111 74 26 0
>  95 2 0 17311744 6120176 0 242 0 0 0  0  0  4  3  0  0 9574 493991 14674
75
> 25 0
>  90 0 0 17308128 6117512 0 301 0 0 0  0  0  0  0  0  0 8812 594396 13331
75
> 25 0
> 102 1 0 17305520 6115024 0 245 0 0 0 0  0  0  0  0  0 6746 590357 11123 76
> 24 0
> 
> And "mpstat 3" shows
> 
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
idl
>   0    2   0    0   226    1  563  232   29 6601    0 43940   74  26   0
0
>   1   18   0   13  1005  474 1975  783   30 5776    0 30448   71  29   0
0
>   2   11   0   93   911  533 1736  668   35 6544    0 33818   80  20   0
0
>   3   28   0  815   173    1  413  167   23 5810    0 28223   68  32   0
0
>  28    0   0  537   456    1 1140  469   20 8601    0 46569   78  22   0
0
>  30    0   0    0   385    1 1080  404   23 7813    0 48830   76  24   0
0
>  32    0   0   34   262    1  673  270   22 8893    0 54499   78  22   0
0
>  33    1   0   17   302    1  792  316   16 7328    0 47958   79  21   0
0
>  34   19   0 5169   172    1  439  174   19 7511    0 42699   78  22   0
0
>  35    0   0   86   366    1  939  383   20 9280    0 55249   73  27   0
0
>  48  117   0  489   244    1  573  257   32 4971    0 27181   84  16   0
0
>  49   12   0 1248   205    1  530  209   26 8644    0 53641   74  26   0
0
>  50   12   0   13   428   71  585  212   20 9953    0 65631   54  46   0
0
>  51    0   0   43  2699 2276  596  209   20 7892    0 43045   64  36   0
0
> 
> Look at smtx column (!) it's incredibly high!
> 
> and then i ran "lockstat sleep 5" and here is a bit of output from it:
> 
> Adaptive mutex spin: 467216 events
> 
> Count indv cuml rcnt     spin Lock                   Caller
> 
> ----------------------------------------------------------------------
> ------
> ---
> 439790  94%  94% 1.00       36 tod_lock               uniqtime+0x10
> 
>  1841   0%  95% 1.00        3 0x30000c6c000          untimeout+0x18
> 
>  1828   0%  95% 1.00        2 0x30000c72000          untimeout+0x18
> 
>  1663   0%  95% 1.00        4 0x30000c72000          timeout_common+0x4
>  1547   0%  96% 1.00        3 0x30000c6c000          timeout_common+0x4
> 
>  1337   0%  96% 1.00        3 0x30000c75000          timeout_common+0x4
> 
>  1305   0%  96% 1.00        2 0x30000c75000          untimeout+0x18
> 
>  1255   0%  96% 1.00      102 0x30005c703c8          qfestart+0x204
> 
> I have totally no clue what to do with this :-(
> Any suggestions will be helpful.
> I will summarize of course.
> 
> best regards,
>     Jedrzej Nasiadek
> 
> p.s.
>   One more thing - i've looked at ps' output - there is no 
> cpu-consuming "pig"
>   the most busy process occupies 1.8% cpu. 
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers