Summary: What is eating up memory?

From: Avrami, Louis (LouisA@omnipoint-pcs.com)
Date: Mon Feb 21 2000 - 17:41:20 CST


Thanks to the following people for their replies:

Gerard Hynes Don Cockman Paul Teasdel

Rich Sullivan Daniel Serna Nelson
Caparrosso
Eugene Choi Darren Brechman-Toussaint Adam Levin
Buddy Lumpkin Brett Lymn James Coby
Bismark Espinoza Nikolai Zeldovich Steve
Gauthier
Gary Jenson Oscar Goosens Dwight
Peterson
Bruce Zimmer Kevin@joltin.com Brian
Laughton
Dave Harrington

The problem actually was an Oracle bug. As part of the RDBMS, we are
running the Oracle ConText Cartridge, which enables text scanning and
searching within a database.

When doing a cold backup of the database, we would shut it down with

shutdown abort (cancel all processes running within the database)
startup restrict (start the database back up, only to allow the
aborted
                              processes to roll back)
shutdown (normal shutdown)

We assumed that the shutdown abort would take care of killing the ConText
Cartridge also. Well, it killed it, but that also was the cause of a memory
leak that consumed much memory. Here is the bug number and abstract from
Oracle MetaLink site, http://metalink.oracle.com:

Found bug 720219

  Abstract: CONTEXT SERVERS CONTINUALLY CONSUMING MEMORY. This bug is fixed
in 8.1.5

Since we're on Oracle RDBMS version 8.0.5.1 and can't upgrade right now, we
used the suggested workaround of executing the following PL/SQL command
within the database using DBA privileges:

        execute ctx_adm.shutdown('all');

By doing this prior to the 'shutdown abort - startup restrict - shutdown',
we clean shut down the ConText Cartridge also, eliminating the memory leak.

We identified this problem by performing snapshots with the commands sar -r,
vmstat, ps -elf, /usr/ucb/ps -aux and top, sorted by memory size. We did
snapshots before and after each major process on the 450. When we tested
the cold backup, we spotted the problem. We've been able to consistently
recreate it. Shutting down the ConText Cartridge cleanly eliminates the
problem.

Many thought that what I was experiencing was file system caching. Several
suggested a very helpful link on Sun memory management and file system
caching:

http://www.sunworld.com/sunworldonline/swol-10-1995/swol-10-perf.html

as well as the book "Sun Performance Tuning" by Adrian Cockcroft.

A summary of other helpful suggestions:

Talk to your Oracle dba if he/she is not a boob
I am the Oracle DBA boob, with no Sys Admin, so ....

        vmstat 2 20
ignore the first line, it is an average of numbers since the machine booted.
vmstat has a column called sr - scan rate. This shows how much activity
there is in reclaiming pages to put on the free list. If this number is zero
or low, then there's nothing to worry about.

        /usr/ucb/ps -aux
which reveals the percentage of memory and CPU that each individual process
is consuming.

        /usr/proc/bin/pmap
to break down the memory consumption of each process.

        ps -elf
Adding up the RSS column would detail how much memory was being used.

        ps -e -ouser,pid,rss,vsz,args | sort +3n -4
which will show you who are the offenders for eating up large amounts of
virtual
memory - modify the sort to see what's using up real memory and compare
the results over a couple of days.

        sysdef -i
to check tunables defined in /etc/system

        memtool - ftp://playground.sun.com/pub/memtool

        top
start top, type 'o' and then type 'size'. Top will then sort by memory size.

        proctool - http://www.sunfreeware.com

        

> ----------
> From: Avrami, Louis
> Sent: Tuesday, February 15, 2000 5:51 PM
> To: 'sun-managers@sunmanagers.ececs.uc.edu'
> Subject: What is eating up memory?
>
> Hello Sun-Managers,
>
> We're experiencing an interesting problem here that I'm hoping you can
> help us with. We are "losing" memory, and we're not sure how to identify
> WHAT may be eating it up.
>
> Here's the problem:
>
> Enterprise 450, 4 gig of memory, running Solaris 2.6, recommended
> patches. We're in the process of building an application, so the machine
> isn't being utilized that much. Here is a "typical" resource snapshot,
> using top:
>
> last pid: 2085; load averages: 0.02, 0.02, 0.02
> 10:43:55
> 53 processes: 52 sleeping, 1 on cpu
> CPU states: 96.7% idle, 0.3% user, 0.6% kernel, 2.4% iowait,
> 0.0% swap
> Memory: 4096M real, 3720M free, 164M swap in use, 5322M swap free
>
> Right now on the machine we're only running an Oracle database with an SGA
> (memory allocation) of approximately 100 meg and SQLNET/Net8 (Oracle's
> TCP/IP to enable network connectivity to the database).
>
> However, after a day or so, we will see the amount of available free
> memory drop down to only 450 meg or so! Here's a top snapshot of this
> situation:
>
> last pid: 5853; load averages: 1.04, 0.66, 0.42
> 12:45:58
> 71 processes: 68 sleeping, 1 stopped, 2 on cpu
> CPU states: 47.6% idle, 49.3% user, 3.1% kernel, 0.0% iowait, 0.0% swap
> Memory: 4096M real, 423M free, 276M swap in use, 5204M swap free
>
> PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
> 320 oracle 11 48 0 124M 107M sleep 2:06 0.43% oracle
> 4932 lavrami 1 28 0 1568K 1400K cpu3 0:00 0.39% top
> 3 root 1 60 -20 0K 0K sleep 1:49 0.31% fsflush
> 318 oracle 11 59 0 124M 107M sleep 0:04 0.00% oracle
> 316 oracle 29 59 0 125M 107M sleep 0:02 0.00% oracle
> 601 oracle 1 48 0 123M 110M sleep 0:01 0.00% oracle
> 2110 oracle 1 48 0 123M 110M sleep 0:01 0.00% oracle
> 350 oracle 1 48 0 11M 7792K sleep 0:01 0.00% tnslsnr
> 260 root 1 58 0 3360K 2704K sleep 0:01 0.00% nsrd
> 427 oracle 1 59 0 126M 113M sleep 0:01 0.00% oracle
> 428 root 1 59 0 16M 6592K sleep 0:01 0.00% Xsun
> 0 root 1 96 -20 0K 0K stop 0:00 0.00% sched
> 306 root 1 28 10 3448K 2520K sleep 0:00 0.00% nsrindexd
> 304 root 1 28 10 3040K 2352K sleep 0:00 0.00% nsrmmdbd
> 1433 oracle 1 29 0 123M 110M sleep 0:00 0.00% oracle
>
>
>
> What is puzzling is that looking at a full top snapshot under these
> conditions doesn't reveal what processes are eating up the additional
> memory. Nothing stands out as the culprit. Conditions on the machine
> and the amount of access to the database are fairly constant. When we are
> short of memory I have shutdown down the Oracle database and Net8 to see
> if that would "free" the missing memory, but that didn't do the trick.
> The amount of free memory would increase from approximately 450 meg to
> approximately 560 meg, which is the size of the Oracle database plus Net8.
>
> The only thing which seems to restore the memory is cycling the machine,
> which isn't a solution in a production environment.
>
> We are going to run a series of processes on the machine to try to
> identify what may be the cause of the excessive memory utilization. Can
> anyone suggest what commands to run to help identify what OS processes
> could be eating up memory? We can find out when it is happening with top,
> sar -r, vmstat, etc., but what we really want to do is identify exactly
> WHAT is responsible for the extra memory utilization.
>
> I imagine what we want to do is some kind of snapshot under ideal
> conditions as a baseline, then run our tests and hopefully catch the
> culprits.
>
> If anyone can suggest any commands, methodology, tools, etc., it would be
> appreciated. I will summarize the responses and report the results of our
> tests.
>
> Thanks,
>
> Lou Avrami
>
>
>
>
>
>
>



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:03 CDT