SUMMARY: Lost swap space

From: Jeff Kennedy (jeff.kennedy@natdecsys.com)
Date: Wed Jun 23 1999 - 12:46:01 CDT


Sorry, forgot to "summarize" it.
---------------------- Forwarded by Jeff Kennedy/NDS on 07/23/99 10:49 AM
---------------------------

"Jeff Kennedy" <jeff.kennedy@natdecsys.com> on 06/23/99 08:49:35 AM
                                                              
                                                              
                                                              
 To: sun-managers@sunmanagers.ececs.uc.edu
                                                              
 cc: (bcc: Jeff Kennedy/NDS)
                                                              
                                                              
                                                              
 Subject: Lost swap space
                                                              

Well, isn't this special?! I've waited such a long time in posting a
summary to verify that we found the cause. It seems we have.

As I mentioned, Oracle 7.x had been running for almost a year without
incident. It had 3 databases under it each with their own shared memory.
Then, Oracle 8 got loaded..........................

With one database under 8, it was given the same amount of shared memory as
the whole of Oracle7. So basically, the memory usage doubled overnight.
Top that with a horde of developers hitting the new database and creating
table spaces left and right, then add million record updates and I've got a
hurtin' system. It took a while to figure this one out since the shared
memory issue was never mentioned. Twice more my server went from swap to
squat. Only after shutting down databases and bringing them back in
different orders did we notice the problem.

This is what I get for assuming something stupid wasn't done to begin with.
Had I merely asked that question in the beginning it would have been much
less painful.

The shared memory has been cut in half and I haven't had a problem like
that since.

Thanks to:
Phil Banham
Stephen Harris
Unixboy
Amarjeet Virdi
Viet Hoang

~Jeff Kennedy
---------------------- Forwarded by Jeff Kennedy/NDS on 07/23/99 08:42 AM
---------------------------

Jeff Kennedy <jeff.kennedy@natdecsys.com> on 06/08/99 09:30:56 AM

 To: Sun Managers List
          <sun-managers@sunmanagers.ececs.uc.edu>

 cc: (bcc: Jeff Kennedy/NDS)

 Subject: Lost swap space

Hello All,

I have an E3500 with 4gb RAM running VM 2.5 and Solaris 2.6. This is a
db server which has been running Oracle 7.x for the last year and just
recently had 8i added (within the last week).

Yesterday my swap partition went to 900kb. Everything thrashed royally
and I ended up rebooting. The messages file showed the following:

Jun 7 15:46:02 host unix: WARNING: Sorry, no swap space to grow stack
for pid 3998 (sqlplus)
Jun 7 15:46:02 host last message repeated 15 times
Jun 7 15:50:26 host unix: WARNING: /tmp: File system full, swap space
limit exceeded
Jun 7 15:50:49 host unix: NOTICE: alloc: /opt: file system full
Jun 7 16:11:35 host unix: WARNING: /tmp: File system full, swap space
limit exceeded
Jun 7 16:11:35 host last message repeated 2 times
Jun 7 16:22:35 host /usr/dt/bin/ttsession[5168]: Error: rpc.ttdbserverd
on host.domain.com is not running
Jun 7 16:23:00 host /usr/dt/bin/ttsession[5168]:
_Tt_db_client::connectToDb(): fcntl(F_SETFD): Bad file number
Jun 7 16:23:04 host /usr/dt/bin/ttsession[5168]:
_Tt_db_file::_Tt_db_file():
_file_cache->insert(host.domain.com/etc/tt/types.xdr), dbStatus 16

Before this there was nothing out of the ordinary. After this is the
reboot.

Oracle 7 core dumped in it's home directory (under /opt) but I have no
idea which was the cause; Oracle dumped because of swap or swap went
because of Oracle. I would lean towards the former since core wasn't in
swap.

Here is what the swap looks like:

host# swap -l
swapfile dev swaplo blocks free
/dev/vx/dsk/swapvol 158,5 16 2097120 1788144
host# swap -s
total: 3265888k bytes allocated + 13784k reserved = 3279672k used,
1243952k available
host#

The incident seems to have subsided but I would like to find out what
caused it in the first place so that I can try and keep it from
happening again.

Thanks,

Jeff Kennedy



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:22 CDT