summary:core can not be saved

From: zhouna (zhouna@21cn.com)
Date: Mon Jul 17 2000 - 01:31:38 CDT


hi,

I dont know wheather or not the former summary reached the list. So send it
again.

zhou na

>my original question is:
>
>one sun e3500 box, solaris 2.6 with the latest patches, runing vxfs and
>vxvm.
>
>It reboots automatically every 2 or 3 days and no information about crach
>can be found in /var/adm/messages.
>
>I enabled core saving, but there is no core saved after its reboot.
>
>the swap for the box is 2G, and patch 107490 has been installed.
># showrev -p | grep 107490
>Patch: 107490-01 Obsoletes: Requires: Incompatibles: Packages: SUNWcsu
>---------------------------------------------------------------------------
-
>------------
>1. the response from John Saalwaechter:
>
>Possible problems:
>
>1). I have seen this before when the system has large, active filesystems.
>When the system crashes and automatically reboots, the fsck processes
>need so much memory that too much swap gets used before savecore has a
>chance to pull the core dump out. The solution is to set the auto-boot?
>OBP variable to false, and then after a crash, boot into single user mode
>and run savecore manually. This lets you get a swap before all of the
>fscks run. (This is usually a UFS problem, and you mentioned that you are
>using vxfs, so this probably isn't it.)
>
>2). Perhaps you don't have enough disk space in /var/crash.
>
>3). Perhaps your "dump device" is incorrectly configured because of VxVM.
>You need to be very careful with encapsulated swap that the "dumpfile"
>settings in the kernel still point to a valid disk slice. One way to
>check this: shut down the system normally, then at the ok prompt issue
>the "sync" command. This is completely harmless, but it does dump core
>and you should end up with a valid core dump in /var/crash/`hostname`
>after the reboot. A different way to check is to run "adb -k" as root,
>and then type in "dumpfile+10/s". You should get back a slice, like
>/dev/dsk/c0t0d0s1. If you get back "/dev/vx/dsk/swapvol", you definitely
>won't be able to get a core dump.
>---------------------------------------------------------------------------
-
>------------
>
>2. Eddy and many other people mentioned:
>
>Been there its the pits. One thing you can try is prtdiag -v it could tell
>you some
>info. Also I dont rember the exactly how to do this but but set eeprom not
>to reboot
>when it crashes then do a sync ( i think ) and it will drop core then
reboot
>and
>you can look at it..
>---------------------------------------------------------------------------
-
>------------
>
>3.Chris Graves gave a lot of useful information:
>
>Problem Description
>
>
>Sending core dumps to alternate dump devices rather than the primary swap
>device.
>
>This is useful in cases where the primary swap device is too small
>to hold a dump or for cases where the customer is running
>Veritas Volume-Manager and has encapsulated the root disk.
>
>It works because the kernel sets up a variable named `dumpfile' to
>be the *first* swap device configured. Normally this is the primary
>swap device designated in /etc/vfstab. This script needs to be placed
>in /etc/rcS.d and is run *before* the primary swap device is setup.
>
>The alternate partition is set as a swap device with `swap -a' then deleted
>with `swap -d', in effect only setting the `dumpfile' variable.
>
>Problem Solution
>
>
>
>Solaris 7 or later does not require this procedure. Solaris 7 and later
>releases provide the "dumpadm" command which allows simple alteration of
the
>location of the dump device (among other things).
>
>For releases earlier then Solaris 7:
>
># S32dumpdev to be placed in /etc/rcS.d
>#
># This file sets DUMPDEV to be the current dumpdev
>#
># This needs to be done before savecore is run, so the
># pages are read from the correct dumpdev.
>#
>#
># Substitute the appropriate device node for <dev_node>
># and uncomment the line below.
>#
>DUMPDEV="/dev/dsk/c0t0d0s1"
>#
>
>if [ $DUMPDEV ]; then
> echo "setting dump device to $DUMPDEV"
> swap -a $DUMPDEV
> swap -d $DUMPDEV
>fi
>
>
>*** A special technical note ***
>
>The method used by the SRDB takes advantage of the fact that the
>initial swap device is also set up as the dump device. The device's
>vnode reference count is incremented twice, setting it to 2.
>
>When the swap -d is done, the vnode reference count is decremented.
>The vnode remains "active" since the reference count is now 1.
>
>In Solaris 2.6 and below, if the disk slice used for the dump device
exceeds
>2GB, ensure the following respective patch is installed otherwise no
>corefile
>will be produced:
>
>2.5.1 108083-01 SunOS 5.5.1: Dump patch
>2.6 107490-01 SunOS 5.6: savecore doesn't work if swap slice is over 2G
>
>After 2.6 no patches are necessary.
>
>There is no way for the system administrator to know (short of using
>adb) that this device is still allocated for use by dump during the
>next crash. Therefore, it is very possible that long after the fact,
>another administrator might decide that the device is "free". He
>can newfs it, mount it, and put the device to use. When the next
>crash occurs, the filesystem on that device will be destroyed.
>
>*** Recommendation ***
>
>Put a comment line in /etc/vfstab explaining that the device has
>been set aside for use as a dump device when the next crash occurs.
>
>INTERNAL SUMMARY:
>
>Date: Tue, 11 Jan 2000 15:50:54 -0500 (EST)
>From: "Paul J. McKernan" <Paul.McKernan@east.sun.com>
>Subject: Can't get a kernel dump?
>To: krnl@corp.sun.com
>
>
>SRDB 11964 contains an /etc/rcS.d script you must use if
>your primary swap is an encapsulated VxVM volume. This must
>also be used if your primary swap is a mirrored SDS metadevice.
>Without this, your customer won't be able to capture kernel
>crash dumps. As the SRDB states, this becomes mute in
>Solaris 7 and beyond.
>
>Here's an example of a primary swap that's an SDS metadevice:
>
>eastcores% cat explorer.80ab8452.arcsun01-2000.01.11.11.14/disks/swap-l.out
>swapfile dev swaplo blocks free
>/dev/md/dsk/d1 85,1 16 4194272 4194272
>
>Their primary swap is 'd1' (a Solstice Disksuite metadevice).
>Is it a metamirror?
>
>eastcores% more
>explorer.80ab8452.arcsun01-2000.01.11.11.14/disks/sds/metastat.out
>*
>*
>*
>d1: Mirror
> Submirror 0: d11
> State: Okay
> Submirror 1: d12
> State: Okay
> Pass: 1
> Read option: roundrobin (default)
> Write option: parallel (default)
> Size: 4197879 blocks
>
>d11: Submirror of d1
> State: Okay
> Size: 4197879 blocks
> Stripe 0:
> Device Start Block Dbase State Hot Spare
> c0t0d0s1 0 No Okay
>
>
>d12: Submirror of d1
> State: Okay
> Size: 4197879 blocks
> Stripe 0:
> Device Start Block Dbase State Hot Spare
> c0t1d0s1 0 No Okay
>*
>*
>*
>
>We can see that 'd1' is a mirror of metadevices 'd11' and 'd12'.
>As you can see 'd11' and 'd12' have physical disk devices.
>This customer needs the "S32dumpdev" script from SRDB 11964.
>They need to place the script in /etc/rcS.d and then edit it
>so that the DUMPDEV is defined as one of the physical disk
>slices associated with either 'd11' or 'd12' (doesn't matter
>which one they pick).
>
>DUMPDEV="/dev/dsk/c0t0d0s1"
>
>or this customer could use:
>
>DUMPDEV="/dev/dsk/c0t1d0s1"
>
>Also, in Solaris 2.6 and below, if the disk slice used for the
>dump device exceeds 2GB, ensure the following respective patch
>is installed otherwise no corefile will be produced:
>
>2.5.1 108083-01 SunOS 5.5.1: Dump patch
>2.6 107490-01 SunOS 5.6: savecore doesn't work if swap slice is over 2G
>---------------------------------------------------------------------------
-
>------------
>
>4. there are still other people suggests that:
>/var/crash may need to be big enough to hold all of swap
>
>===========================================================================
=
>====================================
>
>solution for my problem:
>
>I have 3 large and active file system on the box. the 'fsck pass' parameter
>in /etc/vfstab is set to 3 originally, we changed it from 3 to 2. finally,
>we had the core saved after the former system crash.
>
>
>thanks for your answers.
>
>zhou na
>
>

S
U BEFORE POSTING please READ the FAQ located at
N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq
. and the list POLICY statement located at
M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy
A To submit questions/summaries to this list send your email message to:
N sun-managers@ececs.uc.edu
A To unsubscribe from this list please send an email message to:
G majordomo@sunmanagers.ececs.uc.edu
E and in the BODY type:
R unsubscribe sun-managers
S Or
. unsubscribe sun-managers original@subscription.address
L To view an archive of this list please visit:
I http://www.latech.edu/sunman.html
S
T



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:12 CDT