Even though the actual problem wasn't fingered directly, I
must say that I learned alot from all who responded! Many, many
thanks go out to the following responders for sharing their
wisdom and troubleshooting methods!
Let me also add:
I was *awestruck* by the wide geographic breadth of this list...
Even though I posted the problem at the end of the 'normal'
(i.e. other non-sys-admin employees) work day here on the East
Coast of the US, and had actually resigned myself to struggling
with the problem alone for most of the night, responders rang in
from all over the world proving that it's always daytime somewhere...
Well, enough verbosity from me... on to the summary:
ORIGINAL PROBLEM DESCRIPTION:
Yesterday I upgraded an IPX that was running SunOS 4.1.3
(a porting platform) to Solaris 2.4 (again for a porting platform)
and everything went great. Today I made several changes to the
/etc/system file to support shared memory settings for Oracle
(yes, I know the machine is underpowered but it's a porting/test
platform) and did a boot -r. The system hung after stating that
is was starting the syslog service. Assuming I had screwed something
up in the /etc/system file, after waiting it out for about 5 minutes, I
halted it, and booted it off the network (had to create an install server
since it has no local CD) and mounted the root partition on /a
and edited out the changes... I can't seem now to boot past the
syslog startup... I've even gone so far as to boot off the network
and rename syslog.conf to prevent syslog from starting, but it still
hangs right after setting the multicast.... (without a syslog.conf
syslog won't start, so it's obviously hanging right after....)
Anyways.. I'm open to any and all clues (or past similar experiences)
as to what could be at fault, or even how to begin to troubleshoot
a hang with no error messages..... I'm at my wits end...
I know the hardware's working, as I can boot off the CD (across the
network) and it works fine.. And the system worked fine until the boot
-r after the /etc/system changes... What could be going on?!!!
ACTUAL PROBLEM CAUSE:
After about 2 hours of adding debug statements to the scripts in
init.d (Entry and Exit statements) which simply proved that
it would hang right after entering the 'cron' script, and booting
to the hang, then booting off the cd (net) again and again,
I finally bit the bullet and reinstalled the OS. ***HOWEVER - when
putting everything back - piece by piece, I managed to identify what the
original problem was!!! I had simply preserved the automount files from
when the machine was originally running SunOS, and when I put
them back and ran 'autofs stop' and a subsequent 'autofs start'
I magically lost the ability to execute any OS program!
This got me thinking, and upon rebooting off the cd I discovered that
the config for soft mounts was including a mount over /usr/lib!
The moment the automounter reached the point where it would
soft mount the foreign file system, it was obscuring access
to the /usr/lib/lib*.so.* files which (the actual lib file that is required
escapes me) prevents the executable from using the linked libraries!
Turns out that it wasn't anything to do with where it was stopping,
other than a subsystem that had already started up (automountd)
successfully and was hammering the rest of the startup....
Don't inherit things without examining them carefully.... I
hadn't thought about the automounter as 99.8% of all the
filesystems we automount are DATA only... (and
devices like remote cd drives). This was one 'inheritance'
that bit me... An old colleague of mine used to say:
"Trust...... BUT VERIFY!" -- He was right.
Thanks again for all the help, and support!!!
Bob Bennett (W): firstname.lastname@example.org
Aspect Development Inc. (H): email@example.com
Nashua, NH USA
(603) 880-3764 Ext. 18
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:47 CDT