Advance SUMMARY: callbparams_free_id: not found

From: Jochen Bern (bern@penthesilea.uni-trier.de)
Date: Tue Jun 09 1998 - 16:32:15 CDT


Not actually fixed yet, but the Bug Report located hits *SO* close to
Home that I'm confident that it's the same Bug. I wrote:

> I just had one of the Dpt. Servers freeze, THRICE IN A ROW, with
> possibly relevant Contents in /var/adm/messages as reproduced
> below.
[See below; main Message "unix: callbparams_free_id: not found".
HUNDREDS of them. Host seems to have an "invisible" Memory Hog.]
> SunOS jupiter 5.4 Generic_101945-52 sun4m sparc
> Recommended and Security Patches (though it's been a Couple Months
> again ...)

1. Updated Problem Description: I watched a Crash creep up with 'ps'
   and 'top'. While the Total of Process Sizes as given by 'ps'
   stayed reasonable, 'top' reported less and less free RAM.
   Upon reaching Zero - while still having ~ 1/2 GB free Swap -
   the Machine would refuse Network Services (telnet, ssh, NFS,
   NIS+, ...) while still responding to the Console; Sometimes
   it stayed 'ping'able, sometimes it didn't. I'm led to believe
   that this is a pretty sure Sign of a Kernel Memory Leak (rather
   than, e.g., 'ps' missing some Process(es) hogging Memory).
   
   I later found that the Crashes correlated with a Power User
   trying to 'tar' ~ 90,000 very small Files off an NFS Mount,
   which he'd never done before.
   
   The Machine is currently running low on RAM *again*, in Spite of
   the 'tar' NOT being tried anymore and formerly doing Uptimes
   in the 1-3 Months Range, so the 'tar' mightn't be the *whole*
   Story. :-{

2. The Award goes to Frank de la Torre <ntlinux@hotmail.com> for
   pointing me to BugId 1231772. Excerpt from the Description:

| Bug Id: 1231772
| Category: kernel
| Subcategory: nfs
| State: closed
| Release summary: 1.0_a_plus, 5.4
| Synopsis: system hangs during backup (tar) of an NFS mounted filesystem
| Integrated in releases:
| Patch id:
| Description:
| Configuration:
| SS10 512 in 2.4: name is ultimate
[...]
| My customer performs backup (using tar) of a filesystem mounted via NFS.
| This filesystem contains tons of tiny files.
[...]
| Error messages appear and repeat on the console at the time of hang:
| "callbparams_free_id: not found".
[...]
| The description field as copied from bug report 1231614 follows:
| server hangs when performing backup of an NFS mounted filesystem
| Work around:
| Use "cp" instead of "tar".
| Try tuning down the number of rnodes. The number of rnodes in the dump files
| was 5000. Use /etc/system file:
| set nfs:nrnode=1

   (I assume that there are two more Workarounds/Fixes: a) run 'tar cf -'
   on the NFS Server and pipe into scp or somesuch to the Target Host,
   and b) upgrade to 2.5 ?)
   
   I'll turn down the rnodes *and* tell the User to avoid this Behaviour.
   Belt&Suspenders. ;-) Upgrade to 2.5 when I've gotten the Terms
   of our (future?) Site License bought by the Computing Center pin-
   pointed ...

3. For the Time being, I'll assume that the two other System Messages
   logged:

> Jun 2 17:23:37 jupiter unix: ksyms: too many open references.
> Jun 9 12:56:07 jupiter nis_cachemgr: nis_cast: t_bind: Not enough space

   are non-Problems, namely:
   
   a) ksyms: Judi <cjudi@sprint.net> told me that the Kernel probably
      was unable to free Inodes, and that I might want to reboot and
      remount to cure it. Well, the Messages (from one Week ago) didn't
      repeat in the first Place, and today's Crashes certainly gave
      the Machine a good Deal of Reboots :-}, so I'll leave it at that.

   b) nis_cast: The Message seems to indicate Memory Allocation Problems,
      and the Machine *was* running low on Memory because of the principal
      Problem at that Time, so ... Judi related these to the bootparams
      Service under NIS, Problem is that I don't run the former. ;-)

4. Thanks to:
        Ryan Matteson <ryanm@accn.org>
        Horst Scheuermann <scheuerm@rzsun08.uni-trier.de>
        Steve Kay <stevekay@hotmail.com>
        Frank de la Torre <ntlinux@hotmail.com>
        Judi <cjudi@sprint.net>
        Alex de la Salle <adelasalle@hotmail.com>
   and anyone whose Reply I receive when this Mail is already out.
   Other interesting Suggestions collected from the abovementioned
   Replies:

5. Ryan suspected a Disk filling up. Well, the Disks on that Machine
   ARE more or less full, but that hasn't been a Problem for the OS
   so far, only for the Users. ;-)

6. There's a Memory Leak "in the callbparams Area" of 2.4, possibly
   related to the Problem, as per BugId 1130741. Fixed in 2.5.

7. Frank first suspected the Network Adapter, more precisely, BugIds
   1209096 and 1214439 (which refer to SUN's quad Ethernet Card).
   (The Machine in Question is an SS10 Clone with the normal LANCE
   Ethernet (le0).)

8. Steve suspected Keyboard I/O (alas, nobody touched the Keyboard
   between Crashes) and advised me to debug the STREAMS Modules using
   'strconf' and 'strace'.

9. Judi pointed me to 'ps -elc' and the appropriate adb Incantation
   to trace Memory hogging Processes, Problem is that it wasn't a
   *Process* eating RAM ...

10.Alex located /usr/include/sys/strsubr.h as the Header File relating
   to callbparams_free_id(), and advised me to check Files related
   to Oracle Streams (set up in /etc/system) for Corruption. However,
   there's no Oracle or /etc/system Magic around so far.

11./var/adm/messages Excerpt as included in the original Request:

> Jun 2 17:23:37 jupiter unix: ksyms: too many open references.
> [25 more such Messages]
> Jun 2 17:24:56 jupiter unix: ksyms: too many open references.
> [One Week later ...]
> Jun 9 12:18:25 jupiter unix: callbparams_free_id: not found
> Jun 9 12:20:12 jupiter last message repeated 425 times
> [Freeze 1; Powercycle]
> Jun 9 12:55:52 jupiter unix: callbparams_free_id: not found
> Jun 9 12:55:58 jupiter last message repeated 24 times
> Jun 9 12:55:59 jupiter unix: callbparams_free_id: not found
> Jun 9 12:56:07 jupiter last message repeated 33 times
> Jun 9 12:56:07 jupiter nis_cachemgr: nis_cast: t_bind: Not enough space
> Jun 9 12:56:07 jupiter nis_cachemgr: nis_cast: t_bind: Not enough space
> Jun 9 12:56:07 jupiter unix: callbparams_free_id: not found
> Jun 9 12:57:00 jupiter last message repeated 212 times
> [Freeze 2; Stop-A, "reset". Watched 'ps' and 'top' for a
> While, SOMETHING is eating Mem like crazy, but I can't
> identify the Culprit ...]
> Jun 9 13:21:32 jupiter unix: callbparams_free_id: not found
> Jun 9 13:22:06 jupiter last message repeated 137 times
> Jun 9 13:22:06 jupiter unix: callbparams_free_id: not found
> Jun 9 13:22:27 jupiter last message repeated 85 times
> Jun 9 13:22:28 jupiter unix: callbparams_free_id: not found
> Jun 9 13:22:53 jupiter last message repeated 86 times
> Jun 9 13:22:54 jupiter unix: callbparams_free_id: not found
> Jun 9 13:23:52 jupiter last message repeated 232 times
> Jun 9 13:23:53 jupiter unix: callbparams_free_id: not found
> [Freeze 3; Stop-A, "reset". Users give up, Machine
> seems somewhat stabilized so far ...]

Thanks again,
                                                                J. Bern

-- 
  /\  /""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""\
 /  \/ bern@uni-trier.de    (Size Limit!)   | P.O. Box 1203 | Ham:  \/\
/ J. \ bern@ti.uni-trier.de (SUNAttachm.OK) | D-54202 Trier | DD0KZ /  \
\Bern/ No Finger etc.; Use Mail (Subj. "##" for Autoreply List) and \  /
 \  /\ WWW. /\/
  \/  \____________________________________________________________/



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:41 CDT