SUMMARY: Message Queue full and App. fail

From: Joohyun Cha <>
Date: Mon Jan 17 2005 - 01:46:00 EST
Hi. Managers.

Thanks for you quick and kind advice especially..
  Paul Wilkinson
  John Leadeham

My original post is at the end of message.

Although I couldn't find exact answer yet, one thing I can sure is that
there is no OS's bug about message queue. Below is a part of Paul's messages.

> Sun are pretty open about OS bug's, search
> Message queues have been around in Unix for so long that I'd have
> thought the code involved is well tested and stable, I'd be surprised
> if the OS is at fault here. I've programmed apps using message
> queues without such problems.

I pretty agree with him. I will try to find any bug of application more 
carefully not the OS's.

Thanks all.

I asked.
Hi. Managers

I've problem with my servers. These servers have E3500 with 8 CPUs (Sol 7)
and SF4800 with 8 CPUs (Sol 8). Several Application (not common application
such as RDBMS, Web, WAS etc. They are all developed by our programers) is
running on them. All the Application program is consist of several independant
processes and using several message queue to IPC.
The symptom is...
 1) all of sudden some message queue is filled up fullly and msgsnd() of
    sending process failed. Thus Application program failed. I noticed
    that using 'ipcs'
 2) system is slow down. CPU's %sys is over 50%. by sar, vmstat (normal
    system load is %sys < 20)
 3) msgrcv() of receive process is returned very slowly. msgrcv() is called
    with IPC_WAIT.

Same problem occurs on systems
 1) is running on over 200days
 2) across on nealy all platform and OS version. (E3500 with Sol 7 and SF4800
    with Sol 8)
 4) has no system log messages. (/var/adm/messages)
 5) has process using message queue

The problem is gone with system reboot. I don't know what reason occurs this.
I want to know If Application is assumed properly programed, is there any
kernel bug related to this issue? I wonder why this problem is gone with
system reboot Do I need to reboot the server periodically to prevent this
problem? Is there any recommandation about system reboot from Sun?

Please, May I take your bright ideas?
Thanks in advances.
main(){int a=122,j=11;while(a>-50){a=a>0?a:111;printf("%c",a);a=j==49?46:a-j;
sunmanagers mailing list
Received on Mon Jan 17 01:46:29 2005

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:42 EST