Original message at end. Bottom line. The combination: SunRay Server Software 1.3 Solaris 8 Release 10/00 Solaris 8 Patch 109077 (patches included in recommended cluster) Solaris 8 Patch 111302 Is not compatible. Won't work. Will fail. SunRay Server Software 1.3 apparently requires Solaris 8 Release 4/01 or later and preferably Release 7/01 or later (although I was running 10/00). Patch 109077, updates dhcpd and it's configuration, on which SunRay Server Software is dependent. This Patch precipitated the failure. Furthermore, this patch has a bunch of dependencies, and the instructions recommend that you NOT try to uninstall it. So, I seemed to be basically stuck as far as simple solutions were concerned. I tried doing an upgrade install of Solaris 8 Release 2/02. Freakishly, I had a disk drive failure during the install. So, find an unused disk drive, partitian it to match, go to backup tapes for recovery, and punt on the upgrade for now. Unfortunately, when I went back to the January full backup, I found it had the same failure. On checking the patches, I found it had an earlier version of 109077. On checking my records for patching, rebooting, and backups, I found I had not rebooted since before that patch cluster. If I had, the system would have failed back then. So, go back even further on my full backup tapes and recover again. This worked, but then I had a couple of months of fixes and changes on that server that I had to repeat. Fortunately, it wasn't too much. Anyway, I'm back up and running, and next time there is a break on campus and I can schedule some official down time, I'll try the upgrade to Solaris 8 Release 2/02 and SunRay Server Software 2.0 (that combination works). --------------- Bloody Details for those who care --------------- After having gone through this "bare metal" recovery, I now have some changes I will make in my backup procedures. More on that after these details. Since this server had no tape drive, I do my backups to a tape drive on another server. So that added to my difficulties a little. I had to go through: reboot/shutdown/init and then "stop-a" to get to ok prompt insert CD 1 of 2 of Solaris 8 software ifconfig hme0 129.117.162.215 netmask 255.255.255.0 broadcast 129.117.162.254 up ping 129.117.162.133 Since I'm booted from CD, I don't have my user accounts and profiles, so I have to get the machine at 133 to let me in as root. I have rshd on that machine already open and covered by tcp_wrappers to allow in only my server that have no tape drives. Now I had to add a /.rhosts file for root. It had to have DNS names for reverse lookup. When I started by trying just the IP address, it didn't work. 129.117.162.215 + sunrayserver + sunrayserver.mydomain.edu + Then I'm all set to do my recovery from the other server's tape drive. newfs /dev/rdsk/c0t0d0s0 mount /dev/dsk/c0t0d0s0 /a cd /a ufsrestore rvf 129.117.162.133:/dev/rmt/0n ls cd .. umount /a repeat above for each partitian required and on tape in sequence. Then do a: installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \ /dev/rdsk/c0t0d0s0 The documentation said to do "pboot", but I found that in that directory the only file was "bootblk". When I rebooted, it worked. I did a "uname -i" to see what it returned (SUNW,Ultra-4) and looked down through the directories. --------------- Changes to my procedures --------------- I use a script to generate an informational file that I call a label and then write it out as the first item on the tape when I do backups. Thus, when I pick up a tape, I can pull off that first file with an interactive ufsrestore and see what I put on the tape and what the machine it came from was like. My label looks like: <Label> Amen-ra-02Dec2003-t1 Tue Dec 2 09:36:17 EST 2003 Library Information Systems & Technology Services W.E.B. Du Bois Library University of Massachusetts (413) 545-0074 ------------ Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t0d0s0 15344171 539601 14651129 4% / /proc 0 0 0 0% /proc fd 0 0 0 0% /dev/fd mnttab 0 0 0 0% /etc/mnttab /dev/dsk/c0t2d0s3 15346527 1063952 14129110 8% /var swap 4511248 16 4511232 1% /var/run swap 4614696 103464 4511232 3% /tmp /dev/dsk/c0t0d0s5 1018191 230495 726605 25% /opt /dev/dsk/c0t3d0s7 11214644 4813822 6288676 44% /usr/local /dev/dsk/c0t3d0s6 4131866 1719578 2370970 43% /export/home /dev/dsk/c0t0d0s1 1018191 247358 709742 26% /usr/openwin /proc 0 0 0 0% /var/opt/SUNWbb/root/proclocalhost:(cifsBrowse)browser 10 10 0 100% /CIFS /tmp/SUNWut/sessions 4614696 103464 4511232 3% /var/opt/SUNWbb/root/tmp/SUNWut/sessions /tmp/SUNWut/units 4614696 103464 4511232 3% /var/opt/SUNWbb/root/tmp/SUNWut/units </label> I thought that would be more or less totally adequate. However, it was not as easy as it should have been to get the information I needed. I am changing my script to make this label more informative by including prtvtoc for each of the drives included in the backup, and a "cat" of /etc/vfstab and the backup script, as well as the "df -k" that I have been putting there. That will give me all the information I need to replace and repartition a failed drive as well as recovering from hacks or software failures when I have intact partitions to recover to. --------------- Chris Hoogendyk - O__ ---- Network Specialist & Unix Systems Administrator c/ /'_ --- Library Information Systems & Technology Services (*) \(*) -- W.E.B. Du Bois Library ~~~~~~~~~~ - University of Massachusetts, Amherst <choogend@library.umass.edu> --------------- -------- Original Message -------- Subject: SunRay server failure Date: Mon, 01 Mar 2004 22:39:38 -0500 From: Chris Hoogendyk <choogend@library.umass.edu> To: Sun Managers <sunmanagers@sunmanagers.org> E450, Solaris 8, SunRay Server Software 1.3, 20 SunRay1's in Restricted Access Mode. Last Friday I did the latest Recommended and Security patches. Last done a little over a month ago. This morning I rebooted the server around 7am. Mid afternoon today, my SunRays started failing. First two were hung waiting for DHCP. I tested by recycling my SunRay (logging out) before going down to look. It started a new session just fine. I used a fluke to test the connection for the failed SunRays and could not get a DHCP. I went straight to the switch port with the fluke to eliminate any intervening wiring questions. No DHCP. I thought perhaps it was a problem with the switch vlans (CISCO). This evening I got a call that all the SunRays were failing. I connected to the server. Looking at the messages file from the SunRay web admin interface, I found the following error sequence repeated over and over for one SunRay or another: Mar 1 17:17:22 amen-ra-01 utauthd: [ID 639584 user.info] Worker2 NOTICE: whichServer pseudo.080020c0c454: Mar 1 17:17:22 amen-ra-01 utauthd: [ID 641787 user.info] Worker2 NOTICE: CLAIMED by StartSession.m3 NAME: pseudo.080020c0c454 PARAMETERS: {_=1, rawId=080020c0c454, terminalIPA=192.168.128.61, startRes=1152x900, state=disconnected, initState=0, fw=1.3_12.c_111891-05,REV=2002.05.10.11.53,Boot:1.3; 1999.11.29-09:58:55-GMT, pn=34583, rawType=pseudo, sn=080020c0c454, tokenSeq=1, event=insert, id=080020c0c454, cause=insert, hw=SunRayP1, type=pseudo, namespace=IEEE802} Mar 1 17:17:22 amen-ra-01 utauthd: [ID 388005 user.info] Worker2 NOTICE: CONNECT IEEE802.080020c0c454, pseudo.080020c0c454, all connections allowed Mar 1 17:17:22 amen-ra-01 utauthd: [ID 475121 user.info] Worker2 NOTICE: SESSION_OK pseudo.080020c0c454 Mar 1 17:52:49 amen-ra-01 utauthd: [ID 794400 user.info] SessionManager0 NOTICE: EMPTY: ACTIVE session Mar 1 17:52:49 amen-ra-01 utauthd: [ID 716730 user.info] Terminator NOTICE: DISCONNECT IEEE802.080020c0c454, pseudo.080020c0c454 session terminated Mar 1 17:52:49 amen-ra-01 utauthd: [ID 190098 user.info] Terminator NOTICE: DESTROY pseudo.080020c0c454 lifetime=2127277 Mar 1 17:52:49 amen-ra-01 utauthd: [ID 927710 user.info] SessionManager0 NOTICE: TERMINATE: inactive session followed by this: Mar 1 18:55:25 amen-ra-01 utauthd: [ID 699394 user.info] Worker3 NOTICE: SESSION_OK pseudo.080020c0c454 Mar 1 19:26:34 [192.168.128.176.2.2] 0x0.0x42e1b9 8:0:20:f9:68:97 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:40 [192.168.128.174.2.2] 0x0.0x42ee63 8:0:20:c0:c5:ea Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:40 [192.168.128.61.2.2] 0x0.0x42ee60 8:0:20:c0:c4:54 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:40 [192.168.128.160.2.2] 0x0.0x42ee2e 8:0:20:b9:66:d7 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:40 [192.168.128.56.2.2] 0x0.0x42ee42 8:0:20:c1:c:44 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:41 [192.168.128.62.2.2] 0x0.0x42ee68 8:0:20:e7:b5:8c Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:42 [192.168.128.171.2.2] 0x0.0x42ef11 8:0:20:c0:bd:f2 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:43 [192.168.128.175.2.2] 0x0.0x42efda 8:0:20:f2:47:7a Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:45 [192.168.128.177.2.2] 0x0.0x42ee4b 8:0:20:f5:76:76 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:45 [192.168.128.177.2.2] 0x0.0x42ee4b 8:0:20:f5:76:76 Kernel: 0x665830-0x6658b7: 0x4a1c8 backtrace_me+0x24(...) Mar 1 19:26:45 [192.168.128.177.2.2] 0x0.0x42ee4b 8:0:20:f5:76:76 Kernel: 0x6658b8-0x66592f: 0x493fc panic+0x4c(...) Mar 1 19:26:45 [192.168.128.177.2.2] 0x0.0x42ee4b 8:0:20:f5:76:76 Kernel: 0x665930-0x6659ef: 0x55334 AutoRenewDHCP+0x18c(...) Mar 1 19:26:45 [192.168.128.177.2.2] 0x0.0x42ee4b 8:0:20:f5:76:76 Kernel: Top: 0x44cc4 proc_spawn_pid+0x3cc(...) Mar 1 19:26:46 [192.168.128.179.2.2] 0x0.0x42f0de 8:0:20:f0:fd:60 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:46 [192.168.128.68.2.2] 0x0.0x42ee7d 8:0:20:f5:73:4 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:47 [192.168.128.33.2.2] 0x0.0x42ee82 8:0:20:f9:69:aa Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:50 [192.168.128.162.2.2] 0x0.0x42f238 8:0:20:b6:1:69 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:26:57 [192.168.128.69.2.2] 0x0.0x42f253 0:3:ba:d:99:f4 Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart Mar 1 19:30:16 amen-ra-01 utauthd: [ID 607465 user.info] Worker3 UNEXPECTED: Terminal.readMesages: java.net.SocketException: Connection reset by peer Mar 1 19:30:16 amen-ra-01 utauthd: [ID 181342 user.info] Worker3 NOTICE: DISCONNECT IEEE802.080020f96897, pseudo.080020f96897 destroy Mar 1 19:30:16 amen-ra-01 utauthd: [ID 791169 user.info] Worker3 UNEXPECTED: during send to: java.net.SocketOutputStream@1bca4f error=java.io.IOException: Broken pipe Mar 1 19:30:16 amen-ra-01 utauthd: [ID 151315 user.info] Worker3 NOTICE: DESTROY pseudo.080020f96897 lifetime=43693338 Mar 1 19:30:24 amen-ra-01 utauthd: [ID 607465 user.info] Worker3 UNEXPECTED: Terminal.readMesages: java.net.SocketException: Connection reset by peer Mar 1 19:30:24 amen-ra-01 utauthd: [ID 667050 user.info] Worker3 NOTICE: DISCONNECT IEEE802.080020c0c454, pseudo.080020c0c454 destroy Mar 1 19:30:24 amen-ra-01 utauthd: [ID 118975 user.info] Worker3 UNEXPECTED: during send to: java.net.SocketOutputStream@11bee50 error=java.io.IOException: Broken pipe Mar 1 19:30:24 amen-ra-01 utauthd: [ID 669981 user.info] Worker3 NOTICE: DESTROY pseudo.080020c0c454 lifetime=2099658 Mar 1 19:30:28 amen-ra-01 utauthd: [ID 607465 user.info] Worker3 UNEXPECTED: Terminal.readMesages: java.net.SocketException: Connection reset by peer Rebooting the server accompolished nothing. From the web admin interface for the SunRay Server Software, restarting the service gave the following in the messages file: Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] Restarting SunRay services Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] stopping authentication manager Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting session manager Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting device manager Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting printer service Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting serial service Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] # Using local policy Mar 1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting authentication manager Mar 1 22:11:07 amen-ra-01 utauthd: [ID 396523 user.info] main NOTICE: SmartCardConfigData: LDAP contains no smartcard configuration files Mar 1 22:11:07 amen-ra-01 utauthd: [ID 253120 user.info] main NOTICE: SmartCardConfigData: read 17 smartcard configuration files from directory file: /etc/opt/SUNWut/smartcard/probe_order.conf Mar 1 22:11:08 amen-ra-01 utauthd: [ID 353254 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/Payflex-All.cfg: 237 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 762100 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/MondexMM2.cfg: 89 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 192469 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/JavaBadge.cfg: 144 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 582636 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/OpenPlatform.cfg: 144 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 462772 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/CyberflexAccess.cfg: 104 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 240283 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/ActivCardGold.cfg: 100 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 783214 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/GEMPLUS-MPCOS.cfg: 145 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 522253 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/GEMPLUS-MPCOS-3DES.cfg: 124 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 658487 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/GEMPLUS-GPK4000.cfg: 138 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 293412 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/PKCS15.cfg: 106 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 313178 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/SpanishUniversity-TIBC.cfg: 98 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 486291 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/GD-SMARTCAFE.cfg: 74 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 858185 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/GD-STARCOS.cfg: 74 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 884807 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/BullTB.cfg: 114 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 163784 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/MondexUNU.cfg: 67 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 524863 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/Cryptoflex.cfg: 144 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 651628 user.info] main NOTICE: SmartCardConfigData: /etc/opt/SUNWut/smartcard/UnknownCard.cfg: 63 tokens processed Mar 1 22:11:08 amen-ra-01 utauthd: [ID 723974 user.info] main NOTICE: Loaded module /opt/SUNWut/lib/modules/StartSession.m0 Mar 1 22:11:08 amen-ra-01 utauthd: [ID 612231 user.info] main NOTICE: Loaded module /opt/SUNWut/lib/modules/Authxlation.m1 Mar 1 22:11:08 amen-ra-01 utauthd: [ID 709793 user.info] main NOTICE: Loaded module /opt/SUNWut/lib/modules/ServerSelect.m2 Mar 1 22:11:08 amen-ra-01 utauthd: [ID 723977 user.info] main NOTICE: Loaded module /opt/SUNWut/lib/modules/StartSession.m3 Mar 1 22:11:08 amen-ra-01 utauthd: [ID 723978 user.info] main NOTICE: Loaded module /opt/SUNWut/lib/modules/StartSession.m4 Mar 1 22:11:08 amen-ra-01 utauthd: [ID 745985 user.info] main NOTICE: 5 authentication modules loaded. Mar 1 22:11:08 amen-ra-01 utauthd: [ID 826448 user.info] deviceManager0 NOTICE: DeviceManager.getDeviceManager: Initiate callback to utdevMgrd at localhost:7011 Mar 1 22:11:08 amen-ra-01 utauthd: [ID 914482 user.info] deviceManager0 NOTICE: DeviceManager.initiateCallback localhost:7010 established communication Mar 1 22:11:29 amen-ra-01 policy[1484]: [ID 702911 user.info] TIMEOUT!!! Mar 1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info] Mar 1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info] amen-ra-01: Restarting servers... messages will be logged to /var/opt/SUNWut/log/messages. Mar 1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info] amen-ra-01: ERROR: Service reset failed. Host unreachable. Mar 1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info] I'm really at a loss and I have some critical service points down. Any help would be greatly appreciated. My server having paniced, myself having paniced, I'm now going to crash. I'll look at this again and any replies at 7am EST. TIA --------------- Chris Hoogendyk - O__ ---- Network Specialist & Unix Systems Administrator c/ /'_ --- Library Information Systems & Technology Services (*) \(*) -- W.E.B. Du Bois Library ~~~~~~~~~~ - University of Massachusetts, Amherst <choogend@library.umass.edu> --------------- _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagers _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Mon Mar 8 17:38:56 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:26 EST