Hi All, Many thanks for the responses I've gotten (in no particular order) from: Paul Roetman, Hichael Morton, Jay Lessert and also Vadim Carter (of AC&NC, HWVendorCo for the Jetstor disk array). Please see the end of this email for text of replies. Bottom line // range of suggestions include: -> there should *not* be an OS limitation / cache issue causing the observed problem ; folks report manipulation of large (90+gigs) files without observing this type of problem. -> In future, as a workaround, request that DBA do dumps to "many small files" rather than "one big! file". This is apparently possible in Oracle8 (although not as easy as it used to be in Ora7, I'm told?) and is a decent workaround. -> Possibly, depending on data-type of tables being dumped, subsequent (or inline... via named pipes) compression using gzip MAY result in oradmp files that are smaller / more managable. [alas in my case the large table being dumped has very dense binary data that compresses poorly]. -> Confirm performance of system for small file backup NOW? (yes - it was OK) ; that filesystem wasn't corrupt (it is "logging" and fsck'ed itself all OK / quickly after freeze-crash-reboot of yesterday AM) ; that large file isn't corrupt (believed to be OK since fsck was OK) However, it gets "better". I did more extensive digging on google / sunsolve using "cadp160" as the search term, since this was cited in a message logged shortly before the system froze-hang yesterdayAM (when loading began to pickup on MondayAM as users came in to work). What I've learned is **VERY GRIM**, assuming I can believe it all. ie, -> CADP160 driver [ultra160 scsi kernel module driver] on SolarisX86 has a long history of being buggy & unreliable especially at times of significant load to the disk. This can result in such a fun range of things as data corruption, terrible performance, freeze/reboots, etc etc. There are entries in sunsolve which date back to '2000 and as recent as May/31/03 which are in keeping with these problems, including such things as: BudID: Description: 4481205 cadp160 : performance of cadp160 is very poor 4379142 cadp160: Solaris panics while running stress tests -> there is a "rather interesting" posting I located via google which appears to have been made by someone who claims to be the original developer of a low level SCSI driver module commonly used in Solaris // which is the basis of many other such drivers subsequently developed (Bruce Adler, driver is GLM). If this posting is true, it suggests that Sun has known about this problem with CADP160 for quite a long time ; that it came about for absurd reasons, and that it is quite disgusting that it remains unresolved. And .. IFF this story is true, then it certainly suggests that the cadp160 driver needs to be rewritten from scratch, and that until this happens, it should **NEVER** be anywhere near a production server. For anyone interested in the details, the (long) posting / sordid tale is available at the URL, http://archives.neohapsis.com/archives/openbsd/2002-02/0598.html So. As a temporary workaround, I believe I'll add an entry to /etc/system reading "exclude: drv/cadp160" - which should force the use of the older (apparently more reliable) cadp driver - albeit at non-ultra160 performance, but hopefully infinitely more stable / less buggy. After making this change I'll be doing some trivial tests (ie, attempt to copy the 130gig file between slices ; re-initiate the netbackup job) -- and observe the performance and iowait loading. My expected / hoped-for result will be better performance / less iowait loading. I hope this summary is of some use to others. IF in the unlikely even anyone from Sun reads this, I would encourage you to try to inquire about when the cadp160 driver redevelopment will begin :-) Thanks, Tim Chipman ====paste====original text of replies====== ... have you thought about compressing the file on the fly - with database exports, generally get over 90% compression. Create a bunch of pipes (eg file[0-20].dmp ), and a bunch of gzip processes gzip < file0.dmp > file0.dmp.gz & then export the database to to the pipes... exp ... file=(file0.dmp, file1.dmp, .. ,file20.dmp) \ filesize=2147483648 .... that way you end up with a bunch of 200 meg compressed files to backup, and even if you do uncompress them, they are smaller than two gig. I have a script that generates all this on the fly, and cleans up after itself if you are interested. note: import can be done straight from the compressed files using the same pipe system! Cheers ---------------- Have you confirmed ~12MB/s *now* with a 10GB file in the same file system as your 100+GB file? ... Do you have any interesting non-default entries in /etc/system? I've manipulated 90GB single files on SPARC Solaris 8 (on vxvm RAID0+1 volumes) with no performance issues. Are you positive the RAID5 volume is intact (no coincidental failed subdisk)? ... You *could* try bypassing the normal I/O buffering by backing it up with ufsdump, which will happily do level 0's of subdirectories, if you ask. Not very portable, of course. ------------------ ... the dba should be able to split the file into smaller files for backup. ... _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Wed Jul 9 10:53:14 2003
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:16 EST