SUMMARY: Large NFS file server

From: doug@seismo.gps.caltech.edu
Date: Fri Nov 30 1990 - 13:41:09 CST


Last week I posted the following question:

> On rather short notice, I may have to specify a LARGE file server to
> provide NFS service to a network of Suns (and Vaxen running TGV Multinet
> NFS). This file server will need to archive and make available
> 100+ GB of "online" disk space.
>
> I know about Epoch's family of jukebox systems with its hierarchy of storage
> (normal disk, r/w magneto-optical jukebox, and WORM jukebox). Is there
> anyone else selling such a hardware/software solution to providing vast
> amounts of "online" storage? Any other suggestions? The delays associated
> with the Epoch approach of staging data from slow-speed media to high-speed
> media is OK, but since this would be an operator-less environment, the
> ability for all of the data to appear "online" is important.
>
> In general, the bulk of the data would be written once and read many time --
> i.e. it will be used as an archival and retrieval system.
>
> Also, if anyone is using a database system (such as Oracle) to manage access
> to data on such as system, please let me know if it is possible to use
> databases on remote NFS filesystems.
>
Summary responses:
 
The various options that were suggested fell into 4 categories:
1. Large real disk farms
2. Virtual disk farms using hierarchical storage (magentic and optical)
3. Jukeboxes with non-virtual disk software, or "roll-your-own" software.

In more detail:

------------------------------------------------------------------------
1. Large real disk farms

A. Auspex
Auspex large NFS servers with all data on disk. Unfortunately, the largest
Auspex configuration curretly supports only 60 gig, which isn't quite what
you need. If you are willing to buy two servers, you should consider this
option.

Auspex's real forte is servicing large numbers of NFS requests -- over 1000
average mix NFS requests per second. If your application requires large
amounts of data, but relatively low request rates, then an Epoch system may
be a good match. If you have lots of clients all accessing the data in
parallel, then an Auspex solution probably makes more sense.

B. Convex
Convex C2's have as an option what we call the IDC -- this is an IPI
controller that sits directly on an I/O channel, and will run 32 drives
chained on several channels. I'm not sure how many IDC's will fit into
a machine, but I imagine that your capacity could be approached. That's
about all I know about that product, although I'm sure that if you call
Convex at (214) 497-4000, they'll be happy to connect you to
someone with a tie who can give you the official scoop.

Convex have a system called EMASS which it is claimed can handle between 1
an 1000 Tb (yes Terabytes) of data. It can use 3480 cartridges or Exabytes
to store the data, with a robot-controlled arm to load the things. They claim
30 secs to access the tape (of course you then have to find the data you want
on that tape!). I am not sure if they have optical disk as intermediate
storage.

C. Sun
No one explicitly suggested this, but the same thing could be accomplished
via multiple Sun servers with disk farms.
------------------------------------------------------------------------
2. Virtual disk farms using hierarchical storage (magentic and optical)

A. Epoch
My organisation has just purchased an Epoch for use with SUN systems. We
haven't yet had it installed, but the purpose of the procurement was like
yours to cope with very large (tens of Gb / annum) volumes of data. We
investigated 'conventional' data storage methods but they were not at all
cost-effective compared with the Epoch. Beware that tapes written on the
Epoch will probably NOT be readable elsewhere, since their tape management
system puts its own format labels on. (I am not sure how data is written
beyond the label). EPOCH seems to be great as a back-end fileserver, but
not really as an operating system server - in fairness Epoch went to lengths
to point this out before our purchase. Our people who will be running the
thing suspect that the way it will be used will in fact be to bring a few
hundred Mb at a time of 'live' data back to users' 'own' local disks, let
them work on it, and then push it back to the Epoch for storage. Maybe we'll
know more after a few months experience. To summarise - Epoch looks good
for data storage, but you need something else to hang OS and applications
s/w off. Also although it is the first coherent product of this type (that
we know of anyway) there are others on the way, and I guess it's possible
that they will become the 'standard' not Epoch. However our impressions were
favourable, and I hope the reality will be as good!

The advantage of the Epoch system is that all the data DOES appear
on-line. Our users have no problem with the latencey, because they're
used to it. We've had a system for over 2 years, and with proper
maintenance on the jukebox (specifically the drive) it's pretty
reliable.

B. R-Squared
I have just gotten a flyer from the R-Squared company on a new jukebox
system they have. It looks very interesting and in the same price range as
an Epoch. An advantage of the R**2 system is that it can deal with files
larger than 300M as individual files can span optical media platters. It
also appears that the overall performance of the R**2 system would be a
little slower than the Epoch because it is not a multi-tier system, but it
does use a big SCSI disk as a cache. Then again it may make up for that in
that it attaches directly to your existing Sun system and doesn't have to be
a slow Moto processor. I think that all that R**2 has is a 20Gig juke
whereas Epoch can have up to 5 jukes that range in size from 20Gig to 300Gig
each. New jukes are on the horizon though to change all capacities to
bigger numbers.

C. HP
HP has a MO jukebox based archiving system which can have some large
standard drives hooked up to it to provide much of the same functionality.
------------------------------------------------------------------------
3. Jukeboxes with non-virtual disk software or "roll-your-own" software.

I also have a flyer from Apunix on a juke system that they have. I don't
have details on it though.

I just got literature from Pinnacle describing their 'REO-36000' erasable
optical disk auto-changer which holds up to 56 5.25" optical disks (36 Gbytes).
data transfer rate 7.4Mb/s, ave seek 65 ms., disk change 11.5s.
Price is $50k. (I'm just reading their spec sheet, no other knowledge)
        Pinnacle Micro
        15265 Alton Parkway
        Irvine, CA 92718
        (800) 553-7070
        in CA: (714) 727-3300

At a recent show, I saw a vendor that a package that added new file
system software and an opticial disk to turn a standard Sun server into
something like an Epoch system. (I'm afraid I don't have any
literature on that, so I can't give you more details.)

I just went through this exercise for the Mars Observer GRS DB here at LPL.
There are several others following Epoch's lead but if it's short notice
that's driving you, I'd stick with them. Other names that come to mind are
Qstar, Xedaco(sp?) & such... I'd be very interested in hearing the details
of your requirements the HFS has some very attractive characteristics in the
operatorless envireonment you describe.

Zetaco.... 800-423-3020
NetStore... I haven't checked on them either.
Qstar... 301-564-6006

[ Qstar is sending me literature, but from my phone conversation, it sounds
like their software provides tape emulation for an optical jukebox. ]

Check out Computer Upgrade Corp. in Anaheim CA. They have an optical jukebox
with a 90GB capacity max with options for MO and WORM in the same box. Their
number is (714) 630-3457.

------------------------------------------------------------------------
Regarding use of a database on a NFS file server:

Re ORACLE - beware that you CANNOT use the EPOCH as a database machine.
We had hopes of doing this, but apparently ORACLE needs to live on
a proper physical disk partition. I think it can't even access data on
an NFS-mounted partition, but I am not sure about that. The best
config seems to be to have the actual ORACLE RDBMS on a 'central' system
and use SQL*NET products on all workstations that need access to that data.
As I understand it otherwise you will need to pay for an RDBMS licence on
all the workstations you are actually storing data on. Best to check with
ORACLE themselves re all of this.

------------------------------------------------------------------------
Thanks to:
        "Anthony A. Datri" <convex!datri@uxc.cso.uiuc.edu>
        trinkle@cs.purdue.edu
        Robert L Krawitz <rlk@think.com>
        rackow@antares.mcs.anl.gov
        pbg@cs.brown.edu (Peter Galvin)
        George Young <young@vlsi.ll.mit.edu>
        "Mark D. Baushke" <mdb@esd.3com.com>
        rick@wiau.medical-biophysics.manchester.ac.uk
        elroy!ames!claris!voder!nsc!dtg.nsc.com!levine@csvax.cs.caltech.edu (David LeVine)
        "Dr. Phil Stanford, NCS Local Support, Wormley." <PNS@ibma.nerc-wormley.ac.uk>
        auspex!hitz@uunet.uu.net (Dave Hitz)
        jgotobed@hindmost.lpl.arizona.edu (Joe Gotobed x4549)
        odt@base.bellcore.com

----------------------------------------------------------------------------
Doug Neuhauser Div. of Geological and Planetary Sciences
doug@seismo.gps.caltech.edu California Institute of Technology
818-356-3993 MS 252-21, Pasadena, CA 91125



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:00 CDT