I recently posted an enquiry regarding the pro's and cons of using
disk arrays as opposed to disk packs in a mass storage scenario (some
60GB in a full text search engine environment). The enquiry stems from
the need for this sort of space but with the age old problem of not
enough pennies to spend. The question was how much would I sacrifice
in performance etc by choosing the cheaper route i.e. disk packs as
opposed to say a Sparc Storage Array.
I got a number of very informative replies which I think would be
worth making available to others so here they are.
Incidentally I have decided to push for the storage array and now
await the management decision :-)
------------------------------------
I have been buying my disk drives from a company in US for many years.
They do a lot of firmware tweaking for disk drives and are very
experienced.
I am shortly going to buy one of their in house RAID systems when fund
permit
The price/specs are:-
Wide SCSI Caching RAID Array controller with 32 Mb of cache 18 5.31 Gb
disks --> 80 Gb before filesystem
Installed and test in enclosure for $40,000 approx
22,500 Pounds
Notes:-
This is price in Houston, shipping to UK needs to be costed
Can probably get a SBus Wide SCSI card from them to go into ULTRA-1
(ULTRA-2 is wide SCSI)
You can contact them in Houston Texas on 001-713-467-7273
Britton Dick is my salesman
Regards
Rob
Robert Gillespie Robert.Gillespie@waii.com
Geoscience Systems Administrator, Tel: +44 181 585 4060
Western Atlas International Inc., Fax: +44 181 231 7260 455
London Road, Isleworth, Telex: 24970 WESGEO G
Middlesex, TW7 5AB, England.
----------------------------------
Stuart,
We have finished looking into this same issue.
The basics show that raid 5 is probably the fastest for disk
thru-put(especially writes). Disk mirroring is the most reliable since
you always have doups and can usually do offline backups and keep your
data available. The write thru-put is where you take the performance
hit, however reads can sometimes be faster since on some mirroring
software you can have it read from both mirrors and see who gets the
data back first.
I suggest you look into Connely raid systems and Sun's Solstice Disk
Suite (previously ODS) for mirroring. Both are relativly easy to use.
Robert.
-------------------------------
I'm running an Oracle system on an Ultra 1-170 with 256Mb RAM. This
has the Sun 24Gb disk array - NOT A SPARC STORAGE ARRAY - this is
6x4.2Gb fast/wide disks - requires a fast/wide SCSI adapter in one of
the Sbus slots. So far, IO hasn't been a problem :-) The box is disk
hot-swappable if required, and with the meta-disk package (comes with
Solaris 2.5.1) can be used as a RAID system if required.
rgds
Stephen
--------------------------------
The default rule for most database applications is that the more disks
available
then the better the performance is going to be. Most databases are
held on RAID systems as these offer the best performance against
safety. This type of system means that the disks which contain the
actual database are seperate from the log
disk which will be heavily used if a lot of transactions are
performed. This will then improve the disk I/O and system performance.
It gives you the ability to 'roll back' to a certain point if an error
has occurred, also you are able to
'lose' a disk temporarily as the information is being 'mirrored'
across several disks (RAID 5) and the system can still run.
If you are not concerned about down time or the safety of the data
then
go for a number of disks under the control of array software such as
DiskSuite. This will then give you the ability to setup and control
the disks how you wish and it also does away with need for an actual
disk array (I think) and brings the cost down. The performance is not
as good as the disks must all be connected
by normal cables so you are limited by the speed of the ports and the
transfer rates (unlike an array where the disks are all housed in one
box with a fibre link to the main server).
---------------------------------
The server edition of Solaris includes DiskSuite, so you can build one
or more striped diskset(s) to increase performance. Of course the host
CPU will have to take care of everything without the
help of the SSA CPU, but functionality will be the same. You should be
able to get higher I/O throughput than without striping, but not as
high as with the SSA, and CPU load will be somewhat higher than it
would be with the SSA.
If you start using RAID5 striping, I think you'll see more of a
performance hit, as parity info will have to be generated. But since
you don't seem to mind having to reload after a crash you could get by
without. If you get RSM shelves you should be able to just add an SSA
controller later.
Just remember: If you build one 60GB file system, a reload after a
disk crash will take 2 days.....
--------------------------------------
You probably should get your local Sun sales rep in to do somw quotes
for you. I suspect that it will actually come out cheaper to do a disk
array for space in this range rather than multiple disk packs. You
will definitely get better throughput with an array since the Sun disk
arrays use a fiberoptic interface rather than a SCSI interface to the
CPU.
--------------------------------------
>From real world expierence, managers tipically say "We need you to
build a 60GB Oracle database system, but you only have X dollars
(where X is an amount
that is WAY to low to even think about starting to build a 10 GB
system).."
I have seen ALOT of money spend in managing, recovering and lost
man-power, down-time and production time due to disk failures and
power supply failures. If those responsable for the money had budgets
for a REAL raid system in the first place, they would have been ahead
of the game.
In the long run you have to ask yourself and your manager the
following questions:
1) How important is this data?
2) How many people will be affected if the data becomes unavailable?
(Disk or power supply failure)
3) How long will a recover take? Worst and best scenario?
4) What is the cost of having people sitting idle for that amount of
time? 5) What is the recover cost?
6) Will you actually get ALL the data back to the point of failure?
7) What is the cost of re-entering data from the time of backup to the
time
of failure?
80% of all managers, in my opinion, severly underestimate all the
above points. They are in a "Computer City" mindset, and allocate
money accordingly. They
do not fully understand the REAL costs mentioned above.
IMHO, for a 60GB database system you want to look at the following:
Ultra Sparc 2 (single processor at first, add second if necessary)
512MB Memory (1 GB would be better)
1 Baydel Raider-5 30GB disk array (Master configuration, 64MB cache) 1
Baydel Raider-5 30GB disk array (Slave configuration)
This is not going to be cheap, but neither is downtime or lost
productivity.
---------------------------------
The storage array is actually *slightly* slower than having a bunch of
disks on a bunch of controllers. It won't hurt you to use MultiPacks.
However, you might not have enough Sbus slots in an Ultra1 for
controllers if you want to put on a lot of disk.
I can't be sure from the description above whether or not the machine
will directly support more than one user (via login or heavyweight net
connection or whatever). If it does, and if you can afford it,
consider a multiprocessor - at least an Ultra 2.
It can really help improve response time for multiple users, or when
the database backend is made up of more than one heavily-used process.
) If I am only after a huge junk of disk space is RAID technology
going ) to provide me with much additional performance. This is
an inhouse
) developement/test system so we don't really have much need for
disk ) mirroring or such, but in the same breath we don't want to
be hit by ) any disk i/o bottlenecks.
You don't have to get too fancy, but yes, striping your disks will
help maximize your IO throughput.
November's SunWOrldOnline has a very good article on this.
http://www.sun.com/sunworldonline/swol-11-1996/swol-11-perf.html
--------------------------------------
When going for this much disk and such a demanding application I would
definitely go for the SSA 100. If you want your data mirrored I would
go for a SSA 214. This way you will never meet with any performance,
upgrade or maintance-problems.
>If I am only after a huge junk of disk space is RAID technology
>going to provide me with much additional performance. This is an
>inhouse developement/test system so we don't really have much need
>for disk mirroring or such, but in the same breath we don't want to
>be hit by any disk i/o bottlenecks.
You will get quite a lot of extra prformance with striped disks (also
called RAID 0), but RAID 5 for example, will slow you down slightly.
----------------------------------------
The biggest problem with multiple disk packs is that you will have a
severe bottle neck on your SCSI interface, unless you were to add SBUS
cards and have many SCSI interfaces, but this would be very expensive
and not have the same return on investment that an array would offer.
A Sparc storage array 110 would be an obvious fit for your needs as it
can accomodate 30 drives x 2MB@ = 60 GB. The SSA has several disk
controllers within it and connects to the host via a Fibre Channel
connection so it is very fast.
You raise a very valid point with regards to mirroring. I am using
Raid 0+1 on my array so I effectively have half the capacity, but I am
willing to sacrifice capacity to ensure higher availabilty. My
database is also nowhere near 60GB and wont be soon so my needs are
different than yours. My point here is that if your database is read
intensive, RAID 0+1 can give better performance than with out any
striping or mirroring.
I'm not a salesman for Sun and there are many good array products out
there that you should look into
If you are dealing with that large an amount of data, I'd think an
array would be worth the investment. Just my $0.02 USD. Good luck
with your decision.
Thanks again to all those who responded
Stuart Burch
(Administrator/developer)
Derwent Information Systems
Email: sburch@derwent.co.uk
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:15 CDT