Here is a summary of the responses that I received about the network job
scheduler - Condor.
I have separated my responses into positive and negative sections. The positives won out.
Thanks to the following for your great responses.
Richard Elling, elling@eng.auburn.edu
Frank Northrup, northrup@cps.msu.edu
Tim Priddy, tpriddy@homrun.intel.com
Pete Biggs, pete@physical-chemistry.oxford.ac.uk
Brian Wong, bwong@East.sun.com
Kevin Sheehan, sundev!fletch!kevin@Sun.COM
Joachim Holzfuss, <hofu%VANGOGH.TH-DARMSTADT.DE@CUNYVM.CUNY.EDU
Gernot Ullrich, gernot.ullrich@Germany.Sun.COM
First of all, Condor will...
"...distribute a number of jobs over a number of
machines at off hours and monitor the queue of jobs to
dispatch another job when a machine finishes."
ken.baer@East.sun.com
Condor can be obatined from
shorty.cs.wisc.edu 128.105.2.8 Condor
NEGATIVE
********
------------------------------
We looked at Condor, but decided not to install it because:
1. no shared libraries allows (monster executables)
2. no signal handling allowed (what about FPEs?)
3. requires local disk storage of indeterminant size
(potentially a major limitation given #1 and the
fact that most of our workstations only have 100 Mb
local disks.)
We would like to have some sort of method for running processes
on unused machines that was elegant, easy to administer, transparent
to the programmer, and works. Unfortunately, I don't think condor
is the solution.
That's my 2 cents.
Richard Elling Manager of Network Support
Auburn University Engineering Administration
relling@eng.auburn.edu KB4HB [44.100.0.72] (205)844-2280
-------------------------------
POSITIVE
********
-------------------------------
I've been running Condor for a couple of months now and generally I'm
pleased with it - it seems to do what it claims, and I've never had any
sign of problems with the programs that it is running. It takes a little
work to get programs working under it, especially fortran programs, but it
is certainly possible (it took me about 10 minutes for the first C program,
and about an hour for a fortran prog.). The only slight flakyness is it's
interface to X - it needs to communicate with the X system to determine
keyboard or mouse activity, and there have been a couple of instances of the
condor system crashing when X either starts up or closes down (but I must
say that even though condor goes down on one machine, the program it was
running just gets put back in the pool, and is executed from the last check-
point - in that respect it is a very robust system). The author is aware of
this, and rumour has it that a new version is due sometime - but I don't
know when.
Overall I believe that Condor is a definite plus on our system - it is
excelent for compute intensive jobs, so much so that Condor has increased
the through put of large jobs on our systems to a level where people prefer
to use it over the vastly over worked Cray we have access to - a job may take
10 times as long to run, but at least it gets run!
I hope this is some use to you,
Pete
Pete Biggs pete @ uk.ac.ox.physchem
System Manager
Physical Chemistry Lab
Oxford University phone: +44 865 275490
South Parks Rd fax: +44 865 275410
Oxford OX1 3QZ
UK
--------------------------------
Well, not a condor user, but I've worked with ISIS quite a bit - not
at the higher level of scheduling facilities, but I recommend it hightly
if you're doing network applications.
l & h,
kev
Kevin Sheehan
synergy!kevin@Sun.COM
--------------------------------
I've been running condor for some time now, and found it a
very good thing for our number crunching society here.
Some drawbacks are however:
1) condor_scheduler (or some name like that) takes about 1.5 hours
per day of cpu time (approx)
2) the jobs run by condor are not ``niceable'' to say 15 or so,
This is really bad, because we still want other batchqueues
on the system(s). We removed the keyborard idle feature in the
config file and put the cpu idle load value up to 1.5, so it runs jobs
above load 1.5. But because condor runs on nice 0 it eats up
all of the cpu time otherwise needed by other batch jobs on nice 15.
My solution was to renice the condor_startd to 15, not quite
elegant though.
3) I really don't know if condor runs on or will be running on
SunOS 4.1, which we will be running soon.
4) There hasn't been an upgrade to condor since 2 years or so, to my
knowledge. My version is 4.0.0. So I wander , if someone will port
it to other OS versions.
5) A limitation is , that you can't open files, all you can do
is write (*,*) in Fortran. You have to strengthen your
convincing power for the users.
Dr. J. Holzfuss bitnet: xphyhofu@ddathd21.bitnet ==
== IAP, TH Darmstadt internet: hofu@gauguin.th-darmstadt.de ==
== Schlossgartenstr.11 voice: +/49-6151-162884 ==
== 6100 Darmstadt, FRG
------------------------------------
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:12 CDT