I am currently in search of network batch software for our network of Sun
workstation. As I am new to this kind of software, I have turned to the net
for information on the available packages.
Freely available are: NQS, Monsanto-NQS, DNQS, DQS and
Condor, PVM
Commercial systems are: LSF, IBM Load Leveller, CODINE, HP Task Broker, ...
After a first posting, the answers to which gave me an idea of what software
is available for network batch processing, I sent out additional postings to
ask for people's experience with some specific packages. This posting provides
a summary of the responses to these postings. I include my original news
postings here to provide the readers of this summary with a context.
Posting to comp.parallel, comp.unix.misc, comp.unix.large:
>Some days ago I asked on the net about software that supports splitting a
>large program across a network of SUN workstations. Since I already managed
>to break up the large programs into many small jobs, I was looking for
>software that would enable me to submit all these jobs as a batch job to all
>the machines in the network. That software should automatically deal with
>network problems and machines going down, allow the user to add and delete
>jobs from/to the submit queue, allow scheduling of jobs to certain machines
>and allow monitoring the work in progress.
>I received pointers to the freely available software packages NQS and DQS,
>which seem to be doing what I am looking for. I did an archie search on NQS
>and DQS and found that there seem to be four different programs out there,
>namely NQS, Monsanto-NQS, DNQS, and DQS. The latest versions of these software
>packages that I could find are:
>nqs_2.5.tar.gz retrieved from lune.csc.liv.ac.uk
> in /hpux9/Networking
> file dated Dec 16, 1993
>dqns_1.11.tar.Z retrieved from ftp.physics.mcgill.ca
> in /pub/Dnqs
> file dated May 5, 1993
>Monsanto_NQS.3.36.5.tar.gz retrieved from mrcnext.cso.uiuc.edu
> in /pub/linux/system/Network/distrib/qs/Monsanto-NQS
> file dated Dec 16, 1994
>DQS_3_1_1.tar.gz retrieved from gundel.zdv.uni-mainz.de
> directory /pub/batch/dqs
> file dated Aug 29, 1994
>
>I haven't had the time yet to compare the packages with each other. Any info
>from people who have experience with one or more of these packages is highly
>appreciated. Please send email to norbert@iit.com if you know anything about
>the merits of these programs. I will post a summary of any responses I get to
>comp.unix.misc. If you send email and don't want it to be included in the
>summary, please indicate so. Thanks for your help in advance.
Posting to comp.sys.sun.misc:
>I am looking for network batch processing software for our network of SUN
>workstations. This will allow us to meet our increased computational needs by
>spreading the load equally over all machines in the network. It seems that the
>freely available software like NQS, DNQ, DQS, Monsanto-NQS is not able to
>meet our need because of lack of robustness (recovering from network trouble
>and host reboot) and lack of features (e.g. no support for software with
>license manager, no adequate way to set up a load profile for every host by
>time/day). I am therefore looking at commercial software, specifically LSF
>(Load Sharing Facility) by Platform Computing of Toronto, Ont. Interestingly
>enough, this software is endorsed and co-marketed by SGI, DEC, and HP, but
>not SUN! IBM has their own product called Load Leveller. To my knowledge, Sun
>neither manufactures, co-markets, or endorses any network batch processing
>software (why?). I would like to hear from anyone who has some information on
>or experience with network batch processing software on Sun workstations and
>LSF in particular.
>
>Please reply by email, since our newsfeed is flaky at times. I will post a
>summary here with all the email replies I get. Thanks for your help.
It seems that in my posting to comp.sys.sun.misc I inexplicably exceeded the
80-column format (has been corrected in the copy above). For that I apologize.
So far I have received seven email replies. I'd like to thank Francois P.
Thibaud <thibaud@kether.cgd.ucar.edu>, Jingwen Wang <jwang@sys.toronto.edu>,
Matthias Linke <mlinke@informatik.uni-rostock.de>, Glenn Malling
<gmalling@mailbox.syr.edu>, Kenneth J. Bongort <kbongort@panix.com>, Jim Pryune
<pruyne@cs.wisc.edu>, Scott Strecker <strecker@cray.com> for taking the time
to answer my questions. Their email replies are included below and have been
only edited to delete my original postings that were quoted in the replies.
Again, thanks to all respondents.
-- Norbert Juffa (norbert@iit.com)
===== Reply 1 =================================================================
>From thibaud@kether.cgd.ucar.edu Thu Dec 22 14:13:26 1994
Date: Thu, 22 Dec 1994 15:19:00 -0700
From: thibaud@kether.cgd.ucar.edu (Francois P. Thibaud)
To: norbert@iit.com
Subject: Re: Please relate experience with NQS, DNQS, DQS, Monsanto-NQS
X-Organization: University of Maryland at College Park (UMCP) and
X-Organization: The National Center for Atmospheric Research (NCAR)
X-Address: 1850, Table Mesa Drive, PO Box 3000, Boulder CO 80307-3000 USA
X-Phone: (+1)303-497-1707, Fax: (+1)303-497-1700
X-Url: http://www.cgd.ucar.edu:/gds/thibaud/
X-Mailer: sendmail.el (GNU Emacs-19.28 on Sun's Solaris 2.3)
Reply-To: "Francois P. Thibaud" <thibaud@ncar.ucar.edu>
Content-Length: 3091
Hi !
I am currently writing a small paper on that subject (not yet ready;
I'll send it to you if you are interested at mid January or so). I
believe that none of the packages you mentionned will do what you
want, namely "automatically deal with network problems and machines
going down, allow the user to add and delete jobs from/to the submit
queue, allow scheduling of jobs to certain machines and allow
monitoring the work in progress".
In fact, to my knowledge, *NOTHING* exists that would do what you
want !!! I have been working on something to handle errors. PVM 3.3:
@Manual{PVM3,
title = "PVM 3 User's Guide and Reference Manual (PVM-3.3.4)",
author = "Geist, Al and Beguelin, Adam and Dongarra, Jack and Jiang,
Weicheng and Manchek, Robert and Sunderam, Vaidy",
organization = "Oak Ridge National Laboratory",
address = "Oak Ridge, Tennessee 37831",
year = "1994",
month = "September",
note = "ORNL/TM-12187 (at URL:
$http://www.netlib.org/pvm3/index.html$)"
}
offers some partial solutions.
I would be most interested in the feedback you will get from the Net
as well as you experiences in this field !
Kind Regards !
Frangois P. Thibaud
Organization: University of Maryland at College Park (UMCP) and
The National Center for Atmospheric Research (NCAR)
Address: 1850, Table Mesa Drive; PO Box 3000; Boulder CO 80307-3000 USA
Phone: (+1)303-497-1707; Fax: (+1)303-497-1700; Room 505, North tower
URL: http://www.cgd.ucar.edu:/gds/thibaud/
===== Reply 2 =================================================================
>From mlinke@informatik.uni-rostock.de Fri Dec 23 01:02:36 1994
Date: Fri, 23 Dec 1994 10:07:19 +0100
From: Matthias Linke <mlinke@informatik.uni-rostock.de>
To: norbert@iit.com
Subject: Re: Please relate experience with NQS, DNQS, DQS, Monsanto-NQS
Newsgroups: comp.unix.misc,comp.unix.large,comp.parallel,comp.parallel.pvm
Organization: University of Rostock, CS Dept. (Germany)
X-Newsreader: TIN [version 1.2 PL2]
Content-Length: 1553
HalliHallo !!
There is a paper comparing the main batch sytems like NQS, DQS ,CODINE...
You can find it at
techreports.larc.nasa.gov
/pub/techreports/larc/94/tm109025.ps.Z
Enjoy it!
Greetings, Matthias.
Matthias Linke
University of Rostock
Department of Computer Science
Institute of Technical Informatics
18051 Rostock
Germany
Tel Germany-0381-44424-155
email mlinke@informatik.uni-rostock.de
_/_/_/_/ _/_/_/ _/ _/ _/
_/ _/ _/ _/ _/_/ _/
_/_/_/ _/_/_/ _/ _/ _/ _/
_/ _/ _/ _/ _/ _/_/ Fachbereich
\----\ _/ _/_/_/_/ _/ _/ _/ Informatik
\------------------------------------------------------/
\----------- UNIVERSITAET ROSTOCK ------------------/
\--------------------------------------------------/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
===== Reply 3 =================================================================
>From jwang@sys.toronto.edu Fri Dec 23 10:17:28 1994
From: Jingwen Wang <jwang@sys.toronto.edu>
To: norbert@iit.com
Subject: Re: Please relate experience with NQS, DNQS, DQS, Monsanto-NQS
Newsgroups: comp.unix.misc,comp.unix.large,comp.parallel,comp.parallel.pvm
X-Newsreader: NN version 6.5.0 #6 (NOV)
Date: Fri, 23 Dec 1994 13:22:36 -0500
Content-Length: 2817
I am sure NQS does not quite do what you want. I would suggest you try LSF.
A lot of NQS users have replaced their NQS by LSF.
LSF is from Platform Computing. Send email to info@platform.com for a free
evaluation copy.
If you need information on different packages that does similar things as
LSF, read the following information :
ftp 128.100.2.220:
cd distrib/lsf/doc
binary
get lsf_comp.ps.Z
There are also many other files in this directory that describe LSF.
Jingwen Wang
Computer Systems Research Institute
University of Toronto
6 Kings College Road
Toronto, Ont., Canada M5S 1A1
E-mail: jwang@sys.toronto.edu
Tel: (416)-978-1675
===== Reply 4 =================================================================
>From gmalling@mailbox.syr.edu Tue Dec 27 12:48:35 1994
From: gmalling@mailbox.syr.edu (Glenn Malling)
Date: Tue, 27 Dec 1994 15:54:00 +0500
To: norbert@iit.com
Subject: Re: Network batching software for SUN workstations
Content-Length: 1582
I believe that IBM has a version of Load Leveler for SUN. I could be
wrong but I think I remember seeing a product announcement for it.
>
Glenn A. Malling (gmalling@MAILBOX.SYR.EDU>
Syracuse University Computing Services +1 (315) 443-4111
220 Machinery Hall Syracuse, New York 13244-1260
===== Reply 5 =================================================================
Author expressed preference for not having his response published on the net.
===== Reply 6 =================================================================
>From pruyne@cs.wisc.edu Wed Dec 28 08:39:18 1994
Date: Wed, 28 Dec 94 10:43:48 -0600
From: pruyne@cs.wisc.edu (Jim Pruyne)
To: norbert@iit.com
Subject: Re: Summary: Splitting large program across network of SUNs
Newsgroups: comp.parallel
X-Newsreader: NN version 6.5.0 #5 (NOV)
Content-Length: 1203
Here's another answer in category two:
You might also be interested in checking out Condor which was developed
here at the University of Wisconsin. It is a batch system along the lines
of NQS/DQS, but has a few extra features. In particular, Condor can do
checkpointing and migration of your jobs which allows it to use spares
cycles from users desktops while they are away, and gives you some extra
reliability in case of failure. You can get Condor free via ftp from
ftp.cs.wisc.edu.
--- Jim
===== Reply 7 =================================================================
>From strecker@ektar.cray.com Wed Dec 28 09:21:23 1994
From: strecker@ektar.cray.com
Subject: CraySoft NQE News Release (fwd)
To: norbert@iit.com
Date: Wed, 28 Dec 94 10:54:11 CST
X-Mailer: ELM [version 2.3 PL2]
Content-Length: 6424
Norbert,
I picked up your thread on the newsgroup comp.parallel and thought
you might be interested in the following announcement. Hope this is
Scott
Forwarded message:
>
> Media: Mardi Larson, at UniForum March 23 - 25
> (booth #3021)
> After UniForum, 612/683-3538,
> Steve Conway 612-683-7133
>
> Financial: Bill Gacki, 612/683-7372
>
> CRAY RESEARCH ANNOUNCES NEW VERSION OF
> CRAYSOFT'S SOPHISTICATED BATCH PROCESSING
> SOFTWARE
>
> NQE Software Supports More Platforms, Is Basis For
> Cummings Group Network Tools Development
>
> SAN FRANCISCO, Calif., March 23, 1994 -- Cray Research, Inc.
> today announced that its sophisticated batch processing and
> automatic load balancing software for UNIX computer networks
> is scheduled to be available third quarter of this year on a
> broader range of workstations and servers. The Network
> Queuing Environment (NQE) software 1.1, the newest version of
> the distributed batch management product that is developed by
> Cray Research and marketed and sold on other computer
> platforms by the company's CraySoft initiative, is being
> demonstrated here at the UniForum conference and exhibit this
> week.
>
> Cray Research also announced today that it has signed a Value
> Added Reseller (VAR) agreement with The Cummings Group,
> Inc., (TCG), Seattle, Wash., authors of the NQS Exec and
> NCToolset products. As part of the agreement, TCG will use
> the NQE product as the job management component for its suite
> of network tools.
>
> The NQE product today supports SPARC/Solaris-compliant
> systems like the CRAY SUPERSERVER 6400 (CS6400) and Sun
> Microsystems Computer Corp.'s (SMCC) family of products. NQE
> 1.1 software will support these platforms, as well as
> SPARC/SunOS systems, IBM RS6000 systems, SGI systems, HP
> PA-RISC systems, and DEC Alpha systems running OSF/1.
> Distributed batch job demonstrations of the NQE 1.1 software
> are being conducted at UniForum on a variety of systems
> including a Sun SPARC workstation, an IBM RS6000
> workstation, a DEC Alpha workstation, a CS6400 system, and a
> CRAY EL92 system, all in Cray Research's booth (#3021), as
> well as a SPARCcenter 2000 system residing in SMCC's booth.
>
> "Since we began shipping the NQE product earlier this year we
> have seen a lot of interest from both technical and commercial
> customers who recognize this software's potential as a
> powerful new tool for production-quality distributed batch
> management," said Leary Gates, CraySoft program manager.
>
> NQE software automatically distributes a job to the most
> appropriate resource on the network and provides reliable data
> transfer. This has generated interest from users in the
> financial, automotive, and pharmaceutical industries, as well
> as in the government and university marketplace worldwide,
> Gates said.
>
> "These users are interested in NQE software because of its
> sophisticated load balancing capabilities that automatically
> assign jobs to the best available resource on a network --
> whether that is a single workstation, a workstation cluster, a
> server, Superserver system or Cray Research supercomputer.
> This product allows users to share enterprise-wide network
> resources more effectively," Gates said.
>
> He said that a customer does not require a Cray Research
> system to use CraySoft software. CraySoft's NQE software is
> a client/server product that provides reliable server batch
> processing. It is based on the industry-proven Cray Research
> NQS-based software developed and continually enhanced for
> the company's supercomputing systems.
>
> Gates said CraySoft's NQE 1.1 has a list price of $2,875 (U.S.)
> for a 10 user network license, noting that its competitive
> pricing strategy makes it an even more attractive product
> because customers are charged based only on the number of
> concurrent users on the server. Clients can be freely
> distributed on the customer network.
>
> Regarding the agreement between Cray Research and TCG,
> Daniel Cummings, chairman of TCG, said "the decision to
> replace NQS Exec, the batch scheduling subsystem in
> NCtoolset, with the CraySoft NQE software was based on the
> new software's superior functionality and CraySoft's
> commitment to creating standard-based products. CraySoft
> understands our market and customer needs and Cray Research
> is the leader in high-performance technology and understands
> how to move that technology out to the world of network
> computing."
>
> According to Bob Slone, TCG president, CraySoft's NQE will be
> the basis for all future software development related to TCG's
> networking tools aimed at the commercial marketplace. "Many
> commercial organizations are looking for a heterogeneous
> batch processing and network load balancing solution," said
> Slone. "These are complex network issues and we view
> CraySoft's NQE product as the industry's leading solution. We
> are pleased to have this agreement with Cray and to offer to
> our customers its core technology coupled with our value-
> added features."
>
> CraySoft, formed in Oct., 1993, is an initiative aimed at
> bringing Cray Research software -- networking and application
> software, compilers, tools, and libraries -- to more users on a
> variety of computer platforms including workstations, servers
> and PCs. CraySoft's next product is the CraySoft Fortran 90
> programming environment, which is based on Cray Fortran 90
> (CF90), the first full, native implementation of the F90
> standard.
>
> CraySoft products are available through VARs and distributors
> like TCG, Cray Research sales offices worldwide, and Cray
> Research's new high-performance hotline, 1-800-BUY-CRAY,
> as well as e-mail at craysoft@cray.com.
>
> Cray Research creates the most powerful, highest-quality
> computational tools for solving the world's most challenging
> industrial and scientific problems.
>
> # # #
>
>
>
>
> --
>
> -- Conrad Anderson
> Employee Communications
> (612) 683-7338
>
-- ===================================================================== Scott Strecker Internet: strecker@cray.com ASIC Engineering Software Support Ma Bell : 715.726.4735 Cray Research Inc. Chippewa Falls,WI FAX : 715.726.4070 ========================================================================== End of email replies received ===========================================
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:13 CDT