SUMMARY:HPC3.1_and_MPI

From: Mr Rene Occelli <rene_at_iusti.univ-mrs.fr>
Date: Tue Jul 17 2001 - 04:48:58 EDT
Hi,

NO response for this problem but after  some  manipulations, All is Ok

Original:

After an INStall of SUN HPC3.1 on two machine s(E4000 and E4500) under
Solaris 8, I've some troubel running simple jobs MPI.
I've creatd two partitions named un and deux which one node on each
Leila# mpinfo -N
NAME         UP PARTITION   OS       OSREL NCPU   FMEM   FSWP LOAD1 LOAD5 LOAD15
Leila         y deux        SunOS      5.8    6 663.42   2521  0.00  0.08  2.04
Lisa          y un          SunOS      5.8   14  12024  14344 11.00 11.00 11.02

No prblem for launching simple unix commands. Problem on the master for
simple mpi job.

Leila% mprun -p un  -np 3  connectivity
Connectivity test on 3 processes PASSED.

But problem on the partition deux with node Leila

Leila% mprun -p deux -np 3 connectivity
connectivity: TMRTE_vna_init: Node object not found: Internal error
[unknown MPI_COMM_WORLD 0] ERROR in MPI_Init: unclassified error: RTE_Init_tables: Node object not found: Internal error
Fatal error, aborting.
connectivity: TMRTE_vna_init: Node object not found: Internal error
connectivity: TMRTE_vna_init: Node object not found: Internal error[unknown MPI_COMM_WORLD 1] ERROR
 in MPI_Init: unclassified error: RTE_Init_tables: Node object not found: Internal error
Fatal error, aborting.
[unknown MPI_COMM_WORLD 2] ERROR in MPI_Init: unclassified error: RTE_Init_tables: Node object not found: Internal error
Fatal error, aborting.
Signaled.
Leila% 

Solution:

After looking FAQ and discussioin with HOTLINE , I've chosen to deinstall
all the software on each node and to reinstall it. Same problem
In fact the install is made via a gui tool ( grrr!!) which is bugged and
the procedure is heavy and not simple. 
A find / -name '*hpc*' -ls  shows that some files was not removed even
after the deinstallation. It was the case of the /var/hpc directory
which contains the database of the cluster ( not a readable file grrr!!!).
After removing this folder and redo a clean install  all is OK.

Conclusion: IF you are  going to install HPC3.1 wait ( or ask)for the 4.0 
version.
By

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+       Rene OCCELLI                                            +
+       I.U.S.T.I. C.N.R.S. U.M.R.  6595                        +
+       Technopole de Chateau Gombert                           +
+       5 Rue Enrico FERMI                                      +
+       13453 MARSEILLE Cedex  13 France                        +
+       Tel: (33)04 91 10 69 37     04 91 10 69 38              +
+       Fax: (33)04 91 10 69 69                                 +
+       Email: rene@iusti.univ-mrs.fr                           +
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Received on Tue Jul 17 09:48:58 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:24:59 EDT