Previous Thread
Next Thread
Print Thread
Page 1 of 2 1 2
MPI_Init() error
#3777 10/16/04 05:27 PM
Joined: Jan 2004
Posts: 91
PKo Offline OP
Forum Member
OP Offline
Forum Member
Joined: Jan 2004
Posts: 91
hi all,

when i start the mpirun application on hpux 11.23, there was an error with the MPI_Init() routine.

when the execute the parallel charmm like this

mpirun -np 4 $home/charmm/path out &

immedately after the shell prompt...it is displaying

MPI Application rank 3 exited before MPI_Init() with status 0

----------------

should i have to set any enviromental variables needed for the mpirun?? why is it showing this error though i compiled succesfully with the mpi libraries. how can i solve this??

thanks a lot for your comments

regards
praveen.

Re: MPI_Init() error
PKo #3778 10/18/04 09:51 AM
Joined: Feb 2004
Posts: 147
Forum Member
Offline
Forum Member
Joined: Feb 2004
Posts: 147
Starting up a parallel job with mpirun should be like:

mpirun -np x /path/to/charmm/executable < input > output

Optionally, you can add "&" at the end of the line to start in background.

If CHARMM output file contains something like:
 RDTITL > No title read.

***** LEVEL 1 WARNING FROM < RDTITL > *****
***** Title expected.
******************************************
BOMLEV ( 0) IS NOT REACHED. WRNLEV IS 5



NORMAL TERMINATION BY END OF FILE


then mpirun doesn't do a proper redirection of stdin and CHARMM cannot read
the input file. If so, you should try to find if there are any related mpirun
options for this redirection. There might be options for distributing stdin
to all processes in the parallel job or to only some of them; CHARMM reads
stdin only in the first process (MPI rank 0) and then broadcasts to all other
processes.

Re: MPI_Init() error
bogdan #3779 10/18/04 10:48 AM
Joined: Sep 2003
Posts: 4,793
Likes: 2
Forum Member
Online Content
Forum Member
Joined: Sep 2003
Posts: 4,793
Likes: 2
Note that there is also a problem with rewinding stdin under MPI. From parallel.doc (the same applies if you start CHARMM using mpirun):
Running CHARMM on parallel systems

General note for MPI systems.
Most MPI systems do not allow rewind of stdin which means charmm input files
containing "goto" statements would not work if invoked directly
(this example uses MPICH):
~charmm/exec/gnu/charmm -p4wd . -p4pg file < my.inp > my.out [charmm options]

The workaround is simple:
~charmm/exec/gnu/charmm -p4wd . -p4pg file < my.stdin > my.out ZZZ=my.inp [charm
m options]

where the file my.stdin just streams to the real inputfile:
* Stream to real file given as ZZZ=filename on commandline. Note that the filena
me
* cannot consist of a mixture of upper- and lower-case letters.
*
stream @ZZZ
stop


Lennart Nilsson
Karolinska Institutet
Stockholm, Sweden
Re: MPI_Init() error
bogdan #3780 10/18/04 12:01 PM
Joined: Jan 2004
Posts: 91
PKo Offline OP
Forum Member
OP Offline
Forum Member
Joined: Jan 2004
Posts: 91
hi bogdan and lennart,

actually i started the mpirun job the same way as you also stated but unfortunately i dint wrote that correctly in my last mail...

i started like this..

mpirun -np 2 $home/c30b1/exec/hpux/charmm < input.inp > output.out &

and the output file was the same as you stated in your mail ..

RDTITL> No title read.

***** LEVEL 1 WARNING FROM *****
***** Title expected.
******************************************
BOMLEV ( 0) IS NOT REACHED. WRNLEV IS 5


$$$$$$ New timer profile $$$$$

NORMAL TERMINATION BY END OF FILE

*******************************************************

is there any problem with my mpi charmm compilation??...i hope its not ...because few months back i compiled on 2 proceesor SGI machine it was working well with the mpirun but not on HPUX.

onemore question as lennart stated with the parallel systems (also seen in the parallel.doc)

...""-p4wd . -p4pg file""

what are those options...is it machine specific?? do i have to include this even for HPUX machines.

thanks a lot for your useful comments...it was really helpful...

rgds
praveen.

Re: MPI_Init() error
PKo #3781 10/18/04 12:30 PM
Joined: Feb 2004
Posts: 147
Forum Member
Offline
Forum Member
Joined: Feb 2004
Posts: 147
It is not a problem of compiling it, but of running it. Different systems (even
different systems from the same producer) can behave differently with respect
to how parallel jobs are started. The MPI standard covers only the way the
data communication takes places between the processes, but not how the processes
are started. So please follow my suggestion of finding out if the mpirun that
you are using on that system has any command line options related to
stdin redirection.
The mpirun from SGI's IRIX does stdin redirection as expected, but as explained
above this is no indication that the HPUX mpirun should do the same.

-p4pg and -p4wd are options coming from MPICH. If you are not using MPICH
(which is true, as you are using the MPI implementation from HPUX), they are
not valid.
When using MPICH, they will be read by the MPI library code; they are related
to the start-up of the processes to form the parallel job. They can be passed
as options to the MPI binary directly as in the example from CHARMM doc or to
mpirun, as in:

mpirun -p4wd . -p4pg groupfile /path/to/charmm < input > output

Re: MPI_Init() error
bogdan #3782 10/19/04 01:55 PM
Joined: Jan 2004
Posts: 91
PKo Offline OP
Forum Member
OP Offline
Forum Member
Joined: Jan 2004
Posts: 91
hi bogdan,

thanks for your reply

i tried with stdio (-stdio=i+) options with the mpirun

i issued the commnad line option like this..

mpirun -stdio=i+ -np 2 $home/c30b1/exec/hpux/charmm out &

actually the mpirun started with the two processors but it was writing two time in the output file instead of sharing the job with 2 processors.

********my output file *****************
---------- --------- --------- --------- --------- ---------
MINI> 20 -82516.94591 80.49844 0.38597 0.00029
MINI INTERN> 3221.96017 2515.51567 0.00000 -3444.94203 0.00000
MINI EXTERN> 10309.77404 -86515.28785 0.00000 0.00000 0.00000
MINI IMAGES> 443.53538 -6740.17467 0.00000 0.00000 0.00000
MINI EWALD> 475.00510-471747.78348 468965.45177 0.00000 0.00000
---------- --------- --------- --------- --------- ---------
MINI> 20 -82516.94591 80.49844 0.38597 0.00029
MINI INTERN> 3221.96017 2515.51567 0.00000 -3444.94203 0.00000
MINI EXTERN> 10309.77404 -86515.28785 0.00000 0.00000 0.00000
MINI IMAGES> 443.53538 -6740.17467 0.00000 0.00000 0.00000
MINI EWALD> 475.00510-471747.78348 468965.45177 0.00000 0.00000
---------- --------- --------- --------- --------- ---------
MINI> 40 -82547.45726 30.51135 0.15295 0.00014
MINI INTERN> 3222.61026 2516.51511 0.00000 -3445.48129 0.00000
MINI EXTERN> 10316.03843 -86550.42539 0.00000 0.00000 0.00000
MINI IMAGES> 444.61800 -6743.25800 0.00000 0.00000 0.00000
MINI EWALD> 474.56940-471747.78348 468965.13969 0.00000 0.00000
---------- --------- --------- --------- --------- ---------
MINI> 40 -82547.45726 30.51135 0.15295 0.00014
MINI INTERN> 3222.61026 2516.51511 0.00000 -3445.48129 0.00000
MINI EXTERN> 10316.03843 -86550.42539 0.00000 0.00000 0.00000
MINI IMAGES> 444.61800 -6743.25800 0.00000 0.00000 0.00000
MINI EWALD> 474.56940-471747.78348 468965.13969 0.00000 0.00000
---------- --------- --------- --------- --------- ---------
MINI> 60 -82568.42259 20.96534 0.63141 0.00039
MINI INTERN> 3227.74315 2517.18678 0.00000 -3444.67665 0.00000
MINI EXTERN> 10324.99714 -86584.41199 0.00000 0.00000 0.00000
MINI IMAGES> 445.77425 -6746.37058 0.00000 0.00000 0.00000
MINI EWALD> 474.22426-471747.78348 468964.89451 0.00000 0.00000
---------- --------- --------- --------- --------- ---------
MINI> 60 -82568.42259 20.96534 0.63141 0.00039
MINI INTERN> 3227.74315 2517.18678 0.00000 -3444.67665 0.00000
MINI EXTERN> 10324.99714 -86584.41199 0.00000 0.00000 0.00000
MINI IMAGES> 445.77425 -6746.37058 0.00000 0.00000 0.00000
MINI EWALD> 474.22426-471747.78348 468964.89451 0.00000 0.00000

*********************************************************

when i check this command below, seems this was not a parallel job which is also right from the output file above. the number of processes are showing 0 here.

mpijob -u
JOB USER NPROCS PROGNAME
7305 konidala 0
7844 konidala 0

please let me know if you have any suggestions...

thank you

regards
praveen.

Re: MPI_Init() error
PKo #3783 10/19/04 02:59 PM
Joined: Feb 2004
Posts: 147
Forum Member
Offline
Forum Member
Joined: Feb 2004
Posts: 147
Is $home/c30b1/exec/hpux/charmm a MPI binary ? Having the same lines several
times in the output file is a sign of trying to run in parallel a non-parallel
binary. You should look at the end of the build/hpux/hpux.log file and see
what is the last command executed. If the linking is not done with mpif77 or
the MPI library is not linked (-lmpi is missing), then most probably the
binary was not compiled with MPI. In such case, there will be 2 CHARMM
processes running, both doing exactly the same computations and will provide
2 identical results - but this will not provide any speed-up...

Re: MPI_Init() error
bogdan #3784 10/19/04 03:14 PM
Joined: Jan 2004
Posts: 91
PKo Offline OP
Forum Member
OP Offline
Forum Member
Joined: Jan 2004
Posts: 91
hi bogdon,

I compiled charmm with the mpif90 since there was no f77 compilers on it.

***********************
vibran COMPLETED
mpif90 +U77 +i8 -lmtmpi -ldmpi -o charmm.ex /.....

************************

there was no -lmpi at the end of the linking process ..

I checked out in the /build/hpux/ there was an mpi directory wit the libmpi.a or .so etc files...i hope it took those library files and produced a parallel charmm executable...am i right??

is there any way to check wheather my charmm executable was
a MPI binary or not

thanks for your reply

regards
praveen.

Re: MPI_Init() error
PKo #3785 10/19/04 03:46 PM
Joined: Feb 2004
Posts: 147
Forum Member
Offline
Forum Member
Joined: Feb 2004
Posts: 147
What are the options -lmtmpi -ldmpi supposed to do (which seem to be related somehow to MPI) ?
If you started with a non-MPI CHARMM installation and only afterwards changed things to do a MPI one, without deleting the build/hpux and lib/hpux directories, most likely you have a mix-up that will not work in parallel. One indication is looking for the MPI keyword in build/hpux/pref.dat; if you don't have it there, you don't have a MPI enabled binary, even if you compiled it with mpif90.

So, I would suggest starting from scratch by unpacking the CHARMM archive then make at once all modifications that were made to the current CHARMM tree (I hope that you kept track of them...) and then run install.com, which should then run without interruption until it successfully produces the CHARMM binary.

Re: MPI_Init() error
bogdan #3786 10/19/04 05:00 PM
Joined: Jan 2004
Posts: 91
PKo Offline OP
Forum Member
OP Offline
Forum Member
Joined: Jan 2004
Posts: 91
hi bogdon,

I cleaned everything and started again with the editing (once) and compiling the charmm source code with f90 compilers.

what i observed is i din't find MPI or PARALLEL keywords in the pref.dat.. although i compile with MPI

again with some compilation options (i tired many options) finally i went through producing an executable.
--------------------------------------------------
FC = f90 +DA2.0W +DSitanium2 +Ofenvaccess +fp_exception +FPZ +FPO -dynamic +parallel +extend_source +E4 +noppu +T +U77 +cpp=yes
-------------------------
LD = f90 +DSitanium2 +Ofenvaccess +FPZ +FPO +noppu +U77 +fp_exception +extend_source +parallel -dynamic
-----------------------------------------------

the compilation was succesful but mpirun was not working.

can i include MPI or Parallel keywords in the pref.dat and try to compile again...will it work?? its really confusing for me now.

thanks for your help

regards
praveen konidala.

Page 1 of 2 1 2

Moderated by  lennart, rmv 

Link Copied to Clipboard
Powered by UBB.threads™ PHP Forum Software 7.7.4
(Release build 20200307)
Responsive Width:

PHP: 5.6.33-0+deb8u1 Page Time: 0.016s Queries: 34 (0.009s) Memory: 0.9935 MB (Peak: 1.1230 MB) Data Comp: Off Server Time: 2020-09-21 00:47:35 UTC
Valid HTML 5 and Valid CSS