|
Joined: Jan 2004
Posts: 91
Forum Member
|
OP
Forum Member
Joined: Jan 2004
Posts: 91 |
hi all,
when i start the mpirun application on hpux 11.23, there was an error with the MPI_Init() routine.
when the execute the parallel charmm like this
mpirun -np 4 $home/charmm/path out &
immedately after the shell prompt...it is displaying
MPI Application rank 3 exited before MPI_Init() with status 0
----------------
should i have to set any enviromental variables needed for the mpirun?? why is it showing this error though i compiled succesfully with the mpi libraries. how can i solve this??
thanks a lot for your comments
regards praveen.
|
|
|
|
Joined: Feb 2004
Posts: 147
Forum Member
|
Forum Member
Joined: Feb 2004
Posts: 147 |
Starting up a parallel job with mpirun should be like: mpirun -np x /path/to/charmm/executable < input > outputOptionally, you can add "&" at the end of the line to start in background. If CHARMM output file contains something like: RDTITL > No title read.
***** LEVEL 1 WARNING FROM < RDTITL > ***** ***** Title expected. ****************************************** BOMLEV ( 0) IS NOT REACHED. WRNLEV IS 5
NORMAL TERMINATION BY END OF FILE then mpirun doesn't do a proper redirection of stdin and CHARMM cannot read the input file. If so, you should try to find if there are any related mpirun options for this redirection. There might be options for distributing stdin to all processes in the parallel job or to only some of them; CHARMM reads stdin only in the first process (MPI rank 0) and then broadcasts to all other processes.
|
|
|
|
Joined: Sep 2003
Posts: 4,883 Likes: 12
Forum Member
|
Forum Member
Joined: Sep 2003
Posts: 4,883 Likes: 12 |
Note that there is also a problem with rewinding stdin under MPI. From parallel.doc (the same applies if you start CHARMM using mpirun): Running CHARMM on parallel systems
General note for MPI systems. Most MPI systems do not allow rewind of stdin which means charmm input files containing "goto" statements would not work if invoked directly (this example uses MPICH): ~charmm/exec/gnu/charmm -p4wd . -p4pg file < my.inp > my.out [charmm options]
The workaround is simple: ~charmm/exec/gnu/charmm -p4wd . -p4pg file < my.stdin > my.out ZZZ=my.inp [charm m options]
where the file my.stdin just streams to the real inputfile: * Stream to real file given as ZZZ=filename on commandline. Note that the filena me * cannot consist of a mixture of upper- and lower-case letters. * stream @ZZZ stop
Lennart Nilsson Karolinska Institutet Stockholm, Sweden
|
|
|
|
Joined: Jan 2004
Posts: 91
Forum Member
|
OP
Forum Member
Joined: Jan 2004
Posts: 91 |
hi bogdan and lennart,
actually i started the mpirun job the same way as you also stated but unfortunately i dint wrote that correctly in my last mail...
i started like this..
mpirun -np 2 $home/c30b1/exec/hpux/charmm < input.inp > output.out &
and the output file was the same as you stated in your mail ..
RDTITL> No title read.
***** LEVEL 1 WARNING FROM ***** ***** Title expected. ****************************************** BOMLEV ( 0) IS NOT REACHED. WRNLEV IS 5
$$$$$$ New timer profile $$$$$
NORMAL TERMINATION BY END OF FILE *******************************************************
is there any problem with my mpi charmm compilation??...i hope its not ...because few months back i compiled on 2 proceesor SGI machine it was working well with the mpirun but not on HPUX.
onemore question as lennart stated with the parallel systems (also seen in the parallel.doc)
...""-p4wd . -p4pg file""
what are those options...is it machine specific?? do i have to include this even for HPUX machines.
thanks a lot for your useful comments...it was really helpful...
rgds praveen.
|
|
|
|
Joined: Feb 2004
Posts: 147
Forum Member
|
Forum Member
Joined: Feb 2004
Posts: 147 |
It is not a problem of compiling it, but of running it. Different systems (even different systems from the same producer) can behave differently with respect to how parallel jobs are started. The MPI standard covers only the way the data communication takes places between the processes, but not how the processes are started. So please follow my suggestion of finding out if the mpirun that you are using on that system has any command line options related to stdin redirection. The mpirun from SGI's IRIX does stdin redirection as expected, but as explained above this is no indication that the HPUX mpirun should do the same.
-p4pg and -p4wd are options coming from MPICH. If you are not using MPICH (which is true, as you are using the MPI implementation from HPUX), they are not valid. When using MPICH, they will be read by the MPI library code; they are related to the start-up of the processes to form the parallel job. They can be passed as options to the MPI binary directly as in the example from CHARMM doc or to mpirun, as in:
mpirun -p4wd . -p4pg groupfile /path/to/charmm < input > output
|
|
|
|
Joined: Jan 2004
Posts: 91
Forum Member
|
OP
Forum Member
Joined: Jan 2004
Posts: 91 |
hi bogdan,
thanks for your reply
i tried with stdio (-stdio=i+) options with the mpirun
i issued the commnad line option like this..
mpirun -stdio=i+ -np 2 $home/c30b1/exec/hpux/charmm out &
actually the mpirun started with the two processors but it was writing two time in the output file instead of sharing the job with 2 processors.
********my output file ***************** ---------- --------- --------- --------- --------- --------- MINI> 20 -82516.94591 80.49844 0.38597 0.00029 MINI INTERN> 3221.96017 2515.51567 0.00000 -3444.94203 0.00000 MINI EXTERN> 10309.77404 -86515.28785 0.00000 0.00000 0.00000 MINI IMAGES> 443.53538 -6740.17467 0.00000 0.00000 0.00000 MINI EWALD> 475.00510-471747.78348 468965.45177 0.00000 0.00000 ---------- --------- --------- --------- --------- --------- MINI> 20 -82516.94591 80.49844 0.38597 0.00029 MINI INTERN> 3221.96017 2515.51567 0.00000 -3444.94203 0.00000 MINI EXTERN> 10309.77404 -86515.28785 0.00000 0.00000 0.00000 MINI IMAGES> 443.53538 -6740.17467 0.00000 0.00000 0.00000 MINI EWALD> 475.00510-471747.78348 468965.45177 0.00000 0.00000 ---------- --------- --------- --------- --------- --------- MINI> 40 -82547.45726 30.51135 0.15295 0.00014 MINI INTERN> 3222.61026 2516.51511 0.00000 -3445.48129 0.00000 MINI EXTERN> 10316.03843 -86550.42539 0.00000 0.00000 0.00000 MINI IMAGES> 444.61800 -6743.25800 0.00000 0.00000 0.00000 MINI EWALD> 474.56940-471747.78348 468965.13969 0.00000 0.00000 ---------- --------- --------- --------- --------- --------- MINI> 40 -82547.45726 30.51135 0.15295 0.00014 MINI INTERN> 3222.61026 2516.51511 0.00000 -3445.48129 0.00000 MINI EXTERN> 10316.03843 -86550.42539 0.00000 0.00000 0.00000 MINI IMAGES> 444.61800 -6743.25800 0.00000 0.00000 0.00000 MINI EWALD> 474.56940-471747.78348 468965.13969 0.00000 0.00000 ---------- --------- --------- --------- --------- --------- MINI> 60 -82568.42259 20.96534 0.63141 0.00039 MINI INTERN> 3227.74315 2517.18678 0.00000 -3444.67665 0.00000 MINI EXTERN> 10324.99714 -86584.41199 0.00000 0.00000 0.00000 MINI IMAGES> 445.77425 -6746.37058 0.00000 0.00000 0.00000 MINI EWALD> 474.22426-471747.78348 468964.89451 0.00000 0.00000 ---------- --------- --------- --------- --------- --------- MINI> 60 -82568.42259 20.96534 0.63141 0.00039 MINI INTERN> 3227.74315 2517.18678 0.00000 -3444.67665 0.00000 MINI EXTERN> 10324.99714 -86584.41199 0.00000 0.00000 0.00000 MINI IMAGES> 445.77425 -6746.37058 0.00000 0.00000 0.00000 MINI EWALD> 474.22426-471747.78348 468964.89451 0.00000 0.00000 *********************************************************
when i check this command below, seems this was not a parallel job which is also right from the output file above. the number of processes are showing 0 here.
mpijob -u JOB USER NPROCS PROGNAME 7305 konidala 0 7844 konidala 0
please let me know if you have any suggestions...
thank you
regards praveen.
|
|
|
|
Joined: Feb 2004
Posts: 147
Forum Member
|
Forum Member
Joined: Feb 2004
Posts: 147 |
Is $home/c30b1/exec/hpux/charmm a MPI binary ? Having the same lines several times in the output file is a sign of trying to run in parallel a non-parallel binary. You should look at the end of the build/hpux/hpux.log file and see what is the last command executed. If the linking is not done with mpif77 or the MPI library is not linked (-lmpi is missing), then most probably the binary was not compiled with MPI. In such case, there will be 2 CHARMM processes running, both doing exactly the same computations and will provide 2 identical results - but this will not provide any speed-up...
|
|
|
|
Joined: Jan 2004
Posts: 91
Forum Member
|
OP
Forum Member
Joined: Jan 2004
Posts: 91 |
hi bogdon,
I compiled charmm with the mpif90 since there was no f77 compilers on it.
*********************** vibran COMPLETED mpif90 +U77 +i8 -lmtmpi -ldmpi -o charmm.ex /.....
************************
there was no -lmpi at the end of the linking process ..
I checked out in the /build/hpux/ there was an mpi directory wit the libmpi.a or .so etc files...i hope it took those library files and produced a parallel charmm executable...am i right??
is there any way to check wheather my charmm executable was a MPI binary or not
thanks for your reply
regards praveen.
|
|
|
|
Joined: Feb 2004
Posts: 147
Forum Member
|
Forum Member
Joined: Feb 2004
Posts: 147 |
What are the options -lmtmpi -ldmpi supposed to do (which seem to be related somehow to MPI) ? If you started with a non-MPI CHARMM installation and only afterwards changed things to do a MPI one, without deleting the build/hpux and lib/hpux directories, most likely you have a mix-up that will not work in parallel. One indication is looking for the MPI keyword in build/hpux/pref.dat; if you don't have it there, you don't have a MPI enabled binary, even if you compiled it with mpif90.
So, I would suggest starting from scratch by unpacking the CHARMM archive then make at once all modifications that were made to the current CHARMM tree (I hope that you kept track of them...) and then run install.com, which should then run without interruption until it successfully produces the CHARMM binary.
|
|
|
|
Joined: Jan 2004
Posts: 91
Forum Member
|
OP
Forum Member
Joined: Jan 2004
Posts: 91 |
hi bogdon,
I cleaned everything and started again with the editing (once) and compiling the charmm source code with f90 compilers.
what i observed is i din't find MPI or PARALLEL keywords in the pref.dat.. although i compile with MPI
again with some compilation options (i tired many options) finally i went through producing an executable. -------------------------------------------------- FC = f90 +DA2.0W +DSitanium2 +Ofenvaccess +fp_exception +FPZ +FPO -dynamic +parallel +extend_source +E4 +noppu +T +U77 +cpp=yes ------------------------- LD = f90 +DSitanium2 +Ofenvaccess +FPZ +FPO +noppu +U77 +fp_exception +extend_source +parallel -dynamic -----------------------------------------------
the compilation was succesful but mpirun was not working.
can i include MPI or Parallel keywords in the pref.dat and try to compile again...will it work?? its really confusing for me now.
thanks for your help
regards praveen konidala.
|
|
|
|
Joined: Sep 2003
Posts: 4,883 Likes: 12
Forum Member
|
Forum Member
Joined: Sep 2003
Posts: 4,883 Likes: 12 |
You have tried many things, and still it seems that you are not getting a parallel MPI executable.
Can you compile and run a small MPI test program? This is a basic step to master when installing CHARMM on a new platform - the same compiler and runtime options should then be used for installing and running CHARMM. Is your mpif90 set up for use with the same compiler as you have been trying to use for the CHARMM installation?
It would be easier to help if you could post the following from a clean installation: 1/ The command line used for the installation (install.com ....) 2/ The Makefile that was used (ie, the one that was produced from Makefile_gnu) 3/ The last part of the logfile, showing the linking step
You could also try to use the selfcontained socket library instead of MPI, just to get going with a parallel CHARMM version: install.com gnu large S EFC
Lennart Nilsson Karolinska Institutet Stockholm, Sweden
|
|
|
|
Joined: Feb 2004
Posts: 147
Forum Member
|
Forum Member
Joined: Feb 2004
Posts: 147 |
Quote:
what i observed is i din't find MPI or PARALLEL keywords in the pref.dat.. although i compile with MPI
OK. Stupid question: did you modify the install.com file to add support for MPI to the "hpux" architecture ? The install.com that I have looked at from c28b1 and c31b1 doesn't have support for MPI, only for PVM and sockets. I assumed that you have already done this as you were talking about compiling with MPI, but if this is not the case, then here's what you have to do:
In install.com replace the lines:
case hpux: if (! -e prefx_hpux) f77 -o prefx_$$ prefx.f echo "HPUX" >! $chmbuild/pref$$.dat echo "UNIX" >> $chmbuild/pref$$.dat echo "SCALAR" >> $chmbuild/pref$$.dat if ( $pvmset == 1 ) echo "PVMC" >> $chmbuild/pref$$.dat if ( $socket == 1 ) echo "SOCKET" >> $chmbuild/pref$$.dat if ( $pvmset == 1 || $socket == 1 ) then echo "CMPI" >> $chmbuild/pref$$.dat echo "PARALLEL" >> $chmbuild/pref$$.dat with
case hpux: if (! -e prefx_hpux) f77 -o prefx_$$ prefx.f echo "HPUX" >! $chmbuild/pref$$.dat echo "UNIX" >> $chmbuild/pref$$.dat echo "SCALAR" >> $chmbuild/pref$$.dat if ( $pvmset == 1 ) echo "PVMC" >> $chmbuild/pref$$.dat if ( $socket == 1 ) echo "SOCKET" >> $chmbuild/pref$$.dat if ( $mpiset == 1 ) echo "MPI" >> $chmbuild/pref$$.dat if ( $pvmset == 1 || $socket == 1 || $mpiset == 1 ) then echo "CMPI" >> $chmbuild/pref$$.dat echo "PARALLEL" >> $chmbuild/pref$$.dat After you do this, you have to delete build/hpux and lib/hpux.
|
|
|
|
Joined: Jan 2004
Posts: 91
Forum Member
|
OP
Forum Member
Joined: Jan 2004
Posts: 91 |
hi lennart and bogdan,
happy to tell that i started the parallel job with mpirun and it is working well i think so... i am checking with my test jobs.
hi bogdon..i already did the inclusion of mpiset to the hpux (similar to the gnu case)...and also some changes with the compilers (mpif90)..i am waiting to check the results now..
thanks for your hint regarding the pref.dat which you gave to me yesterday..it was very useful...you know it is always difficult to edit the source code or charmm problems as a whole atleast at the begining stage for the people of different backgrounds.
anyway thanks for everyone who helped me...
rgds praveen.
|
|
|
|
|