Previous Thread
Next Thread
Print Thread
Parallel CHARMM
#33414 02/10/14 04:10 PM
Joined: Jan 2013
Posts: 6
S
sats Offline OP
Forum Member
OP Offline
Forum Member
S
Joined: Jan 2013
Posts: 6
Dear Charmm users

Our group recently got new cluster of 11 nodes each consisting of 16 processors(total 176 processors). I installed charmm c36b2 version and charmm_gamess using mpif90 of mvapich2. I tried to run charmm on different number of nodes it is running perfectly fine on one node, 2 nodes and 4 nodes but it is not running on 3 nodes and more than 4 nodes.

It is giving following error in charmm output file

Parameter: STREAM <- "EQL4.INF"
1
Chemistry at HARvard Macromolecular Mechanics
(CHARMM) - Developmental Version 36b2 February 15, 2012
Copyright(c) 1984-2001 President and Fellows of Harvard College
All Rights Reserved
Current operating system: Linux-2.6.32-358.el6.x86_64(x86_64)@node11.i[+ 47]
Created on 2/11/14 at 5:28:35 by user: root

Maximum number of ATOMS: 360720, and RESidues: 120240
RDTITL> * CHARMM TEST STREAM FILE FOR PARALLEL RUNS
RDTITL> * WHERE REWINDING STDIN IS NOT POSSIBLE
RDTITL> *

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

In error file of pbs it is giving the following thing

[node9.iiserpune.ac.in:mpi_rank_32][error_sighandler] Caught error: Segmentation fault (signal 11)
[proxy:0:0@node11.iiserpune.ac.in] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
[proxy:0:0@node11.iiserpune.ac.in] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@node11.iiserpune.ac.in] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1@node10.iiserpune.ac.in] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
[proxy:0:1@node10.iiserpune.ac.in] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:1@node10.iiserpune.ac.in] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@node11.iiserpune.ac.in] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@node11.iiserpune.ac.in] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@node11.iiserpune.ac.in] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[mpiexec@node11.iiserpune.ac.in] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion


I tried charmm_gamess with 16 replicas. It also giving the same problem.

but it is also running fine when I use 8 nodes considering 8 processors from each node.

can anybody help


sats

Re: Parallel CHARMM
tjaart #33415 02/10/14 06:11 PM
Joined: Sep 2003
Posts: 8,498
rmv Online Content
Forum Member
Online Content
Forum Member
Joined: Sep 2003
Posts: 8,498
Unfortunately, those MPI messages don't provide any information about what caused the failure; as it appears to happen right after the title was printed in the CHARMM log, that does suggest an error during parallel setup. You may need to enable more detailed message reporting with CHARMM.

Until quite recently, calculations using PM Ewald were required to use a total number of cores that was a power of 2; 16, 32, and 64 should be okay, but not 48 cores. I believe that may have changed with c37b1, the next release.

Note that there is a separate forum for issues with QM methods in CHARMM; it's not clear the symptom with GAMESS incorporated is from the same cause.

This should have been a New Topic, instead of a reply to an ancient post; it has been moved.


Last edited by rmv; 02/10/14 06:14 PM. Reason: New Topic!

Rick Venable
computational chemist


Moderated by  lennart, rmv 

Link Copied to Clipboard
Powered by UBB.threads™ PHP Forum Software 7.7.4
(Release build 20200307)
Responsive Width:

PHP: 5.6.33-0+deb8u1 Page Time: 0.008s Queries: 18 (0.003s) Memory: 0.8956 MB (Peak: 0.9602 MB) Data Comp: Off Server Time: 2020-09-20 05:44:21 UTC
Valid HTML 5 and Valid CSS