Previous Thread
Next Thread
Print Thread
Joined: Jun 2012
Posts: 18
Forum Member
OP Offline
Forum Member
Joined: Jun 2012
Posts: 18
I want to run a good number of simulations (approx. 100), each one being of 100ns.

Our research group has a cluster computing facility of approximately 200 cores.

I want to automate the process using the shell script.

How shall I do it??? Someone has said that playing a bit with the iseed value would be useful...

Last edited by souparnoadhikary; 01/29/14 05:28 PM.

Souparno Adhikary,


Trainee,
Centre of Excellence in Bioinformatics,
Bose Institute,
Kolkata
Joined: Sep 2003
Posts: 8,623
Likes: 24
rmv Online Content
Forum Member
Online Content
Forum Member
Joined: Sep 2003
Posts: 8,623
Likes: 24
With newer versions of CHARMM (which are many times faster!), it should not be necessary to specify ISEED; the default now is to auto-seed based on a time counter.

I would also introduce some random variation in the model building as well.


Rick Venable
computational chemist

Joined: Dec 2005
Posts: 1,535
Forum Member
Offline
Forum Member
Joined: Dec 2005
Posts: 1,535
  • This question might be better suited for the "Other User Discussion and Questions" forum.
  • Assuming you can use the whole cluster, you probably want to run the lowest number of cores per simulation that keep the whole cluster busy, in order to minimize the parallel overhead. In your example, that would be 2 cores per simulation (assuming your chemical system parallelizes well on 2 cores).
  • With older CHARMM versions than the ones (which ones?) Rick is talking about, giving different iseed values for different simulations is indeed important if you're simulating the same thing every time. People often use random numbers (from a different source than CHARMM) for that.
  • As for the actual scripting, we can't do that for you. Typically, the script will need to auto-generate job files and submit them to your queuing system.

Last edited by Kenno; 01/29/14 06:09 PM. Reason: added "with older CHARMM versions..."
Joined: Sep 2003
Posts: 8,623
Likes: 24
rmv Online Content
Forum Member
Online Content
Forum Member
Joined: Sep 2003
Posts: 8,623
Likes: 24
The default random generator was changed with c36b1, the first public Fortran95 release; I think auto-seed was introduced then as well, but I'd have to do some digging or test runs to verify that. It would be easy enough to test-- start two simulations, and check the output log for the seed info printed during dynamics startup. Releases c35b6 and earlier would require specifying ISEED to get more divergent simulations.

The biggest speed increase came with c37b1, with the incorporation of a spatial domain decomposition scheme (DOMDEC), similar to what's used in NAMD, GROMACS, and Desmond/Anton simulation engines. If the systems have more than a few thousand atoms, I might not consider CHARMM for this w/o using the DOMDEC code, which makes CHARMM performance comparable to NAMD.


Rick Venable
computational chemist

Joined: Sep 2003
Posts: 4,861
Likes: 10
Forum Member
Online Content
Forum Member
Joined: Sep 2003
Posts: 4,861
Likes: 10
Definitely upgrade. Do not specify the iseed.
For analysis all CHARMM commands that read trajectories accept the keyword NOCHECK, which allows you to process non-seqential trajectory files (this, too, is only available in recent versions).

And get a couple of GPUs, eg GTX TITAN, or GTX 780TI; one such card roughly corresponds to your entire cluster, and is perfectly good for vanilla MD simulations.

Last edited by lennart; 01/29/14 07:27 PM. Reason: GPU promotion

Lennart Nilsson
Karolinska Institutet
Stockholm, Sweden
Joined: Sep 2003
Posts: 8,623
Likes: 24
rmv Online Content
Forum Member
Online Content
Forum Member
Joined: Sep 2003
Posts: 8,623
Likes: 24
I'm still not convinced on the value of GPUs; the double precision performance is still not that great, and while mixed precision is much more accurate than single precision, it is not as accurate as full double precision.

While the OpenMM implementation is probably the only publicly available GPU implementation in CHARMM worth considering, it is still somewhat developmental, and only supports a limited subset of the program capabilities, less than what DOMDEC currently supports. In my evaluation, I found it to be inadequate for our computing needs.


Rick Venable
computational chemist

Joined: Jun 2012
Posts: 18
Forum Member
OP Offline
Forum Member
Joined: Jun 2012
Posts: 18
We have CHARMM c37b1... My system runs well in 8 cores at a time...

Sorry for posting in the wrong section...

Last edited by souparnoadhikary; 01/30/14 07:30 AM. Reason: forum to section

Souparno Adhikary,


Trainee,
Centre of Excellence in Bioinformatics,
Bose Institute,
Kolkata
Joined: Dec 2005
Posts: 1,535
Forum Member
Offline
Forum Member
Joined: Dec 2005
Posts: 1,535
Yes, your system may run well on 8 cores, but it will still run less than 8 times as fast as on 1 core, or less that 4 times as fast as on 2 cores. The difference is what we call "parallel overhead", and is pretty much unavoidable for problems that are not embarrassingly parallel, which includes MD. I don't have a good reference discussing the basics of parallel computing, but the wikipedia article on Amdahl's law should give some clue. It you're interested in a few of the simulations finishing in as little wall clock time as possible, then yes, run on 8 (or more) cores. However, if you're interested in all 100 simulations finishing as soon as possible, then you will be slightly faster with less parallelization (provided you're still keeping the whole cluster busy).
All of this is assuming you don't need to share the cluster with anyone else. If you do, then the human factor comes into play. Submitting many long-running calculations makes for less ideal scheduling than a few short-running ones with higher parallelism, so your colleagues might start complaining.

Our group's most important project is force field development, which involves running many small simulations for a short time (as opposed to a few big ones for a long time), so we routinely take advantage of the notion that less parallelism makes for more efficient CPU usage, even in the design of our computer clusters. Nevertheless, we do have standing recommendations to use minimum numbers of cores for certain kinds of jobs on certain cluster, in order to decrease the incidence of people filling the cluster with jobs that run forever and other people getting impatient. Avoiding this is very much worth a few % in computing efficiency. wink

Last edited by Kenno; 01/30/14 04:22 PM. Reason: added last sentence.

Moderated by  BRBrooks, lennart, rmv 

Link Copied to Clipboard
Powered by UBB.threads™ PHP Forum Software 7.7.5
(Release build 20201027)
Responsive Width:

PHP: 7.3.31-1~deb10u1 Page Time: 0.009s Queries: 30 (0.006s) Memory: 0.7662 MB (Peak: 0.8363 MB) Data Comp: Off Server Time: 2022-12-04 21:08:04 UTC
Valid HTML 5 and Valid CSS