The posts which follow represent a system for running MD in discrete chunks, producing a series of sequentially numbered files. The csh script, dyn.csh, has been extensively used on Unix systems and Linux clusters for years, and some concepts date back to VAX DCL scripts for running CHARMM. This particular example is for a Linux cluster with a PBS queueing system, and a cover script for running CHARMM which handles the runtime setup, esp. for MPI based executables. The output files all have generic names initially, and they are numbered only if dyn.csh determines that the run was successful, based on a few simple tests.
The numbering is controlled by a file named next.seqno, which is first created by and then updated by dyn.csh for successful runs. In the absence of this file, dyn.csh assumes a new simulation is being started, and runs an MD startup script, assumes sequence number 1 for numbering, and puts the number 2 into the next.seqno file. The optional last.seqno file can be used to set a controlled stopping point. When next.seqno is present, an MD continuation script is run each time, extending the dynamics.
Another key step for a successful run is that dyn.res is copied to dyn.rea, the restart file to be read by the next run.
For a failed run, dyn.out is renamed to dyn.err.D.H, where D and H are the date and time in numeric form.
The dyn.csh script is set up to run 5 repetitions, and then attempts to resubmit itself by using ssh to connect to the cluster head node, by running a small csh script which invokes 'qsub'. That script, lobos.com in this case, contains--
#!/bin/csh
qsub -N $cwd:t -l nodes=1:ctown:ppn=8 dyn.csh
and is used to start the chain. In the event that ssh (or rsh) to the head node doesn't work at your site, you can increase the number of repetitions in dyn.csh and resubmit manually. The '$cwd:t' construct uses the current dir name as the PBS job name.
The scripting system assumes a separate subdir for each simulation.
The posts represent three files--
- dynstrt.inp; MD startup, run if next.seqno does not exist
- dyn.inp; continue MD; this file is run repeatedly
- dyn.csh; the csh script which runs CHARMM and evaluates success