I've posted two input scripts I used recently to read in an enzyme system and the pertussis toxin 1PRT in .pdb (NOT .ent) format; CHARMM isn't programmed (yet) to read the new CIF format, so coords must be the in the "classic" .pdb format. I've done this a fair number of times, and have developed a fairly robust (but not automated) procedure which uses a script like the two posted, along with some advance preparation of the .pdb format file(s).

Note that while it may be possible to use an alternate procedure and get a CHARMM PSF and coords, my procedure has the following advantages:

+ the sequence is read directly from the PDB coord file
+ original residue numbering in the PDB file is retained
+ temperature B factors are read into WMAIN

I've included a fair number of comments in these advanced usage examples, which should explain the purpose and rationale for the commands used. A few additional remarks are in order, because of the additional steps, and residue numbering can create some confusion.

First, some examination and hand editing of the PDB file is needed:

+ change HIS residue name to e.g. HSE
+ create a new file for each subunit, or ANY gap in residue nos.
+ for multi-model NMR files, create new files for each model
+ examine subunit terminii for residue numbering, blocking groups
Optional:
+ change atom name CD1 to CD for residue ILE
+ for std CO2- terminii, change atom names for O,OXT to OT1,OT2
+ other edits may be needed for e.g. acetyl on Nterm, amide Cterm

I've indicated the last 3 as optional, because one can use IC BUILD to
place a few missing heavy atoms, e.g. at the terminii. For detailed
simulation work, you should probably revisit the issue of HIS
protonation states.

A key concept is that anytime residue N+1 does NOT follow residue N, a
new file must be created; the most common examples are gaps in numbering
due to disordered loops, or abrupt changes for the beginning of a new
subunit for more complex proteins.

Another important concept for this is the distinction between RESNO and
RESID within CHARMM; the PDB residue number RESID is treated as an
arbitrary label, while RESNO is the absolute integer residue number
within the PSF regardless of RESID. This way, for a second subunit, the
first RESID may be 1, but RESNO might be e.g. 224, assuming the first
subunit had 223 residues. This allows preserving the identifiers used
by the molecular biologists and crystallographers. However, in order
for this to work, the OFFSET keyword is needed with the COOR READ
command, and the value is chosen such that RESID+OFFSET = RESNO. For
the above example, the OFFSET would be 223 when reading the second
subunit (even if the last RESID of the first subunit is not 223).

To understand this fully, look at the scripts in conjunction with the
enzyme (1IRK) or pertussis toxin .pdb file (1PRT).


Rick Venable
computational chemist