Hi Maik.
The bad-complement-strand error was fixed back in May, so just missed the 1.6 release.  If you get the 1.7 production release, the complement strand residues will be named correctly.


Hi Eric,

thanks a lot for your help. That is exactly what I wanted!

By the way, the example DNA file I created in ChimeraX using build start nucleic type DNA form B. And, yes, it has these different residue names for the two strands in the pdb file.

SEQRES   1 A   17  DA  DG  DG  DG  DA  DG  DT  DA  DA  DT  DC  DC  DC
SEQRES   2 A   17  DC  DT  DT  DG
SEQRES   1 B   17  C   A   A   G   G   G   G   A   T   T   A   C   T
SEQRES   2 B   17  C   C   C   T
ATOM      1  C4' DA  A   1     212.165 192.028 187.380  1.00  0.00           C
ATOM      2  O5' DA  A   1     214.218 191.381 188.348  1.00  0.00           O
ATOM      3  O3' DA  A   1     210.239 190.586 187.380  1.00  0.00           O

Can you reproduce this?



Hi Maik,
The residue names come from the input, so chain B in your input file must have those single-letter codes.  The PDB-standard DNA residue names are two letters starting with D, whereas RNA names are one letter, but your one-letter residues names clearly are not RNA since there is a ’T” in them.  Anyway, whatever created the input file is the culprit here.
For getting the sequences cleanly, I would use the Chain objects, which know both the residues and corresponding sequence.  Assuming your selections on the chains are contiguous, this should work:

selection = run(session, "select ...")
for chain in selection.chains:
indices = [i for i, r in enumerate(chain.residues) if r and r.atoms.selecteds.any()]
print(chain, "".join(chain.characters[min(indices):max(indices)+1]))


Eric Pettersen
UCSF Computer Graphics Lab

Hi there,

I'm looking for a neat way to retrieve the sequence of an atom selection (in this case part of a double-stranded DNA molecule) from within a python script.

I've been playing around with:

selection =    run    (session, 'select #1/A:2-4|#1/B:6-9')
for a in selection.atoms.unique_residues:
which gives me:

151 atoms, 169 bonds, 7 residues, 1 model selected
/A DG 2
/A DG 3
/A DG 4
/B G 6
/B G 7
/B A 8
/B T 9

This is a good start, but ideally I'd like to get back a contiguous sequence string for each chain (like (('A', 'GGG'), ('B', GGAT'))). Also, I do not really understand, why residue names are reported as 'DG' for strand A, but 'G' for strand B, which makes extracting a sequence string from the above output quite difficult.

I've also tried:

selection =    run    (session, 'select #1/A:2-4|#1/B:6-9')
for a in selection.atoms.by_chain:
which gives me:

151 atoms, 169 bonds, 7 residues, 1 model selected

This is obviously also not what I want since it returns the entire sequence of each chain.

I guess I could take the residue numbers from the first command and use those to slice the strings returned by the second, but that feels a little awkward, and so I was wondering if there was a better, more direct way to achieve this seemingly simple task.

Your help would be much appreciated.



