data:image/s3,"s3://crabby-images/6afbe/6afbe7577c5a571d04e2d32118581c9ef7f0ad74" alt=""
Hi Maik. The bad-complement-strand error was fixed back in May, so just missed the 1.6 release. If you get the 1.7 production release, the complement strand residues will be named correctly. --Eric
On Jan 9, 2024, at 1:39 AM, Engeholm, Maik via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
Hi Eric,
thanks a lot for your help. That is exactly what I wanted!
By the way, the example DNA file I created in ChimeraX using build start nucleic type DNA form B. And, yes, it has these different residue names for the two strands in the pdb file.
SEQRES 1 A 17 DA DG DG DG DA DG DT DA DA DT DC DC DC SEQRES 2 A 17 DC DT DT DG SEQRES 1 B 17 C A A G G G G A T T A C T SEQRES 2 B 17 C C C T ATOM 1 C4' DA A 1 212.165 192.028 187.380 1.00 0.00 C ATOM 2 O5' DA A 1 214.218 191.381 188.348 1.00 0.00 O ATOM 3 O3' DA A 1 210.239 190.586 187.380 1.00 0.00 O ...
Can you reproduce this?
Regards,
Maik From: Eric Pettersen <pett@cgl.ucsf.edu <mailto:pett@cgl.ucsf.edu>> Sent: Monday, January 8, 2024 9:10:05 PM To: Engeholm, Maik Cc: chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> Subject: Re: [chimerax-users] retrieve the sequence of an atom selection
Hi Maik, The residue names come from the input, so chain B in your input file must have those single-letter codes. The PDB-standard DNA residue names are two letters starting with D, whereas RNA names are one letter, but your one-letter residues names clearly are not RNA since there is a ’T” in them. Anyway, whatever created the input file is the culprit here. For getting the sequences cleanly, I would use the Chain objects, which know both the residues and corresponding sequence. Assuming your selections on the chains are contiguous, this should work:
selection = run(session, "select ...") for chain in selection.chains: indices = [i for i, r in enumerate(chain.residues) if r and r.atoms.selecteds.any()] print(chain, "".join(chain.characters[min(indices):max(indices)+1]))
--Eric
Eric Pettersen UCSF Computer Graphics Lab
On Jan 6, 2024, at 2:02 PM, Engeholm, Maik via ChimeraX-users <chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu>> wrote:
Hi there,
I'm looking for a neat way to retrieve the sequence of an atom selection (in this case part of a double-stranded DNA molecule) from within a python script.
I've been playing around with:
selection = run (session, 'select #1/A:2-4|#1/B:6-9') for a in selection.atoms.unique_residues: print(a)
which gives me:
151 atoms, 169 bonds, 7 residues, 1 model selected /A DG 2 /A DG 3 /A DG 4 /B G 6 /B G 7 /B A 8 /B T 9
This is a good start, but ideally I'd like to get back a contiguous sequence string for each chain (like (('A', 'GGG'), ('B', GGAT'))). Also, I do not really understand, why residue names are reported as 'DG' for strand A, but 'G' for strand B, which makes extracting a sequence string from the above output quite difficult.
I've also tried:
selection = run (session, 'select #1/A:2-4|#1/B:6-9') for a in selection.atoms.by_chain: print(a[1]) print(a[2].unique_residues.unique_sequences[0][1])
which gives me:
151 atoms, 169 bonds, 7 residues, 1 model selected A AGGGAGTAATCCCCTTG B CAAGGGGATTACTCCCT
This is obviously also not what I want since it returns the entire sequence of each chain.
I guess I could take the residue numbers from the first command and use those to slice the strings returned by the second, but that feels a little awkward, and so I was wondering if there was a better, more direct way to achieve this seemingly simple task.
Your help would be much appreciated.
Cheers,
Maik
_______________________________________________ ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.cgl.ucsf.edu_mailman_archives_list_chimerax-2Dusers-40cgl.ucsf.edu_&d=DwMF-g&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=cL0WvLW074DHXNhkYyxmeHhSv30WxTDLNOf7i2e1T40&m=x94wDrGymHgW6pSeiao2m-FHFqhg2PBneysfS-08qnsJ2153TPL6IFzWxrMy5FPi&s=xL5Y4J4VlLQaFjOtRbyiBaJvNCxsVLULzsEitjhw8EM&e=>
ChimeraX-users mailing list -- chimerax-users@cgl.ucsf.edu <mailto:chimerax-users@cgl.ucsf.edu> To unsubscribe send an email to chimerax-users-leave@cgl.ucsf.edu <mailto:chimerax-users-leave@cgl.ucsf.edu> Archives: https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/ <https://mail.cgl.ucsf.edu/mailman/archives/list/chimerax-users@cgl.ucsf.edu/>