Hi Elaine,

Thanks, I was trying to predict a multimer or the overall structure of many subunit chains using individual sequences, each separated with a comma in colab.

But it seems that there was some error and no pdb file was generated; the error message is as follows:

ERROR:colabfold.batch:Could not generate input features af1848: Invalid character in the sequence:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/colabfold/batch.py", line 1357, in run
model_type,
File "<ipython-input-1-d6881d38b934>", line 122, in generate_input_feature_wrapper
(input_features, domain_names) = batch.generate_input_feature_orig(*args, **kw)
File "/usr/local/lib/python3.7/dist-packages/colabfold/batch.py", line 1018, in generate_input_feature
sequence, input_msa, template_features[sequence_index]
File "/usr/local/lib/python3.7/dist-packages/colabfold/batch.py", line 869, in build_monomer_feature
sequence=sequence, description="none", num_res=len(sequence)
File "/usr/local/lib/python3.7/dist-packages/alphafold/data/pipeline.py", line 43, in make_sequence_features
map_unknown_to_x=True)
File "/usr/local/lib/python3.7/dist-packages/alphafold/common/residue_constants.py", line 580, in sequence_to_onehot
raise ValueError(f'Invalid character in the sequence: {aa_type}')
ValueError: Invalid character in the sequence:
INFO:colabfold.batch:Done
Downloading structure predictions to directory Downloads/ChimeraX/AlphaFold
cp: cannot stat '*_relaxed_rank_1_model_*.pdb': No such file or directory
cp: cannot stat '*_unrelaxed_rank_1_model_*_scores.json': No such file or directory

-Dennis

On Wed, 5 Oct 2022 at 23:36, Elaine Meng <meng@cgl.ucsf.edu> wrote:

Hi Dennis,
Your sequence input is wrong - it should contain only the sequences pasted as plain text, with only a comma between them (NOT the ">description" line because it is not supposed to be in fasta format). How to input sequence(s) is explained in the AlphaFold help page.

<https://rbvi.ucsf.edu/chimerax/docs/user/tools/alphafold.html#predict>

"For predicting a complex (multimer), the sequences of all chains in the complex must be given. The same sequence must be given multiple times if it occurs in multiple copies in the complex. The sequences can be specified either collectively as a model number chosen from the menu of currently open models (e.g. when that model contains multiple chains), or individually within a comma-separated list of UniProt identifiers or pasted-in amino acid sequences."

E.g. something like

ACCCC,ALLPAAAA

I hope this helps,
Elaine
-----
Elaine C. Meng, Ph.D.
UCSF Chimera(X) team
Department of Pharmaceutical Chemistry
University of California, San Francisco

> On Oct 5, 2022, at 4:12 AM, Dennis Poh via ChimeraX-users <chimerax-users@cgl.ucsf.edu> wrote:
>
> Sorry, I couldn't generate the pdb from chimera colab - is there anything that i may have missed?
>
> -Dennis
>
> On Wed, 5 Oct 2022 at 17:11, Dennis Poh <pohdennis90@gmail.com> wrote:
> Hi
>
> I encounter a problem in multimer prediction with the sequences I use as input.
> It's always indicating:
> "Missing or invalid "sequences" argument: Sequences argument"
> and " is not a chain specifier, alignment id, UniProt id, or sequence characters"
>
> The input format is always something like a fasta format:
> >seq_id
> ACCCC
>
> >seq_id2
> ALLPAAAA
>
> May I know how I can rectify this?
>
> Thanks!
> - Dennis