trRosetta

  Available multiple sequence alignment formats

The trRosetta server now supports using a multiple sequence alignment as input. Available formats for the input multiple sequence alignment are:

  A3M format

The A3M format consists of aligned fasta, in which alignments are shown with inserts as lower case characters, matches as upper case characters, deletions as ' - ', and gaps aligned to inserts as ' . '. Note that gaps aligned to inserts can be omitted in the A3M format.

In the standard A3M format, sequences are separated by ' > '. See an example:

>example
ETESMKTVRIREKIKKFLGDRPRNTAEILEHINSTMRHGTTSQQLGNVLSKDKDIVKVGYIKRSGILSGGYDICEWATRNWVAEHCPEWTE
>1
----MRTTRLRQKIKKFLNERGeANTTEILEHVNSTMRHGTTPQQLGNVLSKDKDILKVATTKRGGALSGRYEICVWTLRP-----------
>2
----MDSQNLRDLIRNYLSERPRNTIEISAWLASQMDPNSCPEDVTNILEADESIVRIGTVRKSGMRLTDLPISEWASSSWVRRHE-----
>3
----MNSQNLRELIRNYLSERPRNTIEISTWLSSQIDPTNSPVDITSILEADDQIVRIGTVRKSGMRRSESPVSEWASNTWVKHHE-----
>4
--RDMDTEKVREIVRNYISERPRNTAEIAAWLNRH-DDGTGGSDVAAILESDGSFVRIGTVRTSGMTGNSPPLSEWATEKWIQHHER----
>5
-----RTRRLREAVLVFLEEKGnANTVEVFDYLNERFRWGATMNQVGNILAKDTRFAKVGHQ-RGQFRGSVYTVCVWALS------------
>6
-----RTKRLREAVRVYLAENGrSHTVDIFDHLNDRFSWGATMNQVGNILAKDNRFEKVGHVRD-FFRGARYTVCVWDLAS-----------


The separator line can be omitted. In this case, each sequence should be written into one line. See an example:

ETESMKTVRIREKIKKFLGDRPRNTAEILEHINSTMRHGTTSQQLGNVLSKDKDIVKVGYIKRSGILSGGYDICEWATRNWVAEHCPEWTE
----MRTTRLRQKIKKFLNERGeANTTEILEHVNSTMRHGTTPQQLGNVLSKDKDILKVATTKRGGALSGRYEICVWTLRP-----------
----MDSQNLRDLIRNYLSERPRNTIEISAWLASQMDPNSCPEDVTNILEADESIVRIGTVRKSGMRLTDLPISEWASSSWVRRHE-----
----MNSQNLRELIRNYLSERPRNTIEISTWLSSQIDPTNSPVDITSILEADDQIVRIGTVRKSGMRRSESPVSEWASNTWVKHHE-----
--RDMDTEKVREIVRNYISERPRNTAEIAAWLNRH-DDGTGGSDVAAILESDGSFVRIGTVRTSGMTGNSPPLSEWATEKWIQHHER----
-----RTRRLREAVLVFLEEKGnANTVEVFDYLNERFRWGATMNQVGNILAKDTRFAKVGHQ-RGQFRGSVYTVCVWALS------------
-----RTKRLREAVRVYLAENGrSHTVDIFDHLNDRFSWGATMNQVGNILAKDNRFEKVGHVRD-FFRGARYTVCVWDLAS-----------

  FASTA format

The FASTA format consists of aligned fasta, in which lower and upper case are equivalent; ' . ' and ' - ' are equivalent.

In the standard FASTA format, sequences are separated by ' > '. See an example:


>example
ETESMKTVRIREKIKKFLGDRPRNTAEILEHINSTMRHGTTSQQLGNVLSKDKDIVKVGYIKRSGILSGGYDICEWATRNWVAEHCPEWTE
>1
----MRTTRLRQKIKKFLNERGANTTEILEHVNSTMRHGTTPQQLGNVLSKDKDILKVATTKRGGALSGRYEICVWTLRP-----------
>2
----MDSQNLRDLIRNYLSERPRNTIEISAWLASQMDPNSCPEDVTNILEADESIVRIGTVRKSGMRLTDLPISEWASSSWVRRHE-----
>3
----MNSQNLRELIRNYLSERPRNTIEISTWLSSQIDPTNSPVDITSILEADDQIVRIGTVRKSGMRRSESPVSEWASNTWVKHHE-----
>4
--RDMDTEKVREIVRNYISERPRNTAEIAAWLNRH-DDGTGGSDVAAILESDGSFVRIGTVRTSGMTGNSPPLSEWATEKWIQHHER----
>5
-----RTRRLREAVLVFLEEKGANTVEVFDYLNERFRWGATMNQVGNILAKDTRFAKVGHQ-RGQFRGSVYTVCVWALS------------
>6
-----RTKRLREAVRVYLAENGSHTVDIFDHLNDRFSWGATMNQVGNILAKDNRFEKVGHVRD-FFRGARYTVCVWDLAS-----------


The separator line can be omitted. In this case, each sequence should be written into one line. See an example:

ETESMKTVRIREKIKKFLGDRPRNTAEILEHINSTMRHGTTSQQLGNVLSKDKDIVKVGYIKRSGILSGGYDICEWATRNWVAEHCPEWTE
----MRTTRLRQKIKKFLNERGANTTEILEHVNSTMRHGTTPQQLGNVLSKDKDILKVATTKRGGALSGRYEICVWTLRP-----------
----MDSQNLRDLIRNYLSERPRNTIEISAWLASQMDPNSCPEDVTNILEADESIVRIGTVRKSGMRLTDLPISEWASSSWVRRHE-----
----MNSQNLRELIRNYLSERPRNTIEISTWLSSQIDPTNSPVDITSILEADDQIVRIGTVRKSGMRRSESPVSEWASNTWVKHHE-----
--RDMDTEKVREIVRNYISERPRNTAEIAAWLNRH-DDGTGGSDVAAILESDGSFVRIGTVRTSGMTGNSPPLSEWATEKWIQHHER----
-----RTRRLREAVLVFLEEKGANTVEVFDYLNERFRWGATMNQVGNILAKDTRFAKVGHQ-RGQFRGSVYTVCVWALS------------
-----RTKRLREAVRVYLAENGSHTVDIFDHLNDRFSWGATMNQVGNILAKDNRFEKVGHVRD-FFRGARYTVCVWDLAS-----------


  A2M format

The A2M format consists of aligned fasta, in which alignments are shown with inserts as lower case characters, matches as upper case characters, deletions as ' - ', and gaps aligned to inserts as ' . '.

In the standard A2M format, sequences are separated by ' > '. See an example:


>example
ETESMKTVRIREKIKKFLGDRP.RNTAEILEHINSTMRHGTTSQQLGNVLSKDKDIVKVGYIKRSGILSGGYDICEWATRNWVAEHCPEWTE
>1
----MRTTRLRQKIKKFLNERGeANTTEILEHVNSTMRHGTTPQQLGNVLSKDKDILKVATTKRGGALSGRYEICVWTLRP-----------
>2
----MDSQNLRDLIRNYLSERP.RNTIEISAWLASQMDPNSCPEDVTNILEADESIVRIGTVRKSGMRLTDLPISEWASSSWVRRHE-----
>3
----MNSQNLRELIRNYLSERP.RNTIEISTWLSSQIDPTNSPVDITSILEADDQIVRIGTVRKSGMRRSESPVSEWASNTWVKHHE-----
>4
--RDMDTEKVREIVRNYISERP.RNTAEIAAWLNRH-DDGTGGSDVAAILESDGSFVRIGTVRTSGMTGNSPPLSEWATEKWIQHHER----
>5
-----RTRRLREAVLVFLEEKGnANTVEVFDYLNERFRWGATMNQVGNILAKDTRFAKVGHQ-RGQFRGSVYTVCVWALS------------
>6
-----RTKRLREAVRVYLAENGrSHTVDIFDHLNDRFSWGATMNQVGNILAKDNRFEKVGHVRD-FFRGARYTVCVWDLAS-----------


The separator line can be omitted. In this case, each sequence should be written into one line. See an example:

ETESMKTVRIREKIKKFLGDRP.RNTAEILEHINSTMRHGTTSQQLGNVLSKDKDIVKVGYIKRSGILSGGYDICEWATRNWVAEHCPEWTE
----MRTTRLRQKIKKFLNERGeANTTEILEHVNSTMRHGTTPQQLGNVLSKDKDILKVATTKRGGALSGRYEICVWTLRP-----------
----MDSQNLRDLIRNYLSERP.RNTIEISAWLASQMDPNSCPEDVTNILEADESIVRIGTVRKSGMRLTDLPISEWASSSWVRRHE-----
----MNSQNLRELIRNYLSERP.RNTIEISTWLSSQIDPTNSPVDITSILEADDQIVRIGTVRKSGMRRSESPVSEWASNTWVKHHE-----
--RDMDTEKVREIVRNYISERP.RNTAEIAAWLNRH-DDGTGGSDVAAILESDGSFVRIGTVRTSGMTGNSPPLSEWATEKWIQHHER----
-----RTRRLREAVLVFLEEKGnANTVEVFDYLNERFRWGATMNQVGNILAKDTRFAKVGHQ-RGQFRGSVYTVCVWALS------------
-----RTKRLREAVRVYLAENGrSHTVDIFDHLNDRFSWGATMNQVGNILAKDNRFEKVGHVRD-FFRGARYTVCVWDLAS-----------


  STO format

The STO (Stockholm) format consists of a header line with a format and version identifier; mark-up lines starting with "#=GF","#=GC","#=GS" or "#=GR"; alignment lines with the sequence name and aligned sequence; a "//" line indicating the end of the alignment. Alignments are shown with inserts as lower case characters, matches as upper case characters, and gaps as ' . ' or ' - '.

See an example:


# STOCKHOLM 1.0
#=GF ID DUF3860
#=GF AC PF12976.9
#=GF DE Domain of Unknown Function with PDB structure (DUF3860)
#=GF AU Ellrott K;0000-0002-6573-5900
#=GF SE JCSG structure PDB:2OD5
#=GF GA 27.00 27.00;
#=GF TC 33.80 46.10;
#=GF NC 26.00 21.70;
#=GF BM hmmbuild HMM.ann SEED.ann
#=GF SM hmmsearch -Z 57096847 -E 1000 --cpu 4 HMM pfamseq
#=GF TP Family
#=GF WK Domain_of_unknown_function
#=GF CL CL0123
#=GF DR INTERPRO; IPR024619;
#=GF DR SO; 0100021; polypeptide_conserved_region;
#=GF CC A protein family created to cover PDB:2OD5. 2OD5 is a
#=GF CC hypothetical protein (JCVI_PEP_1096688149193) from an
#=GF CC environmental metagenome (unidentified marine microbe).
#=GF SQ 2
#=GS A0A3A5W886_9EURY/2-78 AC A0A3A5W886.1
#=GS A0A3A5W7Z0_9EURY/1-90 AC A0A3A5W7Z0.1
A0A3A5W886_9EURY/2-78 s-RTARLRNEIAQYLETNgVSNTSQILDHVNKRFRWGATMNQVGNVLARDRRFEKLGITEGTTMAGFRERVCIWA-------------------lva
A0A3A5W7Z0_9EURY/1-90 .MKTVRIREKIKKFLGDK.PRNTAEILEHINSTMRHGTTSQQLGNVLSKDKDIVKVGYIKRSGILSGGYDICEWATRTWVS-DNCPGWEEG---qp.
#=GC seq_cons ..+TsRlRpcItpaLtsp.spNTupIL-HlNpphRaGsT.pQlGNVLu+D+ch.KlGhhctoshhuht.clC.WA....................s.
//


  Need more help?

If you have more questions or comments about the server, please email yangjyqd.sdu.edu.cn.