The training dataset:
  • The FASTA sequences for the training set (15051 proteins): download
  • The PDB structures for the training set (15051 proteins): download (size: 761M)
  • The NPZ files (consist of the MSAs and the extracted geometries) for the training set (15051 proteins): download (size: 21G)


    The MSAs in a3m format for the test sets can be downloaded below:
  • CASP13 (Note: the native structures can be downloaded from the casp13 website: http://predictioncenter.org/download_area/CASP13/targets/)
  • CASP14 (Note: the native structures can be downloaded from the casp14 website: http://predictioncenter.org/download_area/CASP14/targets/)
  • CASP15 (Note: the native structures can be downloaded from the casp15 website: http://predictioncenter.org/download_area/CASP15/targets/)
  • CAMEO(201812-201906, used in trRosetta) (Both MSAs and native structure files are available)
  • CAMEO(202006-202009, used in trRosettaX) (Both MSAs and native structure files are available)