The training dataset:
The FASTA sequences for the training set (15051 proteins): download
The PDB structures for the training set (15051 proteins): download (size: 761M)
The NPZ files (consist of the MSAs and the extracted geometries) for the training set (15051 proteins): download (size: 21G)
The MSAs in a3m format for the test sets can be downloaded below:
CASP13 (Note: the native structures can be downloaded from the casp13 website: http://predictioncenter.org/download_area/CASP13/targets/)
CASP14 (Note: the native structures can be downloaded from the casp14 website: http://predictioncenter.org/download_area/CASP14/targets/)
CASP15 (Note: the native structures can be downloaded from the casp15 website: http://predictioncenter.org/download_area/CASP15/targets/)
CAMEO(201812-201906, used in trRosetta) (Both MSAs and native structure files are available)
CAMEO(202006-202009, used in trRosettaX) (Both MSAs and native structure files are available)