The training datasets

The whole training datasets:
  • training dataset (2.4 GB, the NPZ files storing the MSAs, SSs, and the extracted geometries)

    The five training subsets for testing RNA-Puzzles targets and the corresponding pretrained models:
  • Before 2010-12 list model
  • Before 2013-07 list model
  • Before 2016-07 list model
  • Before 2017-01 list model
  • Before 2019-04 list model

    The benchmark datasets

  • 30 independent RNAs (36.5 MB)
  • 20 RNA-Puzzles targets (1.65 MB)
  • 12 blind test targets from CASP15 (4.9 MB)
  • 3 blind test targets from RNA-Puzzles (0.29 MB)

    Each of the above tarball contains the following data:
  • The MSAs.
  • The PDB structures.
  • The predicted secondary structures by SPOT-RNA.

    Reference

  • Wang et al, trRosettaRNA: automated prediction of RNA 3D structure with transformer network, Nature Communications, 14: 7266 (2023). (PDF)