Training dataset

  • training dataset (552 sequences)

    Validation dataset

  • validation dataset (227 sequences)

    Independent test datasets

  • TE210 dataset (210 sequences)
  • TE83 dataset (83 sequences)
  • Dataset results
  • CAID datasets

  • CAID2 binding dataset (78 sequences)
  • CAID2 linker dataset (40 sequences)
  • CAID3 binding dataset (51 sequences)
  • CAID3 linker dataset (20 sequences)

    The CAID2 and CAID3 datasets used in this work were obtained from the official Critical Assessment of Intrinsic Disorder (CAID) challenge website. The datasets correspond to the CAID Round 2 and Round 3 benchmark collections.

    Description of datasets

    For Training, Validation, and Independent Test Datasets

    All three datasets share the same structure:

    For CAID Datasets

    All four datasets share the same structure:

    Reference

  • Liang et al, Hybrid deep learning with protein language models and dual-path architecture for predicting IDP functions, Briefings in Bioinformatics, 27: bbag126 (2026). (PDF)