Prediction of Intrinsically Disordered Functional Regions By IDPFunNet

training dataset (552 sequences)

validation dataset (227 sequences)

TE210 dataset (210 sequences)

TE83 dataset (83 sequences)

All three datasets share the same structure:

Line 1: Protein ID - A unique identifier for each protein.
Line 2: Protein Sequence - Encoded using 1-letter amino acid representation.
Line 3: Annotations of Intrinsic Disorder Regions (IDR) - '1' indicates an IDR, '0' indicates a non-IDR.
Line 4: Annotations of Disordered Protein-binding Regions (PB) - '1' indicates a PB, '0' indicates a non-PB.
Line 5: Annotations of Disordered Nucleic Acid-binding Regions (NB) - '1' indicates a NB, '0' indicates a non-NB.
Line 6: Annotations of Disordered Lipid-binding Regions (LB) - '1' indicates a LB, '0' indicates a non-LB.
Line 7: Annotations of Disordered Ion-binding Regions (IB) - '1' indicates an IB, '0' indicates a non-IB.
Line 8: Annotations of Disordered Small Molecule-binding Regions (SB) - '1' indicates a SB, '0' indicates a non-SB.
Line 9: Annotations of Disordered Flexible Linkers (DFL) - '1' indicates a DFL, '0' indicates a non-DFL.

All four datasets share the same structure:

Line 1: Protein ID - A unique identifier for each protein.
Line 2: Protein Sequence - Encoded using 1-letter amino acid representation.
Line 3: Annotations of Disordered Binding Regions(BR)/Disordered Flexible Linkers(DFL) - '1' indicates a BR/DFL, '0' indicates a BR/DFL.

Liang et al, Hybrid Deep Learning with Protein Language Models and Dual-Path Architecture for Predicting IDP Functions, submitted, 2025.