Similarity impacts machine learning-based scoring function

Benchmark dataset

The similarity between the original training and test samples. In each of the file below, the first line is the PDB IDs of the test proteins. The first column is the PDB IDs of the training proteins.

pairwise structural similarity

pairwise sequences similarity

Similarities inside the training and test sets (not used in the paper but may be useful for related studies)