We evaluated our method at the level of individual binding pockets, considering all ligand-binding pockets for each protein. Metrics such as MCC, DCA, DCC, precision, recall, and F1-score were computed for each pocket and then averaged to get the protein-level performance. Finally, results from all proteins were combined to report the overall performance across the dataset.
The evaluation script is available in COACH-D2.0_evaluate.py.
Benchmark Datasets
To comprehensively evaluate the performance of our method, we employed three benchmark datasets: Coach420, Holo1k, and Holo243.
1. COACH420
The Coach420 dataset is a widely used benchmark consisting of 420 monomeric protein–ligand complexes.
The protein structures and ligand information were obtained from the P2Rank dataset repository.
Label: Coach420_label.csv
Binding site annotations were defined using a geometric criterion: A residue is considered a binding residue if its closest atomic distance to the ligand is below a predefined threshold, calculated as the sum of the Van der Waals radii of the corresponding atoms plus 0.5 Å.
To further refine the annotations, we merged certain k-mer ligands, defined as k covalently connected small molecules occupying the same binding pocket. In addition, we excluded ligands that are biologically unrelated but spatially adjacent, in order to prevent the inclusion of biologically irrelevant binding sites.
2. Holo1k
The Holo1k dataset comprises 1,169 monomeric and multimeric proteins.
Data: P2Rank holo4k repository
Holo1k is a curated subset of the P2Rank Holo4k dataset, optimized for runtime efficiency in COACH-D 2.0. CD-HIT clustering (60% identity) was used to reduce redundancy, resulting in 1,169 representative proteins.
The following chart shows the comparison statistics between Holo1k and Holo4k:
(a) Number of protein chains
(b) Average chain length
(c) Number of binding pockets
The results show that the two datasets exhibit highly similar distributions for these features. This demonstrates that Holo1k effectively reduces redundancy while preserving the overall characteristics of the original dataset, making it a representative subset.
Label: Holo1k_label.csv
Binding site annotations were generated using the same geometric criterion described for Coach420. Similarly, k-mer ligands were merged, and biologically unrelated yet spatially adjacent ligands were excluded to ensure annotation relevance.
3. Holo243
The Holo243 dataset includes 243 cross-chain protein–ligand complexes selected from the Holo1k dataset, designed to evaluate performance on cross-chain binding pockets.
Data: Holo243.list
To assess performance in practical multimeric scenarios, we predicted the 3D structures of these complexes using the AlphaFold3 webserver. The predicted structures are relatively accurate, with an average an average global TM-score of 0.97, and an average pocket TM-score of 0.71.
Detailed statistics on the TM-score for each protein can be found in Holo243_statistics.csv here.
Label: Holo243_label.csv
Binding site annotations in Holo243 follow the same protocol as Coach420 and Holo1k.