Seven state-of-the-art tools which were developed to assess the functional impact of noncoding variants were involved in our benchmark. Variants tested in mice and those which were included in the training datasets of GWAVA and EnsembleExpr, and the dataset of DeepSEA for computing the empirical background distributions were excluded. Finally, 37816 positive variants and 5772175 negative variants were included in benchmark.

Note: EnsembleExpr could not finish the benchmark in reasonable time, we random sampling five times (a sampling fraction of 1%) to generate five sampling benchmark datasets and the evaluation of EnsembleExpr was based on the average performance on these five datasets.

State-of-the-art computational tools involvevd
Tool Modeling Approach Year Website Reference

Sensitivity, Specificity, F1 score, Balanced Accuracy, AUROC and AUPRC
Tool Threshold Sensitivity Specificity F1 score Balanced Accuracy AUROC AUPRC

Performance on variants having overlap with GWAS catalog v1.0.2 (except EnsembleExpr)

Performance on variants with different phastCons100way scores (except EnsembleExpr)

Performance on different cell lines (except EnsembleExpr)