Seven state-of-the-art tools which were developed to assess the functional impact of noncoding variants were involved in our benchmark. Variants tested in mice and those which were included in the training datasets of GWAVA and EnsembleExpr, and the dataset of DeepSEA for computing the empirical background distributions were excluded. Finally, 37816 positive variants and 5772175 negative variants were included in benchmark.
Note: EnsembleExpr could not finish the benchmark in reasonable time, we random sampling five times (a sampling fraction of 1%) to generate five sampling benchmark datasets and the evaluation of EnsembleExpr was based on the average performance on these five datasets.
|Tool||Threshold||Sensitivity||Specificity||F1 score||Balanced Accuracy||AUROC||AUPRC|