REVA

Benchmark

Seven state-of-the-art tools which were developed to assess the functional impact of noncoding variants were involved in our benchmark. Variants tested in mice and those which were included in the training datasets of GWAVA and EnsembleExpr, and the dataset of DeepSEA for computing the empirical background distributions were excluded. Finally, 37816 positive variants and 5772175 negative variants were included in benchmark.

Note: EnsembleExpr could not finish the benchmark in reasonable time, we random sampling five times (a sampling fraction of 1%) to generate five sampling benchmark datasets and the evaluation of EnsembleExpr was based on the average performance on these five datasets.

State-of-the-art computational tools involvevd

Tool	Modeling Approach	Year	Website	Reference

Results

Sensitivity, Specificity, F1 score, Balanced Accuracy, AUROC and AUPRC

Tool	Threshold	Sensitivity	Specificity	F1 score	Balanced Accuracy	AUROC	AUPRC

Benchmark

State-of-the-art computational tools involvevd

Results

Sensitivity, Specificity, F1 score, Balanced Accuracy, AUROC and AUPRC

Performance on variants having overlap with GWAS catalog v1.0.2 (except EnsembleExpr)

Performance on variants with different phastCons100way scores (except EnsembleExpr)

Performance on different cell lines (except EnsembleExpr)