Skip to content

Commit c8cb7a7

Browse files
committed
final commit readme.md
1 parent 6b46dae commit c8cb7a7

1 file changed

Lines changed: 1 addition & 28 deletions

File tree

README.md

Lines changed: 1 addition & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,6 @@ NABench
66
</h3>
77
</div>
88

9-
<div align="center">
10-
<h4>
11-
📄 <a href="https://openreview.net/forum?id=d0gvsym66h" target="_blank">Paper</a> | 💻 <a href="https://github.com/mrzzmrzz/NABench" target="_blank">Code</a> | 📚 <a href="https://anonymous.4open.science/r/NABench-20CB/" target="_blank">Resources</a>
12-
</h4>
13-
</div>
14-
159
## Overview
1610

1711
Variations in nucleotide sequences often lead to significant changes in fitness. Nucleotide Foundation Models (NFMs) have emerged as a new paradigm in fitness prediction, enabling increasingly accurate estimation of fitness directly from sequence. However, assessing the advantages of these models remains challenging due to the use of diverse and specific experimental datasets, and their performance often varies markedly across different nucleic acid families, complicating **fair comparisons**.
@@ -38,7 +32,7 @@ This suggests fundamental differences in the nature of the representations learn
3832

3933
## Baseline Models
4034

41-
Our benchmark evaluates a total of 27 nucleotide foundation models, which are categorized into four main architectural classes: **BERT-like**, **GPT-like**, **Hyena**, and **LLaMA-based**.
35+
Our benchmark evaluates a total of 29 nucleotide foundation models, which are categorized into four main architectural classes: **BERT-like**, **GPT-like**, **Hyena**, and **LLaMA-based**.
4236

4337
| Model | Params | Max Length | Tokenization | Architecture |
4438
|---|---|---|---|---|
@@ -84,16 +78,6 @@ FILENAME="NABench_DMS_assays.zip"
8478
curl -o ${FILENAME} https://your-hosting-url/NABench/${FILENAME} # Please replace with your data hosting URL
8579
unzip ${FILENAME} && rm ${FILENAME}
8680
```
87-
| Data | Size (unzipped) | Filename |
88-
|---|---|---|
89-
| DMS Assays (processed) | 50MB | NABench_DMS_assays.zip |
90-
| SELEX Assays (processed) | 2.1GB | NABench_SELEX_assays.zip |
91-
| Zero-shot Scores (DMS) | 1.5GB | zero_shot_DMS_scores.zip |
92-
| Zero-shot Scores (SELEX) | 8.0GB | zero_shot_SELEX_scores.zip |
93-
| Supervised Scores | 1.2GB | supervised_scores.zip |
94-
| Cross-Validation Folds | 200MB | cv_folds.zip |
95-
| Raw Data | 2.5GB | raw_data.zip |
96-
9781
## How to Contribute
9882

9983
### New Assays
@@ -154,14 +138,3 @@ This script will generate detailed performance reports, including metrics aggreg
154138
We thank all the researchers and experimentalists who developed the original assays and foundation models that made this benchmark possible. We also acknowledge the invaluable contributions of the communities behind **ProteinGym** and **RNAGym**, which heavily inspired this work.
155139

156140
Please consider citing the corresponding papers of the models and datasets you use from this benchmark.
157-
158-
## Citation
159-
If you use NABench in your work, please cite the following paper:
160-
161-
```bibtex
162-
@article{nawork2024,
163-
title={NABench: Large-Scale Benchmarks of Nucleotide Foundation Models for Fitness Prediction},
164-
author={Antiquus S. Hippocampus and Natalia Cerebro and Amelie P. Amygdale and Ji Q. Ren and Yevgeny LeNet},
165-
year={2024},
166-
journal={ICLR 2026 Conference Track on Datasets and Benchmarks}
167-
}

0 commit comments

Comments
 (0)