Searching benchmarks
Software, datasets and commands details.
Softwares
GTDB r202 representative genomes are used for tests:
- file size: 46.26 GB
- files: 47,894
- bases: 151.94 Gb
KMCP vs COBS
All k-mers are indexed and searched.
Database size and building time:
| cobs | kmcp | |
|---|---|---|
| database size | 86.96GB | 55.15GB |
| building time | 29m55s | 21min04s |
| temporary files | 160.76GB | 935.11G |
Searching with bacterial genomes or short reads (~1M reads).
KMCP vs Mash and Sourmash
Only FracMinHash (Scaled MinHash) (scale=1000 for Sourmash and KMCP) or MinHash (scale=3400 for Mash) are indexed and searched.
Database size and building time:
| mash | sourmash | kmcp | |
|---|---|---|---|
| database size | 1.22GB | 5.19GB | 1.52GB |
| buiding time | 13m30s | 40m39s | 7min59s |
| temporary files | - | - | 1.85GB |
Searching with bacterial genomes.
Result
