Download
KMCP is implemented in Go programming language, statically-linked executable binary files are freely available.
SIMD instructions support
SIMD extensions including AVX512
, AVX2
, SSE2
are sequentially detected and used
in two packages for better searching performance.
- pand, for accelerating searching on databases constructed with multiple hash functions.
- pospop, for batch counting matched k-mers in bloom filters.
ARM architecture is supported, but kmcp search
would be slower.
Current Version
v0.9.1 - 2022-12-26 
kmcp search
- faster speed for ARM architectures.
- fix compilation for ARM architectures.
Links
OS | Arch | File, 中国镜像 | Download Count |
---|---|---|---|
Linux | 64-bit | kmcp_linux_amd64.tar.gz, 中国镜像 |
|
Linux | arm64 | kmcp_linux_arm64.tar.gz, 中国镜像 |
|
macOS | 64-bit | kmcp_darwin_amd64.tar.gz, 中国镜像 |
|
macOS | arm64 | kmcp_darwin_arm64.tar.gz, 中国镜像 |
|
Windows | 64-bit | kmcp_windows_amd64.exe.tar.gz, 中国镜像 |
Notes:
- please open an issue to request binaries for other platforms or compile from the source.
- run
kmcp version
to check update !!! - run
kmcp autocompletion
to update shell autocompletion script !!!
Installation
Method 1: Install using conda

conda install -c bioconda kmcp
Method 2: Download binaries
Download the compressed
executable file of your operating system,
and decompress it with tar -zxvf *.tar.gz
command or other tools.
And then:
-
For Linux-like systems
-
If you have root privilege, simply copy it to
/usr/local/bin
:sudo cp kmcp /usr/local/bin/
-
Or copy to anywhere in the environment variable
PATH
:mkdir -p $HOME/bin/; cp kmcp $HOME/bin/
-
-
For Windows, just copy
kmcp.exe
toC:\WINDOWS\system32
.
Method 3: Compile from source
-
wget https://go.dev/dl/go1.17.13.linux-amd64.tar.gz tar -zxf go1.17.13.linux-amd64.tar.gz -C $HOME/ # or # echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc # source ~/.bashrc export PATH=$PATH:$HOME/go/bin
-
Compile KMCP
# ------------- the latest stable version ------------- go get -v -u github.com/shenwei356/kmcp/kmcp # The executable binary file is located in: # ~/go/bin/kmcp # You can also move it to anywhere in the $PATH mkdir -p $HOME/bin cp ~/go/bin/kmcp $HOME/bin/ # --------------- the development version -------------- git clone https://github.com/shenwei356/kmcp cd kmcp/kmcp/ go build # The executable binary file is located in: # ./kmcp # You can also move it to anywhere in the $PATH mkdir -p $HOME/bin cp ./kmcp $HOME/bin/
Shell-completion
Supported shell: bash|zsh|fish|powershell
Bash:
# generate completion shell
kmcp autocompletion --shell bash
# configure if never did.
# install bash-completion if the "complete" command is not found.
echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion
echo "source ~/.bash_completion" >> ~/.bashrc
Zsh:
# generate completion shell
kmcp autocompletion --shell zsh --file ~/.zfunc/_kmcp
# configure if never did
echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc
echo "autoload -U compinit; compinit" >> ~/.zshrc
fish:
kmcp autocompletion --shell fish --file ~/.config/fish/completions/kmcp.fish
Release History
v0.9.0 - 2022-09-28 
compute
:- smaller output files and faster speed.
- more even genome splitting.
index
:- faster speed due to smaller input files.
search
:- more accurate and smaller query FPR following Theorem 2 in SBT paper, instead of the Chernoff bound.
- change the default value of
-f/--max-fpr
from 0.05 to 0.01. - 10-20% speedup.
profile
:- more accurate abundance estimation using EM algorithm.
- change the default value of
-f/--max-fpr
from 0.05 to 0.01. - mode 0: change the default value of
-H/--min-hic-ureads-qcov
from 0.55 to 0.7. - increase float width of reference coverage in KMCP profile format from 2 to 6.
util query-fpr
:- compute query FPR following Theorem 2 in SBT paper, instead of the Chernoff bound.
- new commands:
utils split-genomes
for splitting genomes into chunks.utils ref-info
for printing information of reference (chunks), including the number of k-mers and the actual false-positive rate.
v0.8.3 - 2022-08-15 
kmcp
: fix compiling from source for ARM architectures.#17search
:- fix searching with paired-end reads where the read2 is shorter than the value of
--min-query-len
. #10 - fix the log. #8
- a new flag
-f/--max-fpr
: maximum false positive rate of a query (default 0.05). It reduces the unnecessary output when searching with a low minimum query coverage (-t/--min-query-cov
).
- fix searching with paired-end reads where the read2 is shorter than the value of
profile
:- recommend using the flag
--no-amb-corr
to disable ambiguous reads correction when >= 1000 candidates are detected. - fix logging when using
--level strain
and no taxonomy given.
- recommend using the flag
v0.8.2 - 2022-03-26 
search
:- flag
-g/--query-whole-file
:- fix panic for invalid input.
- add gaps of
k-1
bp before concatatenating seqs.
- add warning for invalid input.
- flag
profile
:- allow modifying parts of parameters in preset profiling modes. #5
- decrease thresholds of minimum reads and unique reads in preset profiling modes 1 and 2 for low coverage sequence data. the profiling results generated with mode 3 in the manuscript are not affected.
v0.8.1 - 2022-03-07 
- update help message, show common usages, add examples, add notes to important options.
v0.8.0 - 2022-02-24 
- commands:
- new command
utils cov2simi
: Convert k-mer coverage to sequence similarity. - new command
utils query-fpr
: Compute the maximum false positive rate of a query.
- new command
compute
:- update doc.
- add flags compatibility check.
search
:- output the false positive rate of each match, rather than the FPR upper bound of the query. this could save some short queries with high similarity.
- change default values of reads filter, because clinical data contain many short reads.
-c/--min-uniq-reads
:30
->10
.-m/--min-query-len
:70
->30
.
- update doc.
profile
:- rename flags:
--keep-main-matches
->--keep-main-matches
.--keep-perfect-match
->--keep-perfect-matches
.
- change default values:
--max-qcov-gap
:0.2
->0.4
.
- mode 0 (pathogen detection):
- switch on flag
--keep-main-matches
- use
--max-qcov-gap 0.4
- switch on flag
- update doc.
- rename flags:
v0.7.1 - 2022-02-08 
profile
:- new flag
--metaphlan-report-version
and the default value is3
. #4 - column name renamed: from
fragsFrac
,fragsRelDepth
,fragsRelDepthStd
tochunksFrac
,chunksRelDepth
,chunksRelDepthStd
. - fix computation of
chunksRelDepth
. - slightly improve sensitivity for
-m 0
.
- new flag
v0.7.0 - 2022-01-24 
- commands:
- new command
utils filter
: Filter search results and find species-specific queries. - new command
utils merge-regions
: Merge species/assembly-specific regions. - rename
info
toutils index-info
.
- new command
compute
:- skip k-mer containing Ns.
- when splitting genome into fragments, sequences are concatenated with k-1 'N's instead of directly concatenation. It eliminates fake k-mers at the concatenation position.
- set default value for flag
-N/--ref-name-regexp
:(?i)(.+)\.(f[aq](st[aq])?|fna)(.gz)?$
. - fix a rare bug when splitting FASTQ files.
search
:- support searching with paired-end reads which has a higher specificity and a lower sensitivity.
A flag
--try-se
is added for search read1/read2 when the paired end reads have no hits. - fix matches order of a query.
- fix queries with many Ns.
- change default value of flag
-t/--min-query-qcov
from0.6
to0.55
(similarity~96.5%
). - change default value of flag
-n/--keep-top-scores
from5
to0
, i.e., keep all matches by default. - new flag
-w/--load-whole-db
: load all index files into memory. - 10-25% faster.
- better log.
- support searching with paired-end reads which has a higher specificity and a lower sensitivity.
A flag
merge
:- fix adding up
hits
. - fix bug of incorrect order, reduce memory usage.
- support one input file.
- fix adding up
profile
:- change analysis workflow, using 4 stages.
- output format change: new column
coverage
,fragsRelDepth
andfragsRelDepthStd
. - change default file extension of binning file.
- check if the taxid of a target is given by taxid mapping file.
- automatically switch to the new taxid for a merged one.
- change computation of
score
. - new flag
-d/--max-frags-depth-stdev
. - new option
-m/--mode
. - change default value of flag
-t/--min-query-qcov
from0.6
to0.55
(similarity~96.5%
). - change default value of flag
-n/--keep-top-qcovs
from5
to0
(keep all matches). - change default value of falg
-f/--max-fpr
from0.01
to0.05
. - change default value of flag
-H/--min-hic-ureads-qcov
from0.8
to0.75
(similarity~98%
). - faster search result parsing.
v0.6.0 - 2021-08-13
- new command:
merge
: merge search results from multiple databases.
compute
:- fix splitting very short genomes.
- remove flag
-e/--exact-number
, making it default.
index
:- do not roundup sizes of indexes. The searching speed is not
affected and even faster due to optimization of
search
command. - use three k-mers thresholds to control index file size.
- better control of cocurrency number and better progress bar.
- do not support RAMBO index anymore.
- do not roundup sizes of indexes. The searching speed is not
affected and even faster due to optimization of
search
:- 1.37X speedup, and faster for database with two or more hash functions.
- new flag
-S/--do-not-sort
.
profile
:- fix a nil pointer bug when no taxid mapping data given.
- fix number of ureads.
- new flag
-m/--keep-main-matches
and--max-score-gap
v0.5.0 - 2021-06-24
compute
:- support multiple sizes of k-mer.
- fix bug of
--by-seq
. - more log.
index
:- default block size is computed by
-j/--threads
instead of number of CPUs.
- default block size is computed by
search
:- show real-time processing speed.
- new flag
-g/--query-whole-file
. - new flag
-u/--kmer-dedup-threshold
. - new flag
-m/--min-query-len
. - increase speed for database with mulitple hashes.
profile
:- better decision of the existence of a reference.
- new flag
-B/--binning-result
for output reads binning result. - new flag
-m/--norm-abund
.
v0.4.0 - 2021-04-08
- new command:
profile
for generating taxonomic profile from search result.
compute
:- new flag
-B/--seq-name-filter
for filtering out unwanted sequences like plasmid. - new flag
-N/--ref-name-regexp
for extracting reference name from sequence file.
- new flag
search
:- change default threshold value.
- new flag
-n/--keep-top-scores
for keeping matches with the top N score.
v0.3.0 - 2021-03-16
- use
--quiet
to replace--verbose
, making printing log info default. search
:- fix computing intersetion between repeats.
- fix closing mmap on Windows.
- change output format and add Jaccard Index.
- speedup by parallelizing name mapping and database closing.
- flush result immediately.
- keep the output order by default
compute
: change default file regexp for matching.fna
files.autocompletion
: support bash, zsh, fish, powershell.
v0.2.1 - 2020-12-31
index
: reduce memory occupation.
v0.2.0 - 2020-12-30
- Add support of RAMBO like indexing.
- Limit to only one input database.
- Change output format.
v0.1.0 - 2020-xx-xx
- First release with basic function.