Skip to content

Download

KMCP is implemented in Go programming language, statically-linked executable binary files are freely available.

SIMD instructions support

SIMD extensions including AVX512, AVX2, SSE2 are sequentially detected and used in two packages for better searching performance.

  • pand, for accelerating searching on databases constructed with multiple hash functions.
  • pospop, for batch counting matched k-mers in bloom filters.

Current Version

v0.9.0 - 2022-09-28 Github Releases (by Release)

  • compute:
    • smaller output files and faster speed.
    • more even genome splitting.
  • index:
    • faster speed due to smaller input files.
  • search:
    • more accurate and smaller query FPR following Theorem 2 in SBT paper, instead of the Chernoff bound.
    • change the default value of -f/--max-fpr from 0.05 to 0.01.
    • 10-20% speedup.
  • profile:
    • more accurate abundance estimation using EM algorithm.
    • change the default value of -f/--max-fpr from 0.05 to 0.01.
    • mode 0: change the default value of -H/--min-hic-ureads-qcov from 0.55 to 0.7.
    • increase float width of reference coverage in KMCP profile format from 2 to 6.
  • util query-fpr:
    • compute query FPR following Theorem 2 in SBT paper, instead of the Chernoff bound.
  • new commands:
    • utils split-genomes for splitting genomes into chunks.
    • utils ref-info for printing information of reference (chunks), including the number of k-mers and the actual false-positive rate.
OS Arch File, 中国镜像 Download Count
Linux 64-bit kmcp_linux_amd64.tar.gz,
中国镜像
Github Releases (by Asset)
Linux arm64 kmcp_linux_arm64.tar.gz,
中国镜像
Github Releases (by Asset)
macOS 64-bit kmcp_darwin_amd64.tar.gz,
中国镜像
Github Releases (by Asset)
macOS arm64 kmcp_darwin_arm64.tar.gz,
中国镜像
Github Releases (by Asset)
Windows 64-bit kmcp_windows_amd64.exe.tar.gz,
中国镜像
Github Releases (by Asset)

Notes:

  • please open an issue to request binaries for other platforms or compile from the source.
  • run kmcp version to check update !!!
  • run kmcp autocompletion to update shell autocompletion script !!!

Installation

Method 1: Install using conda Anaconda Cloud downloads

conda install -c bioconda kmcp

Method 2: Download binaries

Download the compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz command or other tools. And then:

  • For Linux-like systems

    • If you have root privilege, simply copy it to /usr/local/bin:

      sudo cp kmcp /usr/local/bin/
      
    • Or copy to anywhere in the environment variable PATH:

      mkdir -p $HOME/bin/; cp kmcp $HOME/bin/
      
  • For Windows, just copy kmcp.exe to C:\WINDOWS\system32.

Method 3: Compile from source

  1. Install go

    wget https://go.dev/dl/go1.17.13.linux-amd64.tar.gz
    
    tar -zxf go1.17.13.linux-amd64.tar.gz -C $HOME/
    
    # or 
    #   echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc
    #   source ~/.bashrc
    export PATH=$PATH:$HOME/go/bin
    
  2. Compile KMCP

    # ------------- the latest stable version -------------
    
    go get -v -u github.com/shenwei356/kmcp/kmcp
    
    # The executable binary file is located in:
    #   ~/go/bin/kmcp
    # You can also move it to anywhere in the $PATH
    mkdir -p $HOME/bin
    cp ~/go/bin/kmcp $HOME/bin/
    
    # --------------- the development version --------------
    
    git clone https://github.com/shenwei356/kmcp
    cd kmcp/kmcp/
    go build
    
    # The executable binary file is located in:
    #   ./kmcp
    # You can also move it to anywhere in the $PATH
    mkdir -p $HOME/bin
    cp ./kmcp $HOME/bin/
    

Shell-completion

Supported shell: bash|zsh|fish|powershell

Bash:

# generate completion shell
kmcp autocompletion --shell bash

# configure if never did.
# install bash-completion if the "complete" command is not found.
echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion
echo "source ~/.bash_completion" >> ~/.bashrc

Zsh:

# generate completion shell
kmcp autocompletion --shell zsh --file ~/.zfunc/_kmcp

# configure if never did
echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc
echo "autoload -U compinit; compinit" >> ~/.zshrc

fish:

kmcp autocompletion --shell fish --file ~/.config/fish/completions/kmcp.fish

Release History

v0.8.3 - 2022-08-15 Github Releases (by Release)

  • kmcp: fix compiling from source for ARM architectures.#17
  • search:
    • fix searching with paired-end reads where the read2 is shorter than the value of --min-query-len. #10
    • fix the log. #8
    • a new flag -f/--max-fpr: maximum false positive rate of a query (default 0.05). It reduces the unnecessary output when searching with a low minimum query coverage (-t/--min-query-cov).
  • profile:
    • recommend using the flag --no-amb-corr to disable ambiguous reads correction when >= 1000 candidates are detected.
    • fix logging when using --level strain and no taxonomy given.

v0.8.2 - 2022-03-26 Github Releases (by Release)

  • search:
    • flag -g/--query-whole-file:
      • fix panic for invalid input.
      • add gaps of k-1 bp before concatatenating seqs.
    • add warning for invalid input.
  • profile:
    • allow modifying parts of parameters in preset profiling modes. #5
    • decrease thresholds of minimum reads and unique reads in preset profiling modes 1 and 2 for low coverage sequence data. the profiling results generated with mode 3 in the manuscript are not affected.

v0.8.1 - 2022-03-07 Github Releases (by Release)

  • update help message, show common usages, add examples, add notes to important options.

v0.8.0 - 2022-02-24 Github Releases (by Release)

  • commands:
    • new command utils cov2simi: Convert k-mer coverage to sequence similarity.
    • new command utils query-fpr: Compute the maximum false positive rate of a query.
  • compute:
    • update doc.
    • add flags compatibility check.
  • search:
    • output the false positive rate of each match, rather than the FPR upper bound of the query. this could save some short queries with high similarity.
    • change default values of reads filter, because clinical data contain many short reads.
      • -c/--min-uniq-reads: 30 -> 10.
      • -m/--min-query-len: 70 -> 30.
    • update doc.
  • profile:
    • rename flags:
      • --keep-main-matches -> --keep-main-matches.
      • --keep-perfect-match -> --keep-perfect-matches.
    • change default values:
      • --max-qcov-gap: 0.2 -> 0.4.
    • mode 0 (pathogen detection):
      • switch on flag --keep-main-matches
      • use --max-qcov-gap 0.4
    • update doc.

v0.7.1 - 2022-02-08 Github Releases (by Release)

  • profile:
    • new flag --metaphlan-report-version and the default value is 3. #4
    • column name renamed: from fragsFrac, fragsRelDepth, fragsRelDepthStd to chunksFrac, chunksRelDepth, chunksRelDepthStd.
    • fix computation of chunksRelDepth.
    • slightly improve sensitivity for -m 0.

v0.7.0 - 2022-01-24 Github Releases (by Release)

  • commands:
    • new command utils filter: Filter search results and find species-specific queries.
    • new command utils merge-regions: Merge species/assembly-specific regions.
    • rename info to utils index-info.
  • compute:
    • skip k-mer containing Ns.
    • when splitting genome into fragments, sequences are concatenated with k-1 'N's instead of directly concatenation. It eliminates fake k-mers at the concatenation position.
    • set default value for flag -N/--ref-name-regexp: (?i)(.+)\.(f[aq](st[aq])?|fna)(.gz)?$.
    • fix a rare bug when splitting FASTQ files.
  • search:
    • support searching with paired-end reads which has a higher specificity and a lower sensitivity. A flag --try-se is added for search read1/read2 when the paired end reads have no hits.
    • fix matches order of a query.
    • fix queries with many Ns.
    • change default value of flag -t/--min-query-qcov from 0.6 to 0.55 (similarity ~96.5%).
    • change default value of flag -n/--keep-top-scores from 5 to 0, i.e., keep all matches by default.
    • new flag -w/--load-whole-db: load all index files into memory.
    • 10-25% faster.
    • better log.
  • merge:
    • fix adding up hits.
    • fix bug of incorrect order, reduce memory usage.
    • support one input file.
  • profile:
    • change analysis workflow, using 4 stages.
    • output format change: new column coverage, fragsRelDepth and fragsRelDepthStd.
    • change default file extension of binning file.
    • check if the taxid of a target is given by taxid mapping file.
    • automatically switch to the new taxid for a merged one.
    • change computation of score.
    • new flag -d/--max-frags-depth-stdev.
    • new option -m/--mode.
    • change default value of flag -t/--min-query-qcov from 0.6 to 0.55 (similarity ~96.5%).
    • change default value of flag -n/--keep-top-qcovs from 5 to 0 (keep all matches).
    • change default value of falg -f/--max-fpr from 0.01 to 0.05.
    • change default value of flag -H/--min-hic-ureads-qcov from 0.8 to 0.75 (similarity ~98%).
    • faster search result parsing.

v0.6.0 - 2021-08-13

  • new command:
    • merge: merge search results from multiple databases.
  • compute:
    • fix splitting very short genomes.
    • remove flag -e/--exact-number, making it default.
  • index:
    • do not roundup sizes of indexes. The searching speed is not affected and even faster due to optimization of search command.
    • use three k-mers thresholds to control index file size.
    • better control of cocurrency number and better progress bar.
    • do not support RAMBO index anymore.
  • search:
    • 1.37X speedup, and faster for database with two or more hash functions.
    • new flag -S/--do-not-sort.
  • profile:
    • fix a nil pointer bug when no taxid mapping data given.
    • fix number of ureads.
    • new flag -m/--keep-main-matches and --max-score-gap

v0.5.0 - 2021-06-24

  • compute:
    • support multiple sizes of k-mer.
    • fix bug of --by-seq.
    • more log.
  • index:
    • default block size is computed by -j/--threads instead of number of CPUs.
  • search:
    • show real-time processing speed.
    • new flag -g/--query-whole-file.
    • new flag -u/--kmer-dedup-threshold.
    • new flag -m/--min-query-len.
    • increase speed for database with mulitple hashes.
  • profile:
    • better decision of the existence of a reference.
    • new flag -B/--binning-result for output reads binning result.
    • new flag -m/--norm-abund.

v0.4.0 - 2021-04-08

  • new command:
    • profile for generating taxonomic profile from search result.
  • compute:
    • new flag -B/--seq-name-filter for filtering out unwanted sequences like plasmid.
    • new flag -N/--ref-name-regexp for extracting reference name from sequence file.
  • search:
    • change default threshold value.
    • new flag -n/--keep-top-scores for keeping matches with the top N score.

v0.3.0 - 2021-03-16

  • use --quiet to replace --verbose, making printing log info default.
  • search:
    • fix computing intersetion between repeats.
    • fix closing mmap on Windows.
    • change output format and add Jaccard Index.
    • speedup by parallelizing name mapping and database closing.
    • flush result immediately.
    • keep the output order by default
  • compute: change default file regexp for matching .fna files.
  • autocompletion: support bash, zsh, fish, powershell.

v0.2.1 - 2020-12-31

  • index: reduce memory occupation.

v0.2.0 - 2020-12-30

  • Add support of RAMBO like indexing.
  • Limit to only one input database.
  • Change output format.

v0.1.0 - 2020-xx-xx

  • First release with basic function.