Download
unikmer is implemented in Go programming language, statically-linked executable binary files are freely available.
Current Version
v0.20.0 - 2023-11-11
unikmer
:- update help messages
- rename subcommand
uniqs
tomap
. - do not add the extension
.unik
if the value of flag-o/--out-prefix
already has one.
unikmer sort
:- fix the bug of using both
-d/--repeated
and-m/--chunk-size
, which seems to existed for 4 years.
- fix the bug of using both
unikmer merge
:- fix the bug of missing one record when taxid information is contained.
unikmer num
:-f/--force
also support sorted files.
unikmer map
:- fix a bug of missing some regions.
unikmer locate
:- fix seq ID error in output.
- deduplicate output.
Links
OS | Arch | File, 中国镜像 | Download Count |
---|---|---|---|
Linux | 64-bit | unikmer_linux_amd64.tar.gz, 中国镜像 |
|
Linux | arm64 | unikmer_linux_arm64.tar.gz, 中国镜像 |
|
macOS | 64-bit | unikmer_darwin_amd64.tar.gz, 中国镜像 |
|
macOS | arm64 | unikmer_darwin_arm64.tar.gz, 中国镜像 |
|
Windows | 64-bit | unikmer_windows_amd64.exe.tar.gz, 中国镜像 |
Notes:
- please open an issue to request binaries for other platforms.
- run
unikmer version
to check update !!! - run
unikmer autocompletion
to update shell autocompletion script !!!
Installation
Method 1: Install using conda
conda install -c bioconda unikmer
Method 2: Download binaries
Download the compressed
executable file of your operating system,
and decompress it with tar -zxvf *.tar.gz
command or other tools.
And then:
-
For Linux-like systems
-
If you have root privilege, simply copy it to
/usr/local/bin
:sudo cp unikmer /usr/local/bin/
-
Or copy to anywhere in the environment variable
PATH
:mkdir -p $HOME/bin/; cp unikmer $HOME/bin/
-
-
For Windows, just copy
unikmer.exe
toC:\WINDOWS\system32
.
Method 3: Compile from source
-
wget https://go.dev/dl/go1.17.13.linux-amd64.tar.gz tar -zxf go1.17.13.linux-amd64.tar.gz -C $HOME/ # or # echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc # source ~/.bashrc export PATH=$PATH:$HOME/go/bin
-
Compile KMCP
# ------------- the latest stable version ------------- go get -v -u github.com/shenwei356/unikmer/unikmer # The executable binary file is located in: # ~/go/bin/unikmer # You can also move it to anywhere in the $PATH mkdir -p $HOME/bin cp ~/go/bin/unikmer $HOME/bin/ # --------------- the development version -------------- git clone https://github.com/shenwei356/unikmer cd unikmer/unikmer/ go build # The executable binary file is located in: # ./unikmer # You can also move it to anywhere in the $PATH mkdir -p $HOME/bin cp ./unikmer $HOME/bin/
Shell-completion
Supported shell: bash|zsh|fish|powershell
Bash:
# generate completion shell
unikmer autocompletion --shell bash
# configure if never did.
# install bash-completion if the "complete" command is not found.
echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion
echo "source ~/.bash_completion" >> ~/.bashrc
Zsh:
# generate completion shell
unikmer autocompletion --shell zsh --file ~/.zfunc/_unikmer
# configure if never did
echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc
echo "autoload -U compinit; compinit" >> ~/.zshrc
fish:
unikmer autocompletion --shell fish --file ~/.config/fish/completions/unikmer.fish
Release History
v0.19.1 - 2022-12-26
unikmer
: When environment variableUNIKMER_DB
is set, explicitly setting--data-dir
will overide the value ofUNIKMER_DB
.unikmer uniqs
: skip sequences shorter than K.unikmer count/encode
: limit the maximum k-mer size to 64.
v0.19.0 - 2022-04-25
- rename command
genautocomplete
toautocompletion
. - remove command
help
. - change default value of option
-j
from2
to4
. unikmer count/uniqs/locate
: new flag-B/--seq-name-filter
for filtering out unwanted sequences like plasmids.unikmer count
: add support of.xz
and.zst
files.
v0.18.8 - 2021-09-17
- use new version of nthash with better performance.
unikmer info
: fix typoes.
v0.18.7 - 2021-08-30
unikmer
: better counting speed by upstream optimization of FASTA/Q parsing.unikmer concat
: fix parsing flag-n
.
v0.17.3 - 2021-05-16
unikmer
: fix buiding for 386. #21
v0.17.2 - 2021-02-05
unikmer
: slightly speedup for computing LCA.unikmer rfilter:
- flag
-E/--equal-to
supports multiple values. - new flag
-n/--save-predictable-norank
: do not discard some special ranks without order when using -L, where rank of the closest higher node is still lower than rank cutoff.
- flag
v0.17.1 - 2021-01-18
unikmer rfilter:
change handling of black list.
v0.17.0 - 2021-01-15
- syncmer value changed with different hash method.
unikmer count
: syncmer value changed.
v0.16.1 - 2020-12-28
- change Header.Number from
int64
touint64
unikmer info
: fix recounting problem for unsorted kmers but with Number.
v0.16.0 - 2020-12-28
unikmer
:- binary file format change: fix reading long description, and bump version to
5.0
. - better binary file parsing performance.
- binary file format change: fix reading long description, and bump version to
v0.15.0 - 2020-12-25
unikmer
:- binary file minor change: increase description maximal length from 128 B to 1KB.
- separating k-mers (sketches) indexing and searching from
unikmer
, includingunikmer db info/index/search
.
unikmer count
: fix syncmer.unikmer dump
: new flag--hashed
.- rename
unikmer stats
tounikmer info
, and add new columndescription
.
v0.14.0 - 2020-11-25
unikmer union
: fix bug when flag-s
not given.unikmer count/uniqs/locate
: performance improvement on generating k-mers.unikmer count/db
: support scaled/minizimer/syncmer sketch.unikmer stats
: change format.
v0.13.0 - 2020-10-23
- new command
unikmer common
: Finding k-mers shared by most of multiple binary files. unikmer common/count/diff/grep/rfilter/sort/split/union
: faster sorting.unikmer uniqs
: better result for flag--circular
.unikmer search
: fix a bug when searching on database with more than one hash.
v0.12.0 - 2020-09-24
unikmer
:- support longer k (k>32) by saving ntHash.
- new flag
-nocheck-file
for not checking binary file.
- new commands:
unikmer db index
: constructing index from binary filesunikmer db info
: printing information of index fileunikmer db search
: searching sequence from index database
unikmer rfilter
: change format of rank order file.unikmer inter/union
: speedup for single input file.unikmer concat
:- new flag
-t/--taxid
for assigning global taxid, this can slightly reduce file size. - new flag
-n/--number
for setting number of k-mers.
- new flag
unikmer num
:- new flag
-f/--force
for counting k-mers.
- new flag
unikmer locate
: output in BED6.unikmer locate/uniqs
: support multiple genome files.unikmer uniqs
:- stricter multiple mapping limit.
- new flag
-W/--seqs-in-a-file-as-one-genome
.
unikmer count
:- new flag
-u/--unique
for output unique (single copy) kmers
- new flag
v0.11.0 - 2020-07-06
- new command:
unikmer rfilter
for filtering k-mers by taxonomic rank. unikmer inter
: new flag-m/--mix-taxid
allowing part of files being whithout taxids.unikmer dump
: fix a nil pointer bug.unikmer count
:- fix checking taxid in sequence header.
- fix setting global taxid.
unikmer count/diff/union
: slightly reduce memory and speedup when sorting k-mers.unikmer filter
: change scoring.unikmer count/locate/uniqs
: remove flag--circular
.
v0.10.0 - 2020-05-21
unikmer
: fix loading custom taxonomy files.unikmer count
:- new flag
-d
for only count duplicate k-mers, for removing singleton in FASTQ. - fix nil pointer bug of
-t
.
- new flag
unikmer split
: fix memery and last odd k-mer mising bug for given ONE sorted input file.unikmer sort
: skip loading taxonomy data when neither-u
or-d
given.unikmer diff
: 2X speedup, and requiring 1th file being sorted.unikmer inter
: 2-5X speedup, and requiring all files being sorted, sorted output by default.
v0.9.0 - 2020-02-18
unikmer
: new binary format supporting optional Taxids.- deleted command:
unikmer subset
. - new command:
unikmer head
for extracting the first N k-mers. - new command:
unikmer tsplit
for splitting k-mers according to taxid. unikmer grep
: support searching with taxids.unikmer count
: support parsing taxid from FASTA/Q header.
v0.8.0 - 2019-02-09
unikmer
:- new option
-i/--infile-list
, if given, files in the list file are appended to files from cli arguments. - improve performance of binary file reading and writing.
- new option
unikmer sort/split/merge
: safer forcing deletion of existed outdir, and better log.unikmer split
: performance improvement for single sorted input file.unikmer sort
: performance improvement for using-m/--chunk-size
.unikmer grep
: rewrite, support loading queries from .unik files.unikmer dump
: fix number information in output file.unikmer concat
: new flag-s/--sorted
.
v0.7.0 - 2019-09-29
- new command
unikmer filter
: filter low-complexity k-mers. - new command
unikmer split
: split k-mers into sorted chunk files. - new command
unikmer merge
: merge from sorted chunk files. unikmer view
:- new option
-N/--show-code-only
for only showing encoded integers. - fix output error for
-q/--fastq
.
- new option
unikmer uniqs
:- new option
-x/--max-cont-non-uniq-kmers
for limiting max continuous non-unique k-mers. - new option
-X/--max-num-cont-non-uniq-kmers
for limiting max number of continuous non-unique k-mers. - fix bug for
-m/--min-len
.
- new option
unikmer union
:- new option
-d/--repeated
for only printing duplicate k-mers.
- new option
unikmer sort
:- new option
-u/--unique
for removing duplicate k-mers. - new option
-d/--repeated
for only printing duplicate k-mers. - new option
-m/--chunk-size
for limiting maximum memory for sorting.
- new option
unikmer diff
:- small speed improvements.
v0.6.2 - 2019-01-21
unikmer encode
: better output for bits presentation of encoded k-mers (-a/--all
)
v0.6.1 - 2019-01-21
unikmer dump
:- new option
-K/--canonical
to keep the canonical k-mers. - new option
-k/--canonical-only
to only keep the canonical k-mers. - new option
-s/--sorted
to save sorted k-mers.
- new option
unikmer encode
: add option-K/--canonical
to keep the canonical k-mers.
v0.6.0 - 2019-01-20
unikmer
: check encoded integer overflow- new command
unikmer encode
: encode plain k-mer text to integer - new command
unikmer decode
: decode encoded integer to k-mer text
v0.5.3 - 2018-11-28
unikmer count/dump
: check file before handling them.
v0.5.2 - 2018-11-28
unikmer locate
: fix bug.unikmer
: doc update.
v0.5.1 - 2018-11-07
unikmer locate/uniqs
: fix options checking.
v0.5.0 - 2018-11-07
unikmer diff
: fix concurrency bug when cloning kmers from first file.- new command
unikmer locate
: locate Kmers in genome. - new command
unikmer uniqs
: mapping Kmers back to genome and find unique subsequences.
v0.4.4 - 2018-10-27
unikmer
: add global option-L/--compression-level
.unikmer diff
: reduce memory occupation, speed not affected.
v0.4.3 - 2018-10-13
unikmer diff
: fix bug of hanging when the first file having no Kmers.
v0.4.2 - 2018-10-13
unikmer stats/diff
: more intuitional output
v0.4.1 - 2018-10-10
- Better performance of writing and reading binary files
v0.4.0 - 2018-10-09
- Binary serialization format changed.
- new command
unikmer sort
: sort binary files unikmer count/diff/union/inter
: better performance, add option to sort Kmers which significantly reduces file sizeunikmer dump
: changed optionunikmer count
: changed option
v0.3.1 - 2018-09-25
- Binary serialization format changed.
- new command
unikmer stats
: statistics of binary files. unikmer
: adding global option-i/--infile-list
for reading files listed in file.unikmer diff
: fixed a concurrency bug when no diff found.
v0.2.1 - 2018-09-23
unikmer count
: performance improvement and new option--canonical
for only keeping canonical Kmers.
v0.2.0 - 2018-09-09
- new command
unikmer sample
: sample Kmers from binary files. - new global options:
-c, --compact
: write more compact binary file with little loss of speed.-C, --no-compress
: do not compress binary file (not recommended).- some improvements.
v0.1.0 - 2018-08-09
- first release