Download
SeqKit is implemented in Go programming language, statically-linked executable binary files are freely available.
Please cite:
Wei Shen*, Botond Sipos, and Liuyang Zhao. 2024. SeqKit2: A Swiss Army Knife for Sequence and Alignment Processing. iMeta e191. doi:10.1002/imt2.191.
Current Version
- SeqKit v2.10.1 - 2025-08-19
seqkit seq:- fix validating sequences: it failed to report an error when the invalid sequence is not the last one in the input. #536
seqkit stats:- fix decimal places of some fields when using
-T.
- fix decimal places of some fields when using
seqkit fx2tab:- fix the calculation of GC content (
--gc). Previously, the denominator was the total sequence length, which could lead to inaccuracies due to the potential presence of gaps in the sequence. #515
- fix the calculation of GC content (
seqkit sample:- fix
-nfor in-memory mode. #518
- fix
seqkit subseq:- fix the bug that subseq --feature is not case insensitive. #523
seqkit grep/locate/mutate:- update help message for
-p/--pattern, to show how to set multiple values. #527 by @corneliusroemer
- update help message for
Links
| OS | Arch | File, 中国镜像 | Download Count |
|---|---|---|---|
| Linux | 32-bit | seqkit_linux_386.tar.gz, 中国镜像 |
|
| Linux | 64-bit | seqkit_linux_amd64.tar.gz, 中国镜像 |
|
| Linux | arm64 | seqkit_linux_arm64.tar.gz, 中国镜像 |
|
| macOS | 64-bit | seqkit_darwin_amd64.tar.gz, 中国镜像 |
|
| macOS | arm64 | seqkit_darwin_arm64.tar.gz, 中国镜像 |
|
| Windows | 32-bit | seqkit_windows_386.exe.tar.gz, 中国镜像 |
|
| Windows | 64-bit | seqkit_windows_amd64.exe.tar.gz, 中国镜像 |
Notes
- please open an issue to request binaries for other platforms.
- run
seqkit versionto check update !!! - run
seqkit genautocompleteto update shell autocompletion script !!!
Installation
Method 1: Download binaries (latest stable version)
Just download compressed
executable file of your operating system,
and decompress it with tar -zxvf *.tar.gz command or other tools.
And then:
-
For Linux-like systems
-
If you have root privilege simply copy it to
/usr/local/bin:sudo cp seqkit /usr/local/bin/ -
Or copy to anywhere in the environment variable
PATH:mkdir -p $HOME/bin/; cp seqkit $HOME/bin/
-
-
For windows, just copy
seqkit.exetoC:\WINDOWS\system32.
Method 2: Install via conda or pixi (latest stable version)

# conda or mamba
conda install -c bioconda seqkit
# pixi
pixi global install -c bioconda seqkit
Method 3: Install via homebrew (latest stable version)
brew install seqkit
Method 5: Docker based installation (might not be latest stable versio)
git clone this repo:
git clone https://github.com/shenwei356/seqkit
Run the following commands:
cd seqkit
docker build -t shenwei356/seqkit .
docker run -it shenwei356/seqkit:latest
Method 6: Compiling from source (latest stable/dev version)
# ------------------- install golang -----------------
# download Go from https://go.dev/dl
wget https://go.dev/dl/go1.25.0.linux-amd64.tar.gz
tar -zxf go1.25.0.linux-amd64.tar.gz -C $HOME/
# or
# echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc
# source ~/.bashrc
export PATH=$PATH:$HOME/go/bin
# --------------- the stable/development version --------------
git clone https://github.com/shenwei356/seqkit
cd seqkit/seqkit/
# optionally choose a release
# git check v2.10.1
export GOEXPERIMENT=greenteagc # for go1.25
go build -trimpath -ldflags="-s -w" -tags netgo
# The executable binary file is located in:
# ./seqkit
# You can also move it to anywhere in the $PATH
mkdir -p $HOME/bin
cp ./seqkit $HOME/bin/
Shell-completion
Supported shell: bash|zsh|fish|powershell
Bash:
# generate completion shell
seqkit genautocomplete --shell bash
# configure if never did.
# install bash-completion if the "complete" command is not found.
echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion
echo "source ~/.bash_completion" >> ~/.bashrc
Zsh:
# generate completion shell
seqkit genautocomplete --shell zsh --file ~/.zfunc/_seqkit
# configure if never did
echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc
echo "autoload -U compinit; compinit" >> ~/.zshrc
fish:
seqkit genautocomplete --shell fish --file ~/.config/fish/completions/seqkit.fish
Release history
- SeqKit v2.10.0 - 2025-03-12
seqkit:- add a global flag
--skip-file-check: skip input file checking when given a file list if you believe these files do exist. It helps to reduce file checking time when given a huge number of sequence files.
- add a global flag
seqkit split2:- fix prefix checking when paired-end files are given. #512
seqkit stat:- do not compute GC content and N's for protein sequence. #497
seqkit grep:- add early exit for
--delete-matchedwhen no patterns remain #505 by @sawyerknoblich
- add early exit for
seqkit concat:- add an option
-F/--fillto use a sequence of "-" for IDs missing in some files, can be used in MSA results. #510
- add an option
- SeqKit v2.9.0 - 2024-11-01
seqkit:- Fix sequence ID parsing with the default regular expression (in this case, we actually use bytes.Index instead) for a rare case: "xxx\tyyy zzz" was wrongly parsed as "xxx\tyyy". #486
seqkit locate:- Fix
-G/--non-greedyfor tandem repeats, e.g., ATTCGATTCGATTCG (ATTCGx3).
- Fix
seqkit grep/subseq:- Fix negative regions longer than sequence length. #479.
seqkit stats:- Add an extra column
sum_nto count the number of ambiguous characters. #490
- Add an extra column
- SeqKit v2.8.2 - 2024-05-17
seqkit amplicon:- Fix a big introduced in v2.7.0. When more than one pairs of primers are given, only the last one is used. #457
seqkit translate:- Add option
-e/--skip-translate-errorsto skip translate error and output empty sequence. #458
- Add option
seqkit split:- Add flag
-I/--ignore-casefor-i/--by-id. #462
- Add flag
- SeqKit v2.8.1 - 2024-04-07
- SeqKit v2.8.0 - 2024-03-11
seqkit stats:- Add column
N50_num, an alias of L50, #15.
- Add column
seqkit seq/locate/fish/watch:- Removing the flag
-V/--validate-seq-length. Now the whole sequence will be checked if-v/--validate-seqis given.
- Removing the flag
seqkit amplicon:- Fix the speed problem, introduced in v2.7.0. #439.
- Slightly faster by reusing objects.
seqkit seq:- Change the threshold sequence length for parallelizing complement sequence computation, 1kb->1Mb.
- SeqKit v2.7.0 - 2024-01-31
seqkit:- Grouping subcommands in help message, which is intuitive for beginners.
seqkit grep:- New flag:
-D/--allow-duplicated-patternsfor outputting records multiple times when duplicated patterns are given. #427
- New flag:
seqkit subseq:- Use the ID regular expression from the option
--id-regexpto create FASTA index file. This solves the panic happened for sequences containing tabs in the headers. #432
- Use the ID regular expression from the option
seqkit split/sort/shuffle:- When using the two-pass mode (
-2/--two-pass), replace possible tabs in the sequence header.
- When using the two-pass mode (
seqkit rmdup:- Write an empty file of duplicate numbers and lists of IDs even if there's no duplicates when using
-D/--dup-num-file. #436
- Write an empty file of duplicate numbers and lists of IDs even if there's no duplicates when using
seqkit stats:- New flag
-S/--skip-file-checkto skip input file checking when given files or a file list. It's very useful if you run it with millions of files.
- New flag
- SeqKit v2.6.1 - 2023-11-18
seqkit:- fix panic of nil pointer introduced in v2.6.0, which happens when handling multiple input files and some of them have file sizes of zero.
seqkit seq:- fix panic (close of closed channel) when using
-vto checking sequences.
- fix panic (close of closed channel) when using
- SeqKit v2.6.0 - 2023-11-09
seqkit:- add the shortcut
-Xfor the flag--infile-list.
- add the shortcut
seqkit common:- add a new flag
-e/--check-embedded-seqsfor detecting embedded sequences. - for matching by sequences: reduced the memory occupation and corrected numbers in the log. #416
- add a new flag
seqkit stat:- add a new column
AvgQualfor average quality score. #411
- add a new column
seqkit split2:- fix the panic for invalid input.
seqkit subseq:- add a new flag
-R/--region-coordfor appending coordinates to sequence ID for-r/--region. #413
- add a new flag
seqkit locate:- add a new flag
-s/--max-len-to-showto show at most X characters for the search pattern or matched sequences.
- add a new flag
seqkit seq:- change the nucleotide color theme. #412
- SeqKit v2.5.1 - 2023-08-09
- SeqKit v2.5.0 - 2023-07-16
- new command
seqkit merge-slides: merge sliding windows generated from seqkit sliding. #390 seqkit stats:- added a new flag
-N/--Nfor appending other N50-like stats as new columns. #393 - added a progress bar for > 1 input files.
- write the result of each file immediately (no output buffer) when using
-T/--tabular.
- added a new flag
seqkit translate:- add options
-s/--out-subseqsand-m/--min-lento write ORFs longer thanxamino acids as individual records. #389
- add options
seqkit sum:- do not remove possible '*' by default and delete confusing warnings. Thanks to @photocyte. #399
- added a progress bar for > 1 input files.
seqkit pair:- remove the restriction of requiring FASTQ format, i.e., FASTA files are also supported.
seqkit seq:- update help messages. #387
seqkit fxtab:- faster alphabet computation (
-a/--alphabet) with a new data structure. Thanks to @elliotwutingfeng #388
- faster alphabet computation (
seqkit subseq:- accept reverse coordinates in BED/GTF. #392
- new command
- SeqKit v2.4.0 - 2023-03-17
seqkit:seqkit locate:- do not remove embeded regions when searching with regular expressions. #368
seqkit amplicon:- fix BED coordinates for amplicons found in the minus strand. #367
seqkit split:- fix forgetting to add extension for
--two-pass. #332
- fix forgetting to add extension for
seqkit stats:- fix compute Q1 and Q3 of sequence length for one record. #353
seqkit grep:- fix count number (
-C) for matching with mismatch (-m > 0). #370
- fix count number (
seqkit replace:- add some flags to match partly records to edit; these flags are transplanted from
seqkit grep. #348
- add some flags to match partly records to edit; these flags are transplanted from
seqkit faidx:- allow empty lines at the end of sequences.
seqkit faidx/sort/shuffle/split/subseq:seqkit seq:- allow filtering sequences of length zero. thanks to @penglbio.
seqkit rename:- new flag
-s/--separatorfor setting separator between original ID/name and the counter (default "_"). #360 - new flag
-N/--start-numfor setting starting count number for duplicated IDs/names (default 2). #360 - new flag
-1/--rename-1st-recfor renaming the first record as well. #360 - do not append space if there's no description after the sequene ID.
- new flag
seqkit sliding:- new flag
-S/--suffixfor change the suffix added to the sequence ID (default: "_sliding").
- new flag
- SeqKit v2.3.1 - 2022-09-22
- SeqKit v2.3.0 - 2022-08-12
-
SeqKit v2.2.0 - 2020-03-14
seqkit:- add support of
xzandzstdinput/output formats. #274 - fix panic when reading records with header of
ID+ blanks.
- add support of
- new command
seqkit sum: computing message digest for all sequences in FASTA/Q files. The idea comes from @photocyte and the format borrows from seqhash #262 - new command
seqkit fa2fq: retrieving corresponding FASTQ records by a FASTA file seqkit split2:seqkit concat:seqkit locate:- parallelizing
-F/--use-fmiand-mfor large number of search patterns.
- parallelizing
seqkit amplicon:- new flag
-M/--output-mismatchesto append the total mismatches and mismatches of 5' end and 3' end. #286
- new flag
seqkit grep:- detect FASTA/Q symbol
@and>in the searching patterns and show warnings. - add new flag
-C/--count, likegrep -cin GNU grep. #267
- detect FASTA/Q symbol
seqkit range:- support removing leading 100 seqs (
seqkit range -r 101:-1==tail -n +101). #279
- support removing leading 100 seqs (
seqkit subseq:- report error when no options were given.
- update doc:
-
SeqKit v2.1.0 - 2021-11-15
seqkit seq:- fix filtering by average quality
-Q/-R. #257
- fix filtering by average quality
seqkit convert:seqkit split:- fix writing an extra empty file when using
--two-pass#244
- fix writing an extra empty file when using
seqkit subseq:- fix
--bedwhich fail to recognize strand..
- fix
seqkit fq2fa:- faster, and do not wrap sequences.
seqkit grep/locate/mutate:- detect unquoted comma and show warning message, e.g.,
-p 'A{2,}'. #250
- detect unquoted comma and show warning message, e.g.,
- SeqKit v2.0.0 - 2021-08-27
- Performance improvements
seqkit:- faster FASTA/Q reading and writing, especially on FASTQ, see the benchmark.
- reading (plain text): 4X faster.
seqkit stats dataset_C.fq - reading (gzip files): 45% faster.
seqkit stats dataset_C.fq.gz - reading + writing (plain text): 3.5X faster.
seqkit grep -p . -v dataset_C.fq -o t - reading + writing (gzip files): 2.2X faster.
seqkit grep -p . -v dataset_C.fq.gz -o t.gz
- reading (plain text): 4X faster.
- change default value of
-j/--threadsfrom 2 to 4, which is faster for writting gzip files.
- faster FASTA/Q reading and writing, especially on FASTQ, see the benchmark.
seqkit seq:- fix writing speed, which was slowed down in v0.12.1.
- Breaking changes
seqkit grep/rmdup/common:- consider reverse complement sequence by default for comparing by sequence, add flag
-P/--only-positive-strand. #215
- consider reverse complement sequence by default for comparing by sequence, add flag
seqkit rename:- rename ID only, do not append original header to new ID. #236
seqkit fx2tab:- for
-s/--seq-hash: outputing MD5 instead of hash value (integers) of xxhash. #219
- for
- Bugfixes
- New features/enhancements
seqkit grep:- allow empty pattern files.
seqkit faidx:- support region with
begin > end, i.e., returning reverse complement sequence - add new flag
-l/--region-file: file containing a list of regions.
- support region with
seqkit fx2tab:- new flag
-Q/--no-qualfor disabling outputing quality even for FASTQ file. #221
- new flag
seqkit amplicon:- new flag
-u/--save-unmatchedfor saving records that do not match any primer.
- new flag
seqkit sort:- new flag
-b/--by-basesfor sorting by non-gap bases, for multiple sequence alignment files.#216
- new flag
- Performance improvements
- SeqKit v0.16.1 - 2021-05-20
- SeqKit v0.16.0 - 2021-04-16
- new command
seqkit head-genome:- print sequences of the first genome with common prefixes in name
seqkit grep/locate/amplicon -m- much faster (300-400x) searching with mismatch allowed by optimizing FM-indexing and parallelization.
- new flag
-I/--immediate-output.
seqkit grep/locate:seqkit locate:- removing debug info for
-rintroduced in a0f6b6e. #180
- removing debug info for
seqkit amplicon:- fix bug of
-m, when mismatch is allowed.
- fix bug of
seqkit fx2tab:- new flag
-C/--base-countfor counting bases. #183
- new flag
seqkit tab2fx:- fix a rare bug. #197
seqkit subseq:- fix bug for BED with empty columns. #195
seqkit genautocomplete:- support bash|zsh|fish|powershell.
- new command
- SeqKit v0.15.0 - 2021-01-12
seqkit grep/locate: update help message.seqkit grep: search on both strand when searching by sequence.seqkit split2: fix redundant log when using-s.seqkit bam: new fieldRightSoftClipSeq. #172seqkit sample -2: remove extra\n. #173seqkit split2 -l: fix bug for splitting by accumulative length, this bug occurs when the first record is longer than-l, no sequences are lost.
- SeqKit v0.14.0 - 2020-10-30
- new command
seqkit pair: match up paired-end reads from two fastq files, faster than fastq-pair. seqkit translate: new flag-F/--append-framfor optional adding frame info to ID. #159seqkit stats: reduce memory usage when using-afor calculating N50. #153seqkit mutate: fix inserting sequence-i/--insertion, this bug occurs wheninsert siteis big in some cases, don't worry if no error reported.seqkit replace:- new flag
-U/--keep-untouched: do not change anything when no value found for the key (only for sequence name). - do no support editing FASTQ sequence.
- new flag
seqkit grep/locate: new flag--circularfor supporting circular genome. #158
- new command
- SeqKit v0.13.2 - 2020-07-13
seqkit sana: fix bug causing hanging on empty files. #149
- SeqKit v0.13.1 - 2020-07-09
seqkit sana: fix bug causing hanging on empty files. #148
- SeqKit v0.13.0 - 2020-07-07
seqkit: fix a rare FASTA/Q parser bug. #127seqkit seq: output sequence or quality in single line when-s/--seqor-q/--qualis on. #132seqkit translate: delete debug info, #133, and fix typo. #134seqkit split2: tiny performance improvement. #137seqkit stats: new flag-i/--stdin-labelfor replacing default "-" for stdin. #139seqkit fx2tab: new flag-s/--seq-hashfor printing hash of sequence (case sensitive). #144seqkit amplicon:- New features and improvements by @bsipos. #130, #147
- new command
seqkit scat, for real-time robust concatenation of fastx files. - Rewrote the parser behind the
sanasubcommand, now it supports robust parsing of fasta file as well. - Added a "toolbox" feature to the
bamsubcommand (-T), which is a collection of filters acting on streams of BAM records configured through a YAML string (see the docs for more). - Added the
SEQKIT_THREADSenvironmental variable to override the default number of threads.
- new command
- SeqKit v0.12.1 - 2020-04-21
seqkit bam: add colorised and pretty printed output, by @bsipos. #110seqkit locate/grep: fix bug of-m, when query contains letters not in subject sequences. #124seqkit split2: new flag-l/--by-lengthfor splitting into chunks of N bases.seqkit fx2tab:seqkit seq: new flag-k/--color: colorize sequences.
- SeqKit v0.12.0 - 2020-02-18
seqkit:- fix checking input file existence.
- new global flag
--infile-listfor long list of input files, if given, they are appended to files from cli arguments.
seqkit faidx: supporting "truncated" (no ending newline charactor) file.seqkit seq:- do not force switching on
-gwhen using-m/-M. - show recommendation if flag
-t/--seq-typeis not DNA/RNA when computing complement sequence. #103
- do not force switching on
seqkit translate: supporting multiple frames. #96seqkit grep/locate:- add detection and warning for space existing in search pattern/sequence.
- speed improvement (2X) for
-m/--max-mismatch. shenwei356/bwt/issues/3
seqkit locate:- new flag
-M/--hide-matchedfor hiding matched sequences. #98 - new flag
-r/--use-regexpfor explicitly using regular expression, so improve speed of defaultindexoperation. And you have to switch this on if using regexp now. #101 - new flag
-F/--use-fmifor improving search speed for lots of sequence patterns.
- new flag
seqkit rename: making IDs unique across multiple files, and can write into multiple files. #100seqkit sample: fix stdin checking for flag-2. #102.seqkit rename/split/split2: fix detection of existed outdir.split split: fix bug ofseqkit split -i -2and parallizing it.seqkit version: checking update is optional (-u).
- SeqKit v0.11.0 - 2019-09-25
seqkit: fix hanging when reading from truncated gzip file.- new commands:
seqkit amplicon: retrieve amplicon (or specific region around it) via primer(s).
- new commands by @bsipos:
seqkit watch: monitoring and online histograms of sequence features.seqkit sana: sanitize broken single line fastq files.seqkit fish: look for short sequences in larger sequences using local alignment.seqkit bam: monitoring and online histograms of BAM record features.
seqkit grep/locate: reduce memory occupation when using flag-m/--max-mismatch.seqkit seq: fix panic of computing complement sequence for long sequences containing illegal letters without flag-von. #84
- SeqKit v0.10.2 - 2019-07-30
seqkit: fix bug of parsing sequence ID delimited by tab (\t). #78seqkit grep: better logic of--delete-matched.seqkit common/rmdup/split: use xxhash to replace MD5 when comparing with sequence, discard flag-m/--md5.seqkit stats: new flag-b/--basenamefor outputting basename instead of full path.
- SeqKit v0.10.1 - 2019-02-27
seqkit fx2tab: new option-q/--avg-qualfor outputting average read quality. #60seqkit grep/locate: fix support ofXwhen using-d/--degenerate. #61seqkit translate:- new flag
-M/--init-codon-as-Mto translate initial codon at beginning to 'M'. #62 - translates
---to-for aligned DNA/RNA, flag-Xneeded. #63 - supports codons containing ambiguous bases, e.g.,
GGN->G,ATH->I. #64 - new flag
-l/--list-transl-tableto show details of translate table N - new flag
-l/--list-transl-table-with-amb-codonsto show details of translate table N (including ambigugous codons)
- new flag
seqkit split/split2, fix bug of ignoring-Owhen reading from stdin.
- SeqKit v0.10.0 - 2018-12-24
seqkit: report error when input is directory.- new command
seqkit mutate: edit sequence (point mutation, insertion, deletion).
- SeqKit v0.9.3 - 2018-12-02
seqkit stats: fix panic for empty file. #57seqkit translate: add flag-x/--allow-unknown-codonto translate unknown codon toX.
- SeqKit v0.9.2 - 2018-11-16
seqkit: stricter checking for value of global flag-t/--seq-type.seqkit sliding: fix bug for flag-g/--greedy. #54seqkit translate: fix bug for frame < 0. #55seqkit seq: add TAB to default blank characters (flag-G/--gap-letters), and fix filter result when using flag-g/--remove-gapsalong with-m/--min-lenor-M/--max-len
- SeqKit v0.9.1 - 2018-10-12
- SeqKit v0.9.0 - 2018-09-26
seqkit: better handle of empty file, no error message shown. #36- new subcommand
seqkit split2: split sequences into files by size/parts (FASTA, PE/SE FASTQ). #35 - new subcommand
seqkit translate: translate DNA/RNA to protein sequence. #28 seqkit sort: fix bug when using-2 -i, and add support for sorting in natural order. #39seqkit grepandseqkit locate: add experimental support of mismatch when searching subsequences. #14seqkit stats: add stats of Q20 and Q30 for FASTQ. #45
- SeqKit v0.8.1 - 2018-06-29
seqkit: do not callpigzorgzipfor decompressing gzipped file any more. But you can still utilizepigzorgzipbypigz -d -c seqs.fq.gz | seqkit xxx.seqkit subseq: fix bug of missing quality when using--gtfor--bedseqkit stats: parallelize counting files, it's much faster for lots of small files, especially for files on SSD
- SeqKit v0.8.0 - 2018-03-22
seqkit, stricter FASTA/Q format requirement, i.e., must starting with>or@.seqkit, fix output format for FASTQ files containing zero-length records, yes this happens.seqkit, add amino acid codeO(pyrrolysine) andU(selenocysteine).seqkit replace, add flag--nr-widthto fill leading 0s for{nr}, useful for preparing sequence submission (">strain_00001 XX", ">strain_00002 XX").seqkit subseq, require BED file to be tab-delimited.- SeqKit v0.7.2 - 2017-12-03
seqkit tab2fx: fix a concurrency bug that occurs in low proprobability when only 1-column data provided.seqkit stats: add quartiles of sequence lengthseqkit faidx: add support for retrieving subsequence using seq ID and region, which is similar with "samtools faidx" but has some extra features
- SeqKit v0.7.1 - 2017-09-22
seqkit convert: fix bug of read quality containing only 3 or less values. shenwei356/bio/issues/3seqkit stats: add option-T/--tabularto output in machine-friendly tabular format. #23seqkit common: increase speed and decrease memory occupation, and add some notes.- fix some typos. #22
- suggestion: please install pigz to gain better parsing performance for gzipped data.
- SeqKit v0.7.0 - 2017-08-12
- add new command
convertfor converting FASTQ quality encoding between Sanger, Solexa and Illumina. Thanks suggestion from @cviner ( #18). usage & example. - add new command
rangefor printing FASTA/Q records in a range (start:end). #19. usage & example. - add new command
concatefor concatenating sequences with same ID from multiple files. usage & example.
- add new command
- SeqKit v0.6.0 - 2017-06-21
- SeqKit v0.5.5 - 2017-05-10
- Increasing speed of reading
.gzfile by utilizinggzip(1.3X), it would be much faster if you installedpigz(2X). - Fixing colorful output in Windows
seqkit locate: add flag--gtfand--bedto output GTF/BED6 format, so the result can be used inseqkit subseq.seqkit subseq: fix bug of--bed, add checking coordinate.
- Increasing speed of reading
- SeqKit v0.5.4 - 2017-04-11
seqkit subseq --gtf, add flag--gtf-tagto set tag that's outputted as sequence comment- fix
seqkit splitandseqkit sample: forget not to wrap sequence and quality in output for FASTQ format - compile with go1.8.1
- SeqKit v0.5.3 - 2017-04-01
seqkit grep: fix bug when usingseqkit grep -r -f patternfile: all records will be retrived due to failing to discarding the blank pattern (""). #11
- SeqKit v0.5.2 - 2017-03-24
seqkit stats -aandseqkit seq -g -G: change default gap letters from '- ' to '- .'seqkit subseq: fix bug of range overflow when using-d/--down-streamor-u/--up-streamfor retieving subseq using BED (--beb) or GTF (--gtf) file.seqkit locate: add flag-G/--non-greedy, non-greedy mode, faster but may miss motifs overlaping with others.
- SeqKit v0.5.1 - 2017-03-12
seqkit restart: fix bug of flag parsing
- SeqKit v0.5.0 - 2017-03-11
- new command
seqkit restart, for resetting start position for circular genome. seqkit sliding: add flag-g/--greedy, exporting last subsequences even shorter than windows size.seqkit seq:- add flag
-m/--min-lenand-M/--max-lento filter sequences by length. - rename flag
-G/--gap-letterto-G/--gap-letters.
- add flag
seqkit stat:- renamed to
seqkit stats, don't worry, old name is still available as an alias. - add new flag
-a/all, for all statistics, includingsum_gap,N50, andL50.
- renamed to
- new command
- SeqKit v0.4.5 - 2017-02-26
seqkit seq: fix bug of failing to reverse quality of FASTQ sequence
- SeqKit v0.4.4 - 2017-02-17
seqkit locate: fix bug of missing regular-expression motifs containing non-DNA characters (e.g.,ACT.{6,7}CGG) from motif file (-f).- compiled with go v1.8.
- SeqKit v0.4.3 - 2016-12-22
- fix bug of
seqkit stat:min_lenalways be0in versions: v0.4.0, v0.4.1, v0.4.2
- fix bug of
- SeqKit v0.4.2 - 2016-12-21
- fix header information of
seqkit subseqwhen restriving up- and down-steam sequences using GTF/BED file.
- fix header information of
- SeqKit v0.4.1 - 2016-12-16
- enchancement: remove redudant regions for
seqkit locate.
- enchancement: remove redudant regions for
- SeqKit v0.4.0 - 2016-12-07
- fix bug of
seqkit locate, e.g, only find two locations (1-4,7-10, missing4-7) ofACGAinACGACGACGA. - better output of
seqkit statfor empty file.
- fix bug of
- SeqKit v0.3.9 - 2016-12-04
- fix bug of region selection for blank sequences. affected commands include
seqkit subseq --region,seqkit grep --region,seqkit split --by-region. - compile with go1.8beta1.
- fix bug of region selection for blank sequences. affected commands include
- SeqKit v0.3.8.1 - 2016-11-25
- enhancement and bugfix of
seqkit common: two or more same files allowed, fix log information of number of extracted sequences in the first file.
- enhancement and bugfix of
- SeqKit v0.3.8 - 2016-12-24
- enhancement of
seqkit common: better handling of files containing replicated sequences
- enhancement of
- SeqKit v0.3.7 - 2016-12-23
- fix bug in
seqkit split --by-idwhen sequence ID contains invalid characters for system path. - add more flags validation for
seqkit replace. - enhancement: raise error when key pattern matches multiple targes in cases of replacing with key-value files and more controls are added.
- changes: do not wrap sequence and quality in output for FASTQ format.
- fix bug in
- SeqKit v0.3.6 - 2016-11-03
- add new feature for
seqkit grep: new flag-R(--region) for specifying sequence region for searching.
- add new feature for
- SeqKit v0.3.5 - 2016-10-30
- fig bug of
seqkit grep: flag-i(--ignore-case) did not work when not using regular expression
- fig bug of
- SeqKit v0.3.4.1 - 2016-09-21
- improve performance of reading (~10%) and writing (100%) gzip-compressed file
by using
github.com/klauspost/pgzippackage - add citation
- improve performance of reading (~10%) and writing (100%) gzip-compressed file
by using
- SeqKit v0.3.4 - 2016-09-17
- bugfix:
seqwrongly handles only the first one sequence file when multiple files given - new feature:
fx2tabcan output alphabet letters of a sequence by flag-a(--alphabet) - new feature: new flag
-K(--keep-key) forreplace, when replacing with key-value file, one can choose keeping the key as value or not.
- bugfix:
- SeqKit v0.3.3 - 2016-08-18
- fix bug of
seqkit replace, wrongly starting from 2 when using{nr}in-r(--replacement) - new feature:
seqkit replacesupports replacement symbols{nr}(record number) and{kv}(corresponding value of the key ($1) by key-value file)
- fix bug of
- SeqKit v0.3.2 - 2016-08-13
- fix bug of
seqkit split, error when target file is in a directory. - improve performance of
seqkit splidingfor big sequences, and output last part even if it's shorter than window sze, output of FASTQ is also supported.
- fix bug of
- SeqKit v0.3.1.1 - 2016-08-07
- compile with go1.7rc5, with higher performance and smaller size of binary file
- SeqKit v0.3.1 - 2016-08-02
- improve speed of
seqkit locate
- improve speed of
- SeqKit v0.3.0 - 2016-07-28
- use fork of github.com/brentp/xopen, using
zcatfor speedup of .gz file reading on *nix systems. - improve speed of parsing sequence ID when creating FASTA index
- reduce memory usage of
seqkit subseq --gtf - fix bug of
seqkit subseqwhen using flag--id-ncbi - fix bug of
seqkit split, outdir error - fix bug of
seqkit seq -p, last base is wrongly failed to convert when sequence length is odd. - add "sum_len" result for output of
seqkit stat
- use fork of github.com/brentp/xopen, using
- seqkit v0.2.9 - 2016-07-24
- fix minor bug of
seqkit splitandseqkit shuffle, header name error due to improper use of pointer - add option
-O (--out-dir)toseqkit split
- fix minor bug of
- seqkit v0.2.8 - 2016-07-19
- improve speed of parsing sequence ID, not using regular expression for default
--id-regexp - improve speed of record outputing for small-size sequences
- fix minor bug:
seqkit seqfor blank record - update benchmark result
- improve speed of parsing sequence ID, not using regular expression for default
- seqkit v0.2.7 - 2016-07-18
- reduce memory usage by optimize the outputing of sequences.
detail: using
BufferedByteSliceWrapperto resuse bytes.Buffer. - reduce memory usage and improve speed by using custom buffered
reading mechanism, instead of using standard library
bufio, which is slow for large genome sequence. - discard strategy of "buffer" and "chunk" of FASTA/Q records, just parse records one by one.
- delete global flags
-c (--chunk-size)and-b (--buffer-size). - add function testing scripts
- reduce memory usage by optimize the outputing of sequences.
detail: using
- seqkit v0.2.6 - 2016-07-01
- fix bug of
seqkit subseq: Inplace subseq method leaded to wrong result
- fix bug of
- seqkit v0.2.5.1
- fix a bug of
seqkit subseq: chromesome name was not be converting to lower case when using--gtfor--bed
- fix a bug of
- seqkit v0.2.5 - 2016-07-01
- fix a serious bug brought in
v0.2.3, using unsafe method to convertstringto[]byte - add awk-like built-in variable of record number (
{NR}) forseqkit replace
- fix a serious bug brought in
- seqkit v0.2.4.1 - 2016-06-12
- fix several bugs from library
bio, affected situations:- Locating patterns in sequences by pattern FASTA file:
seqkit locate -f - Reading FASTQ file with record of which the quality starts with
+
- Locating patterns in sequences by pattern FASTA file:
- add command
version
- fix several bugs from library
- seqkit v0.2.4 - 2016-05-31
- add subcommand
head
- add subcommand
- seqkit v0.2.3 - 2016-05-08
- reduce memory occupation by avoid copy data when convert
stringto[]byte - speedup reverse-complement by avoid repeatly calling functions
- reduce memory occupation by avoid copy data when convert
- seqkit v0.2.2 - 2016-05-06
- reduce memory occupation of subcommands that use FASTA index
- seqkit v0.2.1 - 2016-05-02
- improve performance of outputing.
- fix bug of
seqkit seq -gfor FASTA fromat - some other minor fix of code and docs
- update benchmark results
- seqkit v0.2.0 - 2016-04-29
- reduce memory usage of writing output
- fix bug of
subseq,shuffle,sortwhen reading from stdin - reduce memory usage of
faidx - make validating sequences an optional option in
seqcommand, it saves some time.
- seqkit v0.1.9 - 2016-04-26
- using custom FASTA index file extension:
.seqkit.fai - reducing memory usage of
sample --number --two-pass - change default CPU number to 2 for multi-cpus computer, and 1 for single-CPU computer
- using custom FASTA index file extension:
- seqkit v0.1.8 - 2016-04-24
- add subcommand
renameto rename duplicated IDs - add subcommand
faidxto create FASTA index file - utilize faidx to improve performance of
subseq shuffle,sortand split support two-pass mode (by flag-2) with faidx to reduce memory usage.- document update
- add subcommand
- seqkit v0.1.7 - 2016-04-21
- add support for (multi-line) FASTQ format
- update document, add technical details
- rename subcommands
fa2tabandtab2fatofx2tabandtab2fx - add subcommand
fq2fa - add column "seq_format" to
stat - add global flag
-b(--bufer-size) - little change of flag in
subseqand some other commands
- seqkit v0.1.6 - 2016-04-07
- add subcommand
replace
- add subcommand
- seqkit v0.1.5.2 - 2016-04-06
- fix bug of
grep, when not using flag-r, flag-iwill not take effect.
- fix bug of
- seqkit v0.1.5.1
- fix result of
seqkit sample -n - fix benchmark script
- fix result of
- seqkit v0.1.5 - 2016-03-29
- add global flag
--id-ncbi - add flag
-d(--dup-seqs-file) and-D(--dup-num-file) for subcommandrmdup - make using MD5 as an optional flag
-m(--md5) in subcommandrmdupandcommon - fix file name suffix of
seqkit splitresult - minor modification of
slidingoutput
- add global flag
- seqkit v0.1.4.1 - 2016-03-27
- change alignment of
statoutput - preciser CPUs number control
- change alignment of
- seqkit v0.1.4 - 2016-03-25
- add subcommand
sort - improve subcommand
subseq: supporting of getting subsequences by GTF and BED files - change name format of
slidingresult - prettier output of
stat
- add subcommand
- seqkit v0.1.3.1 - 2016-03-16
- Performance improvement by reducing time of cleaning spaces
- Document update
- seqkit v0.1.3 - 2016-03-15
- Further performance improvement
- Rename sub command
extracttogrep - Change default value of flag
--threadsback CPU number of current device, change default value of flag--chunk-sizeback 10000 sequences. - Update benchmark
- seqkit v0.1.2 - 2016-03-14
- Add flag
--dna2rnaand--rna2dnato subcommandseq.
- Add flag
- seqkit v0.1.1 - 2016-03-13
- 5.5X speedup of FASTA file parsing by avoid using regular expression to remove spaces (detail ) and using slice indexing instead of map to validate letters (detail)
- Change default value of global flag
-- threadto 1. Since most of the subcommands are I/O intensive, For computation intensive jobs, like extract and locate, you may set a bigger value. - Change default value of global flag
--chunk-sizeto 100. - Add subcommand
stat - Fix bug of failing to automatically detect alphabet when only one record in file.
- seqkit v0.1 - 2016-03-11
- first release of seqkit