Skip to content

Download

SeqKit is implemented in Go programming language, statically-linked executable binary files are freely available.

Please cite:

  1. Wei Shen*, Botond Sipos, and Liuyang Zhao. 2024. SeqKit2: A Swiss Army Knife for Sequence and Alignment Processing. iMeta e191. doi:10.1002/imt2.191.
  2. Wei Shen, Shuai Le, Yan Li*, and Fuquan Hu*. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLOS ONE. doi:10.1371/journal.pone.0163962.

Current Version

  • SeqKit v2.8.1 - 2024-04-07 Github Releases (by Release)
    • seqkit sana:
      • Add support for FASTQ files with IDs in the separator (+, 3rd) lines.. #446, #429, #408
    • seqkit subseq:
      • Add some docs to show how to keep the original order of sequences when extracting with BED: compress the input FASTA file. #451
OS Arch File, 中国镜像 Download Count
Linux 32-bit seqkit_linux_386.tar.gz,
中国镜像
Github Releases (by Asset)
Linux 64-bit seqkit_linux_amd64.tar.gz,
中国镜像
Github Releases (by Asset)
Linux arm64 seqkit_linux_arm64.tar.gz,
中国镜像
Github Releases (by Asset)
macOS 64-bit seqkit_darwin_amd64.tar.gz,
中国镜像
Github Releases (by Asset)
macOS arm64 seqkit_darwin_arm64.tar.gz,
中国镜像
Github Releases (by Asset)
Windows 32-bit seqkit_windows_386.exe.tar.gz,
中国镜像
Github Releases (by Asset)
Windows 64-bit seqkit_windows_amd64.exe.tar.gz,
中国镜像
Github Releases (by Asset)

Notes

  • please open an issuse to request binaries for other platforms.
  • run seqkit version to check update !!!
  • run seqkit genautocomplete to update shell autocompletion script !!!

Installation

Method 1: Download binaries (latest stable version)

Just download compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz command or other tools. And then:

  1. For Linux-like systems

    1. If you have root privilege simply copy it to /usr/local/bin:

      sudo cp seqkit /usr/local/bin/
      
    2. Or copy to anywhere in the environment variable PATH:

      mkdir -p $HOME/bin/; cp seqkit $HOME/bin/
      
  2. For windows, just copy seqkit.exe to C:\WINDOWS\system32.

Method 2: Install via conda (latest stable version) Anaconda Cloud downloads

conda install -c bioconda seqkit

Method 3: Install via homebrew (might not be latest stable version)

brew install seqkit

Method 4: For Go developer (latest stable/dev version)

go get -u github.com/shenwei356/seqkit/v2/seqkit/

Method 5: Docker based installation (might not be latest stable versio)

Install Docker

git clone this repo:

git clone https://github.com/shenwei356/seqkit

Run the following commands:

cd seqkit
docker build -t shenwei356/seqkit .
docker run -it shenwei356/seqkit:latest

Method 6: Compiling from source (latest stable/dev version)

# ------------------- install golang -----------------

# download Go from https://go.dev/dl
wget https://go.dev/dl/go1.17.13.linux-amd64.tar.gz

tar -zxf go1.17.13.linux-amd64.tar.gz -C $HOME/

# or 
#   echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc
#   source ~/.bashrc
export PATH=$PATH:$HOME/go/bin


# ------------- the latest stable version -------------

go get -v -u github.com/shenwei356/seqkit/seqkit

# The executable binary file is located in:
#   ~/go/bin/seqkit
# You can also move it to anywhere in the $PATH
mkdir -p $HOME/bin
cp ~/go/bin/seqkit $HOME/bin/

# --------------- the development version --------------

git clone https://github.com/shenwei356/seqkit
cd seqkit/seqkit/
go build

# The executable binary file is located in:
#   ./seqkit
# You can also move it to anywhere in the $PATH
mkdir -p $HOME/bin
cp ./seqkit $HOME/bin/

Shell-completion

Supported shell: bash|zsh|fish|powershell

Bash:

# generate completion shell
seqkit genautocomplete --shell bash

# configure if never did.
# install bash-completion if the "complete" command is not found.
echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion
echo "source ~/.bash_completion" >> ~/.bashrc

Zsh:

# generate completion shell
seqkit genautocomplete --shell zsh --file ~/.zfunc/_seqkit

# configure if never did
echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc
echo "autoload -U compinit; compinit" >> ~/.zshrc

fish:

seqkit genautocomplete --shell fish --file ~/.config/fish/completions/seqkit.fish

Release history

  • SeqKit v2.8.0 - 2024-03-11 Github Releases (by Release)
    • seqkit stats:
      • Add column N50_num, an alias of L50, #15.
    • seqkit seq/locate/fish/watch:
      • Removing the flag -V/--validate-seq-length. Now the whole sequence will be checked if -v/--validate-seq is given.
    • seqkit amplicon:
      • Fix the speed problem, introduced in v2.7.0. #439.
      • Slightly faster by reusing objects.
    • seqkit seq:
      • Change the threshold sequence length for parallelizing complement sequence computation, 1kb->1Mb.
  • SeqKit v2.7.0 - 2024-01-31 Github Releases (by Release)
    • seqkit:
      • Grouping subcommands in help message, which is intuitive for beginners.
    • seqkit grep:
      • New flag: -D/--allow-duplicated-patterns for outputting records multiple times when duplicated patterns are given. #427
    • seqkit subseq:
      • Use the ID regular expression from the option --id-regexp to create FASTA index file. This solves the panic happened for sequences containing tabs in the headers. #432
    • seqkit split/sort/shuffle:
      • When using the two-pass mode (-2/--two-pass), replace possible tabs in the sequence header.
    • seqkit rmdup:
      • Write an empty file of duplicate numbers and lists of IDs even if there's no duplicates when using -D/--dup-num-file. #436
    • seqkit stats:
      • New flag -S/--skip-file-check to skip input file checking when given files or a file list. It's very useful if you run it with millions of files.
  • SeqKit v2.6.1 - 2023-11-18 Github Releases (by Release)
    • seqkit:
      • fix panic of nil pointer introduced in v2.6.0, which happens when handling multiple input files and some of them have file sizes of zero.
    • seqkit seq:
      • fix panic (close of closed channel) when using -v to checking sequences.
  • SeqKit v2.6.0 - 2023-11-09 Github Releases (by Release)
    • seqkit:
      • add the shortcut -X for the flag --infile-list.
    • seqkit common:
      • add a new flag -e/--check-embedded-seqs for detecting embedded sequences.
      • for matching by sequences: reduced the memory occupation and corrected numbers in the log. #416
    • seqkit stat:
      • add a new column AvgQual for average quality score. #411
    • seqkit split2:
      • fix the panic for invalid input.
    • seqkit subseq:
      • add a new flag -R/--region-coord for appending coordinates to sequence ID for -r/--region. #413
    • seqkit locate:
      • add a new flag -s/--max-len-to-show to show at most X characters for the search pattern or matched sequences.
    • seqkit seq:
      • change the nucleotide color theme. #412
  • SeqKit v2.5.1 - 2023-08-09 Github Releases (by Release)
    • seqkit stats:
      • fix a concurrency bug (file name error) introduced in v2.5.0. #405
    • seqkit subseq:
      • sequence/chromosome IDs are case-sensitive now. #400
  • SeqKit v2.5.0 - 2023-07-16 Github Releases (by Release)
    • new command seqkit merge-slides: merge sliding windows generated from seqkit sliding. #390
    • seqkit stats:
      • added a new flag -N/--N for appending other N50-like stats as new columns. #393
      • added a progress bar for > 1 input files.
      • write the result of each file immediately (no output buffer) when using -T/--tabular.
    • seqkit translate:
      • add options -s/--out-subseqs and -m/--min-len to write ORFs longer than x amino acids as individual records. #389
    • seqkit sum:
      • do not remove possible '*' by default and delete confusing warnings. Thanks to @photocyte. #399
      • added a progress bar for > 1 input files.
    • seqkit pair:
      • remove the restriction of requiring FASTQ format, i.e., FASTA files are also supported.
    • seqkit seq:
      • update help messages. #387
    • seqkit fxtab:
      • faster alphabet computation (-a/--alphabet) with a new data structure. Thanks to @elliotwutingfeng #388
    • seqkit subseq:
      • accept reverse coordinates in BED/GTF. #392
  • SeqKit v2.4.0 - 2023-03-17 Github Releases (by Release)
    • seqkit:
      • support bzip2 format. #361
      • support setting compression level for gzip, zstd, and bzip2 format via --compress-level. #320
      • the global flag --infile-list accepts stdin (-) now.
      • wrap the help message of flags.
    • seqkit locate:
      • do not remove embeded regions when searching with regular expressions. #368
    • seqkit amplicon:
      • fix BED coordinates for amplicons found in the minus strand. #367
    • seqkit split:
      • fix forgetting to add extension for --two-pass. #332
    • seqkit stats:
      • fix compute Q1 and Q3 of sequence length for one record. #353
    • seqkit grep:
      • fix count number (-C) for matching with mismatch (-m > 0). #370
    • seqkit replace:
      • add some flags to match partly records to edit; these flags are transplanted from seqkit grep. #348
    • seqkit faidx:
      • allow empty lines at the end of sequences.
    • seqkit faidx/sort/shuffle/split/subseq:
      • new flag -U/--update-faidx: update the FASTA index file if it exists, to guarantee the index file matches the FASTA files. #364
      • improve log info and update help message. #365
    • seqkit seq:
      • allow filtering sequences of length zero. thanks to @penglbio.
    • seqkit rename:
      • new flag -s/--separator for setting separator between original ID/name and the counter (default "_"). #360
      • new flag -N/--start-num for setting starting count number for duplicated IDs/names (default 2). #360
      • new flag -1/--rename-1st-rec for renaming the first record as well. #360
      • do not append space if there's no description after the sequene ID.
    • seqkit sliding:
      • new flag -S/--suffix for change the suffix added to the sequence ID (default: "_sliding").
  • SeqKit v2.3.1 - 2022-09-22 Github Releases (by Release)
    • seqkit grep/locate: fix bug of FMIndex building for empty sequences. #321
    • seqkit split2: fix bug of splitting two FASTA files. #325
    • seqkit faidx: --id-regexp works now.
  • SeqKit v2.3.0 - 2022-08-12 Github Releases (by Release)
    • seqkit grep/rename:
      • reduce memory comsumption for a lot of searching patterns, and it's faster. #305
      • 2X faster -s/--by-seq.
    • seqkit split
      • fix outputting an empty file when the number of sequence equal to the split size. #293
      • add options to set output file prefix and extention. #296
    • seqkit split2
      • reduce memory consumption. #304
      • add options to set output file prefix
    • seqkit stats:
      • add GC content. #294
  • SeqKit v2.2.0 - 2020-03-14 Github Releases (by Release)

    • seqkit:
      • add support of xz and zstd input/output formats. #274
      • fix panic when reading records with header of ID + blanks.
    • new command seqkit sum: computing message digest for all sequences in FASTA/Q files. The idea comes from @photocyte and the format borrows from seqhash #262
    • new command seqkit fa2fq: retrieving corresponding FASTQ records by a FASTA file
    • seqkit split2:
      • new flag -e/--extension for forcing compresson or changing compression format. #276
      • support changing output prefix via -o/--out-file. #275
    • seqkit concat:
      • fix handling of multiple seqs with the same ID in one file. #269
      • performaning out/full join. #270
      • preserve the comments. #271
    • seqkit locate:
      • parallelizing -F/--use-fmi and -m for large number of search patterns.
    • seqkit amplicon:
      • new flag -M/--output-mismatches to append the total mismatches and mismatches of 5' end and 3' end. #286
    • seqkit grep:
      • detect FASTA/Q symbol @ and > in the searching patterns and show warnings.
      • add new flag -C/--count, like grep -c in GNU grep. #267
    • seqkit range:
      • support removing leading 100 seqs (seqkit range -r 101:-1 == tail -n +101). #279
    • seqkit subseq:
      • report error when no options were given.
    • update doc:
      • seqkit head: add doc for "seqkit tail": seqkit range -N:-1 seqs.fasta. #272
      • seqkit rmdup: add the note of only the first record being saved for duplicates. #265
  • SeqKit v2.1.0 - 2021-11-15 Github Releases (by Release)

    • seqkit seq:
      • fix filtering by average quality -Q/-R. #257
    • seqkit convert:
      • fix quality encoding checking, change default value of -N/--thresh-B-in-n-most-common from 4 to 2. #254 and #239
    • seqkit split:
      • fix writing an extra empty file when using --two-pass#244
    • seqkit subseq:
      • fix --bed which fail to recognize strand ..
    • seqkit fq2fa:
      • faster, and do not wrap sequences.
    • seqkit grep/locate/mutate:
      • detect unquoted comma and show warning message, e.g., -p 'A{2,}'. #250
  • SeqKit v2.0.0 - 2021-08-27 Github Releases (by Release)
    • Performance improvements
      • seqkit:
        • faster FASTA/Q reading and writing, especially on FASTQ, see the benchmark.
          • reading (plain text): 4X faster. seqkit stats dataset_C.fq
          • reading (gzip files): 45% faster. seqkit stats dataset_C.fq.gz
          • reading + writing (plain text): 3.5X faster. seqkit grep -p . -v dataset_C.fq -o t
          • reading + writing (gzip files): 2.2X faster. seqkit grep -p . -v dataset_C.fq.gz -o t.gz
        • change default value of -j/--threads from 2 to 4, which is faster for writting gzip files.
      • seqkit seq:
        • fix writing speed, which was slowed down in v0.12.1.
    • Breaking changes
      • seqkit grep/rmdup/common:
        • consider reverse complement sequence by default for comparing by sequence, add flag -P/--only-positive-strand. #215
      • seqkit rename:
        • rename ID only, do not append original header to new ID. #236
      • seqkit fx2tab:
        • for -s/--seq-hash: outputing MD5 instead of hash value (integers) of xxhash. #219
    • Bugfixes
      • seqkit seq:
        • fix failing to output gzipped format for file name with extension of .gz since v0.12.1.
      • seqkit tab2fx:
        • fix bug for very long sequences. #214
      • seqkit fish:
        • fix range check. #213
      • seqkit grep:
        • it's not exactly a bug: forgot to use multi-threads for -m > 0.
    • New features/enhancements
      • seqkit grep:
        • allow empty pattern files.
      • seqkit faidx:
        • support region with begin > end, i.e., returning reverse complement sequence
        • add new flag -l/--region-file: file containing a list of regions.
      • seqkit fx2tab:
        • new flag -Q/--no-qual for disabling outputing quality even for FASTQ file. #221
      • seqkit amplicon:
        • new flag -u/--save-unmatched for saving records that do not match any primer.
      • seqkit sort:
        • new flag -b/--by-bases for sorting by non-gap bases, for multiple sequence alignment files.#216
  • SeqKit v0.16.1 - 2021-05-20 Github Releases (by Release)
    • seqkit shuffle --two-pass: fix bug introduced in #173 . #209
    • seqkit pair: fix a dangerous bug: when input files are not in current directory, input files were overwritten.
  • SeqKit v0.16.0 - 2021-04-16 Github Releases (by Release)
    • new command seqkit head-genome:
      • print sequences of the first genome with common prefixes in name
    • seqkit grep/locate/amplicon -m
      • much faster (300-400x) searching with mismatch allowed by optimizing FM-indexing and parallelization.
      • new flag -I/--immediate-output.
    • seqkit grep/locate:
      • fix bug of -m when querying contains letters not in alphabet, usually for protein sequences. #178, #179
      • onply search on positive strand when searching unlimited or protein sequences.
    • seqkit locate:
      • removing debug info for -r introduced in a0f6b6e. #180
    • seqkit amplicon:
      • fix bug of -m, when mismatch is allowed.
    • seqkit fx2tab:
      • new flag -C/--base-count for counting bases. #183
    • seqkit tab2fx:
      • fix a rare bug. #197
    • seqkit subseq:
      • fix bug for BED with empty columns. #195
    • seqkit genautocomplete:
      • support bash|zsh|fish|powershell.
  • SeqKit v0.15.0 - 2021-01-12 Github Releases (by Release)
    • seqkit grep/locate: update help message.
    • seqkit grep: search on both strand when searching by sequence.
    • seqkit split2: fix redundant log when using -s.
    • seqkit bam: new field RightSoftClipSeq. #172
    • seqkit sample -2: remove extra \n. #173
    • seqkit split2 -l: fix bug for splitting by accumulative length, this bug occurs when the first record is longer than -l, no sequences are lost.
  • SeqKit v0.14.0 - 2020-10-30 Github Releases (by Release)
    • new command seqkit pair: match up paired-end reads from two fastq files, faster than fastq-pair.
    • seqkit translate: new flag -F/--append-fram for optional adding frame info to ID. #159
    • seqkit stats: reduce memory usage when using -a for calculating N50. #153
    • seqkit mutate: fix inserting sequence -i/--insertion, this bug occurs when insert site is big in some cases, don't worry if no error reported.
    • seqkit replace:
      • new flag -U/--keep-untouched: do not change anything when no value found for the key (only for sequence name).
      • do no support editing FASTQ sequence.
    • seqkit grep/locate: new flag --circular for supporting circular genome. #158
  • SeqKit v0.13.2 - 2020-07-13 Github Releases (by Release)
    • seqkit sana: fix bug causing hanging on empty files. #149
  • SeqKit v0.13.1 - 2020-07-09 Github Releases (by Release)
    • seqkit sana: fix bug causing hanging on empty files. #148
  • SeqKit v0.13.0 - 2020-07-07 Github Releases (by Release)
    • seqkit: fix a rare FASTA/Q parser bug. #127
    • seqkit seq: output sequence or quality in single line when -s/--seq or -q/--qual is on. #132
    • seqkit translate: delete debug info, #133, and fix typo. #134
    • seqkit split2: tiny performance improvement. #137
    • seqkit stats: new flag -i/--stdin-label for replacing default "-" for stdin. #139
    • seqkit fx2tab: new flag -s/--seq-hash for printing hash of sequence (case sensitive). #144
    • seqkit amplicon:
      • fix bug of missing searching reverse strand. #140
      • supporting degenerate bases now. #83
      • new flag -p/--primer-file for reading list of primer pairs. #142
      • new flag --bed for outputing in BED6+1 format. #141
    • New features and improvements by @bsipos. #130, #147
      • new command seqkit scat, for real-time robust concatenation of fastx files.
      • Rewrote the parser behind the sana subcommand, now it supports robust parsing of fasta file as well.
      • Added a "toolbox" feature to the bam subcommand (-T), which is a collection of filters acting on streams of BAM records configured through a YAML string (see the docs for more).
      • Added the SEQKIT_THREADS environmental variable to override the default number of threads.
  • SeqKit v0.12.1 - 2020-04-21 Github Releases (by Release)
    • seqkit bam: add colorised and pretty printed output, by @bsipos. #110
    • seqkit locate/grep: fix bug of -m, when query contains letters not in subject sequences. #124
    • seqkit split2: new flag -l/--by-length for splitting into chunks of N bases.
    • seqkit fx2tab:
      • new flag -I/--case-sensitive for calculating case sensitive base content. #108
      • add missing column name for averge quality for -H -q. #115
      • fix output of -n/--only-name, do not write empty columns of sequence and quality. #104, #115
    • seqkit seq: new flag -k/--color: colorize sequences.
  • SeqKit v0.12.0 - 2020-02-18 Github Releases (by Release)
    • seqkit:
      • fix checking input file existence.
      • new global flag --infile-list for long list of input files, if given, they are appended to files from cli arguments.
    • seqkit faidx: supporting "truncated" (no ending newline charactor) file.
    • seqkit seq:
      • do not force switching on -g when using -m/-M.
      • show recommendation if flag -t/--seq-type is not DNA/RNA when computing complement sequence. #103
    • seqkit translate: supporting multiple frames. #96
    • seqkit grep/locate:
      • add detection and warning for space existing in search pattern/sequence.
      • speed improvement (2X) for -m/--max-mismatch. shenwei356/bwt/issues/3
    • seqkit locate:
      • new flag -M/--hide-matched for hiding matched sequences. #98
      • new flag -r/--use-regexp for explicitly using regular expression, so improve speed of default index operation. And you have to switch this on if using regexp now. #101
      • new flag -F/--use-fmi for improving search speed for lots of sequence patterns.
    • seqkit rename: making IDs unique across multiple files, and can write into multiple files. #100
    • seqkit sample: fix stdin checking for flag -2. #102.
    • seqkit rename/split/split2: fix detection of existed outdir.
    • split split: fix bug of seqkit split -i -2 and parallizing it.
    • seqkit version: checking update is optional (-u).
  • SeqKit v0.11.0 - 2019-09-25 Github Releases (by Release)
    • seqkit: fix hanging when reading from truncated gzip file.
    • new commands:
      • seqkit amplicon: retrieve amplicon (or specific region around it) via primer(s).
    • new commands by @bsipos:
      • seqkit watch: monitoring and online histograms of sequence features.
      • seqkit sana: sanitize broken single line fastq files.
      • seqkit fish: look for short sequences in larger sequences using local alignment.
      • seqkit bam: monitoring and online histograms of BAM record features.
    • seqkit grep/locate: reduce memory occupation when using flag -m/--max-mismatch.
    • seqkit seq: fix panic of computing complement sequence for long sequences containing illegal letters without flag -v on. #84
  • SeqKit v0.10.2 - 2019-07-30 Github Releases (by Release)
    • seqkit: fix bug of parsing sequence ID delimited by tab (\t). #78
    • seqkit grep: better logic of --delete-matched.
    • seqkit common/rmdup/split: use xxhash to replace MD5 when comparing with sequence, discard flag -m/--md5.
    • seqkit stats: new flag -b/--basename for outputting basename instead of full path.
  • SeqKit v0.10.1 - 2019-02-27 Github Releases (by Release)
    • seqkit fx2tab: new option -q/--avg-qual for outputting average read quality. #60
    • seqkit grep/locate: fix support of X when using -d/--degenerate. #61
    • seqkit translate:
      • new flag -M/--init-codon-as-M to translate initial codon at beginning to 'M'. #62
      • translates --- to - for aligned DNA/RNA, flag -X needed. #63
      • supports codons containing ambiguous bases, e.g., GGN->G, ATH->I. #64
      • new flag -l/--list-transl-table to show details of translate table N
      • new flag -l/--list-transl-table-with-amb-codons to show details of translate table N (including ambigugous codons)
    • seqkit split/split2, fix bug of ignoring -O when reading from stdin.
  • SeqKit v0.10.0 - 2018-12-24 Github Releases (by Release)
    • seqkit: report error when input is directory.
    • new command seqkit mutate: edit sequence (point mutation, insertion, deletion).
  • SeqKit v0.9.3 - 2018-12-02 Github Releases (by Release)
    • seqkit stats: fix panic for empty file. #57
    • seqkit translate: add flag -x/--allow-unknown-codon to translate unknown codon to X.
  • SeqKit v0.9.2 - 2018-11-16 Github Releases (by Release)
    • seqkit: stricter checking for value of global flag -t/--seq-type.
    • seqkit sliding: fix bug for flag -g/--greedy. #54
    • seqkit translate: fix bug for frame < 0. #55
    • seqkit seq: add TAB to default blank characters (flag -G/--gap-letters), and fix filter result when using flag -g/--remove-gaps along with -m/--min-len or -M/--max-len
  • SeqKit v0.9.1 - 2018-10-12 Github Releases (by Release)
    • seqkit faidx: fix bug of retrieving subsequence with multiple regions on same sequence. #48
    • seqkit sort/shuffle/split: fix bug when using -2/--two-pass to process .gz files. #52
  • SeqKit v0.9.0 - 2018-09-26 Github Releases (by Release)
    • seqkit: better handle of empty file, no error message shown. #36
    • new subcommand seqkit split2: split sequences into files by size/parts (FASTA, PE/SE FASTQ). #35
    • new subcommand seqkit translate: translate DNA/RNA to protein sequence. #28
    • seqkit sort: fix bug when using -2 -i, and add support for sorting in natural order. #39
    • seqkit grep and seqkit locate: add experimental support of mismatch when searching subsequences. #14
    • seqkit stats: add stats of Q20 and Q30 for FASTQ. #45
  • SeqKit v0.8.1 - 2018-06-29 Github Releases (by Release)
    • seqkit: do not call pigz or gzip for decompressing gzipped file any more. But you can still utilize pigz or gzip by pigz -d -c seqs.fq.gz | seqkit xxx.
    • seqkit subseq: fix bug of missing quality when using --gtf or --bed
    • seqkit stats: parallelize counting files, it's much faster for lots of small files, especially for files on SSD
  • SeqKit v0.8.0 - 2018-03-22 Github Releases (by Release)
  • seqkit, stricter FASTA/Q format requirement, i.e., must starting with > or @.
  • seqkit, fix output format for FASTQ files containing zero-length records, yes this happens.
  • seqkit, add amino acid code O (pyrrolysine) and U (selenocysteine).
  • seqkit replace, add flag --nr-width to fill leading 0s for {nr}, useful for preparing sequence submission (">strain_00001 XX", ">strain_00002 XX").
  • seqkit subseq, require BED file to be tab-delimited.
  • SeqKit v0.7.2 - 2017-12-03 Github Releases (by Release)
    • seqkit tab2fx: fix a concurrency bug that occurs in low proprobability when only 1-column data provided.
    • seqkit stats: add quartiles of sequence length
    • seqkit faidx: add support for retrieving subsequence using seq ID and region, which is similar with "samtools faidx" but has some extra features
  • SeqKit v0.7.1 - 2017-09-22 Github Releases (by Release)
    • seqkit convert: fix bug of read quality containing only 3 or less values. shenwei356/bio/issues/3
    • seqkit stats: add option -T/--tabular to output in machine-friendly tabular format. #23
    • seqkit common: increase speed and decrease memory occupation, and add some notes.
    • fix some typos. #22
    • suggestion: please install pigz to gain better parsing performance for gzipped data.
  • SeqKit v0.7.0 - 2017-08-12 Github Releases (by Release)
    • add new command convert for converting FASTQ quality encoding between Sanger, Solexa and Illumina. Thanks suggestion from @cviner ( #18). usage & example.
    • add new command range for printing FASTA/Q records in a range (start:end). #19. usage & example.
    • add new command concate for concatenating sequences with same ID from multiple files. usage & example.
  • SeqKit v0.6.0 - 2017-06-21 Github Releases (by Release)
    • add new command genautocomplete to generate shell autocompletion script! (#17)
    • add new command seqkit dup for duplicating sequences (#16)
    • seqkit stats -a does not show L50 which may brings confusion (#15)
    • seqkit subseq --bed: more robust for bad BED files
  • SeqKit v0.5.5 - 2017-05-10 Github Releases (by Release)
    • Increasing speed of reading .gz file by utilizing gzip (1.3X), it would be much faster if you installed pigz (2X).
    • Fixing colorful output in Windows
    • seqkit locate: add flag --gtf and --bed to output GTF/BED6 format, so the result can be used in seqkit subseq.
    • seqkit subseq: fix bug of --bed, add checking coordinate.
  • SeqKit v0.5.4 - 2017-04-11 Github Releases (by Release)
    • seqkit subseq --gtf, add flag --gtf-tag to set tag that's outputted as sequence comment
    • fix seqkit split and seqkit sample: forget not to wrap sequence and quality in output for FASTQ format
    • compile with go1.8.1
  • SeqKit v0.5.3 - 2017-04-01 Github Releases (by Release)
    • seqkit grep: fix bug when using seqkit grep -r -f patternfile: all records will be retrived due to failing to discarding the blank pattern (""). #11
  • SeqKit v0.5.2 - 2017-03-24 Github Releases (by Release)
    • seqkit stats -a and seqkit seq -g -G: change default gap letters from '- ' to '- .'
    • seqkit subseq: fix bug of range overflow when using -d/--down-stream or -u/--up-stream for retieving subseq using BED (--beb) or GTF (--gtf) file.
    • seqkit locate: add flag -G/--non-greedy, non-greedy mode, faster but may miss motifs overlaping with others.
  • SeqKit v0.5.1 - 2017-03-12 Github Releases (by Release)
    • seqkit restart: fix bug of flag parsing
  • SeqKit v0.5.0 - 2017-03-11 Github Releases (by Release)
    • new command seqkit restart, for resetting start position for circular genome.
    • seqkit sliding: add flag -g/--greedy, exporting last subsequences even shorter than windows size.
    • seqkit seq:
      • add flag -m/--min-len and -M/--max-len to filter sequences by length.
      • rename flag -G/--gap-letter to -G/--gap-letters.
    • seqkit stat:
      • renamed to seqkit stats, don't worry, old name is still available as an alias.
      • add new flag -a/all, for all statistics, including sum_gap, N50, and L50.
  • SeqKit v0.4.5 - 2017-02-26 Github Releases (by Release)
    • seqkit seq: fix bug of failing to reverse quality of FASTQ sequence
  • SeqKit v0.4.4 - 2017-02-17 Github Releases (by Release)
    • seqkit locate: fix bug of missing regular-expression motifs containing non-DNA characters (e.g., ACT.{6,7}CGG) from motif file (-f).
    • compiled with go v1.8.
  • SeqKit v0.4.3 - 2016-12-22 Github Releases (by Release)
    • fix bug of seqkit stat: min_len always be 0 in versions: v0.4.0, v0.4.1, v0.4.2
  • SeqKit v0.4.2 - 2016-12-21 Github Releases (by Release)
    • fix header information of seqkit subseq when restriving up- and down-steam sequences using GTF/BED file.
  • SeqKit v0.4.1 - 2016-12-16 Github Releases (by Release)
    • enchancement: remove redudant regions for seqkit locate.
  • SeqKit v0.4.0 - 2016-12-07 Github Releases (by Release)
    • fix bug of seqkit locate, e.g, only find two locations (1-4, 7-10, missing 4-7) of ACGA in ACGACGACGA.
    • better output of seqkit stat for empty file.
  • SeqKit v0.3.9 - 2016-12-04 Github Releases (by Release)
    • fix bug of region selection for blank sequences. affected commands include seqkit subseq --region, seqkit grep --region, seqkit split --by-region.
    • compile with go1.8beta1.
  • SeqKit v0.3.8.1 - 2016-11-25 Github Releases (by Release)
    • enhancement and bugfix of seqkit common: two or more same files allowed, fix log information of number of extracted sequences in the first file.
  • SeqKit v0.3.8 - 2016-12-24 Github Releases (by Release)
    • enhancement of seqkit common: better handling of files containing replicated sequences
  • SeqKit v0.3.7 - 2016-12-23 Github Releases (by Release)
    • fix bug in seqkit split --by-id when sequence ID contains invalid characters for system path.
    • add more flags validation for seqkit replace.
    • enhancement: raise error when key pattern matches multiple targes in cases of replacing with key-value files and more controls are added.
    • changes: do not wrap sequence and quality in output for FASTQ format.
  • SeqKit v0.3.6 - 2016-11-03 Github Releases (by Release)
    • add new feature for seqkit grep: new flag -R (--region) for specifying sequence region for searching.
  • SeqKit v0.3.5 - 2016-10-30 Github Releases (by Release)
    • fig bug of seqkit grep: flag -i (--ignore-case) did not work when not using regular expression
  • SeqKit v0.3.4.1 - 2016-09-21 Github Releases (by Release)
    • improve performance of reading (~10%) and writing (100%) gzip-compressed file by using github.com/klauspost/pgzip package
    • add citation
  • SeqKit v0.3.4 - 2016-09-17 Github Releases (by Release)
    • bugfix: seq wrongly handles only the first one sequence file when multiple files given
    • new feature: fx2tab can output alphabet letters of a sequence by flag -a (--alphabet)
    • new feature: new flag -K (--keep-key) for replace, when replacing with key-value file, one can choose keeping the key as value or not.
  • SeqKit v0.3.3 - 2016-08-18 Github Releases (by Release)
    • fix bug of seqkit replace, wrongly starting from 2 when using {nr} in -r (--replacement)
    • new feature: seqkit replace supports replacement symbols {nr} (record number) and {kv} (corresponding value of the key ($1) by key-value file)
  • SeqKit v0.3.2 - 2016-08-13 Github Releases (by Release)
    • fix bug of seqkit split, error when target file is in a directory.
    • improve performance of seqkit spliding for big sequences, and output last part even if it's shorter than window sze, output of FASTQ is also supported.
  • SeqKit v0.3.1.1 - 2016-08-07 Github Releases (by Release)
    • compile with go1.7rc5, with higher performance and smaller size of binary file
  • SeqKit v0.3.1 - 2016-08-02 Github Releases (by Release)
    • improve speed of seqkit locate
  • SeqKit v0.3.0 - 2016-07-28 Github Releases (by Release)
    • use fork of github.com/brentp/xopen, using zcat for speedup of .gz file reading on *nix systems.
    • improve speed of parsing sequence ID when creating FASTA index
    • reduce memory usage of seqkit subseq --gtf
    • fix bug of seqkit subseq when using flag --id-ncbi
    • fix bug of seqkit split, outdir error
    • fix bug of seqkit seq -p, last base is wrongly failed to convert when sequence length is odd.
    • add "sum_len" result for output of seqkit stat
  • seqkit v0.2.9 - 2016-07-24 Github Releases (by Release)
    • fix minor bug of seqkit split and seqkit shuffle, header name error due to improper use of pointer
    • add option -O (--out-dir) to seqkit split
  • seqkit v0.2.8 - 2016-07-19 Github Releases (by Release)
    • improve speed of parsing sequence ID, not using regular expression for default --id-regexp
    • improve speed of record outputing for small-size sequences
    • fix minor bug: seqkit seq for blank record
    • update benchmark result
  • seqkit v0.2.7 - 2016-07-18 Github Releases (by Release)
    • reduce memory usage by optimize the outputing of sequences. detail: using BufferedByteSliceWrapper to resuse bytes.Buffer.
    • reduce memory usage and improve speed by using custom buffered reading mechanism, instead of using standard library bufio, which is slow for large genome sequence.
    • discard strategy of "buffer" and "chunk" of FASTA/Q records, just parse records one by one.
    • delete global flags -c (--chunk-size) and -b (--buffer-size).
    • add function testing scripts
  • seqkit v0.2.6 - 2016-07-01 Github Releases (by Release)
    • fix bug of seqkit subseq: Inplace subseq method leaded to wrong result
  • seqkit v0.2.5.1 Github Releases (by Release)
    • fix a bug of seqkit subseq: chromesome name was not be converting to lower case when using --gtf or --bed
  • seqkit v0.2.5 - 2016-07-01 Github Releases (by Release)
    • fix a serious bug brought in v0.2.3, using unsafe method to convert string to []byte
    • add awk-like built-in variable of record number ({NR}) for seqkit replace
  • seqkit v0.2.4.1 - 2016-06-12 Github Releases (by Release)
    • fix several bugs from library bio, affected situations:
      • Locating patterns in sequences by pattern FASTA file: seqkit locate -f
      • Reading FASTQ file with record of which the quality starts with +
    • add command version
  • seqkit v0.2.4 - 2016-05-31 Github Releases (by Release)
    • add subcommand head
  • seqkit v0.2.3 - 2016-05-08 Github Releases (by Release)
    • reduce memory occupation by avoid copy data when convert string to []byte
    • speedup reverse-complement by avoid repeatly calling functions
  • seqkit v0.2.2 - 2016-05-06 Github Releases (by Release)
    • reduce memory occupation of subcommands that use FASTA index
  • seqkit v0.2.1 - 2016-05-02 Github Releases (by Release)
    • improve performance of outputing.
    • fix bug of seqkit seq -g for FASTA fromat
    • some other minor fix of code and docs
    • update benchmark results
  • seqkit v0.2.0 - 2016-04-29 Github Releases (by Release)
    • reduce memory usage of writing output
    • fix bug of subseq, shuffle, sort when reading from stdin
    • reduce memory usage of faidx
    • make validating sequences an optional option in seq command, it saves some time.
  • seqkit v0.1.9 - 2016-04-26 Github Releases (by Release)
    • using custom FASTA index file extension: .seqkit.fai
    • reducing memory usage of sample --number --two-pass
    • change default CPU number to 2 for multi-cpus computer, and 1 for single-CPU computer
  • seqkit v0.1.8 - 2016-04-24 Github Releases (by Release)
    • add subcommand rename to rename duplicated IDs
    • add subcommand faidx to create FASTA index file
    • utilize faidx to improve performance of subseq
    • shuffle, sort and split support two-pass mode (by flag -2) with faidx to reduce memory usage.
    • document update
  • seqkit v0.1.7 - 2016-04-21 Github Releases (by Release)
    • add support for (multi-line) FASTQ format
    • update document, add technical details
    • rename subcommands fa2tab and tab2fa to fx2tab and tab2fx
    • add subcommand fq2fa
    • add column "seq_format" to stat
    • add global flag -b (--bufer-size)
    • little change of flag in subseq and some other commands
  • seqkit v0.1.6 - 2016-04-07 Github Releases (by Release)
    • add subcommand replace
  • seqkit v0.1.5.2 - 2016-04-06 Github Releases (by Release)
    • fix bug of grep, when not using flag -r, flag -i will not take effect.
  • seqkit v0.1.5.1 Github Releases (by Release)
    • fix result of seqkit sample -n
    • fix benchmark script
  • seqkit v0.1.5 - 2016-03-29 Github Releases (by Release)
    • add global flag --id-ncbi
    • add flag -d (--dup-seqs-file) and -D (--dup-num-file) for subcommand rmdup
    • make using MD5 as an optional flag -m (--md5) in subcommand rmdup and common
    • fix file name suffix of seqkit split result
    • minor modification of sliding output
  • seqkit v0.1.4.1 - 2016-03-27 Github Releases (by Release)
    • change alignment of stat output
    • preciser CPUs number control
  • seqkit v0.1.4 - 2016-03-25 Github Releases (by Release)
    • add subcommand sort
    • improve subcommand subseq: supporting of getting subsequences by GTF and BED files
    • change name format of sliding result
    • prettier output of stat
  • seqkit v0.1.3.1 - 2016-03-16 Github Releases (by Release)
    • Performance improvement by reducing time of cleaning spaces
    • Document update
  • seqkit v0.1.3 - 2016-03-15 Github Releases (by Release)
    • Further performance improvement
    • Rename sub command extract to grep
    • Change default value of flag --threads back CPU number of current device, change default value of flag --chunk-size back 10000 sequences.
    • Update benchmark
  • seqkit v0.1.2 - 2016-03-14 Github Releases (by Release)
    • Add flag --dna2rna and --rna2dna to subcommand seq.
  • seqkit v0.1.1 - 2016-03-13 Github Releases (by Release)
    • 5.5X speedup of FASTA file parsing by avoid using regular expression to remove spaces (detail ) and using slice indexing instead of map to validate letters (detail)
    • Change default value of global flag -- thread to 1. Since most of the subcommands are I/O intensive, For computation intensive jobs, like extract and locate, you may set a bigger value.
    • Change default value of global flag --chunk-size to 100.
    • Add subcommand stat
    • Fix bug of failing to automatically detect alphabet when only one record in file.
  • seqkit v0.1 - 2016-03-11 Github Releases (by Release)
    • first release of seqkit