merge-search-results
$ lexicmap utils merge-search-results -h
Merge a query's search results from multiple indexes
Attention:
1. These search results should come from the same ONE query.
If not, please specify one query with the flag -q/--query
2. We assume that genome IDs are distinct across all indexes.
3. One or more input files are accepted, via positional parameters
and/or a file list via the flag -X/--infile-list.
4. Both the default 20- and 24-column formats are supported,
and formats better be consistent across all input files.
If not, the output format would be the one with a valid record.
Usage:
lexicmap utils merge-search-results [flags]
Flags:
-b, --buffer-size string ► Size of buffer, supported unit: K, M, G. You need increase the value
when "bufio.Scanner: token too long" error reported (default "20M")
-h, --help help for merge-search-results
-o, --out-file string ► Out file, supports the ".gz" suffix ("-" for stdout). (default "-")
-q, --query string ► Query ID to merge
Global Flags:
-X, --infile-list string ► File of input file list (one file per line). If given, they are
appended to files from CLI arguments.
--log string ► Log file.
--quiet ► Do not print any verbose information. But you can write them to a file
with --log.
-j, --threads int ► Number of CPU cores to use. By default, it uses all available cores.
(default 16)
Building 3 indexes with demo data.
lexicmap index -O demo_part1.lmi -X <(ls refs/*.fa.gz | awk 'NR<=5')
lexicmap index -O demo_part2.lmi -X <(ls refs/*.fa.gz | awk 'NR>=6 && NR<=10')
lexicmap index -O demo_part3.lmi -X <(ls refs/*.fa.gz | awk 'NR>10')
Searching a query in these 3 indexes.
query=bench/b.gene_E_coli_16S.fasta
# in practice, one can search in multiple cluster nodes
for index in demo_*.lmi; do
lexicmap search $query -d $index -o t.lexicmap@$index.tsv.gz --debug
done
Merge search results
lexicmap utils merge-search-results -o t.lexicmap.tsv.gz t.lexicmap@*.tsv.gz
22:41:03.963 [INFO] 15 genome hits merged from 3 files for query: NC_000913.3:4166659-4168200
If some files contain search results of multiple queries, then specify one query to merge.
# search with multiple queries
for index in demo_*.lmi; do
lexicmap search bench/*.fasta -d $index -o t.lexicmap2@$index.tsv.gz --debug
done
# this would failed
lexicmap utils merge-search-results -o t.lexicmap2.tsv.gz t.lexicmap2@*.tsv.gz
22:46:07.807 [ERRO] inconsistent queries: 'blaLAQ__NG_076677.1' in file 't.lexicmap2@demo_part2.lmi.tsv.gz' and 'blaSHV__NG_242606.1' in file 't.lexicmap2@demo_part1.lmi.tsv.gz. Please specify one query with flag -q/--query
# specify one query
lexicmap utils merge-search-results -o t.lexicmap2.tsv.gz t.lexicmap2@*.tsv.gz \
-q NC_000913.3:4166659-4168200
22:49:01.076 [INFO] 15 genome hits merged from 3 files for query: NC_000913.3:4166659-4168200