bcftools(1)

1BCFTOOLS(1)                                                        BCFTOOLS(1)
2
3
4

NAME

6       bcftools - utilities for variant calling and manipulating VCFs and
7       BCFs.
8

SYNOPSIS

10       bcftools [--version|--version-only] [--help] [COMMAND] [OPTIONS]
11

DESCRIPTION

13       BCFtools  is  a set of utilities that manipulate variant calls in the
14       Variant Call Format (VCF) and its binary counterpart BCF. All commands
15       work transparently with both VCFs and BCFs, both uncompressed and
16       BGZF-compressed.
17
18       Most commands accept VCF, bgzipped VCF and BCF with filetype detected
19       automatically even when streaming from a pipe. Indexed VCF and BCF will
20       work in all situations. Un-indexed VCF and BCF and streams will work in
21       most, but not all situations. In general, whenever multiple VCFs are
22       read simultaneously, they must be indexed and therefore also
23       compressed. (Note that files with non-standard index names can be
24       accessed as e.g. "bcftools view -r X:2928329
25       file.vcf.gz##idx##non-standard-index-name".)
26
27       BCFtools is designed to work on a stream. It regards an input file "-"
28       as the standard input (stdin) and outputs to the standard output
29       (stdout). Several commands can thus be  combined  with  Unix pipes.
30
31   VERSION
32       This manual page was last updated 2021-07-07 and refers to bcftools git
33       version 1.13.
34
35   BCF1
36       The BCF1 format output by versions of samtools <= 0.1.19 is not
37       compatible with this version of bcftools. To read BCF1 files one can
38       use the view command from old versions of bcftools packaged with
39       samtools versions <= 0.1.19 to convert to VCF, which can then be read
40       by this version of bcftools.
41
42               samtools-0.1.19/bcftools/bcftools view file.bcf1 | bcftools view
43
44   VARIANT CALLING
45       See bcftools call for variant calling from the output of the samtools
46       mpileup command. In versions of samtools <= 0.1.19 calling was done
47       with bcftools view. Users are now required to choose between the old
48       samtools calling model (-c/--consensus-caller) and the new multiallelic
49       calling model (-m/--multiallelic-caller). The multiallelic calling
50       model is recommended for most tasks.
51

LIST OF COMMANDS

53       For a full list of available commands, run bcftools without arguments.
54       For a full list of available options, run bcftools COMMAND without
55       arguments.
56
57       •   annotate   .. edit VCF files, add or remove annotations
58
59       •   call        .. SNP/indel calling (former "view")
60
61       •   cnv          .. Copy Number Variation caller
62
63       •   concat    .. concatenate VCF/BCF files from the same set of samples
64
65       •   consensus    .. create consensus sequence by applying VCF variants
66
67       •   convert  .. convert VCF/BCF to other formats and back
68
69       •   csq          .. haplotype aware consequence caller
70
71       •   filter    .. filter VCF/BCF files using fixed thresholds
72
73       •   gtcheck  .. check sample concordance, detect sample swaps and
74           contamination
75
76       •   index      .. index VCF/BCF
77
78       •   isec        .. intersections of VCF/BCF files
79
80       •   merge      .. merge VCF/BCF files files from non-overlapping sample
81           sets
82
83       •   mpileup  .. multi-way pileup producing genotype likelihoods
84
85       •   norm        .. normalize indels
86
87       •   plugin    .. run user-defined plugin
88
89       •   polysomy   .. detect contaminations and whole-chromosome
90           aberrations
91
92       •   query      .. transform VCF/BCF into user-defined formats
93
94       •   reheader   .. modify VCF/BCF header, change sample names
95
96       •   roh          .. identify runs of homo/auto-zygosity
97
98       •   sort        .. sort VCF/BCF files
99
100       •   stats      .. produce VCF/BCF stats (former vcfcheck)
101
102       •   view        .. subset, filter and convert VCF and BCF files
103

LIST OF SCRIPTS

105       Some helper scripts are bundled with the bcftools code.
106
107       •   plot-vcfstats  .. plots the output of stats
108

COMMANDS AND OPTIONS

110   Common Options
111       The following options are common to many bcftools commands. See usage
112       for specific commands to see if they apply.
113
114       FILE
115           Files can be both VCF or BCF, uncompressed or BGZF-compressed. The
116           file "-" is interpreted as standard input. Some tools may require
117           tabix- or CSI-indexed files.
118
119       -c, --collapse snps|indels|both|all|some|none|id
120           Controls  how to treat records with duplicate positions and defines
121           compatible records across multiple input files. Here by
122           "compatible" we mean records which should be considered as
123           identical by the tools. For example, when performing line
124           intersections, the desire may be to consider as identical all sites
125           with matching positions (bcftools isec -c all), or only sites with
126           matching variant type (bcftools isec -c snps  -c indels), or only
127           sites with all alleles identical (bcftools isec -c none).
128
129           none
130               only records with identical REF and ALT alleles are compatible
131
132           some
133               only records where some subset of ALT alleles match are
134               compatible
135
136           all
137               all records are compatible, regardless of whether the ALT
138               alleles match or not. In the case of records with the same
139               position, only the first will be considered and appear on
140               output.
141
142           snps
143               any SNP records are compatible, regardless of whether the ALT
144               alleles match or not. For duplicate positions, only the first
145               SNP record will be considered and appear on output.
146
147           indels
148               all  indel records are compatible, regardless of whether the
149               REF and ALT alleles match or not. For duplicate positions, only
150               the first indel record will be considered and appear on output.
151
152           both
153               abbreviation of "-c indels  -c snps"
154
155           id
156               only records with identical ID column are compatible. Supported
157               by bcftools merge only.
158
159       -f, --apply-filters LIST
160           Skip sites where FILTER column does not contain any of the strings
161           listed in LIST. For example, to include only sites which have no
162           filters set, use -f .,PASS.
163
164       --no-version
165           Do not append version and command line information to the output
166           VCF header.
167
168       -o, --output FILE
169           When output consists of a single stream, write it to FILE rather
170           than to standard output, where it is written by default.
171
172       -O, --output-type b|u|z|v
173           Output compressed BCF (b), uncompressed BCF (u), compressed VCF
174           (z), uncompressed VCF (v). Use the -Ou option when piping between
175           bcftools subcommands to speed up performance by removing
176           unnecessary compression/decompression and VCF←→BCF conversion.
177
178       -r, --regions chr|chr:pos|chr:beg-end|chr:beg-[,...]
179           Comma-separated list of regions, see also -R, --regions-file.
180           Overlapping records are matched even when the starting coordinate
181           is outside of the region, unlike the -t/-T options where only the
182           POS coordinate is checked. Note that -r cannot be used in
183           combination with -R.
184
185       -R, --regions-file FILE
186           Regions can be specified either on command line or in a VCF, BED,
187           or tab-delimited file (the default). The columns of the
188           tab-delimited file can contain either positions (two-column format)
189           or intervals (three-column format): CHROM, POS, and, optionally,
190           END,  where positions are 1-based and inclusive. The columns of the
191           tab-delimited BED file are also CHROM, POS and END (trailing
192           columns are ignored), but coordinates are 0-based, half-open. To
193           indicate that a file be treated as BED rather than the 1-based
194           tab-delimited file, the file must have the ".bed" or ".bed.gz"
195           suffix (case-insensitive). Uncompressed files are stored in memory,
196           while bgzip-compressed and tabix-indexed region files are streamed.
197           Note that sequence names must match exactly, "chr20" is not the
198           same as "20". Also note that chromosome ordering in FILE will be
199           respected, the VCF will be processed in the order in which
200           chromosomes first appear in FILE. However, within chromosomes, the
201           VCF will always be processed in ascending genomic coordinate order
202           no matter what order they appear in FILE. Note that overlapping
203           regions in FILE can result in duplicated out of order positions in
204           the output. This option requires indexed VCF/BCF files. Note that
205           -R cannot be used in combination with -r.
206
207       -s, --samples [&#94;]LIST
208           Comma-separated list of samples to include or exclude if prefixed
209           with "&#94;". The sample order is updated to reflect that given on
210           the command line. Note that in general tags such as INFO/AC,
211           INFO/AN, etc are not updated to correspond to the subset samples.
212           bcftools view is the exception where some tags will be updated
213           (unless the -I, --no-update option is used; see bcftools view
214           documentation). To use updated tags for the subset in another
215           command one can pipe from view into that command. For example:
216
217               bcftools view -Ou -s sample1,sample2 file.vcf | bcftools query -f %INFO/AC\t%INFO/AN\n
218
219       -S, --samples-file FILE
220           File of sample names to include or exclude if prefixed with
221           "&#94;". One sample per line. See also the note above for the -s,
222           --samples option. The sample order is updated to reflect that given
223           in the input file. The command bcftools call accepts an optional
224           second column indicating ploidy (0, 1 or 2) or sex (as defined by
225           --ploidy, for example "F" or "M"), for example:
226
227               sample1    1
228               sample2    2
229               sample3    2
230
231       or
232
233               sample1    M
234               sample2    F
235               sample3    F
236
237       If the second column is not present, the sex "F" is assumed. With
238       bcftools call -C trio, PED file is expected. The program ignores the
239       first column and the last indicates sex (1=male, 2=female), for
240       example:
241
242               ignored_column  daughterA fatherA  motherA  2
243               ignored_column  sonB      fatherB  motherB  1
244
245       -t, --targets [&#94;]chr|chr:pos|chr:from-to|chr:from-[,...]
246           Similar as -r, --regions, but the next position is accessed by
247           streaming the whole VCF/BCF rather than using the tbi/csi index.
248           Both -r and -t options can be applied simultaneously: -r  uses  the
249           index  to  jump  to  a  region and -t discards positions which are
250           not in the targets. Unlike -r, targets can be prefixed with "&#94;"
251           to request logical complement. For example, "&#94;X,Y,MT" indicates
252           that sequences X, Y and MT should be skipped. Yet another
253           difference between the -t/-T and -r/-R is that -r/-R checks for
254           proper overlaps and considers both POS and the end position of an
255           indel, while -t/-T considers the POS coordinate only. Note that -t
256           cannot be used in combination with -T.
257
258       -T, --targets-file [&#94;]FILE
259           Same -t, --targets, but reads regions from a file. Note that -T
260           cannot be used in combination with -t.
261
262           With the call -C alleles command, third column of the targets file
263           must be comma-separated list of alleles, starting with the
264           reference allele. Note that the file must be compressed and
265           indexed. Such a file can be easily created from a VCF using:
266
267               bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' file.vcf | bgzip -c > als.tsv.gz &amp;&amp; tabix -s1 -b2 -e2 als.tsv.gz
268
269       --threads INT
270           Use multithreading with INT worker threads. The option is currently
271           used only for the compression of the output stream, only when
272           --output-type is b or z. Default: 0.
273
274   bcftools annotate [OPTIONS] FILE
275       Add or remove annotations.
276
277       -a, --annotations file
278           Bgzip-compressed and tabix-indexed file with annotations. The file
279           can be VCF, BED, or a tab-delimited file with mandatory columns
280           CHROM, POS (or, alternatively, FROM and TO), optional columns REF
281           and ALT, and arbitrary number of annotation columns. BED files are
282           expected to have the ".bed" or ".bed.gz" suffix (case-insensitive),
283           otherwise a tab-delimited file is assumed. Note that in case of
284           tab-delimited file, the coordinates POS, FROM and TO are one-based
285           and inclusive. When REF and ALT are present, only matching VCF
286           records will be annotated. When multiple ALT alleles are present in
287           the annotation file (given as comma-separated list of alleles), at
288           least one must match one of the alleles in the corresponding VCF
289           record. Similarly, at least one alternate allele from a
290           multi-allelic VCF record must be present in the annotation file.
291           Missing values can be added by providing "." in place of actual
292           value. Note that flag types, such as "INFO/FLAG", can be annotated
293           by including a field with the value "1" to set the flag, "0" to
294           remove it, or "." to keep existing flags. See also -c, --columns
295           and -h, --header-lines.
296
297               # Sample annotation file with columns CHROM, POS, STRING_TAG, NUMERIC_TAG
298               1  752566  SomeString      5
299               1  798959  SomeOtherString 6
300
301       --collapse snps|indels|both|all|some|none
302           Controls how to match records from the annotation file to the
303           target VCF. Effective only when -a is a VCF or BCF. See Common
304           Options for more.
305
306       -c, --columns list
307           Comma-separated list of columns or tags to carry over from the
308           annotation file (see also -a, --annotations). If the annotation
309           file is not a VCF/BCF, list describes the columns of the annotation
310           file and must include CHROM, POS (or, alternatively, FROM and TO),
311           and optionally REF and ALT. Unused columns which should be ignored
312           can be indicated by "-". + If the annotation file is a VCF/BCF,
313           only the edited columns/tags must be present and their order does
314           not matter. The columns ID, QUAL, FILTER, INFO and FORMAT can be
315           edited, where INFO tags can be written both as "INFO/TAG" or simply
316           "TAG", and FORMAT tags can be written as "FORMAT/TAG" or "FMT/TAG".
317           The imported VCF annotations can be renamed as "DST_TAG:=SRC_TAG"
318           or "FMT/DST_TAG:=FMT/SRC_TAG". + To carry over all INFO
319           annotations, use "INFO". To add all INFO annotations except "TAG",
320           use "&#94;INFO/TAG". By default, existing values are replaced. + To
321           add annotations without overwriting existing values (that is, to
322           add missing tags or add values to existing tags with missing
323           values), use "+TAG" instead of "TAG". To append to existing values
324           (rather than replacing or leaving untouched), use "=TAG" (instead
325           of "TAG" or "+TAG"). To replace only existing values without
326           modifying missing annotations, use "-TAG". To match the record also
327           by ID, in addition to REF and ALT, use "~ID". + If the annotation
328           file is not a VCF/BCF, all new annotations must be defined via -h,
329           --header-lines. + See also the -l, --merge-logic option.
330
331       -C, --columns-file file
332           Read the list of columns from a file (normally given via the -c,
333           --columns option). "-" to skip a column of the annotation file. One
334           column name per row, an additional space- or tab-separated field
335           can be present to indicate the merge logic (normally given via the
336           -l, --merge-logic option). This is useful when many annotations are
337           added at once.
338
339       -e, --exclude EXPRESSION
340           exclude sites for which EXPRESSION is true. For valid expressions
341           see EXPRESSIONS.
342
343       --force
344           continue even when parsing errors, such as undefined tags, are
345           encountered. Note this can be an unsafe operation and can result in
346           corrupted BCF files. If this option is used, make sure to sanity
347           check the result thoroughly.
348
349       -h, --header-lines file
350           Lines to append to the VCF header, see also -c, --columns and -a,
351           --annotations. For example:
352
353               ##INFO=<ID=NUMERIC_TAG,Number=1,Type=Integer,Description="Example header line">
354               ##INFO=<ID=STRING_TAG,Number=1,Type=String,Description="Yet another header line">
355
356       -I, --set-id [&#43;]FORMAT
357           assign ID on the fly. The format is the same as in the query
358           command (see below). By default all existing IDs are replaced. If
359           the format string is preceded by "+", only missing IDs will be set.
360           For example, one can use
361
362               bcftools annotate --set-id +'%CHROM\_%POS\_%REF\_%FIRST_ALT' file.vcf
363
364       -i, --include EXPRESSION
365           include only sites for which EXPRESSION is true. For valid
366           expressions see EXPRESSIONS.
367
368       -k, --keep-sites
369           keep sites which do not pass -i and -e expressions instead of
370           discarding them
371
372       -l, --merge-logic
373       tag:first|append|append-missing|unique|sum|avg|min|max[,...]
374           When multiple regions overlap a single record, this option defines
375           how to treat multiple annotation values when setting tag in the
376           destination file: use the first encountered value ignoring the rest
377           (first); append allowing duplicates (append); append even if the
378           appended value is missing, i.e. is a dot (append-missing); append
379           discarding duplicate values (unique); sum the values (sum, numeric
380           fields only); average the values (avg); use the minimum value (min)
381           or the maximum (max). + Note that this option is intended for use
382           with BED or TAB-delimited annotation files only. Moreover, it is
383           effective only when either REF and ALT or BEG and END --columns are
384           present . + Multiple rules can be given either as a comma-separated
385           list or giving the option multiple times. This is an experimental
386           feature.
387
388       -m, --mark-sites TAG
389           annotate sites which are present ("+") or absent ("-") in the -a
390           file with a new INFO/TAG flag
391
392       --no-version
393           see Common Options
394
395       -o, --output FILE
396           see Common Options
397
398       -O, --output-type b|u|z|v
399           see Common Options
400
401       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
402           see Common Options
403
404       -R, --regions-file file
405           see Common Options
406
407       --rename-annots file
408           rename annotations according to the map in file, with "old_name
409           new_name\n" pairs separated by whitespaces, each on a separate
410           line. The old name must be prefixed with the annotation type: INFO,
411           FORMAT, or FILTER.
412
413       --rename-chrs file
414           rename chromosomes according to the map in file, with "old_name
415           new_name\n" pairs separated by whitespaces, each on a separate
416           line.
417
418       -s, --samples [&#94;]LIST
419           subset of samples to annotate, see also Common Options
420
421       -S, --samples-file FILE
422           subset of samples to annotate. If the samples are named differently
423           in the target VCF and the -a, --annotations VCF, the name mapping
424           can be given as "src_name dst_name\n", separated by whitespaces,
425           each pair on a separate line.
426
427       --single-overlaps
428           use this option to keep memory requirements low with very large
429           annotation files. Note, however, that this comes at a cost, only
430           single overlapping intervals are considered in this mode. This was
431           the default mode until the commit af6f0c9 (Feb 24 2019).
432
433       --threads INT
434           see Common Options
435
436       -x, --remove list
437           List of annotations to remove. Use "FILTER" to remove all filters
438           or "FILTER/SomeFilter" to remove a specific filter. Similarly,
439           "INFO" can be used to remove all INFO tags and "FORMAT" to remove
440           all FORMAT tags except GT. To remove all INFO tags except "FOO" and
441           "BAR", use "&#94;INFO/FOO,INFO/BAR" (and similarly for FORMAT and
442           FILTER). "INFO" can be abbreviated to "INF" and "FORMAT" to "FMT".
443
444       Examples:
445
446               # Remove three fields
447               bcftools annotate -x ID,INFO/DP,FORMAT/DP file.vcf.gz
448
449               # Remove all INFO fields and all FORMAT fields except for GT and PL
450               bcftools annotate -x INFO,^FORMAT/GT,FORMAT/PL file.vcf
451
452               # Add ID, QUAL and INFO/TAG, not replacing TAG if already present
453               bcftools annotate -a src.bcf -c ID,QUAL,+TAG dst.bcf
454
455               # Carry over all INFO and FORMAT annotations except FORMAT/GT
456               bcftools annotate -a src.bcf -c INFO,^FORMAT/GT dst.bcf
457
458               # Annotate from a tab-delimited file with six columns (the fifth is ignored),
459               # first indexing with tabix. The coordinates are 1-based.
460               tabix -s1 -b2 -e2 annots.tab.gz
461               bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,POS,REF,ALT,-,TAG file.vcf
462
463               # Annotate from a tab-delimited file with regions (1-based coordinates, inclusive)
464               tabix -s1 -b2 -e3 annots.tab.gz
465               bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,FROM,TO,TAG input.vcf
466
467               # Annotate from a bed file (0-based coordinates, half-closed, half-open intervals)
468               bcftools annotate -a annots.bed.gz -h annots.hdr -c CHROM,FROM,TO,TAG input.vcf
469
470               # Transfer the INFO/END tag, matching by POS,REF,ALT and ID. This example assumes
471               # that INFO/END is already present in the VCF header.
472               bcftools annotate -a annots.tab.gz  -c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf
473
474               # For more examples see http://samtools.github.io/bcftools/howtos/annotate.html
475
476   bcftools call [OPTIONS] FILE
477       This command replaces the former bcftools view caller. Some of the
478       original functionality has been temporarily lost in the process of
479       transition  under  htslib <http://github.com/samtools/htslib>, but will
480       be added back on popular demand. The  original  calling  model  can  be
481       invoked with the -c option.
482
483   File format options:
484       --no-version
485           see Common Options
486
487       -o, --output FILE
488           see Common Options
489
490       -O, --output-type b|u|z|v
491           see Common Options
492
493       --ploidy ASSEMBLY[?]
494           predefined  ploidy,  use list (or any other unused word) to print a
495           list of all predefined assemblies. Append a question mark to  print
496           the actual definition. See also --ploidy-file.
497
498       --ploidy-file FILE
499           ploidy  definition  given  as  a space/tab-delimited list of CHROM,
500           FROM, TO, SEX, PLOIDY. The SEX codes are arbitrary  and  correspond
501           to the ones used by --samples-file. The default ploidy can be given
502           using the starred records (see below), unlisted regions have ploidy
503           2. The default ploidy definition is
504
505               X 1 60000 M 1
506               X 2699521 154931043 M 1
507               Y 1 59373566 M 1
508               Y 1 59373566 F 0
509               MT 1 16569 M 1
510               MT 1 16569 F 1
511               *  * *     M 2
512               *  * *     F 2
513
514       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
515           see Common Options
516
517       -R, --regions-file file
518           see Common Options
519
520       -s, --samples LIST
521           see Common Options
522
523       -S, --samples-file FILE
524           see Common Options
525
526       -t, --targets LIST
527           see Common Options
528
529       -T, --targets-file FILE
530           see Common Options
531
532       --threads INT
533           see Common Options
534
535   Input/output options:
536       -A, --keep-alts
537           output all alternate alleles present in the alignments even if they
538           do not appear in any of the genotypes
539
540       -f, --format-fields list
541           comma-separated list of FORMAT fields to output  for  each  sample.
542           Currently  GQ  and  GP  fields  are supported. For convenience, the
543           fields can be given as lower case letters.  Prefixed  with  "&#94;"
544           indicates  a  request for tag removal of auxiliary tags useful only
545           for calling.
546
547       -F, --prior-freqs AN,AC
548           take advantage of prior knowledge of population allele frequencies.
549           The workflow looks like this:
550
551               # Extract AN,AC values from an existing VCF, such 1000Genomes
552               bcftools query -f'%CHROM\t%POS\t%REF\t%ALT\t%AN\t%AC\n' 1000Genomes.bcf | bgzip -c > AFs.tab.gz
553
554               # If the tags AN,AC are not already present, use the +fill-tags plugin
555               bcftools +fill-tags 1000Genomes.bcf | bcftools query -f'%CHROM\t%POS\t%REF\t%ALT\t%AN\t%AC\n' | bgzip -c > AFs.tab.gz
556               tabix -s1 -b2 -e2 AFs.tab.gz
557
558               # Create a VCF header description, here we name the tags REF_AN,REF_AC
559               cat AFs.hdr
560               ##INFO=<ID=REF_AN,Number=1,Type=Integer,Description="Total number of alleles in reference genotypes">
561               ##INFO=<ID=REF_AC,Number=A,Type=Integer,Description="Allele count in reference genotypes for each ALT allele">
562
563               # Now before calling, stream the raw mpileup output through `bcftools annotate` to add the frequencies
564               bcftools mpileup [...] -Ou | bcftools annotate -a AFs.tab.gz -h AFs.hdr -c CHROM,POS,REF,ALT,REF_AN,REF_AC -Ou | bcftools call -mv -F REF_AN,REF_AC [...]
565
566       -G, --group-samples FILE|-
567           by  default,  all  samples  are  assumed  to  come  from  a  single
568           population. This option allows to group  samples  into  populations
569           and apply the HWE assumption within but not across the populations.
570           FILE is a tab-delimited text file with sample names  in  the  first
571           column and group names in the second column. If - is given instead,
572           no HWE assumption is made  at  all  and  single-sample  calling  is
573           performed.  (Note  that in low coverage data this inflates the rate
574           of false  positives.)  The  -G  option  requires  the  presence  of
575           per-sample  FORMAT/QS  or  FORMAT/AD  tag  generated  with bcftools
576           mpileup -a QS (or -a AD).
577
578       -g, --gvcf INT
579           output also gVCF blocks of homozygous REF calls. The parameter  INT
580           is  the  minimum per-sample depth required to include a site in the
581           non-variant block.
582
583       -i, --insert-missed INT
584           output  also  sites  missed  by  mpileup   but   present   in   -T,
585           --targets-file.
586
587       -M, --keep-masked-ref
588           output sites where REF allele is N
589
590       -V, --skip-variants snps|indels
591           skip indel/SNP sites
592
593       -v, --variants-only
594           output variant sites only
595
596   Consensus/variant calling options:
597       -c, --consensus-caller
598           the original samtools/bcftools calling method (conflicts with -m)
599
600       -C, --constrain alleles|trio
601
602           alleles
603               call genotypes given alleles. See also -T, --targets-file.
604
605           trio
606               call  genotypes  given  the father-mother-child constraint. See
607               also -s, --samples and -n, --novel-rate.
608
609       -m, --multiallelic-caller
610           alternative  model  for  multiallelic  and   rare-variant   calling
611           designed   to  overcome  known  limitations  in  -c  calling  model
612           (conflicts with -c)
613
614       -n, --novel-rate float[,...]
615           likelihood of novel mutation for constrained -C trio  calling.  The
616           trio   genotype   calling  maximizes  likelihood  of  a  particular
617           combination  of  genotypes  for  father,  mother  and   the   child
618           P(F=i,M=j,C=k)  =  P(unconstrained) * Pn + P(constrained) * (1-Pn).
619           By providing three values, the mutation rate Pn is  set  explicitly
620           for SNPs, deletions and insertions, respectively. If two values are
621           given, the first is interpreted as the mutation rate  of  SNPs  and
622           the  second  is  used  to  calculate  the  mutation  rate of indels
623           according  to  their  length   as   Pn=float*exp(-a-b*len),   where
624           a=22.8689,  b=0.2994  for  insertions  and  a=21.9313, b=0.2856 for
625           deletions [pubmed:23975140]. If only one value is given,  the  same
626           mutation rate Pn is used for SNPs and indels.
627
628       -p, --pval-threshold float
629           with -c, accept variant if P(ref|D) < float.
630
631       -P, --prior float
632           expected  substitution  rate,  or 0 to disable the prior. Only with
633           -m.
634
635       -t, --targets file|chr|chr:pos|chr:from-to|chr:from-[,...]
636           see Common Options
637
638       -X, --chromosome-X
639           haploid output for male samples (requires PED file with -s)
640
641       -Y, --chromosome-Y
642           haploid output for males and skips females (requires PED file  with
643           -s)
644
645   bcftools cnv [OPTIONS] FILE
646       Copy  number  variation  caller,  requires  a  VCF  annotated  with the
647       Illumina’s B-allele frequency (BAF) and Log  R  Ratio  intensity  (LRR)
648       values.  The  HMM  considers  the  following  copy  number states: CN 2
649       (normal), 1 (single-copy  loss),  0  (complete  loss),  3  (single-copy
650       gain).
651
652   General Options:
653       -c, --control-sample string
654           optional  control  sample  name.  If  given,  pairwise  calling  is
655           performed and the -P  option can be used
656
657       -f, --AF-file file
658           read allele frequencies from  a tab-delimited file with the columns
659           CHR,POS,REF,ALT,AF
660
661       -o, --output-dir path
662           output directory
663
664       -p, --plot-threshold float
665           call  matplotlib  to  produce plots for chromosomes with quality at
666           least float, useful for visual inspection of the calls. With -p  0,
667           plots  for  all  chromosomes  will  be  generated.  If not given, a
668           matplotlib script will be created but not called.
669
670       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
671           see Common Options
672
673       -R, --regions-file file
674           see Common Options
675
676       -s, --query-sample string
677           query sample name
678
679       -t, --targets LIST
680           see Common Options
681
682       -T, --targets-file FILE
683           see Common Options
684
685   HMM Options:
686       -a, --aberrant float[,float]
687           fraction of aberrant cells in query and control.  The  hallmark  of
688           duplications  and  contaminations  is the BAF value of heterozygous
689           markers which is dependent  on  the  fraction  of  aberrant  cells.
690           Sensitivity  to  smaller  fractions  of  cells  can be increased by
691           setting -a to a lower value. Note however, that this comes  at  the
692           cost of increased false discovery rate.
693
694       -b, --BAF-weight float
695           relative contribution from BAF
696
697       -d, --BAF-dev float[,float]
698           expected  BAF  deviation  in  query  and  control,  i.e.  the noise
699           observed in the data.
700
701       -e, --err-prob float
702           uniform error probability
703
704       -l, --LRR-weight float
705           relative contribution from LRR. With noisy data,  this  option  can
706           have  big  effect  on the number of calls produced. In truly random
707           noise (such as in simulated data), the value  should  be  set  high
708           (1.0),  but  in  the  presence of systematic noise when LRR are not
709           informative, lower values result in cleaner calls (0.2).
710
711       -L, --LRR-smooth-win int
712           reduce LRR noise by applying moving average given this window size
713
714       -O, --optimize float
715           iteratively estimate the fraction of aberrant cells,  down  to  the
716           given  fraction.  Lowering  this value from the default 1.0 to say,
717           0.3, can help discover more events but also increases noise
718
719       -P, --same-prob float
720           the prior probability of the query and the control sample being the
721           same.  Setting  to  0 calls both independently, setting to 1 forces
722           the same copy number state in both.
723
724       -x, --xy-prob float
725           the HMM probability of transition to  another  copy  number  state.
726           Increasing this values leads to smaller and more frequent calls.
727
728   bcftools concat [OPTIONS] FILE1 FILE2 [...]
729       Concatenate  or  combine  VCF/BCF files. All source files must have the
730       same sample columns appearing in the  same  order.  Can  be  used,  for
731       example,  to concatenate chromosome VCFs into one VCF, or combine a SNP
732       VCF and an indel VCF into one. The input files must be  sorted  by  chr
733       and  position.  The files must be given in the correct order to produce
734       sorted  VCF  on  output  unless  the  -a,  --allow-overlaps  option  is
735       specified.  With the --naive option, the files are concatenated without
736       being recompressed, which is very fast..
737
738       -a, --allow-overlaps
739           First coordinate of the next file can precede last  record  of  the
740           current file.
741
742       -c, --compact-PS
743           Do not output PS tag at each site, only at the start of a new phase
744           set block.
745
746       -d, --rm-dups snps|indels|both|all|exact
747           Output duplicate records of  specified  type  present  in  multiple
748           files only once. Requires -a, --allow-overlaps.
749
750       -D, --remove-duplicates
751           Alias for -d exact
752
753       -f, --file-list FILE
754           Read file names from FILE, one file name per line.
755
756       -l, --ligate
757           Ligate  phased  VCFs  by  matching phase at overlapping haplotypes.
758           Note that the option is intended for  VCFs  with  perfect  overlap,
759           sites  in  overlapping  regions present in one but missing in other
760           are dropped.
761
762       --no-version
763           see Common Options
764
765       -n, --naive
766           Concatenate VCF or BCF files without recompression.  This  is  very
767           fast  but  requires that all files are of the same type (all VCF or
768           all BCF) and have the same headers. This is because  all  tags  and
769           chromosome  names  in  the BCF body rely on the order of the contig
770           and tag definitions in the header. A header check compatibility  is
771           performed  and the program throws an error if it is not safe to use
772           the option.
773
774       --naive-force
775           Same  as  --naive,  but  header  compatibility  is   not   checked.
776           Dangerous, use with caution.
777
778       -o, --output FILE
779           see Common Options
780
781       -O, --output-type b|u|z|v
782           see Common Options
783
784       -q, --min-PQ INT
785           Break phase set if phasing quality is lower than INT
786
787       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
788           see Common Options. Requires -a, --allow-overlaps.
789
790       -R, --regions-file FILE
791           see Common Options. Requires -a, --allow-overlaps.
792
793       --threads INT
794           see Common Options
795
796   bcftools consensus [OPTIONS] FILE
797       Create consensus sequence by applying VCF variants to a reference fasta
798       file. By default, the program  will  apply  all  ALT  variants  to  the
799       reference  fasta  to  obtain the consensus sequence. Using the --sample
800       (and, optionally, --haplotype) option will apply  genotype  (haplotype)
801       calls from FORMAT/GT. Note that the program does not act as a primitive
802       variant caller and ignores allelic depth information, such  as  INFO/AD
803       or FORMAT/AD. For that, consider using the setGT plugin.
804
805       -c, --chain FILE
806           write a chain file for liftover
807
808       -e, --exclude EXPRESSION
809           exclude  sites  for which EXPRESSION is true. For valid expressions
810           see EXPRESSIONS.
811
812       -f, --fasta-ref FILE
813           reference sequence in fasta format
814
815       -H, --haplotype 1|2|R|A|I|LR|LA|SR|SA|1pIu|2pIu
816           choose which allele from the FORMAT/GT field to use (the codes  are
817           case-insensitive):
818
819           1
820               the first allele, regardless of phasing
821
822           2
823               the second allele, regardless of phasing
824
825           R
826               the REF allele (in heterozygous genotypes)
827
828           A
829               the ALT allele (in heterozygous genotypes)
830
831           I
832               IUPAC code for all genotypes
833
834           LR, LA
835               the  longer  allele.  If both have the same length, use the REF
836               allele (LR), or the ALT allele  (LA)
837
838           SR, SA
839               the shorter allele. If both have the same length, use  the  REF
840               allele (SR), or the ALT allele  (SA)
841
842           1pIu, 2pIu
843               first/second  allele  for  phased  genotypes and IUPAC code for
844               unphased genotypes
845
846                   This option requires *-s*, unless exactly one sample is present in the VCF
847
848       -i, --include EXPRESSION
849           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
850           expressions see EXPRESSIONS.
851
852       -I, --iupac-codes
853           output variants in the form of IUPAC ambiguity codes
854
855       --mark-del CHAR
856           instead of removing sequence, insert CHAR for deletions
857
858       --mark-ins uc|lc
859           highlight  inserted  sequence  in uppercase (uc) or lowercase (lc),
860           leaving the rest of the sequence as is
861
862       --mark-snv uc|lc
863           highlight  substitutions  in  uppercase  (uc)  or  lowercase  (lc),
864           leaving the rest of the sequence as is
865
866       -m, --mask FILE
867           BED  file  or  TAB  file  with  regions  to be replaced with N (the
868           default) or as  specified  by  the  next  --mask-with  option.  See
869           discussion  of  --regions-file  in  Common  Options for file format
870           details.
871
872       --mask-with CHAR|lc|uc
873           replace  sequence  from  --mask  with  CHAR,  skipping  overlapping
874           variants, or change to lowercase (lc) or uppercase (uc)
875
876       -M, --missing CHAR
877           instead  of  skipping  the  missing genotypes, output the character
878           CHAR (e.g. "?")
879
880       -o, --output FILE
881           write output to a file
882
883       -s, --sample NAME
884           apply variants of the given sample
885
886       Examples:
887
888               # Apply variants present in sample "NA001", output IUPAC codes for hets
889               bcftools consensus -i -s NA001 -f in.fa in.vcf.gz > out.fa
890
891               # Create consensus for one region. The fasta header lines are then expected
892               # in the form ">chr:from-to".
893               samtools faidx ref.fa 8:11870-11890 | bcftools consensus in.vcf.gz -o out.fa
894
895   bcftools convert [OPTIONS] FILE
896   VCF input options:
897       -e, --exclude EXPRESSION
898           exclude sites for which EXPRESSION is true. For  valid  expressions
899           see EXPRESSIONS.
900
901       -i, --include EXPRESSION
902           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
903           expressions see EXPRESSIONS.
904
905       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
906           see Common Options
907
908       -R, --regions-file FILE
909           see Common Options
910
911       -s, --samples LIST
912           see Common Options
913
914       -S, --samples-file FILE
915           see Common Options
916
917       -t, --targets LIST
918           see Common Options
919
920       -T, --targets-file FILE
921           see Common Options
922
923   VCF output options:
924       --no-version
925           see Common Options
926
927       -o, --output FILE
928           see Common Options
929
930       -O, --output-type b|u|z|v
931           see Common Options
932
933       --threads INT
934           see Common Options
935
936   GEN/SAMPLE conversion:
937       -G, --gensample2vcf prefix or gen-file,sample-file
938           convert IMPUTE2 output to VCF. The second column  must  be  of  the
939           form  "CHROM:POS_REF_ALT"  to detect possible strand swaps; IMPUTE2
940           leaves the first one empty ("--") when sites from  reference  panel
941           are filled in. See also -g below.
942
943       -g, --gensample prefix or gen-file,sample-file
944           convert  from VCF to gen/sample format used by IMPUTE2 and SHAPEIT.
945           The columns of .gen file format  are  ID1,ID2,POS,A,B  followed  by
946           three  genotype  probabilities P(AA), P(AB), P(BB) for each sample.
947           In order to prevent strand swaps, the program uses IDs of the  form
948           "CHROM:POS_REF_ALT". For example:
949
950             .gen
951             ----
952             1:111485207_G_A 1:111485207_G_A 111485207 G A 0 1 0 0 1 0
953             1:111494194_C_T 1:111494194_C_T 111494194 C T 0 1 0 0 0 1
954
955             .sample
956             -------
957             ID_1 ID_2 missing
958             0 0 0
959             sample1 sample1 0
960             sample2 sample2 0
961
962       --tag STRING
963           tag to take values for .gen file: GT,PL,GL,GP
964
965       --chrom
966           output chromosome in the first column instead of CHROM:POS_REF_ALT
967
968       --sex FILE
969           output sex column in the sample file. The FILE format is
970
971               MaleSample    M
972               FemaleSample  F
973
974       --vcf-ids
975           output VCF IDs in the second column instead of CHROM:POS_REF_ALT
976
977   gVCF conversion:
978       --gvcf2vcf
979           convert gVCF to VCF, expanding REF blocks into sites. Note that the
980           -i and -e options  work  differently  with  this  switch.  In  this
981           situation  the  filtering  expressions define which sites should be
982           expanded and which sites should be left unmodified, but  all  sites
983           are printed on output. In order to drop sites, stream first through
984           bcftools view.
985
986       -f, --fasta-ref file
987           reference sequence in fasta format. Must be indexed  with  samtools
988           faidx
989
990   HAP/SAMPLE conversion:
991       --hapsample2vcf prefix or hap-file,sample-file
992           convert from hap/sample format to VCF. The columns of .hap file are
993           similar to .gen file  above,  but  there  are  only  two  haplotype
994           columns  per sample. Note that the first column of the .hap file is
995           expected to be in the form "CHR:POS_REF_ALT(_END)?", with the  _END
996           being optional for defining the INFO/END tag when ALT is a symbolic
997           allele, for example:
998
999             .hap
1000             ----
1001             1:111485207_G_A rsID1 111485207 G A 0 1 0 0
1002             1:111494194_C_T rsID2 111494194 C T 0 1 0 0
1003             1:111495231_A_<DEL>_111495784 rsID3 111495231 A <DEL> 0 0 1 0
1004
1005       --hapsample prefix or hap-file,sample-file
1006           convert from VCF to hap/sample format used by IMPUTE2 and  SHAPEIT.
1007           The  columns  of .hap file begin with ID,RSID,POS,REF,ALT. In order
1008           to  prevent  strand  swaps,  the  program  uses  IDs  of  the  form
1009           "CHROM:POS_REF_ALT".
1010
1011       --haploid2diploid
1012           with  -h  option  converts  haploid genotypes to homozygous diploid
1013           genotypes. For example, the program will print 0 0 instead  of  the
1014           default  0  -.  This  is  useful  for  programs which do not handle
1015           haploid genotypes correctly.
1016
1017       --sex FILE
1018           output sex column in the sample file. The FILE format is
1019
1020               MaleSample    M
1021               FemaleSample  F
1022
1023       --vcf-ids
1024           output VCF IDs instead of "CHROM:POS_REF_ALT" IDs
1025
1026   HAP/LEGEND/SAMPLE conversion:
1027       -H, --haplegendsample2vcf prefix or hap-file,legend-file,sample-file
1028           convert from hap/legend/sample format used by IMPUTE2 to  VCF,  see
1029           also -h, --hapslegendsample below.
1030
1031       -h, --haplegendsample prefix or hap-file,legend-file,sample-file
1032           convert  from  VCF  to hap/legend/sample format used by IMPUTE2 and
1033           SHAPEIT. The columns of .legend file ID,POS,REF,ALT.  In  order  to
1034           prevent   strand   swaps,   the   program  uses  IDs  of  the  form
1035           "CHROM:POS_REF_ALT". The .sample file is quite basic at the  moment
1036           with columns for population, group and sex expected to be edited by
1037           the user. For example:
1038
1039             .hap
1040             -----
1041             0 1 0 0 1 0
1042             0 1 0 0 0 1
1043
1044             .legend
1045             -------
1046             id position a0 a1
1047             1:111485207_G_A 111485207 G A
1048             1:111494194_C_T 111494194 C T
1049
1050             .sample
1051             -------
1052             sample population group sex
1053             sample1 sample1 sample1 2
1054             sample2 sample2 sample2 2
1055
1056       --haploid2diploid
1057           with -h option converts haploid  genotypes  to  homozygous  diploid
1058           genotypes.  For  example, the program will print 0 0 instead of the
1059           default 0 -. This is  useful  for  programs  which  do  not  handle
1060           haploid genotypes correctly.
1061
1062       --sex FILE
1063           output sex column in the sample file. The FILE format is
1064
1065               MaleSample    M
1066               FemaleSample  F
1067
1068       --vcf-ids
1069           output VCF IDs instead of "CHROM:POS_REF_ALT" IDs
1070
1071   TSV conversion:
1072       --tsv2vcf file
1073           convert  from  TSV (tab-separated values) format (such as generated
1074           by 23andMe) to VCF. The input file fields can  be  tab-  or  space-
1075           delimited
1076
1077       -c, --columns list
1078           comma-separated  list  of  fields in the input file. In the current
1079           version, the fields CHROM, POS, ID, and AA  are  expected  and  can
1080           appear  in  arbitrary order, columns which should be ignored in the
1081           input file can be indicated by "-". The AA field lists  alleles  on
1082           the  forward reference strand, for example "CC" or "CT" for diploid
1083           genotypes  or  "C"  for  haploid   genotypes   (sex   chromosomes).
1084           Insertions and deletions are not supported yet, missing data can be
1085           indicated with "--".
1086
1087       -f, --fasta-ref file
1088           reference sequence in fasta format. Must be indexed  with  samtools
1089           faidx
1090
1091       -s, --samples LIST
1092           list of sample names. See Common Options
1093
1094       -S, --samples-file FILE
1095           file of sample names. See Common Options
1096
1097       Example:
1098
1099           # Convert 23andme results into VCF
1100           bcftools convert -c ID,CHROM,POS,AA -s SampleName -f 23andme-ref.fa --tsv2vcf 23andme.txt -Oz -o out.vcf.gz
1101
1102   bcftools csq [OPTIONS] FILE
1103       Haplotype  aware consequence predictor which correctly handles combined
1104       variants such as MNPs split over multiple VCF records,  SNPs  separated
1105       by  an  intron  (but  adjacent  in  the  spliced  transcript) or nearby
1106       frame-shifting  indels  which  in   combination   in   fact   are   not
1107       frame-shifting.
1108
1109       The  output  VCF  is  annotated  with  INFO/BCSQ  and  FORMAT/BCSQ  tag
1110       (configurable with the -c option). The latter is a bitmask  of  indexes
1111       to INFO/BCSQ, with interleaved haplotypes. See the usage examples below
1112       for using the %TBCSQ converter in query for  extracting  a  more  human
1113       readable form from this bitmask. The construction of the bitmask limits
1114       the number of consequences that can be referenced  per  sample  in  the
1115       FORMAT/BCSQ  tags. By default this is 15, but if more are required, see
1116       the --ncsq option.
1117
1118       The program requires on input a VCF/BCF file, the reference  genome  in
1119       fasta  format  (--fasta-ref)  and  genomic  features in the GFF3 format
1120       downloadable from the Ensembl website  (--gff-annot),  and  outputs  an
1121       annotated   VCF/BCF  file.  Currently,  only  Ensembl  GFF3  files  are
1122       supported.
1123
1124       By default, the input VCF should be phased. If  phase  is  unknown,  or
1125       only partially known, the --phase option can be used to indicate how to
1126       handle unphased data. Alternatively, haplotype  aware  calling  can  be
1127       turned off with the --local-csq option.
1128
1129       If   conflicting   (overlapping)  variants  within  one  haplotype  are
1130       detected, a warning will be emitted and predictions will  be  based  on
1131       only the first variant in the analysis.
1132
1133       Symbolic alleles are not supported. They will remain unannotated in the
1134       output VCF and are ignored for the prediction analysis.
1135
1136       -c, --custom-tag STRING
1137           use this custom tag to store consequences rather than  the  default
1138           BCSQ tag
1139
1140       -B, --trim-protein-seq INT
1141           abbreviate   protein-changing   predictions   to   maximum  of  INT
1142           aminoacids. For example, instead  of  writing  the  whole  modified
1143           protein sequence with potentially hundreds of aminoacids, with -B 1
1144           only an  abbreviated  version  such  as  25E..329>25G..94  will  be
1145           written.
1146
1147       -e, --exclude EXPRESSION
1148           exclude  sites  for which EXPRESSION is true. For valid expressions
1149           see EXPRESSIONS.
1150
1151       -f, --fasta-ref FILE
1152           reference sequence in fasta format (required)
1153
1154       --force
1155           run even if some sanity checks fail. Currently the option allows to
1156           skip transcripts in malformatted GFFs with incorrect phase
1157
1158       -g, --gff-annot FILE
1159           GFF3 annotation file (required), such as <ftp://ftp.ensembl.org/
1160           pub/current_gff3/homo_sapiens>. An example of a minimal working GFF
1161           file:
1162
1163               # The program looks for "CDS", "exon", "three_prime_UTR" and "five_prime_UTR" lines,
1164               # looks up their parent transcript (determined from the "Parent=transcript:" attribute),
1165               # the gene (determined from the transcript's "Parent=gene:" attribute), and the biotype
1166               # (the most interesting is "protein_coding").
1167               #
1168               # Attributes required for
1169               #   gene lines:
1170               #   - ID=gene:<gene_id>
1171               #   - biotype=<biotype>
1172               #   - Name=<gene_name>      [optional]
1173               #
1174               #   transcript lines:
1175               #   - ID=transcript:<transcript_id>
1176               #   - Parent=gene:<gene_id>
1177               #   - biotype=<biotype>
1178               #
1179               #   other lines (CDS, exon, five_prime_UTR, three_prime_UTR):
1180               #   - Parent=transcript:<transcript_id>
1181               #
1182               # Supported biotypes:
1183               #   - see the function gff_parse_biotype() in bcftools/csq.c
1184
1185               1   ignored_field  gene            21  2148  . -   . ID=gene:GeneId;biotype=protein_coding;Name=GeneName
1186               1   ignored_field  transcript      21  2148  . -   . ID=transcript:TranscriptId;Parent=gene:GeneId;biotype=protein_coding
1187               1   ignored_field  three_prime_UTR 21  2054  . -   . Parent=transcript:TranscriptId
1188               1   ignored_field  exon            21  2148  . -   . Parent=transcript:TranscriptId
1189               1   ignored_field  CDS             21  2148  . -   1   Parent=transcript:TranscriptId
1190               1   ignored_field  five_prime_UTR  210 2148  . -   . Parent=transcript:TranscriptId
1191
1192       -i, --include EXPRESSION
1193           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
1194           expressions see EXPRESSIONS.
1195
1196       -l, --local-csq
1197           switch  off  haplotype-aware  calling,  run  localized  predictions
1198           considering only one VCF record at a time
1199
1200       -n, --ncsq INT
1201           maximum  number  of per-haplotype consequences to consider for each
1202           site. The INFO/BCSQ column includes all consequences, but only  the
1203           first INT will be referenced by the FORMAT/BCSQ fields. The default
1204           value is 15 which corresponds to one  32-bit  integer  per  diploid
1205           sample,   after   accounting   for   values  reserved  by  the  BCF
1206           specification. Note that increasing the value  leads  to  increased
1207           size of the output BCF.
1208
1209       --no-version
1210           see Common Options
1211
1212       -o, --output FILE
1213           see Common Options
1214
1215       -O, --output-type b|t|u|z|v
1216           see  Common Options. In addition, a custom tab-delimited plain text
1217           output can be printed (t).
1218
1219       -p, --phase a|m|r|R|s
1220           how to handle unphased heterozygous genotypes:
1221
1222           a
1223               take GTs as is, create haplotypes regardless of  phase  (0/1  →
1224               0|1)
1225
1226           m
1227               merge all GTs into a single haplotype (0/1 → 1, 1/2 → 1)
1228
1229           r
1230               require phased GTs, throw an error on unphased heterozygous GTs
1231
1232           R
1233               create  non-reference  haplotypes if possible (0/1 → 1|1, 1/2 →
1234               1|2)
1235
1236           s
1237               skip unphased heterozygous GTs
1238
1239       -q, --quiet
1240           suppress warning messages
1241
1242       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
1243           see Common Options
1244
1245       -R, --regions-file FILE
1246           see Common Options
1247
1248       -s, --samples LIST
1249           samples to include or "-" to apply all variants and ignore samples
1250
1251       -S, --samples-file FILE
1252           see Common Options
1253
1254       -t, --targets LIST
1255           see Common Options
1256
1257       -T, --targets-file FILE
1258           see Common Options
1259
1260       Examples:
1261
1262               # Basic usage
1263               bcftools csq -f hs37d5.fa -g Homo_sapiens.GRCh37.82.gff3.gz in.vcf -Ob -o out.bcf
1264
1265               # Extract the translated haplotype consequences. The following TBCSQ variations
1266               # are recognised:
1267               #   %TBCSQ    .. print consequences in all haplotypes in separate columns
1268               #   %TBCSQ{0} .. print the first haplotype only
1269               #   %TBCSQ{1} .. print the second haplotype only
1270               #   %TBCSQ{*} .. print a list of unique consequences present in either haplotype
1271               bcftools query -f'[%CHROM\t%POS\t%SAMPLE\t%TBCSQ\n]' out.bcf
1272
1273       Examples of BCSQ annotation:
1274
1275               # Two separate VCF records at positions 2:122106101 and 2:122106102
1276               # change the same codon. This UV-induced C>T dinucleotide mutation
1277               # has been annotated fully at the position 2:122106101 with
1278               #   - consequence type
1279               #   - gene name
1280               #   - ensembl transcript ID
1281               #   - coding strand (+ fwd, - rev)
1282               #   - amino acid position (in the coding strand orientation)
1283               #   - list of corresponding VCF variants
1284               # The annotation at the second position gives the position of the full
1285               # annotation
1286               BCSQ=missense|CLASP1|ENST00000545861|-|1174P>1174L|122106101G>A+122106102G>A
1287               BCSQ=@122106101
1288
1289               # A frame-restoring combination of two frameshift insertions C>CG and T>TGG
1290               BCSQ=@46115084
1291               BCSQ=inframe_insertion|COPZ2|ENST00000006101|-|18AGRGP>18AQAGGP|46115072C>CG+46115084T>TGG
1292
1293               # Stop gained variant
1294               BCSQ=stop_gained|C2orf83|ENST00000264387|-|141W>141*|228476140C>T
1295
1296               # The consequence type of a variant downstream from a stop are prefixed with *
1297               BCSQ=*missense|PER3|ENST00000361923|+|1028M>1028T|7890117T>C
1298
1299   bcftools filter [OPTIONS] FILE
1300       Apply fixed-threshold filters.
1301
1302       -e, --exclude EXPRESSION
1303           exclude sites for which EXPRESSION is true. For  valid  expressions
1304           see EXPRESSIONS.
1305
1306       -g, --SnpGap INT[:'indel',mnp,bnd,other,overlap]
1307           filter  SNPs  within  INT  base  pairs  of  an indel or other other
1308           variant type. The  following  example  demonstrates  the  logic  of
1309           --SnpGap 3 applied on a deletion and an insertion:
1310
1311           The SNPs at positions 1 and 7 are filtered, positions 0 and 8 are not:
1312                    0123456789
1313               ref  .G.GT..G..
1314               del  .A.G-..A..
1315           Here the positions 1 and 6 are filtered, 0 and 7 are not:
1316                    0123-456789
1317               ref  .G.G-..G..
1318               ins  .A.GT..A..
1319
1320       -G, --IndelGap INT
1321           filter  clusters  of  indels  separated  by INT or fewer base pairs
1322           allowing only one to pass. The following example  demonstrates  the
1323           logic of --IndelGap 2 applied on a deletion and an insertion:
1324
1325           The second indel is filtered:
1326                    012345678901
1327               ref  .GT.GT..GT..
1328               del  .G-.G-..G-..
1329           And similarly here, the second is filtered:
1330                    01 23 456 78
1331               ref  .A-.A-..A-..
1332               ins  .AT.AT..AT..
1333
1334       -i, --include EXPRESSION
1335           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
1336           expressions see EXPRESSIONS.
1337
1338       -m, --mode [+x]
1339           define behaviour at sites with  existing  FILTER  annotations.  The
1340           default  mode  replaces existing filters of failed sites with a new
1341           FILTER  string  while  leaving  sites  which  pass  untouched  when
1342           non-empty  and  setting to "PASS" when the FILTER string is absent.
1343           The "+" mode appends new FILTER strings of failed sites instead  of
1344           replacing  them. The "x" mode resets filters of sites which pass to
1345           "PASS". Modes "+" and "x" can both be set.
1346
1347       --no-version
1348           see Common Options
1349
1350       -o, --output FILE
1351           see Common Options
1352
1353       -O, --output-type b|u|z|v
1354           see Common Options
1355
1356       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
1357           see Common Options
1358
1359       -R, --regions-file file
1360           see Common Options
1361
1362       -s, --soft-filter STRING|+
1363           annotate FILTER column with STRING or, with +, a unique filter name
1364           generated by the program ("Filter%d").
1365
1366       -S, --set-GTs .|0
1367           set  genotypes  of failed samples to missing value (.) or reference
1368           allele (0)
1369
1370       -t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
1371           see Common Options
1372
1373       -T, --targets-file file
1374           see Common Options
1375
1376       --threads INT
1377           see Common Options
1378
1379   bcftools gtcheck [OPTIONS] [-g genotypes.vcf.gz] query.vcf.gz
1380       Checks sample identity. The program can operate in two modes. If the -g
1381       option  is  given, the identity of samples from query.vcf.gz is checked
1382       against the samples in the -g file. Without the -g option, multi-sample
1383       cross-check of samples in query.vcf.gz is performed.
1384
1385       --distinctive-sites NUM[,MEM[,DIR]]
1386           Find  sites that can distinguish between at least NUM sample pairs.
1387           If the number is smaller or equal to 1, it is  interpreted  as  the
1388           fraction  of pairs. The optional MEM string sets the maximum memory
1389           used for in-memory sorting and DIR is the temporary  directory  for
1390           external sorting. This option requires also --pairs to be given.
1391
1392       --dry-run
1393           Stop after first record to estimate required time.
1394
1395       -e, --error-probability INT
1396           Interpret genotypes and genotype likelihoods probabilistically. The
1397           value of INT represents genotype quality when GT tag is used  (e.g.
1398           Q=30  represents one error in 1,000 genotypes and Q=40 one error in
1399           10,000 genotypes) and is ignored when PL tag is used (in that  case
1400           an  arbitrary  non-zero  integer can be provided). See also the -u,
1401           --use option below. If set to 0,  the  discordance  equals  to  the
1402           number  of  mismatching  genotypes  when  GT  vs GT is compared. If
1403           performance is an issue, set to 0 for faster run but less  accurate
1404           results.
1405
1406       -g, --genotypes FILE
1407           VCF/BCF file with reference genotypes to compare against
1408
1409       -H, --homs-only
1410           Homozygous  genotypes only, useful with low coverage data (requires
1411           -g, --genotypes)
1412
1413       --n-matches INT
1414           Print only top INT matches for each sample, 0  for  unlimited.  Use
1415           negative value to sort by HWE probability rather than the number of
1416           discordant sites. Note that average score is used to determine  the
1417           top matches, not absolute values.
1418
1419       --no-HWE-prob
1420           Disable   calculation   of   HWE   probability   to  reduce  memory
1421           requirements with comparisons between very large number  of  sample
1422           pairs.
1423
1424       -p, --pairs LIST
1425           A  comma-separated  list  of  sample  pairs to compare. When the -g
1426           option is given, the first sample must be from the query file,  the
1427           second   from   the   -g  file,  third  from  the  query  file  etc
1428           (qry,gt[,qry,gt..]). Without the -g option, the pairs  are  created
1429           the   same   way   but   both  samples  are  from  the  query  file
1430           (qry,qry[,qry,qry..])
1431
1432       -P, --pairs-file FILE
1433           A file with tab-delimited sample pairs to compare. The first sample
1434           in  the  pair  must  come  from the query file, the second from the
1435           genotypes file when -g is given
1436
1437       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
1438           Restrict to comma-separated list of regions, see Common Options
1439
1440       *-R, --regions-file' FILE
1441           Restrict to regions listed in a file, see Common Options
1442
1443       -s, --samples [qry|gt]:'LIST':
1444           List of query samples or -g samples.  If  neither  -s  nor  -S  are
1445       given, all possible sample
1446           pair combinations are compared
1447
1448       -S, --samples-file [qry|gt]:'FILE'
1449           File  with the query or -g samples to compare. If neither -s nor -S
1450       are given, all possible sample
1451           pair combinations are compared
1452
1453       -t, --targets file
1454           see Common Options
1455
1456       -T, --targets-file file
1457           see Common Options
1458
1459       -u, --use TAG1[,TAG2]
1460           specifies which tag to use in the query  file  (TAG1)  and  the  -g
1461           (TAG2)  file.  By default, the PL tag is used in the query file and
1462           GT in the -g file when available.
1463
1464       Examples:
1465
1466              # Check discordance of all samples from B against all sample in A
1467              bcftools gtcheck -g A.bcf B.bcf
1468
1469              # Limit comparisons to the fiven list of samples
1470              bcftools gtcheck -s gt:a1,a2,a3 -s qry:b1,b2 -g A.bcf B.bcf
1471
1472              # Compare only two pairs a1,b1 and a1,b2
1473              bcftools gtcheck -p a1,b1,a1,b2 -g A.bcf B.bcf
1474
1475   bcftools index [OPTIONS]  in.bcf|in.vcf.gz
1476       Creates index for bgzip compressed VCF/BCF files for random access. CSI
1477       (coordinate-sorted  index)  is  created  by  default.  The  CSI  format
1478       supports indexing of chromosomes up  to  length  2&#94;31.  TBI  (tabix
1479       index)  index  files,  which support chromosome lengths up to 2&#94;29,
1480       can be created by using the -t/--tbi option or using the tabix  program
1481       packaged with htslib. When loading an index file, bcftools will try the
1482       CSI first and then the TBI.
1483
1484   Indexing options:
1485       -c, --csi
1486           generate CSI-format index for VCF/BCF files [default]
1487
1488       -f, --force
1489           overwrite index if it already exists
1490
1491       -m, --min-shift INT
1492           set minimal interval size for CSI indices to 2&#94;INT; default: 14
1493
1494       -o, --output FILE
1495           output file name. If not set, then the index will be created  using
1496           the input file name plus a .csi or .tbi extension
1497
1498       -t, --tbi
1499           generate TBI-format index for VCF files
1500
1501       --threads INT
1502           see Common Options
1503
1504   Stats options:
1505       -n, --nrecords
1506           print the number of records based on the CSI or TBI index files
1507
1508       -s, --stats
1509           Print  per contig stats based on the CSI or TBI index files. Output
1510           format is three tab-delimited  columns  listing  the  contig  name,
1511           contig  length (. if unknown) and number of records for the contig.
1512           Contigs with zero records are not printed.
1513
1514   bcftools isec [OPTIONS]  A.vcf.gz B.vcf.gz [...]
1515       Creates intersections, unions and complements of VCF  files.  Depending
1516       on the options, the program can output records from one (or more) files
1517       which have (or  do  not  have)  corresponding  records  with  the  same
1518       position in the other files.
1519
1520       -c, --collapse snps|indels|both|all|some|none
1521           see Common Options
1522
1523       -C, --complement
1524           output  positions present only in the first file but missing in the
1525           others
1526
1527       -e, --exclude -|EXPRESSION
1528           exclude sites for which EXPRESSION is true. If -e (or  -i)  appears
1529           only  once,  the  same  filtering expression will be applied to all
1530           input files. Otherwise, -e or -i must be given for each input file.
1531           To  indicate  that  no filtering should be performed on a file, use
1532           "-" in place of EXPRESSION, as shown  in  the  example  below.  For
1533           valid expressions see EXPRESSIONS.
1534
1535       -f, --apply-filters LIST
1536           see Common Options
1537
1538       -i, --include EXPRESSION
1539           include  only sites for which EXPRESSION is true. See discussion of
1540           -e, --exclude above.
1541
1542       -n, --nfiles [+-=]INT|~BITMAP
1543           output positions present in this many (=), this many or  more  (+),
1544           this many or fewer (-), or the exact same (~) files
1545
1546       -o, --output FILE
1547           see  Common  Options.  When  several  files are being output, their
1548           names are controlled via -p instead.
1549
1550       -O, --output-type b|u|z|v
1551           see Common Options
1552
1553       -p, --prefix DIR
1554           if given, subset each of the input files accordingly. See also -w.
1555
1556       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
1557           see Common Options
1558
1559       -R, --regions-file file
1560           see Common Options
1561
1562       -t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
1563           see Common Options
1564
1565       -T, --targets-file file
1566           see Common Options
1567
1568       -w, --write LIST
1569           list of input files to output given as 1-based indices. With -p and
1570           no -w, all files are written.
1571
1572   Examples:
1573       Create  intersection  and  complements of two sets saving the output in
1574       dir/*
1575
1576               bcftools isec -p dir A.vcf.gz B.vcf.gz
1577
1578       Filter sites in A (require INFO/MAF>=0.01) and B  (require  INFO/dbSNP)
1579       but  not  in  C, and create an intersection, including only sites which
1580       appear in at least two of the files after filters have been applied
1581
1582               bcftools isec -e'MAF<0.01' -i'dbSNP=1' -e- A.vcf.gz B.vcf.gz C.vcf.gz -n +2 -p dir
1583
1584       Extract and write records from A shared by both A  and  B  using  exact
1585       allele match
1586
1587               bcftools isec -p dir -n=2 -w1 A.vcf.gz B.vcf.gz
1588
1589       Extract records private to A or B comparing by position only
1590
1591               bcftools isec -p dir -n-1 -c all A.vcf.gz B.vcf.gz
1592
1593       Print a list of records which are present in A and B but not in C and D
1594
1595               bcftools isec -n~1100 -c all A.vcf.gz B.vcf.gz C.vcf.gz D.vcf.gz
1596
1597   bcftools merge [OPTIONS] A.vcf.gz B.vcf.gz [...]
1598       Merge multiple VCF/BCF files from non-overlapping sample sets to create
1599       one  multi-sample  file.  For  example,  when  merging  file   A.vcf.gz
1600       containing  samples  S1, S2 and S3 and file B.vcf.gz containing samples
1601       S3 and S4, the output file will contain five samples named S1, S2,  S3,
1602       2:S3 and S4.
1603
1604       Note  that  it  is responsibility of the user to ensure that the sample
1605       names are unique across all files. If they are not,  the  program  will
1606       exit  with  an  error  unless  the option --force-samples is given. The
1607       sample names can be also given explicitly using the --print-header  and
1608       --use-header options.
1609
1610       Note  that  only records from different files can be merged, never from
1611       the same file. For "vertical" merge take a look at bcftools  concat  or
1612       bcftools norm -m instead.
1613
1614       --force-samples
1615           if  the  merged  files  contain  duplicate  samples  names, proceed
1616           anyway. Duplicate sample names will be resolved by  prepending  the
1617           index  of  the  file  as  it  appeared  on  the command line to the
1618           conflicting sample name (see 2:S3 in the above example).
1619
1620       --print-header
1621           print only merged header and exit
1622
1623       --use-header FILE
1624           use the VCF header in the provided text FILE
1625
1626       -0  --missing-to-ref
1627           assume genotypes at missing sites are 0/0
1628
1629       -f, --apply-filters LIST
1630           see Common Options
1631
1632       -F, --filter-logic x|+
1633           Set the output record to PASS if any of the inputs is PASS (x),  or
1634           apply all filters (+), which is the default.
1635
1636       -g, --gvcf -|FILE
1637           merge gVCF blocks, INFO/END tag is expected. If the reference fasta
1638           file FILE is not given and the dash (-) is given, unknown reference
1639           bases  generated at gVCF block splits will be substituted with N’s.
1640           The --gvcf  option  uses  the  following  default  INFO  rules:  -i
1641           QS:sum,MinDP:min,I16:sum,IDV:max,IMF:max.
1642
1643       -i, --info-rules -|TAG:METHOD[,...]
1644           Rules  for merging INFO fields (scalars or vectors) or - to disable
1645           the default rules. METHOD is one  of  sum,  avg,  min,  max,  join.
1646           Default is DP:sum,DP4:sum if these fields exist in the input files.
1647           Fields with no specified rule will take the value  from  the  first
1648           input  file. The merged QUAL value is currently set to the maximum.
1649           This behaviour is not user controllable at the moment.
1650
1651       -l, --file-list FILE
1652           Read file names from FILE, one file name per line.
1653
1654       -L, --local-alleles INT
1655           Sites with many  alternate  alleles  can  require  extremely  large
1656           storage  space which can exceed the 2GB size limit representable by
1657           BCF. This is caused by Number=G  tags  (such  as  FORMAT/PL)  which
1658           store  a  value  for  each  combination  of reference and alternate
1659           alleles. The -L, --local-alleles option allows to replace such tags
1660           with  a  localized tag (FORMAT/LPL) which only includes a subset of
1661           alternate alleles relevant for that sample. A new FORMAT/LAA tag is
1662           added which lists 1-based indices of the alternate alleles relevant
1663           (local) for the current sample. The number INT  gives  the  maximum
1664           number of alternate alleles that can be included in the PL tag. The
1665           default value is 0 which disables the feature  and  outputs  values
1666           for all alternate alleles.
1667
1668       -m, --merge snps|indels|both|all|none|id
1669           The  option  controls  what  types  of  multiallelic records can be
1670           created:
1671
1672           -m none   .. no new multiallelics, output multiple records instead
1673           -m snps   .. allow multiallelic SNP records
1674           -m indels .. allow multiallelic indel records
1675           -m both   .. both SNP and indel records can be multiallelic
1676           -m all    .. SNP records can be merged with indel records
1677           -m id     .. merge by ID
1678
1679       --no-index
1680           the option allows to merge files without indexing  them  first.  In
1681           order  for this option to work, the user must ensure that the input
1682           files have chromosomes in the same order and  consistent  with  the
1683           order of sequences in the VCF header.
1684
1685       --no-version
1686           see Common Options
1687
1688       -o, --output FILE
1689           see Common Options
1690
1691       -O, --output-type b|u|z|v
1692           see Common Options
1693
1694       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
1695           see Common Options
1696
1697       -R, --regions-file file
1698           see Common Options
1699
1700       --threads INT
1701           see Common Options
1702
1703   bcftools mpileup [OPTIONS] -f ref.fa in.bam [in2.bam [...]]
1704       Generate VCF or BCF containing genotype likelihoods for one or multiple
1705       alignment (BAM or CRAM) files. This is based on the  original  samtools
1706       mpileup  command  (with  the  -v  or  -g  options)  producing  genotype
1707       likelihoods in VCF or BCF format, but not the  textual  pileup  output.
1708       The  mpileup  command  was  transferred  to  bcftools in order to avoid
1709       errors resulting from use of  incompatible  versions  of  samtools  and
1710       bcftools when using in the mpileup+bcftools call pipeline.
1711
1712       Individuals  are  identified  from the SM tags in the @RG header lines.
1713       Multiple individuals can be pooled in  one  alignment  file,  also  one
1714       individual  can be separated into multiple files. If sample identifiers
1715       are absent, each input file is regarded as one sample.
1716
1717       Note that there are two orthogonal ways to  specify  locations  in  the
1718       input  file;  via  -r  region  and  -t  positions. The former uses (and
1719       requires) an index to do random access while the latter streams through
1720       the  file  contents  filtering  out the specified regions, requiring no
1721       index. The two may be used in  conjunction.  For  example  a  BED  file
1722       containing locations of genes in chromosome 20 could be specified using
1723       -r 20 -t chr20.bed, meaning that the index is used to  find  chromosome
1724       20 and then it is filtered for the regions listed in the BED file. Also
1725       note that the -r option can be much slower than -t  with  many  regions
1726       and  can  require  more memory when multiple regions and many alignment
1727       files are processed.
1728
1729   Input options
1730       -6, --illumina1.3+
1731           Assume the quality is in the Illumina 1.3+ encoding.
1732
1733       -A, --count-orphans
1734           Do not skip anomalous read pairs in variant calling.
1735
1736       -b, --bam-list FILE
1737           List of input alignment files, one file per line [null]
1738
1739       -B, --no-BAQ
1740           Disable probabilistic  realignment  for  the  computation  of  base
1741           alignment  quality  (BAQ). BAQ is the Phred-scaled probability of a
1742           read base being misaligned. Applying this option greatly  helps  to
1743           reduce false SNPs caused by misalignments.
1744
1745       -C, --adjust-MQ INT
1746           Coefficient   for  downgrading mapping quality for reads containing
1747           excessive mismatches. Given a read with a phred-scaled  probability
1748           q  of  being  generated  from  the mapped position, the new mapping
1749           quality is about sqrt((INT-q)/INT)*INT. A zero value (the  default)
1750           disables this functionality.
1751
1752       -D, --full-BAQ
1753           Run  the  BAQ algorithm on all reads, not just those in problematic
1754           regions. This matches the behaviour for Bcftools 1.12 and earlier.
1755
1756           By default mpileup uses heuristics to decide when to apply the  BAQ
1757           algorithm.  Most  sequences  will not be BAQ adjusted, giving a CPU
1758           time closer to --no-BAQ, but it will still be  applied  in  regions
1759           with suspected problematic alignments. This has been tested to work
1760           well on single sample data with  even  allele  frequency,  but  the
1761           reliability  is unknown for multi-sample calling and for low allele
1762           frequency variants so  full  BAQ  is  still  recommended  in  those
1763           scenarios.
1764
1765       -d, --max-depth INT
1766           At  a  position, read maximally INT reads per input file. Note that
1767           the original samtools mpileup command had a minimum value of 8000/n
1768           where  n was the number of input files given to mpileup. This means
1769           that in samtools mpileup  the  default  was  highly  likely  to  be
1770           increased and the -d parameter would have an effect only once above
1771           the cross-sample minimum of 8000.  This  behavior  was  problematic
1772           when  working  with a combination of single- and multi-sample bams,
1773           therefore in bcftools mpileup the user is given  the  full  control
1774           (and responsibility), and an informative message is printed instead
1775           [250]
1776
1777       -E, --redo-BAQ
1778           Recalculate BAQ on the fly, ignore existing BQ tags
1779
1780       -f, --fasta-ref FILE
1781           The faidx-indexed reference file in the FASTA format. The file  can
1782           be optionally compressed by bgzip. Reference is required by default
1783           unless the --no-reference option is set [null]
1784
1785       --no-reference
1786           Do not require the --fasta-ref option.
1787
1788       -G, --read-groups FILE
1789           list of read groups to include or exclude if prefixed with "&#94;".
1790           One  read  group per line. This file can also be used to assign new
1791           sample names to read groups by giving the  new  sample  name  as  a
1792           second   white-space-separated  field,  like  this:  "read_group_id
1793           new_sample_name". If the read group name is not  unique,  also  the
1794           bam   file   name   can   be   included:  "read_group_id  file_name
1795           sample_name". If all  reads  from  the  alignment  file  should  be
1796           treated  as  a  single  sample, the asterisk symbol can be used: "*
1797           file_name sample_name". Alignments without a read group ID  can  be
1798           matched  with  "?". NOTE: The meaning of bcftools mpileup -G is the
1799           opposite of samtools mpileup -G.
1800
1801               RG_ID_1
1802               RG_ID_2  SAMPLE_A
1803               RG_ID_3  SAMPLE_A
1804               RG_ID_4  SAMPLE_B
1805               RG_ID_5  FILE_1.bam  SAMPLE_A
1806               RG_ID_6  FILE_2.bam  SAMPLE_A
1807               *        FILE_3.bam  SAMPLE_C
1808               ? FILE_3.bam  SAMPLE_D
1809
1810       -q, -min-MQ INT
1811           Minimum mapping quality for an alignment to be used [0]
1812
1813       -Q, --min-BQ INT
1814           Minimum base quality for a base to be considered [13]
1815
1816       *    --max-BQ* INT
1817           Caps the base  quality  to  a  maximum  value  [60].  This  can  be
1818           particularly  useful on technologies that produce overly optimistic
1819           high qualities, leading to too many false  positives  or  incorrect
1820           genotype assignments.
1821
1822       -r, --regions CHR|CHR:POS|CHR:FROM-TO|CHR:FROM-[,...]
1823           Only  generate  mpileup  output  in  given  regions.  Requires  the
1824           alignment files to be indexed. If used in conjunction with -l  then
1825           considers the intersection; see Common Options
1826
1827       -R, --regions-file FILE
1828           As  for  -r,  --regions,  but  regions  read  from FILE; see Common
1829           Options
1830
1831       --ignore-RG
1832           Ignore RG tags. Treat all  reads  in  one  alignment  file  as  one
1833           sample.
1834
1835       --rf, --incl-flags STR|INT
1836           Required flags: skip reads with mask bits unset  [null]
1837
1838       --ff, --excl-flags STR|INT
1839           Filter     flags:     skip     reads    with    mask    bits    set
1840           [UNMAP,SECONDARY,QCFAIL,DUP]
1841
1842       -s, --samples LIST
1843           list of sample names. See Common Options
1844
1845       -S, --samples-file FILE
1846           file of sample  names  to  include  or  exclude  if  prefixed  with
1847           "&#94;".  One sample per line. This file can also be used to rename
1848           samples   by   giving   the   new   sample   name   as   a   second
1849           white-space-separated  column, like this: "old_name new_name". If a
1850           sample name contains spaces, the spaces can be  escaped  using  the
1851           backslash character, for example "Not\ a\ good\ sample\ name".
1852
1853       -t, --targets LIST
1854           see Common Options
1855
1856       -T, --targets-file FILE
1857           see Common Options
1858
1859       -x, --ignore-overlaps
1860           Disable read-pair overlap detection.
1861
1862       --seed INT
1863           Set the random number seed used when sub-sampling deep regions [0].
1864
1865   Output options
1866       -a, --annotate LIST
1867           Comma-separated   list   of   FORMAT   and  INFO  tags  to  output.
1868           (case-insensitive, the "FORMAT/" prefix is optional, and use "?" to
1869           list available annotations on the command line) [null]:
1870
1871           FORMAT/AD   .. Allelic depth (Number=R,Type=Integer)
1872           FORMAT/ADF  .. Allelic depths on the forward strand (Number=R,Type=Integer)
1873           FORMAT/ADR  .. Allelic depths on the reverse strand (Number=R,Type=Integer)
1874           FORMAT/DP   .. Number of high-quality bases (Number=1,Type=Integer)
1875           FORMAT/SP   .. Phred-scaled strand bias P-value (Number=1,Type=Integer)
1876           FORMAT/SCR  .. Number of soft-clipped reads (Number=1,Type=Integer)
1877
1878           INFO/AD     .. Total allelic depth (Number=R,Type=Integer)
1879           INFO/ADF    .. Total allelic depths on the forward strand (Number=R,Type=Integer)
1880           INFO/ADR    .. Total allelic depths on the reverse strand (Number=R,Type=Integer)
1881           INFO/SCR    .. Number of soft-clipped reads (Number=1,Type=Integer)
1882
1883           FORMAT/DV   .. Deprecated in favor of FORMAT/AD; Number of high-quality non-reference bases, (Number=1,Type=Integer)
1884           FORMAT/DP4  .. Deprecated in favor of FORMAT/ADF and FORMAT/ADR; Number of high-quality ref-forward, ref-reverse,
1885                          alt-forward and alt-reverse bases (Number=4,Type=Integer)
1886           FORMAT/DPR  .. Deprecated in favor of FORMAT/AD; Number of high-quality bases for each observed allele (Number=R,Type=Integer)
1887           INFO/DPR    .. Deprecated in favor of INFO/AD; Number of high-quality bases for each observed allele (Number=R,Type=Integer)
1888
1889       -g, --gvcf INT[,...]
1890           output  gVCF blocks of homozygous REF calls, with depth (DP) ranges
1891           specified by the list of integers. For example, passing  5,15  will
1892           group  sites  into two types of gVCF blocks, the first with minimum
1893           per-sample DP from the interval [5,15) and the latter with  minimum
1894           depth  15  or  more. In this example, sites with minimum per-sample
1895           depth less than 5 will be printed as separate records,  outside  of
1896           gVCF blocks.
1897
1898       --no-version
1899           see Common Options
1900
1901       -o, --output FILE
1902           Write  output  to FILE, rather than the default of standard output.
1903           (The same short option is used for both --open-prob  and  --output.
1904           If  -o’s  argument  contains  any non-digit characters other than a
1905           leading + or - sign,  it  is  interpreted  as --output. Usually the
1906           filename  extension  will  take  care  of  this, but to write to an
1907           entirely numeric filename use -o ./123 or --output 123.)
1908
1909       -O, --output-type b|u|z|v
1910           see Common Options
1911
1912       --threads INT
1913           see Common Options
1914
1915       -U, --mwu-u
1916           The the previous Mann-Whitney U test score from  version  1.12  and
1917           earlier.  This  is  a  probability  score, but importantly it folds
1918           probabilities above or below the desired score into the same P. The
1919           new  Mann-Whitney U test score is a "Z score", expressing the score
1920           as the number of standard deviations away from the mean (with  zero
1921           being  matching  the  mean).  It  keeps  both positive and negative
1922           values. This can be important  for  some  tests  where  errors  are
1923           asymmetric.
1924
1925               This option changes the INFO field names produced back to the ones
1926               used by the earlier Bcftools releases. For excample BQBZ becomes
1927               BQB.
1928
1929   Options for SNP/INDEL genotype likelihood computation
1930       -X, --config STR
1931           Specify  a  platform  specific  configuration  profile. The profile
1932           should be one  of  1.12,  illumina,  ont  or  pacbio-ccs.  Settings
1933           applied are as follows:
1934
1935               1.12           -Q13 -h100 -m1
1936               illumina       [ default values ]
1937               ont                   -B -Q5 --max-BQ 30 -I
1938               pacbio-ccs     -D -Q5 --max-BQ 50 -F0.1 -o25 -e1 -M99999
1939
1940       --ar, --ambig-reads drop|incAD|incAD0
1941           What  to  do  with ambiguous indel reads that do not span an entire
1942           short tandem repeat region: discard ambiguous  reads  from  calling
1943           and do not increment high-quality AD depth counters (drop), exclude
1944           from calling but  increment  AD  counters  proportionally  (incAD),
1945           exclude  from  calling  and  increment  the  first  value of the AD
1946           counter (incAD0) [drop]
1947
1948       -e, --ext-prob INT
1949           Phred-scaled gap extension sequencing error  probability.  Reducing
1950           INT leads to longer indels [20]
1951
1952       -F, --gap-frac FLOAT
1953           Minimum fraction of gapped reads [0.002]
1954
1955       -h, --tandem-qual INT
1956           Coefficient  for  modeling  homopolymer  errors.  Given  an  l-long
1957           homopolymer run, the sequencing error of an  indel  of  size  s  is
1958           modeled  as  INT*s/l  [500] Increasing this informs the caller that
1959           indels in long homopolymers are more likely genuine and less likely
1960           to  be sequencing artifacts. Hence increasing tandem-qual will have
1961           higher recall and lower precision. Bcftools 1.12 and earlier had  a
1962           default   of   100,   which  was  tuned  around  more  error  prone
1963           instruments. Note changing this may have  a  minor  impact  on  SNP
1964           calling too. For maximum SNP calling accuracy, it may be preferable
1965           to adjust this lower again, although  this  will  adversely  affect
1966           indels.
1967
1968       --indel-bias FLOAT
1969           Skews   the   indel   scores   up  or  down,  trading  recall  (low
1970           false-negative)  vs  precision  (low  false-positive)   [1.0].   In
1971           Bcftools  1.12  and earlier this parameter didn’t exist, but had an
1972           implied value of 1.0. If you are planning to do heavy filtering  of
1973           variants, selecting the best quality ones only (favouring precision
1974           over recall), it is advisable to set  this  lower  (such  as  0.75)
1975           while  higher  depth  samples or where you favour recall rates over
1976           precision may work better with a higher value such as 2.0.
1977
1978       -I, --skip-indels
1979           Do not perform INDEL calling
1980
1981       -L, --max-idepth INT
1982           Skip INDEL calling if the average per-sample  depth  is  above  INT
1983           [250]
1984
1985       -m, --min-ireads INT
1986           Minimum number gapped reads for indel candidates INT [1]
1987
1988       -M, --max-read-len INT
1989           The  maximum  read  length  permitted  by  the BAQ algorithm [500].
1990           Variants are still called on longer reads, but  they  will  not  be
1991           passed  through  the  BAQ  method.  This  limit  exists  to prevent
1992           excessively long BAQ times and high memory usage. Note  if  partial
1993           BAQ  is enabled with -D then raising this parameter will likely not
1994           have a significant a CPU cost.
1995
1996       -o, --open-prob INT
1997           Phred-scaled gap open sequencing error  probability.  Reducing  INT
1998           leads  to more indel calls. (The same short option is used for both
1999           --open-prob and --output.  When  -o’s  argument  contains  only  an
2000           optional  +  or  -  sign  followed  by  the  digits  0  to 9, it is
2001           interpreted  as --open-prob.) [40]
2002
2003       -p, --per-sample-mF
2004           Apply -m and -F thresholds per sample to  increase  sensitivity  of
2005           calling.  By  default both options are applied to reads pooled from
2006           all samples.
2007
2008       -P, --platforms STR
2009           Comma-delimited  list  of  platforms (determined  by  @RG-PL)  from
2010           which  indel  candidates are obtained. It is recommended to collect
2011           indel candidates from sequencing technologies that have  low  indel
2012           error rate such as ILLUMINA [all]
2013
2014   Examples:
2015       Call  SNPs and short INDELs, then mark low quality sites and sites with
2016       the read depth exceeding a limit. (The read depth should be adjusted to
2017       about  twice  the  average  read  depth  as  higher read depths usually
2018       indicate problematic regions which are often enriched  for  artefacts.)
2019       One  may  consider  to  add  -C50  to  mpileup  if  mapping  quality is
2020       overestimated  for reads  containing   excessive  mismatches.  Applying
2021       this  option  usually  helps  for BWA-backtrack alignments, but may not
2022       other aligners.
2023
2024               bcftools mpileup -Ou -f ref.fa aln.bam | \
2025               bcftools call -Ou -mv | \
2026               bcftools filter -s LowQual -e '%QUAL<20 || DP>100' > var.flt.vcf
2027
2028   bcftools norm [OPTIONS] file.vcf.gz
2029       Left-align and  normalize  indels,  check  if  REF  alleles  match  the
2030       reference,   split  multiallelic  sites  into  multiple  rows;  recover
2031       multiallelics from multiple rows. Left-alignment and normalization will
2032       only be applied if the --fasta-ref option is supplied.
2033
2034       -a, --atomize
2035           Decompose  complex variants, e.g. split MNVs into consecutive SNVs.
2036           See also --atom-overlaps and --old-rec-tag.
2037
2038       --atom-overlaps .|*
2039           Alleles missing because of an overlapping variant can be set either
2040           to  missing (.) or to the star alele (*), as recommended by the VCF
2041           specification. IMPORTANT: Note that asterisk is expaneded by  shell
2042           and must be put in quotes or escaped by a backslash:
2043
2044               # Before atomization:
2045               100  CC  C,GG   1/2
2046
2047               # After:
2048               #   bcftools norm -a .
2049               100         C         G      ./1
2050               100         CC         C      1/.
2051               101         C         G      ./1
2052
2053               # After:
2054               #   bcftools norm -a '*'
2055               #   bcftools norm -a \*
2056               100         C         G,*    2/1
2057               100         CC         C,*    1/2
2058               101         C         G,*    2/1
2059
2060       -c, --check-ref e|w|x|s
2061           what  to  do  when  incorrect or missing REF allele is encountered:
2062           exit (e), warn (w), exclude (x), or set/fix (s) bad  sites.  The  w
2063           option  can  be combined with x and s. Note that s can swap alleles
2064           and will update genotypes (GT) and AC counts, but will not  attempt
2065           to  fix  PL or other fields. Also note, and this cannot be stressed
2066           enough, that s will NOT fix strand issues in your VCF, do  NOT  use
2067           it for that purpose!!! (Instead see <http://samtools.github.io/
2068           bcftools/howtos/plugin.af-dist.html> and <<http://
2069           samtools.github.io/bcftools/howtos/plugin.fixref.html>.>)
2070
2071       -d, --rm-dup snps|indels|both|all|exact
2072           If  a  record  is  present  multiple  times,  output only the first
2073           instance. See also --collapse in Common Options.
2074
2075       -D, --remove-duplicates
2076           If a record is present in multiple files,  output  only  the  first
2077           instance. Alias for -d none, deprecated.
2078
2079       -f, --fasta-ref FILE
2080           reference   sequence.   Supplying   this   option   will   turn  on
2081           left-alignment   and   normalization,   however,   see   also   the
2082           --do-not-normalize option below.
2083
2084       --force
2085           try  to  proceed  with  -m-  even  if malformed tags with incorrect
2086           number  of  fields   are   encountered,   discarding   such   tags.
2087           (Experimental, use at your own risk.)
2088
2089       --keep-sum TAG[,...]
2090           keep vector sum constant when splitting multiallelic sites. Only AD
2091           tag is currently supported. See also <https://github.com/samtools/
2092           bcftools/issues/360>
2093
2094       -m, --multiallelics -|+[snps|indels|both|any]
2095           split  multiallelic  sites  into  biallelic  records  (-)  or  join
2096           biallelic sites into multiallelic records  (+).  An  optional  type
2097           string  can  follow  which  controls  variant types which should be
2098           split or merged together: If only SNP records should  be  split  or
2099           merged,  specify  snps;  if  both  SNPs and indels should be merged
2100           separately into two records,  specify  both;  if  SNPs  and  indels
2101           should be merged into a single record, specify any.
2102
2103       --no-version
2104           see Common Options
2105
2106       -N, --do-not-normalize
2107           the  -c  s option can be used to fix or set the REF allele from the
2108           reference -f. The -N option will not turn on indel normalisation as
2109           the -f option normally implies
2110
2111       --old-rec-tag STR
2112           Add INFO/STR annotation with the original record. The format of the
2113           annotation is CHROM|POS|REF|ALT|USED_ALT_IDX.
2114
2115       -o, --output FILE
2116           see Common Options
2117
2118       -O, --output-type b|u|z|v
2119           see Common Options
2120
2121       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
2122           see Common Options
2123
2124       -R, --regions-file file
2125           see Common Options
2126
2127       -s, --strict-filter
2128           when merging (-m+), merged site is PASS only  if  all  sites  being
2129           merged PASS
2130
2131       -t, --targets LIST
2132           see Common Options
2133
2134       -T, --targets-file FILE
2135           see Common Options
2136
2137       --threads INT
2138           see Common Options
2139
2140       -w, --site-win INT
2141           maximum  distance  between  two  records  to  consider when locally
2142           sorting variants which changed position during the realignment
2143
2144   bcftools [plugin NAME|+NAME] [OPTIONS] FILE —; [PLUGIN OPTIONS]
2145       A common framework for various utilities. The plugins can be  used  the
2146       same  way as normal commands only their name is prefixed with "+". Most
2147       plugins accept two types of parameters: general options shared  by  all
2148       plugins followed by a separator, and a list of plugin-specific options.
2149       There are some exceptions to this rule, some plugins do not accept  the
2150       common options and implement their own parameters. Therefore please pay
2151       attention to the usage examples that each plugin comes with.
2152
2153   VCF input options:
2154       -e, --exclude EXPRESSION
2155           exclude sites for which EXPRESSION is true. For  valid  expressions
2156           see EXPRESSIONS.
2157
2158       -i, --include EXPRESSION
2159           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
2160           expressions see EXPRESSIONS.
2161
2162       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
2163           see Common Options
2164
2165       -R, --regions-file file
2166           see Common Options
2167
2168       -t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
2169           see Common Options
2170
2171       -T, --targets-file file
2172           see Common Options
2173
2174   VCF output options:
2175       --no-version
2176           see Common Options
2177
2178       -o, --output FILE
2179           see Common Options
2180
2181       -O, --output-type b|u|z|v
2182           see Common Options
2183
2184       --threads INT
2185           see Common Options
2186
2187   Plugin options:
2188       -h, --help
2189           list plugin’s options
2190
2191       -l, --list-plugins
2192           List all available plugins.
2193
2194           By  default,  appropriate  system  directories  are  searched   for
2195           installed plugins.
2196               You   can   override   this  by  setting  the  BCFTOOLS_PLUGINS
2197           environment variable
2198               to a colon-separated list of directories to search.
2199               If BCFTOOLS_PLUGINS begins with a colon, ends with a colon,  or
2200           contains
2201               adjacent  colons,  the  system directories are also searched at
2202           that position
2203               in the list of directories.
2204
2205       -v, --verbose
2206           print debugging information to debug plugin failure
2207
2208       -V, --version
2209           print version string and exit
2210
2211   List of plugins coming with the distribution:
2212       ad-bias
2213           find positions with wildly varying  ALT  allele  frequency  (Fisher
2214           test on FMT/AD)
2215
2216       add-variantkey
2217           add VariantKey INFO fields VKX and RSX
2218
2219       af-dist
2220           collect AF deviation stats and GT probability distribution given AF
2221           and assuming HWE
2222
2223       allele-length
2224           count the frequency of the length of REF, ALT and REF+ALT
2225
2226       check-ploidy
2227           check if ploidy of samples is consistent for all sites
2228
2229       check-sparsity
2230           print samples without genotypes in a region or chromosome
2231
2232       color-chrs
2233           color shared chromosomal segments, requires trio  VCF  with  phased
2234           GTs
2235
2236       contrast
2237           runs  a basic association test, per-site or in a region, and checks
2238           for novel alleles and genotypes in two groups of samples. Adds  the
2239           following INFO annotations:
2240
2241           •   PASSOC    ..  Fisher’s  exact  test  probability  of  genotypic
2242               association (REF vs non-REF allele)
2243
2244           •   FASSOC  .. proportion of non-REF allele in controls and cases
2245
2246           •   NASSOC  .. number of  control-ref,  control-alt,  case-ref  and
2247               case-alt alleles
2248
2249           •   NOVELAL  ..  lists  samples with a novel allele not observed in
2250               the control group
2251
2252           •   NOVELGT .. lists samples with a novel genotype not observed  in
2253               the control group
2254
2255       counts
2256           a  minimal  plugin  which  counts number of SNPs, Indels, and total
2257           number of sites.
2258
2259       dosage
2260           print genotype dosage. By default the plugin searches  for  PL,  GL
2261           and GT, in that order.
2262
2263       fill-from-fasta
2264           fill INFO or REF field based on values in a fasta file
2265
2266       fill-tags
2267           set various INFO tags. The list of tags supported in this version:
2268
2269           •   INFO/AC          Number:A   Type:Integer   ..  Allele  count in
2270               genotypes
2271
2272           •   INFO/AC_Hom     Number:A  Type:Integer   ..  Allele  counts  in
2273               homozygous genotypes
2274
2275           •   INFO/AC_Het      Number:A   Type:Integer   ..  Allele counts in
2276               heterozygous genotypes
2277
2278           •   INFO/AC_Hemi    Number:A  Type:Integer   ..  Allele  counts  in
2279               hemizygous genotypes
2280
2281           •   INFO/AF         Number:A  Type:Float    .. Allele frequency
2282
2283           •   INFO/AN          Number:1   Type:Integer   ..  Total  number of
2284               alleles in called genotypes
2285
2286           •   INFO/ExcHet      Number:A    Type:Float      ..   Test   excess
2287               heterozygosity; 1=good, 0=bad
2288
2289           •   INFO/END         Number:1  Type:Integer  .. End position of the
2290               variant
2291
2292           •   INFO/F_MISSING  Number:1  Type:Float    .. Fraction of  missing
2293               genotypes
2294
2295           •   INFO/HWE           Number:A     Type:Float      ..   HWE   test
2296               (PMID:15789306); 1=good, 0=bad
2297
2298           •   INFO/MAF         Number:A   Type:Float      ..   Minor   Allele
2299               frequency
2300
2301           •   INFO/NS          Number:1   Type:Integer   .. Number of samples
2302               with data
2303
2304           •   INFO/TYPE        Number:.  Type:String    ..  The  record  type
2305               (REF,SNP,MNP,INDEL,etc)
2306
2307           •   FORMAT/VAF       Number:A   Type:Float     ..  The  fraction of
2308               reads with the alternate allele, requires FORMAT/AD or ADF+ADR
2309
2310           •   FORMAT/VAF1      Number:1   Type:Float     ..   The   same   as
2311               FORMAT/VAF but for all alternate alleles cumulatively
2312
2313           •   TAG=func(TAG)   Number:1  Type:Integer  .. Experimental support
2314               for user-defined expressions such as "DP=sum(DP)"
2315
2316       fix-ploidy
2317           sets correct ploidy
2318
2319       fixref
2320           determine and fix strand orientation
2321
2322       frameshifts
2323           annotate frameshift indels
2324
2325       GTisec
2326           count genotype intersections across all possible sample subsets  in
2327           a vcf file
2328
2329       GTsubset
2330           output only sites where the requested samples all exclusively share
2331           a genotype
2332
2333       guess-ploidy
2334           determine sample sex by checking genotype  likelihoods  (GL,PL)  or
2335           genotypes (GT) in the non-PAR region of chrX.
2336
2337       gvcfz
2338           compress  gVCF  file  by  resizing  non-variant blocks according to
2339           specified criteria
2340
2341       impute-info
2342           add imputation information metrics  to  the  INFO  field  based  on
2343           selected FORMAT tags
2344
2345       indel-stats
2346           calculates per-sample or de novo indels stats. The usage and format
2347           is similar to smpl-stats and trio-stats
2348
2349       isecGT
2350           compare two files and set non-identical genotypes to missing
2351
2352       mendelian
2353           count Mendelian consistent / inconsistent genotypes.
2354
2355       missing2ref
2356           sets missing genotypes ("./.") to ref allele ("0/0" or "0|0")
2357
2358       parental-origin
2359           determine parental origin of a CNV region
2360
2361       prune
2362           prune  sites  by   missingness,   allele   frequency   or   linkage
2363           disequilibrium.  Alternatively,  annotate sites with r2, Lewontin’s
2364           D' (PMID:19433632), Ragsdale’s D (PMID:31697386).
2365
2366       remove-overlaps
2367           remove overlapping variants and duplicate sites
2368
2369       scatter
2370           intended as an inverse to bcftools concat, scatter VCF by chunks or
2371           regions, creating multiple VCFs.
2372
2373       setGT
2374           general  tool  to set genotypes according to rules requested by the
2375           user
2376
2377       smpl-stats
2378           calculates basic per-sample stats. The usage and format is  similar
2379           to indel-stats and trio-stats.
2380
2381       split
2382           split VCF by sample, creating single- or multi-sample VCFs
2383
2384       split-vep
2385           extract fields from structured annotations such as INFO/CSQ created
2386           by bcftools/csq or VEP. These can be added as a new INFO  field  to
2387           the VCF or in a custom text format. See <http://samtools.github.io/
2388           bcftools/howtos/plugin.split-vep.html> for more.
2389
2390       tag2tag
2391           Convert between similar tags, such as GL,PL,GP or QR,QA,QS.
2392
2393       trio-dnm2
2394           screen variants for possible de-novo mutations in trios
2395
2396       trio-stats
2397           calculate transmission rate in trio children. The usage and  format
2398           is similar to indel-stats and smpl-stats.
2399
2400       trio-switch-rate
2401           calculate  phase switch rate in trio samples, children samples must
2402           have phased GTs
2403
2404       variantkey-hex
2405           generate unsorted VariantKey-RSid index files in hexadecimal format
2406
2407   Examples:
2408           # List options common to all plugins
2409           bcftools plugin
2410
2411           # List available plugins
2412           bcftools plugin -l
2413
2414           # Run a plugin
2415           bcftools plugin counts in.vcf
2416
2417           # Run a plugin using the abbreviated "+" notation
2418           bcftools +counts in.vcf
2419
2420           # Run a plugin from an explicit location
2421           bcftools +/path/to/counts.so in.vcf
2422
2423           # The input VCF can be streamed just like in other commands
2424           cat in.vcf | bcftools +counts
2425
2426           # Print usage information of plugin "dosage"
2427           bcftools +dosage -h
2428
2429           # Replace missing genotypes with 0/0
2430           bcftools +missing2ref in.vcf
2431
2432           # Replace missing genotypes with 0|0
2433           bcftools +missing2ref in.vcf -- -p
2434
2435   Plugins troubleshooting:
2436       Things to check if your plugin does not show up in the bcftools  plugin
2437       -l output:
2438
2439       •   Run with the -v option for verbose output: bcftools plugin -lv
2440
2441       •   Does  the environment variable BCFTOOLS_PLUGINS include the correct
2442           path?
2443
2444   Plugins API:
2445           // Short description used by 'bcftools plugin -l'
2446           const char *about(void);
2447
2448           // Longer description used by 'bcftools +name -h'
2449           const char *usage(void);
2450
2451           // Called once at startup, allows initialization of local variables.
2452           // Return 1 to suppress normal VCF/BCF header output, -1 on critical
2453           // errors, 0 otherwise.
2454           int init(int argc, char **argv, bcf_hdr_t *in_hdr, bcf_hdr_t *out_hdr);
2455
2456           // Called for each VCF record, return NULL to suppress the output
2457           bcf1_t *process(bcf1_t *rec);
2458
2459           // Called after all lines have been processed to clean up
2460           void destroy(void);
2461
2462   bcftools polysomy [OPTIONS] file.vcf.gz
2463       Detect  number  of  chromosomal  copies  in  VCFs  annotates  with  the
2464       Illumina’s  B-allele  frequency (BAF) values. Note that this command is
2465       not compiled in by default, see the section Optional  Compilation  with
2466       GSL in the INSTALL file for help.
2467
2468   General options:
2469       -o, --output-dir path
2470           output directory
2471
2472       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
2473           see Common Options
2474
2475       -R, --regions-file file
2476           see Common Options
2477
2478       -s, --sample string
2479           sample name
2480
2481       -t, --targets LIST
2482           see Common Options
2483
2484       -T, --targets-file FILE
2485           see Common Options
2486
2487       -v, --verbose
2488           verbose debugging output which gives hints about the thresholds and
2489           decisions made by the program.  Note  that  the  exact  output  can
2490           change between versions.
2491
2492   Algorithm options:
2493       -b, --peak-size float
2494           the  minimum  peak  size considered as a good match can be from the
2495           interval [0,1] where larger is stricter
2496
2497       -c, --cn-penalty float
2498           a penalty  for  increasing  copy  number  state.  How  this  works:
2499           multiple  peaks  are  always  a  better  fit  than  a  single peak,
2500           therefore the program prefers a single peak  (normal  copy  number)
2501           unless  the  absolute  deviation  of  the  multiple  peaks  fit  is
2502           significantly smaller. Here the meaning of "significant"  is  given
2503           by the float from the interval [0,1] where larger is stricter.
2504
2505       -f, --fit-th float
2506           threshold  for  goodness  of  fit  (normalized absolute deviation),
2507           smaller is stricter
2508
2509       -i, --include-aa
2510           include also the AA peak in CN2 and CN3  evaluation.  This  usually
2511           requires increasing -f.
2512
2513       -m, --min-fraction float
2514           minimum  distinguishable fraction of aberrant cells. The experience
2515           shows that trustworthy are estimates of 20% and more.
2516
2517       -p, --peak-symmetry float
2518           a heuristics to filter failed fits where the expected peak symmetry
2519           is  violated.  The  float  is from the interval [0,1] and larger is
2520           stricter
2521
2522   bcftools query [OPTIONS] file.vcf.gz [file.vcf.gz [...]]
2523       Extracts fields from VCF or BCF files and outputs them in  user-defined
2524       format.
2525
2526       -e, --exclude EXPRESSION
2527           exclude  sites  for which EXPRESSION is true. For valid expressions
2528           see EXPRESSIONS.
2529
2530       -f, --format FORMAT
2531           learn by example, see below
2532
2533       -H, --print-header
2534           print header
2535
2536       -i, --include EXPRESSION
2537           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
2538           expressions see EXPRESSIONS.
2539
2540       -l, --list-samples
2541           list sample names and exit
2542
2543       -o, --output FILE
2544           see Common Options
2545
2546       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
2547           see Common Options
2548
2549       -R, --regions-file file
2550           see Common Options
2551
2552       -s, --samples LIST
2553           see Common Options
2554
2555       -S, --samples-file FILE
2556           see Common Options
2557
2558       -t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
2559           see Common Options
2560
2561       -T, --targets-file file
2562           see Common Options
2563
2564       -u, --allow-undef-tags
2565           do  not  throw  an  error if there are undefined tags in the format
2566           string, print "." instead
2567
2568       -v, --vcf-list FILE
2569           process multiple VCFs listed in the file
2570
2571   Format:
2572           %CHROM          The CHROM column (similarly also other columns: POS, ID, REF, ALT, QUAL, FILTER)
2573           %END            End position of the REF allele
2574           %END0           End position of the REF allele in 0-based coordinates
2575           %FIRST_ALT      Alias for %ALT{0}
2576           %FORMAT         Prints all FORMAT fields or a subset of samples with -s or -S
2577           %GT             Genotype (e.g. 0/1)
2578           %INFO           Prints the whole INFO column
2579           %INFO/TAG       Any tag in the INFO column
2580           %IUPACGT        Genotype translated to IUPAC ambiguity codes (e.g. M instead of C/A)
2581           %LINE           Prints the whole line
2582           %MASK           Indicates presence of the site in other files (with multiple files)
2583           %N_PASS(expr)   Number of samples that pass the filtering expression (see *<<expressions,EXPRESSIONS>>*)
2584           %POS0           POS in 0-based coordinates
2585           %PBINOM(TAG)    Calculate phred-scaled binomial probability, the allele index is determined from GT
2586           %SAMPLE         Sample name
2587           %TAG{INT}       Curly brackets to print a subfield (e.g. INFO/TAG{1}, the indexes are 0-based)
2588           %TBCSQ          Translated FORMAT/BCSQ. See the csq command above for explanation and examples.
2589           %TGT            Translated genotype (e.g. C/A)
2590           %TYPE           Variant type (REF, SNP, MNP, INDEL, BND, OTHER)
2591           []              Format fields must be enclosed in brackets to loop over all samples
2592           \n              new line
2593           \t              tab character
2594
2595           Everything else is printed verbatim.
2596
2597   Examples:
2598           # Print chromosome, position, ref allele and the first alternate allele
2599           bcftools query -f '%CHROM  %POS  %REF  %ALT{0}\n' file.vcf.gz
2600
2601           # Similar to above, but use tabs instead of spaces, add sample name and genotype
2602           bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%SAMPLE=%GT]\n' file.vcf.gz
2603
2604           # Print FORMAT/GT fields followed by FORMAT/GT fields
2605           bcftools query -f 'GQ:[ %GQ] \t GT:[ %GT]\n' file.vcf
2606
2607           # Make a BED file: chr, pos (0-based), end pos (1-based), id
2608           bcftools query -f'%CHROM\t%POS0\t%END\t%ID\n' file.bcf
2609
2610           # Print only samples with alternate (non-reference) genotypes
2611           bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i'GT="alt"' file.bcf
2612
2613           # Print all samples at sites with at least one alternate genotype
2614           bcftools view -i'GT="alt"' file.bcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]'
2615
2616           # Print phred-scaled binomial probability from FORMAT/AD tag for all heterozygous genotypes
2617           bcftools query -i'GT="het"' -f'[%CHROM:%POS %SAMPLE %GT %PBINOM(AD)\n]' file.vcf
2618
2619           # Print the second value of AC field if bigger than 10. Note the (unfortunate) difference in
2620           # index subscript notation: formatting expressions (-f) uses "{}" while filtering expressions
2621           # (-i) use "[]". This is for historic reasons and backward-compatibility.
2622           bcftools query -f '%AC{1}\n' -i 'AC[1]>10' file.vcf.gz
2623
2624   bcftools reheader [OPTIONS] file.vcf.gz
2625       Modify header of VCF/BCF files, change sample names.
2626
2627       -f, --fai FILE
2628           add to the header contig names and their lengths from the  provided
2629           fasta  index  file (.fai). Lengths of existing contig lines will be
2630           updated and contig lines not  present  in  the  fai  file  will  be
2631           removed
2632
2633       -h, --header FILE
2634           new VCF header
2635
2636       -o, --output FILE
2637           see Common Options
2638
2639       -s, --samples FILE
2640           new  sample  names,  one  name  per line, in the same order as they
2641           appear in the VCF file. Alternatively, only samples which  need  to
2642           be  renamed  can be listed as "old_name new_name\n" pairs separated
2643           by whitespaces, each on a separate line. If a sample name  contains
2644           spaces,  the  spaces  can be escaped using the backslash character,
2645           for example "Not\ a\ good\ sample\ name".
2646
2647       -T, --temp-prefix PATH
2648           template for temporary file names, used with -f
2649
2650       --threads INT
2651           see Common Options
2652
2653   bcftools roh [OPTIONS] file.vcf.gz
2654       A program for detecting  runs  of  homo/autozygosity.  Only  bi-allelic
2655       sites are considered.
2656
2657   The HMM model:
2658           Notation:
2659             D  = Data, AZ = autozygosity, HW = Hardy-Weinberg (non-autozygosity),
2660             f  = non-ref allele frequency
2661
2662           Emission probabilities:
2663             oAZ = P_i(D|AZ) = (1-f)*P(D|RR) + f*P(D|AA)
2664             oHW = P_i(D|HW) = (1-f)^2 * P(D|RR) + f^2 * P(D|AA) + 2*f*(1-f)*P(D|RA)
2665
2666           Transition probabilities:
2667             tAZ = P(AZ|HW)  .. from HW to AZ, the -a parameter
2668             tHW = P(HW|AZ)  .. from AZ to HW, the -H parameter
2669
2670             ci  = P_i(C)  .. probability of cross-over at site i, from genetic map
2671             AZi = P_i(AZ) .. probability of site i being AZ/non-AZ, scaled so that AZi+HWi = 1
2672             HWi = P_i(HW)
2673
2674             P_{i+1}(AZ) = oAZ * max[(1 - tAZ * ci) * AZ{i-1} , tAZ * ci * (1-AZ{i-1})]
2675             P_{i+1}(HW) = oHW * max[(1 - tHW * ci) * (1-AZ{i-1}) , tHW * ci * AZ{i-1}]
2676
2677   General Options:
2678       --AF-dflt FLOAT
2679           in  case  allele frequency is not known, use the FLOAT. By default,
2680           sites where allele frequency cannot be determined,  or  is  0,  are
2681           skipped.
2682
2683       --AF-tag TAG
2684           use  the  specified  INFO  tag  TAG as an allele frequency estimate
2685           instead of the default AC and AN tags. Sites which do not have  TAG
2686           will be skipped.
2687
2688       --AF-file FILE
2689           Read  allele  frequencies  from a tab-delimited file containing the
2690           columns: CHROM\tPOS\tREF,ALT\tAF. The file can be  compressed  with
2691           bgzip  and  indexed  with  tabix  -s1  -b2 -e2. Sites which are not
2692           present in the FILE or have different reference or alternate allele
2693           will be skipped. Note that such a file can be easily created from a
2694           VCF using:
2695
2696               bcftools query -f'%CHROM\t%POS\t%REF,%ALT\t%INFO/TAG\n' file.vcf | bgzip -c > freqs.tab.gz
2697
2698       -b, --buffer-size INT[,INT]
2699           when the entire many-sample file cannot fit into memory, a  sliding
2700           buffer approach can be used. The first value is the number of sites
2701           to keep in memory. If negative, it is interpreted  as  the  maximum
2702           memory  to  use, in MB. The second, optional, value sets the number
2703           of overlapping sites. The default overlap is set to roughly  1%  of
2704           the buffer size.
2705
2706       -e, --estimate-AF FILE
2707           estimate  the allele frequency by recalculating INFO/AC and INFO/AN
2708           on the fly, using the specified TAG which can be  either  FORMAT/GT
2709           ("GT")  or  FORMAT/PL ("PL"). If TAG is not given, "GT" is assumed.
2710           Either all  samples  ("-")  or  samples  listed  in  FILE  will  be
2711           included.  For example, use "PL,-" to estimate AF from FORMAT/PL of
2712           all samples. If neither -e  nor  the  other  --AF-...  options  are
2713           given,  the  allele  frequency  is  estimated from AC and AN counts
2714           which are already present in the INFO field.
2715
2716       --exclude EXPRESSION
2717           exclude sites for which EXPRESSION is true. For  valid  expressions
2718           see EXPRESSIONS.
2719
2720       -G, --GTs-only FLOAT
2721           use  genotypes  (FORMAT/GT  fields)  ignoring  genotype likelihoods
2722           (FORMAT/PL), setting PL of unseen genotypes to FLOAT. Safe value to
2723           use is 30 to account for GT errors.
2724
2725       --include EXPRESSION
2726           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
2727           expressions see EXPRESSIONS.
2728
2729       -I, --skip-indels
2730           skip indels as their genotypes are usually enriched for errors
2731
2732       -m, --genetic-map FILE
2733           genetic map in the format required also by IMPUTE2. Only the  first
2734           and  third column are used (position and Genetic_Map(cM)). The FILE
2735           can be a single file or a file  mask,  where  string  "{CHROM}"  is
2736           replaced with chromosome name.
2737
2738       -M, --rec-rate FLOAT
2739           constant   recombination   rate   per   bp.   In  combination  with
2740           --genetic-map, the --rec-rate parameter is interpreted differently,
2741           as  FLOAT-fold  increase  of transition probabilities, which allows
2742           the  model  to  become  more  sensitive  yet  still   account   for
2743           recombination  hotspots.  Note that also the range of the values is
2744           therefore different in both cases: normally the parameter  will  be
2745           in  the  range (1e-3,1e-9) but with --genetic-map it will be in the
2746           range (10,1000).
2747
2748       -o, --output FILE
2749           Write output to the FILE, by  default  the  output  is  printed  on
2750           stdout
2751
2752       -O, --output-type s|r[z]
2753           Generate  per-site  output (s) or per-region output (r). By default
2754           both types are printed and the output is uncompressed. Add z for  a
2755           compressed output.
2756
2757       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
2758           see Common Options
2759
2760       -R, --regions-file file
2761           see Common Options
2762
2763       -s, --samples LIST
2764           see Common Options
2765
2766       -S, --samples-file FILE
2767           see Common Options
2768
2769       -t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
2770           see Common Options
2771
2772       -T, --targets-file file
2773           see Common Options
2774
2775   HMM Options:
2776       -a, --hw-to-az FLOAT
2777           P(AZ|HW)   transition   probability  from  AZ  (autozygous)  to  HW
2778           (Hardy-Weinberg) state
2779
2780       -H, --az-to-hw FLOAT
2781           P(HW|AZ) transition probability from HW to AZ state
2782
2783       -V, --viterbi-training FLOAT
2784           estimate HMM  parameters  using  Baum-Welch  algorithm,  using  the
2785           convergence threshold FLOAT, e.g. 1e-10 (experimental)
2786
2787   bcftools sort [OPTIONS] file.bcf
2788       -m, --max-mem FLOAT[kMG]
2789           Maximum memory to use. Approximate, affects the number of temporary
2790           files written to the disk. Note that if the command fails  at  this
2791           step  because  of  too  many  open  files, your system limit on the
2792           number of open files ("ulimit") may need to be increased.
2793
2794       -o, --output FILE
2795           see Common Options
2796
2797       -O, --output-type b|u|z|v
2798           see Common Options
2799
2800       -T, --temp-dir DIR
2801           Use this directory to store temporary files
2802
2803   bcftools stats [OPTIONS] A.vcf.gz [B.vcf.gz]
2804       Parses VCF or BCF and produces text file stats which  is  suitable  for
2805       machine  processing  and  can  be plotted using plot-vcfstats. When two
2806       files are given, the program generates separate stats for  intersection
2807       and  the  complements.  By  default only sites are compared, -s/-S must
2808       given to include also sample columns. When one VCF file is specified on
2809       the  command  line, then stats by non-reference allele frequency, depth
2810       distribution, stats by quality and per-sample counts, singleton  stats,
2811       etc.  are  printed.  When  two  VCF files are given, then stats such as
2812       concordance (Genotype concordance by  non-reference  allele  frequency,
2813       Genotype   concordance   by   sample,  Non-Reference  Discordance)  and
2814       correlation are  also  printed.  Per-site  discordance  (PSD)  is  also
2815       printed in --verbose mode.
2816
2817       --af-bins LIST|FILE
2818           comma separated list of allele frequency bins (e.g. 0.1,0.5,1) or a
2819           file  listing  the  allele  frequency  bins  one  per  line   (e.g.
2820           0.1\n0.5\n1)
2821
2822       --af-tag TAG
2823           allele frequency INFO tag to use for binning. By default the allele
2824           frequency is estimated from AC/AN, if available, or  directly  from
2825           the genotypes (GT) if not.
2826
2827       -1, --1st-allele-only
2828           consider only the 1st alternate allele at multiallelic sites
2829
2830       -c, --collapse snps|indels|both|all|some|none
2831           see Common Options
2832
2833       -d, --depth INT,INT,INT
2834           ranges of depth distribution: min, max, and size of the bin
2835
2836       --debug
2837           produce verbose per-site and per-sample output
2838
2839       -e, --exclude EXPRESSION
2840           exclude  sites  for which EXPRESSION is true. For valid expressions
2841           see EXPRESSIONS.
2842
2843       -E, --exons file.gz
2844           tab-delimited file with exons for indel frameshifts statistics. The
2845           columns  of  the  file  are CHR, FROM, TO, with 1-based, inclusive,
2846           positions. The file is BGZF-compressed and indexed with tabix
2847
2848               tabix -s1 -b2 -e3 file.gz
2849
2850       -f, --apply-filters LIST
2851           see Common Options
2852
2853       -F, --fasta-ref ref.fa
2854           faidx indexed reference sequence file to determine INDEL context
2855
2856       -i, --include EXPRESSION
2857           include  only  sites  for  which  EXPRESSION  is  true.  For  valid
2858           expressions see EXPRESSIONS.
2859
2860       -I, --split-by-ID
2861           collect  stats  separately  for  sites which have the ID column set
2862           ("known sites") or which do not have  the  ID  column  set  ("novel
2863           sites").
2864
2865       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
2866           see Common Options
2867
2868       -R, --regions-file file
2869           see Common Options
2870
2871       -s, --samples LIST
2872           see Common Options
2873
2874       -S, --samples-file FILE
2875           see Common Options
2876
2877       -t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
2878           see Common Options
2879
2880       -T, --targets-file file
2881           see Common Options
2882
2883       -u, --user-tstv <TAG[:min:max:n]>
2884           collect Ts/Tv stats for any tag using the given binning [0:1:100]
2885
2886       -v, --verbose
2887           produce verbose per-site and per-sample output
2888
2889   bcftools view [OPTIONS] file.vcf.gz [REGION [...]]
2890       View,  subset  and  filter  VCF  or BCF files by position and filtering
2891       expression. Convert between VCF and BCF. Former bcftools subset.
2892
2893   Output options
2894       -G, --drop-genotypes
2895           drop individual genotype information (after subsetting if -s option
2896           is set)
2897
2898       -h, --header-only
2899           output the VCF header only
2900
2901       -H, --no-header
2902           suppress the header in VCF output
2903
2904       -l, --compression-level [0-9]
2905           compression  level. 0 stands for uncompressed, 1 for best speed and
2906           9 for best compression.
2907
2908       --no-version
2909           see Common Options
2910
2911       -O, --output-type b|u|z|v
2912           see Common Options
2913
2914       -o, --output FILE:
2915           output file name. If not  present,  the  default  is  to  print  to
2916       standard output (stdout).
2917
2918       -r, --regions chr|chr:pos|chr:from-to|chr:from-[,...]
2919           see Common Options
2920
2921       -R, --regions-file file
2922           see Common Options
2923
2924       -t, --targets chr|chr:pos|chr:from-to|chr:from-[,...]
2925           see Common Options
2926
2927       -T, --targets-file file
2928           see Common Options
2929
2930       --threads INT
2931           see Common Options
2932
2933   Subset options:
2934       -a, --trim-alt-alleles
2935           remove alleles not seen in the genotype fields from the ALT column.
2936           Note that if no alternate allele remains after trimming, the record
2937           itself is not removed but ALT is set to ".". If the option -s or -S
2938           is given, removes alleles not seen in the subset. INFO  and  FORMAT
2939           tags declared as Type=A, G or R will be trimmed as well.
2940
2941       --force-samples
2942           only warn about unknown subset samples
2943
2944       -I, --no-update
2945           do  not (re)calculate INFO fields for the subset (currently INFO/AC
2946           and INFO/AN)
2947
2948       -s, --samples LIST
2949           see Common Options. Note that it is  possible  to  create  multiple
2950           subsets simultaneously using the split plugin.
2951
2952       -S, --samples-file FILE
2953           see  Common  Options.  Note  that it is possible to create multiple
2954           subsets simultaneously using the split plugin.
2955
2956   Filter options:
2957       Note that filter options below dealing  with  counting  the  number  of
2958       alleles will, for speed, first check for the values of AC and AN in the
2959       INFO column to avoid parsing all the genotype (FORMAT/GT) fields in the
2960       VCF. This means that a filter like --min-af 0.1 will be calculated from
2961       INFO/AC and INFO/AN when available or FORMAT/GT otherwise. However,  it
2962       will  not  attempt  to  use  any other existing field, like INFO/AF for
2963       example. For that, use --exclude AF<0.1 instead.
2964
2965       Also note that one must be careful when sample subsetting and filtering
2966       is  performed  in  a  single  command  because  the  order  of internal
2967       operations can influence the result. For example, the  -i/-e  filtering
2968       is  performed  before sample removal, but the -P filtering is performed
2969       after, and some are inherently ambiguous, for example allele counts can
2970       be  taken  from  the INFO column when present but calculated on the fly
2971       when absent. Therefore it is strongly  recommended  to  spell  out  the
2972       required  order  explicitly by separating such commands into two steps.
2973       (Make sure to use the -O u option when piping!)
2974
2975       -c, --min-ac INT[:nref|:alt1|:minor|:major|:'nonmajor']
2976           minimum allele count (INFO/AC) of sites to be  printed.  Specifying
2977           the  type  of  allele  is  optional and can be set to non-reference
2978           (nref, the default), 1st  alternate   (alt1),  the  least  frequent
2979           (minor),  the  most  frequent  (major)  or  sum of all but the most
2980           frequent (nonmajor) alleles.
2981
2982       -C, --max-ac INT[:nref|:alt1|:minor|:'major'|:'nonmajor']
2983           maximum allele count (INFO/AC) of sites to be  printed.  Specifying
2984           the  type  of  allele  is  optional and can be set to non-reference
2985           (nref, the default), 1st  alternate   (alt1),  the  least  frequent
2986           (minor),  the  most  frequent  (major)  or  sum of all but the most
2987           frequent (nonmajor) alleles.
2988
2989       -e, --exclude EXPRESSION
2990           exclude sites for which EXPRESSION is true. For  valid  expressions
2991           see EXPRESSIONS.
2992
2993       -f, --apply-filters LIST
2994           see Common Options
2995
2996       -g, --genotype [&#94;][hom|het|miss]
2997           include  only sites with one or more homozygous (hom), heterozygous
2998           (het) or missing (miss) genotypes. When prefixed  with  &#94;,  the
2999           logic  is  reversed; thus &#94;het excludes sites with heterozygous
3000           genotypes.
3001
3002       -i, --include EXPRESSION
3003           include sites for which EXPRESSION is true. For  valid  expressions
3004           see EXPRESSIONS.
3005
3006       -k, --known
3007           print known sites only (ID column is not ".")
3008
3009       -m, --min-alleles INT
3010           print sites with at least INT alleles listed in REF and ALT columns
3011
3012       -M, --max-alleles INT
3013           print sites with at most INT alleles listed in REF and ALT columns.
3014           Use -m2 -M2 -v snps to only view biallelic SNPs.
3015
3016       -n, --novel
3017           print novel sites only (ID column is ".")
3018
3019       -p, --phased
3020           print sites where all samples are  phased.  Haploid  genotypes  are
3021           considered phased. Missing genotypes considered unphased unless the
3022           phased bit is set.
3023
3024       -P, --exclude-phased
3025           exclude sites where all samples are phased
3026
3027       -q, --min-af FLOAT[:nref|:alt1|:minor|:major|:nonmajor]
3028           minimum allele  frequency  (INFO/AC  /  INFO/AN)  of  sites  to  be
3029           printed.  Specifying  the type of allele is optional and can be set
3030           to non-reference (nref, the default), 1st  alternate   (alt1),  the
3031           least frequent (minor), the most frequent (major) or sum of all but
3032           the most frequent (nonmajor) alleles.
3033
3034       -Q, --max-af FLOAT[:nref|:alt1|:minor|:major|:nonmajor]
3035           maximum allele  frequency  (INFO/AC  /  INFO/AN)  of  sites  to  be
3036           printed.  Specifying  the type of allele is optional and can be set
3037           to non-reference (nref, the default), 1st  alternate   (alt1),  the
3038           least frequent (minor), the most frequent (major) or sum of all but
3039           the most frequent (nonmajor) alleles.
3040
3041       -u, --uncalled
3042           print sites without a called genotype
3043
3044       -U, --exclude-uncalled
3045           exclude sites without a called genotype
3046
3047       -v, --types snps|indels|mnps|other
3048           comma-separated list of variant types to select. Site  is  selected
3049           if  any  of  the  ALT  alleles  is of the type requested. Types are
3050           determined by comparing the REF and ALT alleles in the  VCF  record
3051           not  INFO  tags like INFO/INDEL or INFO/VT. Use --include to select
3052           based on INFO tags.
3053
3054       -V, --exclude-types snps|indels|mnps|ref|bnd|other
3055           comma-separated list of variant types to exclude. Site is  excluded
3056           if  any  of  the  ALT  alleles  is of the type requested. Types are
3057           determined by comparing the REF and ALT alleles in the  VCF  record
3058           not  INFO tags like INFO/INDEL or INFO/VT. Use --exclude to exclude
3059           based on INFO tags.
3060
3061       -x, --private
3062           print sites where only the subset samples  carry  an  non-reference
3063           allele. Requires --samples or --samples-file.
3064
3065       -X, --exclude-private
3066           exclude  sites where only the subset samples carry an non-reference
3067           allele
3068
3069   bcftools help [COMMAND] | bcftools --help [COMMAND]
3070       Display   a   brief  usage  message  listing  the   bcftools   commands
3071       available.  If the name of a command is also given, e.g., bcftools help
3072       view, the  detailed  usage  message  for  that  particular  command  is
3073       displayed.
3074
3075   bcftools [--version|-v]
3076       Display  the version numbers and copyright information for bcftools and
3077       the important libraries used by bcftools.
3078
3079   bcftools [--version-only]
3080       Display the full bcftools version number in a machine-readable format.
3081

EXPRESSIONS

3083       These filtering expressions are accepted by most of the commands.
3084
3085       Valid expressions may contain:
3086
3087       •   numerical  constants,  string  constants,  file  names   (this   is
3088           currently supported only to filter by the ID column)
3089
3090               1, 1.0, 1e-4
3091               "String"
3092               @file_name
3093
3094       •   arithmetic operators
3095
3096               +,*,-,/
3097
3098       •   comparison operators
3099
3100               == (same as =), >, >=, <=, <, !=
3101
3102       •   regex  operators  "\~"  and  its negation "!~". The expressions are
3103           case sensitive unless "/i" is added.
3104
3105               INFO/HAYSTACK ~ "needle"
3106               INFO/HAYSTACK ~ "NEEDless/i"
3107
3108       •   parentheses
3109
3110               (, )
3111
3112       •   logical operators. See also the examples below and the filtering
3113           tutorial <http://samtools.github.io/bcftools/howtos/filtering.html>
3114           about the distinction between "&amp;&amp;" vs "&amp;" and  "||"  vs
3115           "|".
3116
3117               &amp;&amp;,  &amp;, ||,  |
3118
3119       •   INFO tags, FORMAT tags, column names
3120
3121               INFO/DP or DP
3122               FORMAT/DV, FMT/DV, or DV
3123               FILTER, QUAL, ID, CHROM, POS, REF, ALT[0]
3124
3125       •   starting with 1.11, the FILTER column can be queried as follows:
3126
3127               FILTER="PASS"
3128               FILTER="A"          .. exact match, for example "A;B" does not pass
3129               FILTER!="A"         .. exact match, for example "A;B" does pass
3130               FILTER~"A"          .. both "A" and "A;B" pass
3131               FILTER!~"A"         .. neither "A" nor "A;B" pass
3132
3133       •   1 (or 0) to test the presence (or absence) of a flag
3134
3135               FlagA=1 &amp;&amp; FlagB=0
3136
3137       •   "." to test missing values
3138
3139               DP=".", DP!=".", ALT="."
3140
3141       •   missing  genotypes  can  be  matched regardless of phase and ploidy
3142           (".|.", "./.", ".", "0|.") using these expressions
3143
3144               GT="mis", GT~"\.", GT!~"\."
3145
3146       •   missing genotypes can be matched including  the  phase  and  ploidy
3147           (".|.", "./.", ".") using these expressions
3148
3149               GT=".|.", GT="./.", GT="."
3150
3151       •   sample  genotype: reference (haploid or diploid), alternate (hom or
3152           het,   haploid   or   diploid),   missing   genotype,   homozygous,
3153           heterozygous,  haploid,  ref-ref  hom,  alt-alt  hom,  ref-alt het,
3154           alt-alt het, haploid ref, haploid alt (case-insensitive)
3155
3156               GT="ref"
3157               GT="alt"
3158               GT="mis"
3159               GT="hom"
3160               GT="het"
3161               GT="hap"
3162               GT="RR"
3163               GT="AA"
3164               GT="RA" or GT="AR"
3165               GT="Aa" or GT="aA"
3166               GT="R"
3167               GT="A"
3168
3169       •   TYPE     for     variant     type      in      REF,ALT      columns
3170           (indel,snp,mnp,ref,bnd,other,overlap).  Use the regex operator "\~"
3171           to require at least one allele of the given type or the equal  sign
3172           "=" to require that all alleles are of the given type. Compare
3173
3174               TYPE="snp"
3175               TYPE~"snp"
3176               TYPE!="snp"
3177               TYPE!~"snp"
3178
3179       •   array  subscripts (0-based), "*" for any element, "-" to indicate a
3180           range. Note that for querying FORMAT vectors, the colon ":" can  be
3181           used  to  select a sample and an element of the vector, as shown in
3182           the examples below
3183
3184               INFO/AF[0] > 0.3             .. first AF value bigger than 0.3
3185               FORMAT/AD[0:0] > 30          .. first AD value of the first sample bigger than 30
3186               FORMAT/AD[0:1]               .. first sample, second AD value
3187               FORMAT/AD[1:0]               .. second sample, first AD value
3188               DP4[*] == 0                  .. any DP4 value
3189               FORMAT/DP[0]   > 30          .. DP of the first sample bigger than 30
3190               FORMAT/DP[1-3] > 10          .. samples 2-4
3191               FORMAT/DP[1-]  < 7           .. all samples but the first
3192               FORMAT/DP[0,2-4] > 20        .. samples 1, 3-5
3193               FORMAT/AD[0:1]               .. first sample, second AD field
3194               FORMAT/AD[0:*], AD[0:] or AD[0] .. first sample, any AD field
3195               FORMAT/AD[*:1] or AD[:1]        .. any sample, second AD field
3196               (DP4[0]+DP4[1])/(DP4[2]+DP4[3]) > 0.3
3197               CSQ[*] ~ "missense_variant.*deleterious"
3198
3199       •   with many samples it can be more practical to provide a  file  with
3200           sample names, one sample name per line
3201
3202               GT[@samples.txt]="het" &amp; binom(AD)<0.01
3203
3204       •   function  on  FORMAT tags (over samples) and INFO tags (over vector
3205           fields): maximum; minimum; arithmetic mean (AVG is synonymous  with
3206           MEAN);  median;  standard  deviation from mean; sum; string length;
3207           absolute value; number of elements:
3208
3209               MAX, MIN, AVG, MEAN, MEDIAN, STDEV, SUM, STRLEN, ABS, COUNT
3210
3211           Note that functions above evaluate to a  single  value  across  all
3212           samples  and  are  intended to select sites, not samples, even when
3213           applied on FORMAT tags. However, when prefixed with SMPL_  (or  "s"
3214           for brevity, e.g. SMPL_MAX or sMAX), they will evaluate to a vector
3215           of per-sample values when applied on FORMAT tags:
3216
3217               SMPL_MAX, SMPL_MIN, SMPL_AVG, SMPL_MEAN, SMPL_MEDIAN, SMPL_STDEV, SMPL_SUM,
3218               sMAX, sMIN, sAVG, sMEAN, sMEDIAN, sSTDEV, sSUM
3219
3220       •   two-tailed binomial test. Note that for N=0 the test evaluates to a
3221           missing  value  and  when FORMAT/GT is used to determine the vector
3222           indices, it evaluates to 1 for homozygous genotypes.
3223
3224               binom(FMT/AD)                .. GT can be used to determine the correct index
3225               binom(AD[0],AD[1])           .. or the fields can be given explicitly
3226               phred(binom())               .. the same as binom but phred-scaled
3227
3228       •   variables calculated on the fly if not present: number of alternate
3229           alleles;  number  of  samples;  count  of  alternate alleles; minor
3230           allele count (similar to  AC  but  is  always  smaller  than  0.5);
3231           frequency  of  alternate  alleles  (AF=AC/AN);  frequency  of minor
3232           alleles (MAF=MAC/AN); number of alleles in called genotypes; number
3233           of  samples with missing genotype; fraction of samples with missing
3234           genotype; indel length (deletions negative, insertions positive)
3235
3236               N_ALT, N_SAMPLES, AC, MAC, AF, MAF, AN, N_MISSING, F_MISSING, ILEN
3237
3238       •   the number (N_PASS) or fraction (F_PASS) of samples which pass  the
3239           expression
3240
3241               N_PASS(GQ>90 &amp; GT!="mis") > 90
3242               F_PASS(GQ>90 &amp; GT!="mis") > 0.9
3243
3244       •   custom perl filtering. Note that this command is not compiled in by
3245           default, see the section Optional  Compilation  with  Perl  in  the
3246           INSTALL  file  for help and misc/demo-flt.pl for a working example.
3247           The demo defined  the  perl  subroutine  "severity"  which  can  be
3248           invoked from the command line as follows:
3249
3250               perl:path/to/script.pl; perl.severity(INFO/CSQ) > 3
3251
3252       Notes:
3253
3254       •   String comparisons and regular expressions are case-insensitive
3255
3256       •   Comma  in  strings  is interpreted as a separator and when multiple
3257           values are compared,  the  OR  logic  is  used.  Consequently,  the
3258           following two expressions are equivalent but not the third:
3259
3260               -i 'TAG="hello,world"'
3261               -i 'TAG="hello" || TAG="world"'
3262               -i 'TAG="hello" &amp;&amp; TAG="world"'
3263
3264       •   Variables  and  function  names  are  case-insensitive, but not tag
3265           names.  For  example,  "qual"  can  be  used  instead  of   "QUAL",
3266           "strlen()" instead of "STRLEN()" , but not "dp" instead of "DP".
3267
3268       •   When  querying  multiple values, all elements are tested and the OR
3269           logic  is  used  on  the  result.  For   example,   when   querying
3270           "TAG=1,2,3,4", it will be evaluated as follows:
3271
3272               -i 'TAG[*]=1'   .. true, the record will be printed
3273               -i 'TAG[*]!=1'  .. true
3274               -e 'TAG[*]=1'   .. false, the record will be discarded
3275               -e 'TAG[*]!=1'  .. false
3276               -i 'TAG[0]=1'   .. true
3277               -i 'TAG[0]!=1'  .. false
3278               -e 'TAG[0]=1'   .. false
3279               -e 'TAG[0]!=1'  .. true
3280
3281       Examples:
3282
3283           MIN(DV)>5       .. selects the whole site, evaluates min across all values and samples
3284
3285           SMPL_MIN(DV)>5  .. selects matching samples, evaluates within samples
3286
3287           MIN(DV/DP)>0.3
3288
3289           MIN(DP)>10 &amp; MIN(DV)>3
3290
3291           FMT/DP>10  &amp; FMT/GQ>10 .. both conditions must be satisfied within one sample
3292
3293           FMT/DP>10 &amp;&amp; FMT/GQ>10 .. the conditions can be satisfied in different samples
3294
3295           QUAL>10 |  FMT/GQ>10   .. true for sites with QUAL>10 or a sample with GQ>10, but selects only samples with GQ>10
3296
3297           QUAL>10 || FMT/GQ>10   .. true for sites with QUAL>10 or a sample with GQ>10, plus selects all samples at such sites
3298
3299           TYPE="snp" &amp;&amp; QUAL>=10 &amp;&amp; (DP4[2]+DP4[3] > 2)
3300
3301           COUNT(GT="hom")=0      .. no homozygous genotypes at the site
3302
3303           AVG(GQ)>50             .. average (arithmetic mean) of genotype qualities bigger than 50
3304
3305           ID=@file       .. selects lines with ID present in the file
3306
3307           ID!=@~/file    .. skip lines with ID present in the ~/file
3308
3309           MAF[0]<0.05    .. select rare variants at 5% cutoff
3310
3311           POS>=100   .. restrict your range query, e.g. 20:100-200 to strictly sites with POS in that range.
3312
3313       Shell expansion:
3314
3315       Note that expressions must often be quoted because some characters have
3316       special meaning in the shell. An  example  of  expression  enclosed  in
3317       single  quotes  which  cause that the whole expression is passed to the
3318       program as intended:
3319
3320           bcftools view -i '%ID!="." &amp; MAF[0]<0.01'
3321
3322       Please refer to the documentation of your shell for details.
3323

SCRIPTS AND OPTIONS

3325   plot-vcfstats [OPTIONS] file.vchk [...]
3326       Script for processing output of bcftools stats. It  can  merge  results
3327       from   multiple  outputs  (useful  when  running  the  stats  for  each
3328       chromosome separately), plots graphs and creates a PDF presentation.
3329
3330       -m, --merge
3331           Merge vcfstats files to STDOUT, skip plotting.
3332
3333       -p, --prefix DIR
3334           The output directory. This directory will be created if it does not
3335           exist.
3336
3337       -P, --no-PDF
3338           Skip the PDF creation step.
3339
3340       -r, --rasterize
3341           Rasterize  PDF images for faster rendering. This is the default and
3342           the opposite of -v, --vectors.
3343
3344       -s, --sample-names
3345           Use sample names for xticks rather than numeric IDs.
3346
3347       -t, --title STRING
3348           Identify files by these titles in plots. The option  can  be  given
3349           multiple  times,  for  each ID in the bcftools stats output. If not
3350           present, the script will use abbreviated source file names for  the
3351           titles.
3352
3353       -v, --vectors
3354           Generate  vector  graphics  for  PDF  images,  the  opposite of -r,
3355           --rasterize.
3356
3357       -T, --main-title STRING
3358           Main title for the PDF.
3359
3360       Example:
3361
3362           # Generate the stats
3363           bcftools stats -s - > file.vchk
3364
3365           # Plot the stats
3366           plot-vcfstats -p outdir file.vchk
3367
3368           # The final looks can be customized by editing the generated
3369           # 'outdir/plot.py' script and re-running manually
3370           cd outdir &amp;&amp; python plot.py &amp;&amp; pdflatex summary.tex
3371

PERFORMANCE

3373       HTSlib was designed with BCF format in mind. When  parsing  VCF  files,
3374       all  records  are  internally converted into BCF representation. Simple
3375       operations, like removing a single column  from  a  VCF  file,  can  be
3376       therefore  done much faster with standard UNIX commands, such as awk or
3377       cut. Therefore it is recommended to  use  BCF  as  input/output  format
3378       whenever  possible  to  avoid  large  overhead  of  the VCF → BCF → VCF
3379       conversion.
3380

BUGS

3382       Please report any bugs you encounter on the github website: <http://
3383       github.com/samtools/bcftools>
3384

AUTHORS

3386       Heng  Li  from  the  Sanger  Institute  wrote the original C version of
3387       htslib, samtools and bcftools. Bob Handsaker from the  Broad  Institute
3388       implemented  the  BGZF  library.  Petr Danecek, Shane McCarthy and John
3389       Marshall are  maintaining and further developing bcftools.  Many  other
3390       people   contributed   to   the   program   and   to  the  file  format
3391       specifications, both directly  and  indirectly  by  providing  patches,
3392       testing and reporting bugs. We thank them all.
3393

RESOURCES

3395       BCFtools GitHub website: <http://github.com/samtools/bcftools>
3396
3397       Samtools GitHub website: <http://github.com/samtools/samtools>
3398
3399       HTSlib GitHub website: <http://github.com/samtools/htslib>
3400
3401       File format specifications: <http://samtools.github.io/hts-specs>
3402
3403       BCFtools documentation: <http://samtools.github.io/bcftools>
3404
3405       BCFtools wiki page: <https://github.com/samtools/bcftools/wiki>
3406

COPYING

3408       The  MIT/Expat  License  or  GPL  License, see the LICENSE document for
3409       details. Copyright (c) Genome Research Ltd.
3410
3411
3412
3413                                  2021-07-07                       BCFTOOLS(1)