1esl-mask(1)                      Easel Manual                      esl-mask(1)
2
3
4

NAME

6       esl-mask - mask sequence residues with X's (or other characters)
7
8

SYNOPSIS

10       esl-mask [options] seqfile maskfile
11
12
13

DESCRIPTION

15       esl-mask  reads lines from maskfile that give start/end coordinates for
16       regions in each sequence in seqfile, masks these residues (changes them
17       to X's), and outputs the masked sequence.
18
19
20       The  maskfile  is  a  space-delimited  file. Blank lines and lines that
21       start with '#' (comments) are ignored. Each data line contains at least
22       three  fields:  seqname,  start, and end.  The seqname is the name of a
23       sequence in the seqfile, and start and end are coordinates  defining  a
24       region  in  that sequence.  The coordinates are indexed <1..L> with re‐
25       spect to a sequence of length <L>.
26
27
28       By default, the sequence names must appear in exactly  the  same  order
29       and  number  as the sequences in the seqfile.  This is easy to enforce,
30       because the format of maskfile is also legal as a  list  of  names  for
31       esl-sfetch,  so  you  can  always  fetch a temporary sequence file with
32       esl-sfetch and pipe that to esl-mask.  (Alternatively, see the  -R  op‐
33       tion for fetching from an SSI-indexed seqfile.)
34
35
36       The  default is to mask the region indicated by <start>..<end>.  Alter‐
37       natively, everything but this region can be masked; see the -r  reverse
38       masking option.
39
40
41       The  default  is to mask residues by converting them to X's.  Any other
42       masking character can be chosen  (see  -m  option),  or  alternatively,
43       masked residues can be lowercased (see -l option).
44
45
46
47

OPTIONS

49       -h     Print brief help; includes version number and summary of all op‐
50              tions, including expert options.
51
52
53       -l     Lowercase; mask by converting masked characters  to  lower  case
54              and unmasked characters to upper case.
55
56
57       -m <c> Mask by converting masked residues to <c> instead of the default
58              X.
59
60
61       -o <f> Send output to file <f> instead of stdout.
62
63
64       -r     Reverse mask; mask everything outside the region start..end,  as
65              opposed to the default of masking that region.
66
67
68       -R     Random  access; fetch sequences from seqfile rather than requir‐
69              ing that sequence names in maskfile and seqfile come in  exactly
70              the same order and number.  The seqfile must be SSI indexed (see
71              esl-sfetch --index.)
72
73
74       -x <n> Extend all masked regions by up to <n> residues  on  each  side.
75              For  normal  masking, this means masking <start>-<n>..<end>+<n>.
76              For reverse masking, this  means  masking  1..<start>-1+<n>  and
77              <end>+1-<n>..L in a sequence of length L.
78
79
80
81       --informat <s>
82              Assert that input seqfile is in format <s>, bypassing format au‐
83              todetection.  Common choices for <s> include: fasta, embl,  gen‐
84              bank.   Alignment  formats  also  work;  common choices include:
85              stockholm, a2m, afa, psiblast, clustal, phylip.  For more infor‐
86              mation,  and  for  codes  for some less common formats, see main
87              documentation.  The string <s>  is  case-insensitive  (fasta  or
88              FASTA both work).
89
90
91
92
93

SEE ALSO

95       http://bioeasel.org/
96
97
99       Copyright (C) 2020 Howard Hughes Medical Institute.
100       Freely distributed under the BSD open source license.
101
102

AUTHOR

104       http://eddylab.org
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122Easel @EASELVERSION@               Nov 2020                        esl-mask(1)
Impressum