1esl-mask(1) Easel Manual esl-mask(1)
2
3
4
6 esl-mask - mask sequence residues with X's (or other characters)
7
8
10 esl-mask [options] seqfile maskfile
11
12
13
15 esl-mask reads lines from maskfile that give start/end coordinates for
16 regions in each sequence in seqfile, masks these residues (changes them
17 to X's), and outputs the masked sequence.
18
19
20 The maskfile is a space-delimited file. Blank lines and lines that
21 start with '#' (comments) are ignored. Each data line contains at least
22 three fields: seqname, start, and end. The seqname is the name of a
23 sequence in the seqfile, and start and end are coordinates defining a
24 region in that sequence. The coordinates are indexed <1..L> with re‐
25 spect to a sequence of length <L>.
26
27
28 By default, the sequence names must appear in exactly the same order
29 and number as the sequences in the seqfile. This is easy to enforce,
30 because the format of maskfile is also legal as a list of names for
31 esl-sfetch, so you can always fetch a temporary sequence file with
32 esl-sfetch and pipe that to esl-mask. (Alternatively, see the -R op‐
33 tion for fetching from an SSI-indexed seqfile.)
34
35
36 The default is to mask the region indicated by <start>..<end>. Alter‐
37 natively, everything but this region can be masked; see the -r reverse
38 masking option.
39
40
41 The default is to mask residues by converting them to X's. Any other
42 masking character can be chosen (see -m option), or alternatively,
43 masked residues can be lowercased (see -l option).
44
45
46
47
49 -h Print brief help; includes version number and summary of all op‐
50 tions, including expert options.
51
52
53 -l Lowercase; mask by converting masked characters to lower case
54 and unmasked characters to upper case.
55
56
57 -m <c> Mask by converting masked residues to <c> instead of the default
58 X.
59
60
61 -o <f> Send output to file <f> instead of stdout.
62
63
64 -r Reverse mask; mask everything outside the region start..end, as
65 opposed to the default of masking that region.
66
67
68 -R Random access; fetch sequences from seqfile rather than requir‐
69 ing that sequence names in maskfile and seqfile come in exactly
70 the same order and number. The seqfile must be SSI indexed (see
71 esl-sfetch --index.)
72
73
74 -x <n> Extend all masked regions by up to <n> residues on each side.
75 For normal masking, this means masking <start>-<n>..<end>+<n>.
76 For reverse masking, this means masking 1..<start>-1+<n> and
77 <end>+1-<n>..L in a sequence of length L.
78
79
80
81 --informat <s>
82 Assert that input seqfile is in format <s>, bypassing format au‐
83 todetection. Common choices for <s> include: fasta, embl, gen‐
84 bank. Alignment formats also work; common choices include:
85 stockholm, a2m, afa, psiblast, clustal, phylip. For more infor‐
86 mation, and for codes for some less common formats, see main
87 documentation. The string <s> is case-insensitive (fasta or
88 FASTA both work).
89
90
91
92
93
95 http://bioeasel.org/
96
97
99 Copyright (C) 2020 Howard Hughes Medical Institute.
100 Freely distributed under the BSD open source license.
101
102
104 http://eddylab.org
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122Easel @EASELVERSION@ Nov 2020 esl-mask(1)