1esl-sfetch(1) Easel Manual esl-sfetch(1)
2
3
4
6 esl-sfetch - retrieve (sub-)sequences from a sequence file
7
8
10 esl-sfetch [options] seqfile key
11 (retrieve a single sequence by key)
12
13 esl-sfetch -c from..to [options] seqfile key
14 (retrieve a single subsequence by key and coords)
15
16 esl-sfetch -f [options] seqfile keyfile
17 (retrieve multiple sequences using a file of keys)
18
19 esl-sfetch -Cf [options] seqfile subseq-coord-file
20 (retrieve multiple subsequences using file of keys and coords)
21
22 esl-sfetch --index msafile
23 (index a sequence file for retrievals)
24
25
26
28 esl-sfetch retrieves one or more sequences or subsequences from seq‐
29 file.
30
31
32 The seqfile must be indexed using esl-sfetch --index seqfile. This
33 creates an SSI index file seqfile.ssi.
34
35
36 To retrieve a single complete sequence, do esl-sfetch seqfile key,
37 where key is the name or accession of the desired sequence.
38
39
40 To retrieve a single subsequence rather than a complete sequence, use
41 the -c start..end option to provide start and end coordinates. The
42 start and end coordinates are provided as one string, separated by any
43 nonnumeric, nonwhitespace character or characters you like; see the -c
44 option below for more details.
45
46
47 To retrieve more than one complete sequence at once, you may use the -f
48 option, and the second command line argument will specify the name of a
49 keyfile that contains a list of names or accessions, one per line; the
50 first whitespace-delimited field on each line of this file is parsed as
51 the name/accession.
52
53
54 To retrieve more than one subsequence at once, use the -C option in ad‐
55 dition to -f, and now the second argument is parsed as a list of subse‐
56 quence coordinate lines. See the -C option below for more details, in‐
57 cluding the format of these lines.
58
59
60
61 In DNA/RNA files, you may extract (sub-)sequences in reverse complement
62 orientation in two different ways: either by providing a from coordi‐
63 nate that is greater than to, or by providing the -r option.
64
65
66 When the -f option is used to do multiple (sub-)sequence retrieval, the
67 file argument may be - (a single dash), in which case the list of
68 names/accessions (or subsequence coordinate lines) is read from stan‐
69 dard input. However, because a standard input stream can't be SSI in‐
70 dexed, (sub-)sequence retrieval from stdin may be slow.
71
72
73
75 -h Print brief help; includes version number and summary of all op‐
76 tions, including expert options.
77
78
79 -c coords
80 Retrieve a subsequence with start and end coordinates specified
81 by the coords string. This string consists of start and end co‐
82 ordinates separated by any nonnumeric, nonwhitespace character
83 or characters you like; for example, -c 23..100, -c 23/100, or
84 -c 23-100 all work. To retrieve a suffix of a subsequence, you
85 can omit the end ; for example, -c 23: would work. To specify
86 reverse complement (for DNA/RNA sequence), you can specify from
87 greater than to; for example, -c 100..23 retrieves the reverse
88 complement strand from 100 to 23.
89
90
91 -f Interpret the second argument as a keyfile instead of as just
92 one key. The first whitespace-limited field on each line of
93 keyfile is interpreted as a name or accession to be fetched.
94 This option doesn't work with the --index option. Any other
95 fields on a line after the first one are ignored. Blank lines
96 and lines beginning with # are ignored.
97
98
99 -o <f> Output retrieved sequences to a file <f> instead of to stdout.
100
101
102
103 -n <s> Rename the retrieved (sub-)sequence <s>. Incompatible with -f.
104
105
106 -r Reverse complement the retrieved (sub-)sequence. Only accepted
107 for DNA/RNA sequences.
108
109
110 -C Multiple subsequence retrieval mode, with -f option (required).
111 Specifies that the second command line argument is to be parsed
112 as a subsequence coordinate file, consisting of lines containing
113 four whitespace-delimited fields: new_name, from, to, name/ac‐
114 cession. For each such line, sequence name/accession is found,
115 a subsequence from..to is extracted, and the subsequence is re‐
116 named new_name before being output. Any other fields after the
117 first four are ignored. Blank lines and lines beginning with #
118 are ignored.
119
120
121
122 -O Output retrieved sequence to a file named key. This is a conve‐
123 nience for saving some typing: instead of
124 % esl-sfetch -o SRPA_HUMAN swissprot SRPA_HUMAN
125 you can just type
126 % esl-sfetch -O swissprot SRPA_HUMAN
127 The -O option only works if you're retrieving a single align‐
128 ment; it is incompatible with -f.
129
130
131 --index
132 Instead of retrieving a key, the special command esl-sfetch
133 --index seqfile produces an SSI index of the names and acces‐
134 sions of the alignments in the seqfile. Indexing should be done
135 once on the seqfile to prepare it for all future fetches.
136
137
138
140 --informat <s>
141 Assert that seqfile is in format <s>, bypassing format autode‐
142 tection. Common choices for <s> include: fasta, embl, genbank.
143 Alignment formats also work; common choices include: stockholm,
144 a2m, afa, psiblast, clustal, phylip. For more information, and
145 for codes for some less common formats, see main documentation.
146 The string <s> is case-insensitive (fasta or FASTA both work).
147
148
149
150
152 http://bioeasel.org/
153
154
156 Copyright (C) 2020 Howard Hughes Medical Institute.
157 Freely distributed under the BSD open source license.
158
159
161 http://eddylab.org
162
163
164
165Easel 0.48 Nov 2020 esl-sfetch(1)