1esl-shuffle(1) Easel Manual esl-shuffle(1)
2
3
4
6 esl-shuffle - shuffling sequences or generating random ones
7
8
10 esl-shuffle [options] seqfile
11 (shuffle sequences)
12
13 esl-shuffle -G [options]
14 (generate random sequences)
15
16 esl-shuffle -A [options] msafile
17 (shuffle multiple sequence alignments)
18
19
21 esl-shuffle has three different modes of operation.
22
23
24 By default, esl-shuffle reads individual sequences from seqfile, shuf‐
25 fles them, and outputs the shuffled sequences. By default, shuffling
26 is done by preserving monoresidue composition; other options are listed
27 below.
28
29
30 With the -G option, esl-shuffle generates some number of random se‐
31 quences of some length in some alphabet. The -N option controls the
32 number (default is 1), the -L option controls the length (default is
33 0), and the --amino, --dna, and --rna options control the alphabet.
34
35
36 With the -A option, esl-shuffle reads one or more multiple alignments
37 from msafile shuffles them, and outputs the shuffled alignments. By
38 default, the alignment is shuffled columnwise (i.e. column order is
39 permuted). Other options are listed below.
40
41
42
44 -h Print brief help; includes version number and summary of all
45 options, including expert options.
46
47
48 -o <f> Direct output to a file named <f> rather than to stdout.
49
50
51 -N <n> Generate <n> sequences, or <n> perform independent shuffles per
52 input sequence or alignment.
53
54
55 -L <n> Generate sequences of length <n>, or truncate output shuffled
56 sequences or alignments to a length of <n>.
57
58
59
60
61
63 These options only apply in default (sequence shuffling) mode. They
64 are mutually exclusive.
65
66
67 -m Monoresidue shuffling (the default): preserve monoresidue compo‐
68 sition exactly. Uses the Fisher/Yates algorithm (aka Knuth's
69 "Algorithm P").
70
71
72 -d Diresidue shuffling; preserve diresidue composition exactly.
73 Uses the Altschul/Erickson algorithm (Altschul and Erickson,
74 1986). A more efficient algorithm (Kandel and Winkler 1996) is
75 known but has not yet been implemented in Easel.
76
77
78 -0 0th order Markov generation: generate a sequence of the same
79 length with the same 0th order Markov frequencies. Such a se‐
80 quence will approximately preserve the monoresidue composition
81 of the input.
82
83
84 -1 1st order Markov generation: generate a sequence of the same
85 length with the same 1st order Markov frequencies. Such a se‐
86 quence will approximately preserve the diresidue composition of
87 the input.
88
89
90 -r Reversal; reverse each input.
91
92
93 -w <n> Regionally shuffle the input in nonoverlapping windows of size
94 <n> residues, preserving exact monoresidue composition in each
95 window.
96
97
98
99
101 -b Sample columns with replacement, in order to generate a boot‐
102 strap-resampled alignment dataset.
103
104
105 -v Shuffle residues with each column independently; i.e., permute
106 residue order in each column ("vertical" shuffling).
107
108
109
111 One of these must be selected, if -G is used.
112
113
114 --amino
115 Generate amino acid sequences.
116
117
118 --dna Generate DNA sequences.
119
120
121 --rna Generate RNA sequences.
122
123
124
125
127 --informat <s>
128 Assert that input seqfile is in format <s>, bypassing format au‐
129 todetection. Common choices for <s> include: fasta, embl, gen‐
130 bank. Alignment formats also work; common choices include:
131 stockholm, a2m, afa, psiblast, clustal, phylip. For more infor‐
132 mation, and for codes for some less common formats, see main
133 documentation. The string <s> is case-insensitive (fasta or
134 FASTA both work).
135
136
137
138 --seed <n>
139 Specify the seed for the random number generator, where the seed
140 <n> is an integer greater than zero. This can be used to make
141 the results of esl-shuffle reproducible. If <n> is 0, the ran‐
142 dom number generator is seeded arbitrarily and stochastic simu‐
143 lations will vary from run to run. Arbitrary seeding (0) is the
144 default.
145
146
147
148
149
151 http://bioeasel.org/
152
153
155 Copyright (C) 2020 Howard Hughes Medical Institute.
156 Freely distributed under the BSD open source license.
157
158
160 http://eddylab.org
161
162
163
164
165Easel 0.48 Nov 2020 esl-shuffle(1)