1esl-shuffle(1)                   Easel Manual                   esl-shuffle(1)
2
3
4

NAME

6       esl-shuffle - shuffling sequences or generating random ones
7
8

SYNOPSIS

10       esl-shuffle [options] seqfile
11         (shuffle sequences)
12
13       esl-shuffle -G [options]
14         (generate random sequences)
15
16       esl-shuffle -A [options] msafile
17         (shuffle multiple sequence alignments)
18
19

DESCRIPTION

21       esl-shuffle has three different modes of operation.
22
23
24       By  default, esl-shuffle reads individual sequences from seqfile, shuf‐
25       fles them, and outputs the shuffled sequences.  By  default,  shuffling
26       is done by preserving monoresidue composition; other options are listed
27       below.
28
29
30       With the -G option, esl-shuffle generates some  number  of  random  se‐
31       quences  of  some  length  in some alphabet. The -N option controls the
32       number (default is 1), the -L option controls the  length  (default  is
33       0), and the --amino, --dna, and --rna options control the alphabet.
34
35
36       With  the  -A option, esl-shuffle reads one or more multiple alignments
37       from msafile shuffles them, and outputs the  shuffled  alignments.   By
38       default,  the  alignment  is  shuffled columnwise (i.e. column order is
39       permuted).  Other options are listed below.
40
41
42

GENERAL OPTIONS

44       -h     Print brief help;  includes version number and  summary  of  all
45              options, including expert options.
46
47
48       -o <f> Direct output to a file named <f> rather than to stdout.
49
50
51       -N <n> Generate  <n> sequences, or <n> perform independent shuffles per
52              input sequence or alignment.
53
54
55       -L <n> Generate sequences of length <n>, or  truncate  output  shuffled
56              sequences or alignments to a length of <n>.
57
58
59
60
61

SEQUENCE SHUFFLING OPTIONS

63       These  options  only  apply in default (sequence shuffling) mode.  They
64       are mutually exclusive.
65
66
67       -m     Monoresidue shuffling (the default): preserve monoresidue compo‐
68              sition  exactly.   Uses  the Fisher/Yates algorithm (aka Knuth's
69              "Algorithm P").
70
71
72       -d     Diresidue shuffling;  preserve  diresidue  composition  exactly.
73              Uses  the  Altschul/Erickson  algorithm  (Altschul and Erickson,
74              1986). A more efficient algorithm (Kandel and Winkler  1996)  is
75              known but has not yet been implemented in Easel.
76
77
78       -0     0th  order  Markov  generation:  generate a sequence of the same
79              length with the same 0th order Markov frequencies.  Such  a  se‐
80              quence  will  approximately preserve the monoresidue composition
81              of the input.
82
83
84       -1     1st order Markov generation: generate a  sequence  of  the  same
85              length  with  the  same 1st order Markov frequencies. Such a se‐
86              quence will approximately preserve the diresidue composition  of
87              the input.
88
89
90       -r     Reversal; reverse each input.
91
92
93       -w <n> Regionally  shuffle  the input in nonoverlapping windows of size
94              <n> residues, preserving exact monoresidue composition  in  each
95              window.
96
97
98
99

MULTIPLE ALIGNMENT SHUFFLING OPTIONS

101       -b     Sample  columns  with  replacement, in order to generate a boot‐
102              strap-resampled alignment dataset.
103
104
105       -v     Shuffle residues with each column independently;  i.e.,  permute
106              residue order in each column ("vertical" shuffling).
107
108
109

SEQUENCE GENERATION OPTIONS

111       One of these must be selected, if -G is used.
112
113
114       --amino
115              Generate amino acid sequences.
116
117
118       --dna  Generate DNA sequences.
119
120
121       --rna  Generate RNA sequences.
122
123
124
125

EXPERT OPTIONS

127       --informat <s>
128              Assert that input seqfile is in format <s>, bypassing format au‐
129              todetection.  Common choices for <s> include: fasta, embl,  gen‐
130              bank.   Alignment  formats  also  work;  common choices include:
131              stockholm, a2m, afa, psiblast, clustal, phylip.  For more infor‐
132              mation,  and  for  codes  for some less common formats, see main
133              documentation.  The string <s>  is  case-insensitive  (fasta  or
134              FASTA both work).
135
136
137
138       --seed <n>
139              Specify the seed for the random number generator, where the seed
140              <n> is an integer greater than zero. This can be  used  to  make
141              the  results of esl-shuffle reproducible.  If <n> is 0, the ran‐
142              dom number generator is seeded arbitrarily and stochastic  simu‐
143              lations will vary from run to run.  Arbitrary seeding (0) is the
144              default.
145
146
147
148
149

SEE ALSO

151       http://bioeasel.org/
152
153
155       Copyright (C) 2020 Howard Hughes Medical Institute.
156       Freely distributed under the BSD open source license.
157
158

AUTHOR

160       http://eddylab.org
161
162
163
164
165Easel 0.48                         Nov 2020                     esl-shuffle(1)
Impressum