1RAY(1) User Commands RAY(1)
2
3
4
6 Ray - assemble genomes in parallel using the message-passing interface
7
9 mpiexec -n NUMBER_OF_RANKS Ray -k KMERLENGTH -p l1_1.fastq
10 l1_2.fastq -p l2_1.fastq l2_2.fastq -o test
11
12 mpiexec -n NUMBER_OF_RANKS Ray Ray.conf # with commands in a
13 file
14
16 The Ray genome assembler is built on top of the RayPlatform, a
17 generic plugin-based
18 distributed and parallel compute engine that uses the message-passing
19 interface
20 for passing messages.
21
22 Ray targets several applications:
23
24 - de novo genome assembly (with Ray vanilla)
25 - de novo meta-genome assembly (with Ray Méta)
26 - de novo transcriptome assembly (works, but not tested a lot)
27 - quantification of contig abundances
28 - quantification of microbiome consortia members (with Ray Communi‐
29 ties)
30 - quantification of transcript expression
31 - taxonomy profiling of samples (with Ray Communities)
32 - gene ontology profiling of samples (with Ray Ontologies)
33
34
36 -help
37 Displays this help page.
38
39 -version
40 Displays Ray version and compilation options.
41
42 Using a configuration file
43
44 Ray can be launched with
45 mpiexec -n 16 Ray Ray.conf
46 The configuration file can include comments (starting with #).
47
48 K-mer length
49
50 -k kmerLength
51 Selects the length of k-mers. The default value is 21.
52 It must be odd because reverse-complement vertices are
53 stored together.
54 The maximum length is defined at compilation by MAXKMER‐
55 LENGTH
56 Larger k-mers utilise more memory.
57
58 Inputs
59
60 -p leftSequenceFile rightSequenceFile [averageOuterDistance
61 standardDeviation]
62 Provides two files containing paired-end reads.
63 averageOuterDistance and standardDeviation are automati‐
64 cally computed if not provided.
65
66 -i interleavedSequenceFile [averageOuterDistance standardDevia‐
67 tion]
68 Provides one file containing interleaved paired-end
69 reads.
70 averageOuterDistance and standardDeviation are automati‐
71 cally computed if not provided.
72
73 -s sequenceFile
74 Provides a file containing single-end reads.
75
76 Outputs
77
78 -o outputDirectory
79 Specifies the directory for outputted files. Default is
80 RayOutput
81
82 Assembly options (defaults work well)
83
84 -disable-recycling
85 Disables read recycling during the assembly
86 reads will be set free in 3 cases:
87 1. the distance did not match for a pair
88 2. the read has not met its mate
89 3. the library population indicates a wrong placement
90 see Constrained traversal of repeats with paired
91 sequences.
92 Sébastien Boisvert, Élénie Godzaridis, François Lavio‐
93 lette & Jacques Corbeil.
94 First Annual RECOMB Satellite Workshop on Massively Par‐
95 allel Sequencing, March 26-27 2011, Vancouver, BC, Canada.
96
97 -disable-scaffolder
98 Disables the scaffolder.
99
100 -minimum-contig-length minimumContigLength
101 Changes the minimum contig length, default is 100 nucleo‐
102 tides
103
104 -color-space
105 Runs in color-space
106 Needs csfasta files. Activated automatically if csfasta
107 files are provided.
108
109 -use-maximum-seed-coverage maximumSeedCoverageDepth
110 Ignores any seed with a coverage depth above this thresh‐
111 old.
112 The default is 4294967295.
113
114 -use-minimum-seed-coverage minimumSeedCoverageDepth
115 Sets the minimum seed coverage depth.
116 Any path with a coverage depth lower than this will be
117 discarded. The default is 0.
118
119 Distributed storage engine (all these values are for each MPI rank)
120
121 -bloom-filter-bits bits
122 Sets the number of bits for the Bloom filter
123 Default is 268435456 bits, 0 bits disables the Bloom fil‐
124 ter.
125
126 -hash-table-buckets buckets
127 Sets the initial number of buckets. Must be a power of 2
128 !
129 Default value: 268435456
130
131 -hash-table-buckets-per-group buckets
132 Sets the number of buckets per group for sparse storage
133 Default value: 64, Must be between >=1 and <= 64
134
135 -hash-table-load-factor-threshold threshold
136 Sets the load factor threshold for real-time resizing
137 Default value: 0.75, must be >= 0.5 and < 1
138
139 -hash-table-verbosity
140 Activates verbosity for the distributed storage engine
141
142 Biological abundances
143
144 -search searchDirectory
145 Provides a directory containing fasta files to be
146 searched in the de Bruijn graph.
147 Biological abundances will be written to RayOutput/Bio‐
148 logicalAbundances
149 See Documentation/BiologicalAbundances.txt
150
151 -one-color-per-file
152 Sets one color per file instead of one per sequence.
153 By default, each sequence in each file has a different
154 color.
155 For files with large numbers of sequences, using one sin‐
156 gle color per file may be more efficient.
157
158 Taxonomic profiling with colored de Bruijn graphs
159
160 -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-
161 Names.tsv
162 Provides a taxonomy.
163 Computes and writes detailed taxonomic profiles.
164 See Documentation/Taxonomy.txt for details.
165
166 -gene-ontology OntologyTerms.txt Annotations.txt
167 Provides an ontology and annotations.
168 OntologyTerms.txt is fetched from http://geneontology.org
169 Annotations.txt is a 2-column file (EMBL_CDS han‐
170 dle & gene ontology identifier)
171 See Documentation/GeneOntology.txt
172 Other outputs
173
174 -enable-neighbourhoods
175 Computes contig neighborhoods in the de Bruijn graph
176 Output file: RayOutput/NeighbourhoodRelations.txt
177
178 -amos
179 Writes the AMOS file called RayOutput/AMOS.afg
180 An AMOS file contains read positions on contigs.
181 Can be opened with software with graphical user inter‐
182 face.
183
184 -write-kmers
185 Writes k-mer graph to RayOutput/kmers.txt
186 The resulting file is not utilised by Ray.
187 The resulting file is very large.
188
189 -write-read-markers
190 Writes read markers to disk.
191
192 -write-seeds
193 Writes seed DNA sequences to RayOutput/Rank<rank>.Ray‐
194 Seeds.fasta
195
196 -write-extensions
197 Writes extension DNA sequences to RayOut‐
198 put/Rank<rank>.RayExtensions.fasta
199
200 -write-contig-paths
201 Writes contig paths with coverage values
202 to RayOutput/Rank<rank>.RayContigPaths.txt
203
204 -write-marker-summary
205 Writes marker statistics.
206
207 Memory usage
208
209 -show-memory-usage
210 Shows memory usage. Data is fetched from /proc on
211 GNU/Linux
212 Needs __linux__
213
214 -show-memory-allocations
215 Shows memory allocation events
216
217 Algorithm verbosity
218
219 -show-extension-choice
220 Shows the choice made (with other choices) during the
221 extension.
222
223 -show-ending-context
224 Shows the ending context of each extension.
225 Shows the children of the vertex where extension was too
226 difficult.
227
228 -show-distance-summary
229 Shows summary of outer distances used for an extension
230 path.
231
232 -show-consensus
233 Shows the consensus when a choice is done.
234
235 Checkpointing
236
237 -write-checkpoints checkpointDirectory
238 Write checkpoint files
239
240 -read-checkpoints checkpointDirectory
241 Read checkpoint files
242
243 -read-write-checkpoints checkpointDirectory
244 Read and write checkpoint files
245
246 Message routing for large number of cores
247
248 -route-messages
249 Enables the Ray message router. Disabled by default.
250 Messages will be routed accordingly so that any rank can
251 communicate directly with only a few others.
252 Without -route-messages, any rank can communicate
253 directly with any other rank.
254 Files generated: Routing/Connections.txt, Rout‐
255 ing/Routes.txt and Routing/RelayEvents.txt
256 and Routing/Summary.txt
257
258 -connection-type type
259 Sets the connection type for routes.
260 Accepted values are debruijn, hypercube, polytope, group,
261 random, kautz and complete. Default is debruijn.
262 debruijn: a full de Bruijn graph a given alphabet and
263 diameter
264 hypercube: a hypercube, alphabet is {0,1} and the ver‐
265 tices is a power of 2
266 polytope: a convex regular polytope, alphabet is
267 {0,1,...,B-1} and the vertices is a power of B
268 group: silly model where one representative per group
269 can communicate with outsiders
270 random: Erdős-Rényi model
271 kautz: a full de Kautz graph, which is a subgraph of a
272 de Bruijn graph
273 complete: a full graph with all the possible connections
274 With the type debruijn, the number of ranks must be a
275 power of something.
276 Examples: 256 = 16*16, 512=8*8*8, 49=7*7, and so on.
277 Otherwise, don't use debruijn routing but use another one
278 With the type kautz, the number of ranks n must be
279 n=(k+1)*k^(d-1) for some k and d
280
281 -routing-graph-degree degree
282 Specifies the outgoing degree for the routing graph.
283 See Documentation/Routing.txt
284
285 Hardware testing
286
287 -test-network-only
288 Tests the network and returns.
289
290 -write-network-test-raw-data
291 Writes one additional file per rank detailing the network
292 test.
293
294 -exchanges NumberOfExchanges
295 Sets the number of exchanges
296
297 -disable-network-test
298 Skips the network test.
299
300 Debugging
301
302 -verify-message-integrity
303 Checks message data reliability for any non-empty mes‐
304 sage.
305 add '-D CONFIG_SSE_4_2' in the Makefile to use hardware
306 instruction (SSE 4.2)
307
308 -run-profiler
309 Runs the profiler as the code runs. By default, only show
310 granularity warnings.
311 Running the profiler increases running times.
312
313 -with-profiler-details
314 Shows number of messages sent and received in each meth‐
315 ods during in each time slices (epochs). Needs -run-profiler.
316
317 -show-communication-events
318 Shows all messages sent and received.
319
320 -show-read-placement
321 Shows read placement in the graph during the extension.
322
323 -debug-bubbles
324 Debugs bubble code.
325 Bubbles can be due to heterozygous sites or sequencing
326 errors or other (unknown) events
327
328 -debug-seeds
329 Debugs seed code.
330 Seeds are paths in the graph that are likely unique.
331
332 -debug-fusions
333 Debugs fusion code.
334
335 -debug-scaffolder
336 Debug the scaffolder.
337
339 Input files
340
341 Note: file format is determined with file extension.
342
343 .fasta
344 .fasta.gz (needs HAVE_LIBZ=y at compilation)
345 .fasta.bz2 (needs HAVE_LIBBZ2=y at compilation)
346 .fastq
347 .fastq.gz (needs HAVE_LIBZ=y at compilation)
348 .fastq.bz2 (needs HAVE_LIBBZ2=y at compilation)
349 .sff (paired reads must be extracted manually)
350 .csfasta (color-space reads)
351
352 Outputted files
353
354 Scaffolds
355
356 RayOutput/Scaffolds.fasta
357 The scaffold sequences in FASTA format
358 RayOutput/ScaffoldComponents.txt
359 The components of each scaffold
360 RayOutput/ScaffoldLengths.txt
361 The length of each scaffold
362 RayOutput/ScaffoldLinks.txt
363 Scaffold links
364
365 Contigs
366
367 RayOutput/Contigs.fasta
368 Contiguous sequences in FASTA format
369 RayOutput/ContigLengths.txt
370 The lengths of contiguous sequences
371
372 Summary
373
374 RayOutput/OutputNumbers.txt
375 Overall numbers for the assembly
376
377 de Bruijn graph
378
379 RayOutput/CoverageDistribution.txt
380 The distribution of coverage values
381 RayOutput/CoverageDistributionAnalysis.txt
382 Analysis of the coverage distribution
383 RayOutput/degreeDistribution.txt
384 Distribution of ingoing and outgoing degrees
385 RayOutput/kmers.txt
386 k-mer graph, required option: -write-kmers
387 The resulting file is not utilised by Ray.
388 The resulting file is very large.
389
390 Assembly steps
391
392 RayOutput/SeedLengthDistribution.txt
393 Distribution of seed length
394 RayOutput/Rank<rank>.OptimalReadMarkers.txt
395 Read markers.
396 RayOutput/Rank<rank>.RaySeeds.fasta
397 Seed DNA sequences, required option: -write-seeds
398 RayOutput/Rank<rank>.RayExtensions.fasta
399 Extension DNA sequences, required option: -write-extensions
400 RayOutput/Rank<rank>.RayContigPaths.txt
401 Contig paths with coverage values, required option: -write-
402 contig-paths
403
404 Paired reads
405
406 RayOutput/LibraryStatistics.txt
407 Estimation of outer distances for paired reads
408 RayOutput/Library<LibraryNumber>.txt
409 Frequencies for observed outer distances (insert size + read
410 lengths)
411
412 Partition
413
414 RayOutput/NumberOfSequences.txt
415 Number of reads in each file
416 RayOutput/SequencePartition.txt
417 Sequence partition
418
419 Ray software
420
421 RayOutput/RayVersion.txt
422 The version of Ray
423 RayOutput/RayCommand.txt
424 The exact same command provided
425
426 AMOS
427
428 RayOutput/AMOS.afg
429 Assembly representation in AMOS format, required option:
430 -amos
431
432 Communication
433
434 RayOutput/MessagePassingInterface.txt Number of messages
435 sent
436 RayOutput/NetworkTest.txt Latencies in microseconds
437 RayOutput/Rank<rank>NetworkTestData.txt Network test raw
438 data
439
441 - mpiexec -n 1 Ray -help|less (always up-to-date)
442 - This help page (always up-to-date)
443 - The directory Documentation/
444 - Manual (Portable Document Format): InstructionManual.tex (in
445 Documentation)
446 - Mailing list archives: http://source‐
447 forge.net/mailarchive/forum.php?forum_name=denovoassembler-users
448
450 Written by Sébastien Boisvert.
451
453 Report bugs to denovoassembler-users@lists.sourceforge.net
454 Home page: <http://denovoassembler.sourceforge.net/>
455
457 This program is free software: you can redistribute it and/or
458 modify
459 it under the terms of the GNU General Public License as pub‐
460 lished by
461 the Free Software Foundation, version 3 of the License.
462
463 This program is distributed in the hope that it will be useful,
464 but WITHOUT ANY WARRANTY; without even the implied warranty of
465 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
466 GNU General Public License for more details.
467
468 You have received a copy of the GNU General Public License
469 along with this program (see LICENSE).
470
471
472
473
474Ray 2.1.0 November 2012 RAY(1)