1RAY(1)                           User Commands                          RAY(1)
2
3
4

NAME

6       Ray - assemble genomes in parallel using the message-passing interface
7

SYNOPSIS

9              mpiexec  -n  NUMBER_OF_RANKS  Ray  -k  KMERLENGTH  -p l1_1.fastq
10       l1_2.fastq -p l2_1.fastq l2_2.fastq -o test
11
12              mpiexec -n NUMBER_OF_RANKS Ray Ray.conf #  with  commands  in  a
13       file
14

DESCRIPTION

16         The  Ray  genome  assembler  is  built  on  top of the RayPlatform, a
17       generic plugin-based
18         distributed and parallel compute engine that uses the message-passing
19       interface
20         for passing messages.
21
22         Ray targets several applications:
23
24           - de novo genome assembly (with Ray vanilla)
25           - de novo meta-genome assembly (with Ray Méta)
26           - de novo transcriptome assembly (works, but not tested a lot)
27           - quantification of contig abundances
28           - quantification of microbiome consortia members (with Ray Communi‐
29       ties)
30           - quantification of transcript expression
31           - taxonomy profiling of samples (with Ray Communities)
32           - gene ontology profiling of samples (with Ray Ontologies)
33
34

OPTIONS

36              -help
37                     Displays this help page.
38
39              -version
40                     Displays Ray version and compilation options.
41
42         Using a configuration file
43
44           Ray can be launched with
45           mpiexec -n 16 Ray Ray.conf
46           The configuration file can include comments (starting with #).
47
48         K-mer length
49
50              -k kmerLength
51                     Selects the length of k-mers. The default value is 21.
52                     It must be odd because  reverse-complement  vertices  are
53       stored together.
54                     The  maximum length is defined at compilation by MAXKMER‐
55       LENGTH
56                     Larger k-mers utilise more memory.
57
58         Inputs
59
60              -p  leftSequenceFile   rightSequenceFile   [averageOuterDistance
61       standardDeviation]
62                     Provides two files containing paired-end reads.
63                     averageOuterDistance  and standardDeviation are automati‐
64       cally computed if not provided.
65
66              -i interleavedSequenceFile [averageOuterDistance  standardDevia‐
67       tion]
68                     Provides   one  file  containing  interleaved  paired-end
69       reads.
70                     averageOuterDistance and standardDeviation are  automati‐
71       cally computed if not provided.
72
73              -s sequenceFile
74                     Provides a file containing single-end reads.
75
76         Outputs
77
78              -o outputDirectory
79                     Specifies  the  directory for outputted files. Default is
80       RayOutput
81
82         Assembly options (defaults work well)
83
84              -disable-recycling
85                     Disables read recycling during the assembly
86                     reads will be set free in 3 cases:
87                     1. the distance did not match for a pair
88                     2. the read has not met its mate
89                     3. the library population indicates a wrong placement
90                     see  Constrained  traversal  of   repeats   with   paired
91       sequences.
92                     Sébastien  Boisvert,  Élénie  Godzaridis, François Lavio‐
93       lette & Jacques Corbeil.
94                     First Annual RECOMB Satellite Workshop on Massively  Par‐
95       allel Sequencing, March 26-27 2011, Vancouver, BC, Canada.
96
97              -disable-scaffolder
98                     Disables the scaffolder.
99
100              -minimum-contig-length minimumContigLength
101                     Changes the minimum contig length, default is 100 nucleo‐
102       tides
103
104              -color-space
105                     Runs in color-space
106                     Needs csfasta files. Activated automatically  if  csfasta
107       files are provided.
108
109              -use-maximum-seed-coverage maximumSeedCoverageDepth
110                     Ignores any seed with a coverage depth above this thresh‐
111       old.
112                     The default is 4294967295.
113
114              -use-minimum-seed-coverage minimumSeedCoverageDepth
115                     Sets the minimum seed coverage depth.
116                     Any path with a coverage depth lower than  this  will  be
117       discarded. The default is 0.
118
119         Distributed storage engine (all these values are for each MPI rank)
120
121              -bloom-filter-bits bits
122                     Sets the number of bits for the Bloom filter
123                     Default is 268435456 bits, 0 bits disables the Bloom fil‐
124       ter.
125
126              -hash-table-buckets buckets
127                     Sets the initial number of buckets. Must be a power of  2
128       !
129                     Default value: 268435456
130
131              -hash-table-buckets-per-group buckets
132                     Sets the number of buckets per group for sparse storage
133                     Default value: 64, Must be between >=1 and <= 64
134
135              -hash-table-load-factor-threshold threshold
136                     Sets the load factor threshold for real-time resizing
137                     Default value: 0.75, must be >= 0.5 and < 1
138
139              -hash-table-verbosity
140                     Activates verbosity for the distributed storage engine
141
142         Biological abundances
143
144              -search searchDirectory
145                     Provides   a  directory  containing  fasta  files  to  be
146       searched in the de Bruijn graph.
147                     Biological abundances will be written  to  RayOutput/Bio‐
148       logicalAbundances
149                     See Documentation/BiologicalAbundances.txt
150
151              -one-color-per-file
152                     Sets one color per file instead of one per sequence.
153                     By  default,  each  sequence in each file has a different
154       color.
155                     For files with large numbers of sequences, using one sin‐
156       gle color per file may be more efficient.
157
158         Taxonomic profiling with colored de Bruijn graphs
159
160              -with-taxonomy  Genome-to-Taxon.tsv  TreeOfLife-Edges.tsv Taxon-
161       Names.tsv
162                     Provides a taxonomy.
163                     Computes and writes detailed taxonomic profiles.
164                     See Documentation/Taxonomy.txt for details.
165
166              -gene-ontology OntologyTerms.txt  Annotations.txt
167                     Provides an ontology and annotations.
168                     OntologyTerms.txt is fetched from http://geneontology.org
169                     Annotations.txt  is  a  2-column  file   (EMBL_CDS   han‐
170       dle     &    gene ontology identifier)
171                     See Documentation/GeneOntology.txt
172         Other outputs
173
174              -enable-neighbourhoods
175                     Computes contig neighborhoods in the de Bruijn graph
176                     Output file: RayOutput/NeighbourhoodRelations.txt
177
178              -amos
179                     Writes the AMOS file called RayOutput/AMOS.afg
180                     An AMOS file contains read positions on contigs.
181                     Can  be  opened  with software with graphical user inter‐
182       face.
183
184              -write-kmers
185                     Writes k-mer graph to RayOutput/kmers.txt
186                     The resulting file is not utilised by Ray.
187                     The resulting file is very large.
188
189              -write-read-markers
190                     Writes read markers to disk.
191
192              -write-seeds
193                     Writes seed DNA  sequences  to  RayOutput/Rank<rank>.Ray‐
194       Seeds.fasta
195
196              -write-extensions
197                     Writes     extension    DNA    sequences    to    RayOut‐
198       put/Rank<rank>.RayExtensions.fasta
199
200              -write-contig-paths
201                     Writes contig paths with coverage values
202                     to RayOutput/Rank<rank>.RayContigPaths.txt
203
204              -write-marker-summary
205                     Writes marker statistics.
206
207         Memory usage
208
209              -show-memory-usage
210                     Shows  memory  usage.  Data  is  fetched  from  /proc  on
211       GNU/Linux
212                     Needs __linux__
213
214              -show-memory-allocations
215                     Shows memory allocation events
216
217         Algorithm verbosity
218
219              -show-extension-choice
220                     Shows  the  choice  made  (with other choices) during the
221       extension.
222
223              -show-ending-context
224                     Shows the ending context of each extension.
225                     Shows the children of the vertex where extension was  too
226       difficult.
227
228              -show-distance-summary
229                     Shows  summary  of  outer distances used for an extension
230       path.
231
232              -show-consensus
233                     Shows the consensus when a choice is done.
234
235         Checkpointing
236
237              -write-checkpoints checkpointDirectory
238                     Write checkpoint files
239
240              -read-checkpoints checkpointDirectory
241                     Read checkpoint files
242
243              -read-write-checkpoints checkpointDirectory
244                     Read and write checkpoint files
245
246         Message routing for large number of cores
247
248              -route-messages
249                     Enables the Ray message router. Disabled by default.
250                     Messages will be routed accordingly so that any rank  can
251       communicate directly with only a few others.
252                     Without   -route-messages,   any   rank  can  communicate
253       directly with any other rank.
254                     Files    generated:    Routing/Connections.txt,     Rout‐
255       ing/Routes.txt and Routing/RelayEvents.txt
256                     and Routing/Summary.txt
257
258              -connection-type type
259                     Sets the connection type for routes.
260                     Accepted values are debruijn, hypercube, polytope, group,
261       random, kautz and complete. Default is debruijn.
262                      debruijn: a full de Bruijn graph a  given  alphabet  and
263       diameter
264                      hypercube:  a  hypercube, alphabet is {0,1} and the ver‐
265       tices is a power of 2
266                      polytope:  a  convex  regular  polytope,   alphabet   is
267       {0,1,...,B-1} and the vertices is a power of B
268                      group:  silly  model  where one representative per group
269       can communicate with outsiders
270                      random: Erdős-Rényi model
271                      kautz: a full de Kautz graph, which is a subgraph  of  a
272       de Bruijn graph
273                      complete: a full graph with all the possible connections
274                     With  the  type  debruijn,  the number of ranks must be a
275       power of something.
276                     Examples: 256 = 16*16, 512=8*8*8, 49=7*7, and so on.
277                     Otherwise, don't use debruijn routing but use another one
278                     With the type kautz,  the  number  of  ranks  n  must  be
279       n=(k+1)*k^(d-1) for some k and d
280
281              -routing-graph-degree degree
282                     Specifies the outgoing degree for the routing graph.
283                     See Documentation/Routing.txt
284
285         Hardware testing
286
287              -test-network-only
288                     Tests the network and returns.
289
290              -write-network-test-raw-data
291                     Writes one additional file per rank detailing the network
292       test.
293
294              -exchanges NumberOfExchanges
295                     Sets the number of exchanges
296
297              -disable-network-test
298                     Skips the network test.
299
300         Debugging
301
302              -verify-message-integrity
303                     Checks message data reliability for  any  non-empty  mes‐
304       sage.
305                     add  '-D  CONFIG_SSE_4_2' in the Makefile to use hardware
306       instruction (SSE 4.2)
307
308              -run-profiler
309                     Runs the profiler as the code runs. By default, only show
310       granularity warnings.
311                     Running the profiler increases running times.
312
313              -with-profiler-details
314                     Shows  number of messages sent and received in each meth‐
315       ods during in each time slices (epochs). Needs -run-profiler.
316
317              -show-communication-events
318                     Shows all messages sent and received.
319
320              -show-read-placement
321                     Shows read placement in the graph during the extension.
322
323              -debug-bubbles
324                     Debugs bubble code.
325                     Bubbles can be due to heterozygous  sites  or  sequencing
326       errors or other (unknown) events
327
328              -debug-seeds
329                     Debugs seed code.
330                     Seeds are paths in the graph that are likely unique.
331
332              -debug-fusions
333                     Debugs fusion code.
334
335              -debug-scaffolder
336                     Debug the scaffolder.
337

FILES

339         Input files
340
341            Note: file format is determined with file extension.
342
343            .fasta
344            .fasta.gz (needs HAVE_LIBZ=y at compilation)
345            .fasta.bz2 (needs HAVE_LIBBZ2=y at compilation)
346            .fastq
347            .fastq.gz (needs HAVE_LIBZ=y at compilation)
348            .fastq.bz2 (needs HAVE_LIBBZ2=y at compilation)
349            .sff (paired reads must be extracted manually)
350            .csfasta (color-space reads)
351
352         Outputted files
353
354         Scaffolds
355
356            RayOutput/Scaffolds.fasta
357                 The scaffold sequences in FASTA format
358            RayOutput/ScaffoldComponents.txt
359                 The components of each scaffold
360            RayOutput/ScaffoldLengths.txt
361                 The length of each scaffold
362            RayOutput/ScaffoldLinks.txt
363                 Scaffold links
364
365         Contigs
366
367            RayOutput/Contigs.fasta
368                 Contiguous sequences in FASTA format
369            RayOutput/ContigLengths.txt
370                 The lengths of contiguous sequences
371
372         Summary
373
374            RayOutput/OutputNumbers.txt
375                 Overall numbers for the assembly
376
377         de Bruijn graph
378
379            RayOutput/CoverageDistribution.txt
380                 The distribution of coverage values
381            RayOutput/CoverageDistributionAnalysis.txt
382                 Analysis of the coverage distribution
383            RayOutput/degreeDistribution.txt
384                 Distribution of ingoing and outgoing degrees
385            RayOutput/kmers.txt
386                 k-mer graph, required option: -write-kmers
387                The resulting file is not utilised by Ray.
388                The resulting file is very large.
389
390         Assembly steps
391
392            RayOutput/SeedLengthDistribution.txt
393                Distribution of seed length
394            RayOutput/Rank<rank>.OptimalReadMarkers.txt
395                Read markers.
396            RayOutput/Rank<rank>.RaySeeds.fasta
397                Seed DNA sequences, required option: -write-seeds
398            RayOutput/Rank<rank>.RayExtensions.fasta
399                Extension DNA sequences, required option: -write-extensions
400            RayOutput/Rank<rank>.RayContigPaths.txt
401                Contig  paths  with  coverage values, required option: -write-
402       contig-paths
403
404         Paired reads
405
406            RayOutput/LibraryStatistics.txt
407                 Estimation of outer distances for paired reads
408            RayOutput/Library<LibraryNumber>.txt
409                Frequencies for observed outer distances (insert size  +  read
410       lengths)
411
412         Partition
413
414            RayOutput/NumberOfSequences.txt
415                Number of reads in each file
416            RayOutput/SequencePartition.txt
417                 Sequence partition
418
419         Ray software
420
421            RayOutput/RayVersion.txt
422                 The version of Ray
423            RayOutput/RayCommand.txt
424                 The exact same command provided
425
426         AMOS
427
428            RayOutput/AMOS.afg
429                 Assembly  representation  in  AMOS  format,  required option:
430       -amos
431
432         Communication
433
434            RayOutput/MessagePassingInterface.txt           Number of messages
435       sent
436            RayOutput/NetworkTest.txt           Latencies in microseconds
437            RayOutput/Rank<rank>NetworkTestData.txt           Network test raw
438       data
439

DOCUMENTATION

441              - mpiexec -n 1 Ray -help|less (always up-to-date)
442              - This help page (always up-to-date)
443              - The directory Documentation/
444              - Manual (Portable Document Format):  InstructionManual.tex  (in
445       Documentation)
446              -        Mailing       list       archives:       http://source
447       forge.net/mailarchive/forum.php?forum_name=denovoassembler-users
448

AUTHOR

450              Written by Sébastien Boisvert.
451

REPORTING BUGS

453              Report bugs to denovoassembler-users@lists.sourceforge.net
454              Home page: <http://denovoassembler.sourceforge.net/>
455
457              This program is free software: you can  redistribute  it  and/or
458       modify
459              it  under  the  terms  of the GNU General Public License as pub‐
460       lished by
461              the Free Software Foundation, version 3 of the License.
462
463              This program is distributed in the hope that it will be useful,
464              but WITHOUT ANY WARRANTY; without even the implied warranty of
465              MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
466              GNU General Public License for more details.
467
468              You have received a copy of the GNU General Public License
469              along with this program (see LICENSE).
470
471
472
473
474Ray 2.1.0                        November 2012                          RAY(1)
Impressum