1Bio::AnalysisI(3) User Contributed Perl Documentation Bio::AnalysisI(3)
2
3
4
6 Bio::AnalysisI - An interface to any (local or remote) analysis tool
7
9 This is an interface module - you do not instantiate it. Use
10 "Bio::Tools::Run::Analysis" module:
11
12 use Bio::Tools::Run::Analysis;
13 my $tool = new Bio::Tools::Run::Analysis (@args);
14
16 This interface contains all public methods for accessing and control‐
17 ling local and remote analysis tools. It is meant to be used on the
18 client side.
19
21 Mailing Lists
22
23 User feedback is an integral part of the evolution of this and other
24 Bioperl modules. Send your comments and suggestions preferably to the
25 Bioperl mailing list. Your participation is much appreciated.
26
27 bioperl-l@bioperl.org - General discussion
28 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
29
30 Reporting Bugs
31
32 Report bugs to the Bioperl bug tracking system to help us keep track of
33 the bugs and their resolution. Bug reports can be submitted via the
34 web:
35
36 http://bugzilla.open-bio.org/
37
39 Martin Senger (martin.senger@gmail.com)
40
42 Copyright (c) 2003, Martin Senger and EMBL-EBI. All Rights Reserved.
43
44 This module is free software; you can redistribute it and/or modify it
45 under the same terms as Perl itself.
46
48 This software is provided "as is" without warranty of any kind.
49
51 <http://www.ebi.ac.uk/soaplab/Perl_Client.html>.
52
54 This is actually the main documentation...
55
56 If you try to call any of these methods directly on this "Bio::Analy‐
57 sisI" object you will get a not implemented error message. You need to
58 call them on a "Bio::Tools::Run::Analysis" object instead.
59
60 analysis_name
61
62 Usage : $tool->analysis_name;
63 Returns : a name of this analysis
64 Args : none
65
66 analysis_spec
67
68 Usage : $tool->analysis_spec;
69 Returns : a hash reference describing this analysis
70 Args : none
71
72 The returned hash reference uses the following keys (not all of them
73 always present, perhaps others present as well): "name", "type", "ver‐
74 sion", "supplier", "installation", "description".
75
76 Here is an example output:
77
78 Analysis 'edit.seqret':
79 installation => EMBL-EBI
80 description => Reads and writes (returns) sequences
81 supplier => EMBOSS
82 version => 2.6.0
83 type => edit
84 name => seqret
85
86 describe
87
88 Usage : $tool->analysis_spec;
89 Returns : an XML detailed description of this analysis
90 Args : none
91
92 The returned XML string contains metadata describing this analysis ser‐
93 vice. It includes also metadata returned (and easier used) by method
94 "analysis_spec", "input_spec" and "result_spec".
95
96 The DTD used for returned metadata is based on the adopted standard
97 (BSA specification for analysis engine):
98
99 <!ELEMENT DsLSRAnalysis (analysis)+>
100
101 <!ELEMENT analysis (description?, input*, output*, extension?)>
102
103 <!ATTLIST analysis
104 type CDATA #REQUIRED
105 name CDATA #IMPLIED
106 version CDATA #IMPLIED
107 supplier CDATA #IMPLIED
108 installation CDATA #IMPLIED>
109
110 <!ELEMENT description ANY>
111 <!ELEMENT extension ANY>
112
113 <!ELEMENT input (default?, allowed*, extension?)>
114
115 <!ATTLIST input
116 type CDATA #REQUIRED
117 name CDATA #REQUIRED
118 mandatory (true⎪false) "false">
119
120 <!ELEMENT default (#PCDATA)>
121 <!ELEMENT allowed (#PCDATA)>
122
123 <!ELEMENT output (extension?)>
124
125 <!ATTLIST output
126 type CDATA #REQUIRED
127 name CDATA #REQUIRED>
128
129 But the DTD may be extended by provider-specific metadata. For example,
130 the EBI experimental SOAP-based service on top of EMBOSS uses DTD
131 explained at "http://www.ebi.ac.uk/~senger/applab".
132
133 input_spec
134
135 Usage : $tool->input_spec;
136 Returns : an array reference with hashes as elements
137 Args : none
138
139 The analysis input data are named, and can be also associated with a
140 default value, with allowed values and with few other attributes. The
141 names are important for feeding the service with the input data (the
142 inputs are given to methods "create_job", "run", and/or "wait_for" as
143 name/value pairs).
144
145 Here is a (slightly shortened) example of an input specification:
146
147 $input_spec = [
148 {
149 'mandatory' => 'false',
150 'type' => 'String',
151 'name' => 'sequence_usa'
152 },
153 {
154 'mandatory' => 'false',
155 'type' => 'String',
156 'name' => 'sequence_direct_data'
157 },
158 {
159 'mandatory' => 'false',
160 'allowed_values' => [
161 'gcg',
162 'gcg8',
163 ...
164 'raw'
165 ],
166 'type' => 'String',
167 'name' => 'sformat'
168 },
169 {
170 'mandatory' => 'false',
171 'type' => 'String',
172 'name' => 'sbegin'
173 },
174 {
175 'mandatory' => 'false',
176 'type' => 'String',
177 'name' => 'send'
178 },
179 {
180 'mandatory' => 'false',
181 'type' => 'String',
182 'name' => 'sprotein'
183 },
184 {
185 'mandatory' => 'false',
186 'type' => 'String',
187 'name' => 'snucleotide'
188 },
189 {
190 'mandatory' => 'false',
191 'type' => 'String',
192 'name' => 'sreverse'
193 },
194 {
195 'mandatory' => 'false',
196 'type' => 'String',
197 'name' => 'slower'
198 },
199 {
200 'mandatory' => 'false',
201 'type' => 'String',
202 'name' => 'supper'
203 },
204 {
205 'mandatory' => 'false',
206 'default' => 'false',
207 'type' => 'String',
208 'name' => 'firstonly'
209 },
210 {
211 'mandatory' => 'false',
212 'default' => 'fasta',
213 'allowed_values' => [
214 'gcg',
215 'gcg8',
216 'embl',
217 ...
218 'raw'
219 ],
220 'type' => 'String',
221 'name' => 'osformat'
222 }
223 ];
224
225 result_spec
226
227 Usage : $tool->result_spec;
228 Returns : a hash reference with result names as keys
229 and result types as values
230 Args : none
231
232 The analysis results are named and can be retrieved using their names
233 by methods "results" and "result".
234
235 Here is an example of the result specification (again for the service
236 edit.seqret):
237
238 $result_spec = {
239 'outseq' => 'String',
240 'report' => 'String',
241 'detailed_status' => 'String'
242 };
243
244 create_job
245
246 Usage : $tool->create_job ( {'sequence'=>'tatat'} )
247 Returns : Bio::Tools::Run::Analysis::Job
248 Args : data and parameters for this execution
249 (in various formats)
250
251 Create an object representing a single execution of this analysis tool.
252
253 Call this method if you wish to "stage the scene" - to create a job
254 with all input data but without actually running it. This method is
255 called automatically from other methods ("run" and "wait_for") so usu‐
256 ally you do not need to call it directly.
257
258 The input data and prameters for this execution can be specified in
259 various ways:
260
261 array reference
262 The array has scalar elements of the form
263
264 name = [[@]value]
265
266 where "name" is the name of an input data or input parameter (see
267 method "input_spec" for finding what names are recognized by this
268 analysis) and "value" is a value for this data/parameter. If
269 "value" is missing a 1 is assumed (which is convenient for the
270 boolean options). If "value" starts with "@" it is treated as a
271 local filename, and its contents is used as the data/parameter
272 value.
273
274 hash reference
275 The same as with the array reference but now there is no need to
276 use an equal sign. The hash keys are input names and hash values
277 their data. The values can again start with a "@" sign indicating a
278 local filename.
279
280 scalar
281 In this case, the parameter represents a job ID obtained in some
282 previous invocation - such job already exists on the server side,
283 and we are just re-creating it here using the same job ID.
284
285 TBD: here we should allow the same by using a reference to the
286 Bio::Tools::Run::Analysis::Job object.
287
288 undef
289 Finally, if the parameter is undefined, ask server to create an
290 empty job. The input data may be added later using "set_data..."
291 method(s) - see scripts/papplmaker.PLS for details.
292
293 run
294
295 Usage : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
296 Returns : Bio::Tools::Run::Analysis::Job,
297 representing started job (an execution)
298 Args : the same as for create_job
299
300 Create a job and start it, but do not wait for its completion.
301
302 wait_for
303
304 Usage : $tool->wait_for ( { 'sequence' => '@my,file' } )
305 Returns : Bio::Tools::Run::Analysis::Job,
306 representing finished job
307 Args : the same as for create_job
308
309 Create a job, start it and wait for its completion.
310
311 Note that this is a blocking method. It returns only after the executed
312 job finishes, either normally or by an error.
313
314 Usually, after this call, you ask for results of the finished job:
315
316 $analysis->wait_for (...)->results;
317
319 An interface to the public methods provided by "Bio::Tools::Run::Analy‐
320 sis::Job" objects.
321
322 The "Bio::Tools::Run::Analysis::Job" objects represent a created, run‐
323 ning, or finished execution of an analysis tool.
324
325 The factory for these objects is module "Bio::Tools::Run::Analysis"
326 where the following methods return an "Bio::Tools::Run::Analysis::Job"
327 object:
328
329 create_job (returning a prepared job)
330 run (returning a running job)
331 wait_for (returning a finished job)
332
333 id
334
335 Usage : $job->id;
336 Returns : this job ID
337 Args : none
338
339 Each job (an execution) is identifiable by this unique ID which can be
340 used later to re-create the same job (in other words: to re-connect to
341 the same job). It is useful in cases when a job takes long time to fin‐
342 ish and your client program does not want to wait for it within the
343 same session.
344
345 run
346
347 Usage : $job->run
348 Returns : itself
349 Args : none
350
351 It starts previously created job. The job already must have all input
352 data filled-in. This differs from the method of the same name of the
353 "Bio::Tools::Run::Analysis" object where the "run" method creates also
354 a new job allowing to set input data.
355
356 wait_for
357
358 Usage : $job->wait_for
359 Returns : itself
360 Args : none
361
362 It waits until a previously started execution of this job finishes.
363
364 terminate
365
366 Usage : $job->terminate
367 Returns : itself
368 Args : none
369
370 Stop the currently running job (represented by this object). This is a
371 definitive stop, there is no way to resume it later.
372
373 last_event
374
375 Usage : $job->last_event
376 Returns : an XML string
377 Args : none
378
379 It returns a short XML document showing what happened last with this
380 job. This is the used DTD:
381
382 <!-- place for extensions -->
383 <!ENTITY % event_body_template "(state_changed ⎪ heartbeat_progress ⎪ percent_progress ⎪ time_progress ⎪ step_progress)">
384
385 <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
386
387 <!ATTLIST analysis_event
388 timestamp CDATA #IMPLIED>
389
390 <!ELEMENT message (#PCDATA)>
391
392 <!ELEMENT state_changed EMPTY>
393 <!ENTITY % analysis_state "created ⎪ running ⎪ completed ⎪ terminated_by_request ⎪ terminated_by_error">
394 <!ATTLIST state_changed
395 previous_state (%analysis_state;) "created"
396 new_state (%analysis_state;) "created">
397
398 <!ELEMENT heartbeat_progress EMPTY>
399
400 <!ELEMENT percent_progress EMPTY>
401 <!ATTLIST percent_progress
402 percentage CDATA #REQUIRED>
403
404 <!ELEMENT time_progress EMPTY>
405 <!ATTLIST time_progress
406 remaining CDATA #REQUIRED>
407
408 <!ELEMENT step_progress EMPTY>
409 <!ATTLIST step_progress
410 total_steps CDATA #IMPLIED
411 steps_completed CDATA #REQUIRED>
412
413 Here is an example what is returned after a job was created and
414 started, but before it finishes (note that the example uses an analysis
415 'showdb' which does not need any input data):
416
417 use Bio::Tools::Run::Analysis;
418 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
419 ->run
420 ->last_event;
421
422 It prints:
423
424 <?xml version = "1.0"?>
425 <analysis_event>
426 <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
427 <state_changed previous_state="created" new_state="running"/>
428 </analysis_event>
429
430 The same example but now after it finishes:
431
432 use Bio::Tools::Run::Analysis;
433 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
434 ->wait_for
435 ->last_event;
436
437 <?xml version = "1.0"?>
438 <analysis_event>
439 <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
440 <state_changed previous_state="running" new_state="completed"/>
441 </analysis_event>
442
443 status
444
445 Usage : $job->status
446 Returns : string describing the job status
447 Args : none
448
449 It returns one of the following strings (and perhaps more if a server
450 implementation extended possible job states):
451
452 CREATED
453 RUNNING
454 COMPLETED
455 TERMINATED_BY_REQUEST
456 TERMINATED_BY_ERROR
457
458 created
459
460 Usage : $job->created (1)
461 Returns : time when this job was created
462 Args : optional
463
464 Without any argument it returns a time of creation of this job in sec‐
465 onds, counting from the beginning of the UNIX epoch (1.1.1970). With a
466 true argument it returns a formatted time, using rules described in
467 "Bio::Tools::Run::Analysis::Utils::format_time".
468
469 started
470
471 Usage : $job->started (1)
472 Returns : time when this job was started
473 Args : optional
474
475 See "created".
476
477 ended
478
479 Usage : $job->ended (1)
480 Returns : time when this job was terminated
481 Args : optional
482
483 See "created".
484
485 elapsed
486
487 Usage : $job->elapsed
488 Returns : elapsed time of the execution of the given job
489 (in milliseconds), or 0 of job was not yet started
490 Args : none
491
492 Note that some server implementations cannot count in millisecond - so
493 the returned time may be rounded to seconds.
494
495 times
496
497 Usage : $job->times ('formatted')
498 Returns : a hash refrence with all time characteristics
499 Args : optional
500
501 It is a convenient method returning a hash reference with the folowing
502 keys:
503
504 created
505 started
506 ended
507 elapsed
508
509 See "create" for remarks on time formating.
510
511 An example - both for unformatted and formatted times:
512
513 use Data::Dumper;
514 use Bio::Tools::Run::Analysis;
515 my $rh = new Bio::Tools::Run::Analysis (-name => 'nucleic_cpg_islands.cpgplot')
516 ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
517 ->times (1);
518 print Data::Dumper->Dump ( [$rh], ['Times']);
519 $rh = new Bio::Tools::Run::Analysis (-name => 'nucleic_cpg_islands.cpgplot')
520 ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
521 ->times;
522 print Data::Dumper->Dump ( [$rh], ['Times']);
523
524 $Times = {
525 'ended' => 'Mon Mar 3 17:52:06 2003',
526 'started' => 'Mon Mar 3 17:52:05 2003',
527 'elapsed' => '1000',
528 'created' => 'Mon Mar 3 17:52:05 2003'
529 };
530 $Times = {
531 'ended' => '1046713961',
532 'started' => '1046713926',
533 'elapsed' => '35000',
534 'created' => '1046713926'
535 };
536
537 results
538
539 Usage : $job->results (...)
540 Returns : one or more results created by this job
541 Args : various, see belou
542
543 This is a complex method trying to make sense for all kinds of results.
544 Especially it tries to help to put binary results (such as images) into
545 local files. Generally it deals with fhe following facts:
546
547 · Each analysis tool may produce more results.
548
549 · Some results may contain binary data not suitable for printing into
550 a terminal window.
551
552 · Some results may be split into variable number of parts (this is
553 mainly true for the image results that can consist of more *.png
554 files).
555
556 Note also that results have names to distinguish if there are more of
557 them. The names can be obtained by method "result_spec".
558
559 Here are the rules how the method works:
560
561 Retrieving NAMED results:
562 -------------------------
563 results ('name1', ...) => return results as they are, no storing into files
564
565 results ( { 'name1' => 'filename', ... } ) => store into 'filename', return 'filename'
566 results ( 'name1=filename', ...) => ditto
567
568 results ( { 'name1' => '-', ... } ) => send result to the STDOUT, do not return anything
569 results ( 'name1=-', ...) => ditto
570
571 results ( { 'name1' => '@', ... } ) => store into file whose name is invented by
572 this method, perhaps using RESULT_NAME_TEMPLATE env
573 results ( 'name1=@', ...) => ditto
574
575 results ( { 'name1' => '?', ... } ) => find of what type is this result and then use
576 {'name1'=>'@' for binary files, and a regular
577 return for non-binary files
578 results ( 'name=?', ...) => ditto
579
580 Retrieving ALL results:
581 -----------------------
582 results() => return all results as they are, no storing into files
583
584 results ('@') => return all results, as if each of them given
585 as {'name' => '@'} (see above)
586
587 results ('?') => return all results, as if each of them given
588 as {'name' => '?'} (see above)
589
590 Misc:
591 -----
592 * any result can be returned as a scalar value, or as an array reference
593 (the latter is used for results consisting of more parts, such images);
594 this applies regardless whether the returned result is the result itself
595 or a filename created for the result
596
597 * look in the documentation of the C<panalysis[.PLS]> script for examples
598 (especially how to use various templates for inventing file names)
599
600 result
601
602 Usage : $job->result (...)
603 Returns : the first result
604 Args : see 'results'
605
606 remove
607
608 Usage : $job->remove
609 Returns : 1
610 Args : none
611
612 The job object is not actually removed in this time but it is marked
613 (setting 1 to "_destroy_on_exit" attribute) as ready for deletion when
614 the client program ends (including a request to server to forget the
615 job mirror object on the server side).
616
617
618
619perl v5.8.8 2007-05-07 Bio::AnalysisI(3)