1Bio::AnalysisI(3) User Contributed Perl Documentation Bio::AnalysisI(3)
2
3
4
6 Bio::AnalysisI - An interface to any (local or remote) analysis tool
7
9 This is an interface module - you do not instantiate it. Use
10 "Bio::Tools::Run::Analysis" module:
11
12 use Bio::Tools::Run::Analysis;
13 my $tool = Bio::Tools::Run::Analysis->new(@args);
14
16 This interface contains all public methods for accessing and
17 controlling local and remote analysis tools. It is meant to be used on
18 the client side.
19
21 Mailing Lists
22 User feedback is an integral part of the evolution of this and other
23 Bioperl modules. Send your comments and suggestions preferably to the
24 Bioperl mailing list. Your participation is much appreciated.
25
26 bioperl-l@bioperl.org - General discussion
27 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
28
29 Support
30 Please direct usage questions or support issues to the mailing list:
31
32 bioperl-l@bioperl.org
33
34 rather than to the module maintainer directly. Many experienced and
35 reponsive experts will be able look at the problem and quickly address
36 it. Please include a thorough description of the problem with code and
37 data examples if at all possible.
38
39 Reporting Bugs
40 Report bugs to the Bioperl bug tracking system to help us keep track of
41 the bugs and their resolution. Bug reports can be submitted via the
42 web:
43
44 http://bugzilla.open-bio.org/
45
47 Martin Senger (martin.senger@gmail.com)
48
50 Copyright (c) 2003, Martin Senger and EMBL-EBI. All Rights Reserved.
51
52 This module is free software; you can redistribute it and/or modify it
53 under the same terms as Perl itself.
54
56 This software is provided "as is" without warranty of any kind.
57
59 http://www.ebi.ac.uk/Tools/webservices/soaplab/guide
60
62 This is actually the main documentation...
63
64 If you try to call any of these methods directly on this
65 "Bio::AnalysisI" object you will get a not implemented error message.
66 You need to call them on a "Bio::Tools::Run::Analysis" object instead.
67
68 analysis_name
69 Usage : $tool->analysis_name;
70 Returns : a name of this analysis
71 Args : none
72
73 analysis_spec
74 Usage : $tool->analysis_spec;
75 Returns : a hash reference describing this analysis
76 Args : none
77
78 The returned hash reference uses the following keys (not all of them
79 always present, perhaps others present as well): "name", "type",
80 "version", "supplier", "installation", "description".
81
82 Here is an example output:
83
84 Analysis 'edit.seqret':
85 installation => EMBL-EBI
86 description => Reads and writes (returns) sequences
87 supplier => EMBOSS
88 version => 2.6.0
89 type => edit
90 name => seqret
91
92 describe
93 Usage : $tool->analysis_spec;
94 Returns : an XML detailed description of this analysis
95 Args : none
96
97 The returned XML string contains metadata describing this analysis
98 service. It includes also metadata returned (and easier used) by method
99 "analysis_spec", "input_spec" and "result_spec".
100
101 The DTD used for returned metadata is based on the adopted standard
102 (BSA specification for analysis engine):
103
104 <!ELEMENT DsLSRAnalysis (analysis)+>
105
106 <!ELEMENT analysis (description?, input*, output*, extension?)>
107
108 <!ATTLIST analysis
109 type CDATA #REQUIRED
110 name CDATA #IMPLIED
111 version CDATA #IMPLIED
112 supplier CDATA #IMPLIED
113 installation CDATA #IMPLIED>
114
115 <!ELEMENT description ANY>
116 <!ELEMENT extension ANY>
117
118 <!ELEMENT input (default?, allowed*, extension?)>
119
120 <!ATTLIST input
121 type CDATA #REQUIRED
122 name CDATA #REQUIRED
123 mandatory (true|false) "false">
124
125 <!ELEMENT default (#PCDATA)>
126 <!ELEMENT allowed (#PCDATA)>
127
128 <!ELEMENT output (extension?)>
129
130 <!ATTLIST output
131 type CDATA #REQUIRED
132 name CDATA #REQUIRED>
133
134 But the DTD may be extended by provider-specific metadata. For example,
135 the EBI experimental SOAP-based service on top of EMBOSS uses DTD
136 explained at "http://www.ebi.ac.uk/~senger/applab".
137
138 input_spec
139 Usage : $tool->input_spec;
140 Returns : an array reference with hashes as elements
141 Args : none
142
143 The analysis input data are named, and can be also associated with a
144 default value, with allowed values and with few other attributes. The
145 names are important for feeding the service with the input data (the
146 inputs are given to methods "create_job", "Bio::AnalysisI|run", and/or
147 "Bio::AnalysisI|wait_for" as name/value pairs).
148
149 Here is a (slightly shortened) example of an input specification:
150
151 $input_spec = [
152 {
153 'mandatory' => 'false',
154 'type' => 'String',
155 'name' => 'sequence_usa'
156 },
157 {
158 'mandatory' => 'false',
159 'type' => 'String',
160 'name' => 'sequence_direct_data'
161 },
162 {
163 'mandatory' => 'false',
164 'allowed_values' => [
165 'gcg',
166 'gcg8',
167 ...
168 'raw'
169 ],
170 'type' => 'String',
171 'name' => 'sformat'
172 },
173 {
174 'mandatory' => 'false',
175 'type' => 'String',
176 'name' => 'sbegin'
177 },
178 {
179 'mandatory' => 'false',
180 'type' => 'String',
181 'name' => 'send'
182 },
183 {
184 'mandatory' => 'false',
185 'type' => 'String',
186 'name' => 'sprotein'
187 },
188 {
189 'mandatory' => 'false',
190 'type' => 'String',
191 'name' => 'snucleotide'
192 },
193 {
194 'mandatory' => 'false',
195 'type' => 'String',
196 'name' => 'sreverse'
197 },
198 {
199 'mandatory' => 'false',
200 'type' => 'String',
201 'name' => 'slower'
202 },
203 {
204 'mandatory' => 'false',
205 'type' => 'String',
206 'name' => 'supper'
207 },
208 {
209 'mandatory' => 'false',
210 'default' => 'false',
211 'type' => 'String',
212 'name' => 'firstonly'
213 },
214 {
215 'mandatory' => 'false',
216 'default' => 'fasta',
217 'allowed_values' => [
218 'gcg',
219 'gcg8',
220 'embl',
221 ...
222 'raw'
223 ],
224 'type' => 'String',
225 'name' => 'osformat'
226 }
227 ];
228
229 result_spec
230 Usage : $tool->result_spec;
231 Returns : a hash reference with result names as keys
232 and result types as values
233 Args : none
234
235 The analysis results are named and can be retrieved using their names
236 by methods "results" and "result".
237
238 Here is an example of the result specification (again for the service
239 edit.seqret):
240
241 $result_spec = {
242 'outseq' => 'String',
243 'report' => 'String',
244 'detailed_status' => 'String'
245 };
246
247 create_job
248 Usage : $tool->create_job ( {'sequence'=>'tatat'} )
249 Returns : Bio::Tools::Run::Analysis::Job
250 Args : data and parameters for this execution
251 (in various formats)
252
253 Create an object representing a single execution of this analysis tool.
254
255 Call this method if you wish to "stage the scene" - to create a job
256 with all input data but without actually running it. This method is
257 called automatically from other methods ("Bio::AnalysisI|run" and
258 "Bio::AnalysisI|wait_for") so usually you do not need to call it
259 directly.
260
261 The input data and prameters for this execution can be specified in
262 various ways:
263
264 array reference
265 The array has scalar elements of the form
266
267 name = [[@]value]
268
269 where "name" is the name of an input data or input parameter (see
270 method "input_spec" for finding what names are recognized by this
271 analysis) and "value" is a value for this data/parameter. If
272 "value" is missing a 1 is assumed (which is convenient for the
273 boolean options). If "value" starts with "@" it is treated as a
274 local filename, and its contents is used as the data/parameter
275 value.
276
277 hash reference
278 The same as with the array reference but now there is no need to
279 use an equal sign. The hash keys are input names and hash values
280 their data. The values can again start with a "@" sign indicating a
281 local filename.
282
283 scalar
284 In this case, the parameter represents a job ID obtained in some
285 previous invocation - such job already exists on the server side,
286 and we are just re-creating it here using the same job ID.
287
288 TBD: here we should allow the same by using a reference to the
289 Bio::Tools::Run::Analysis::Job object.
290
291 undef
292 Finally, if the parameter is undefined, ask server to create an
293 empty job. The input data may be added later using "set_data..."
294 method(s) - see scripts/papplmaker.PLS for details.
295
296 run
297 Usage : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
298 Returns : Bio::Tools::Run::Analysis::Job,
299 representing started job (an execution)
300 Args : the same as for create_job
301
302 Create a job and start it, but do not wait for its completion.
303
304 wait_for
305 Usage : $tool->wait_for ( { 'sequence' => '@my,file' } )
306 Returns : Bio::Tools::Run::Analysis::Job,
307 representing finished job
308 Args : the same as for create_job
309
310 Create a job, start it and wait for its completion.
311
312 Note that this is a blocking method. It returns only after the executed
313 job finishes, either normally or by an error.
314
315 Usually, after this call, you ask for results of the finished job:
316
317 $analysis->wait_for (...)->results;
318
320 An interface to the public methods provided by
321 "Bio::Tools::Run::Analysis::Job" objects.
322
323 The "Bio::Tools::Run::Analysis::Job" objects represent a created,
324 running, or finished execution of an analysis tool.
325
326 The factory for these objects is module "Bio::Tools::Run::Analysis"
327 where the following methods return an "Bio::Tools::Run::Analysis::Job"
328 object:
329
330 create_job (returning a prepared job)
331 run (returning a running job)
332 wait_for (returning a finished job)
333
334 id
335 Usage : $job->id;
336 Returns : this job ID
337 Args : none
338
339 Each job (an execution) is identifiable by this unique ID which can be
340 used later to re-create the same job (in other words: to re-connect to
341 the same job). It is useful in cases when a job takes long time to
342 finish and your client program does not want to wait for it within the
343 same session.
344
345 Bio::AnalysisI::JobI::run
346 Usage : $job->run
347 Returns : itself
348 Args : none
349
350 It starts previously created job. The job already must have all input
351 data filled-in. This differs from the method of the same name of the
352 "Bio::Tools::Run::Analysis" object where the
353 "Bio::AnalysisI::JobI::run" method creates also a new job allowing to
354 set input data.
355
356 Bio::AnalysisI::JobI::wait_for
357 Usage : $job->wait_for
358 Returns : itself
359 Args : none
360
361 It waits until a previously started execution of this job finishes.
362
363 terminate
364 Usage : $job->terminate
365 Returns : itself
366 Args : none
367
368 Stop the currently running job (represented by this object). This is a
369 definitive stop, there is no way to resume it later.
370
371 last_event
372 Usage : $job->last_event
373 Returns : an XML string
374 Args : none
375
376 It returns a short XML document showing what happened last with this
377 job. This is the used DTD:
378
379 <!-- place for extensions -->
380 <!ENTITY % event_body_template "(state_changed | heartbeat_progress | percent_progress | time_progress | step_progress)">
381
382 <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
383
384 <!ATTLIST analysis_event
385 timestamp CDATA #IMPLIED>
386
387 <!ELEMENT message (#PCDATA)>
388
389 <!ELEMENT state_changed EMPTY>
390 <!ENTITY % analysis_state "created | running | completed | terminated_by_request | terminated_by_error">
391 <!ATTLIST state_changed
392 previous_state (%analysis_state;) "created"
393 new_state (%analysis_state;) "created">
394
395 <!ELEMENT heartbeat_progress EMPTY>
396
397 <!ELEMENT percent_progress EMPTY>
398 <!ATTLIST percent_progress
399 percentage CDATA #REQUIRED>
400
401 <!ELEMENT time_progress EMPTY>
402 <!ATTLIST time_progress
403 remaining CDATA #REQUIRED>
404
405 <!ELEMENT step_progress EMPTY>
406 <!ATTLIST step_progress
407 total_steps CDATA #IMPLIED
408 steps_completed CDATA #REQUIRED>
409
410 Here is an example what is returned after a job was created and
411 started, but before it finishes (note that the example uses an analysis
412 'showdb' which does not need any input data):
413
414 use Bio::Tools::Run::Analysis;
415 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
416 ->run
417 ->last_event;
418
419 It prints:
420
421 <?xml version = "1.0"?>
422 <analysis_event>
423 <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
424 <state_changed previous_state="created" new_state="running"/>
425 </analysis_event>
426
427 The same example but now after it finishes:
428
429 use Bio::Tools::Run::Analysis;
430 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
431 ->wait_for
432 ->last_event;
433
434 <?xml version = "1.0"?>
435 <analysis_event>
436 <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
437 <state_changed previous_state="running" new_state="completed"/>
438 </analysis_event>
439
440 status
441 Usage : $job->status
442 Returns : string describing the job status
443 Args : none
444
445 It returns one of the following strings (and perhaps more if a server
446 implementation extended possible job states):
447
448 CREATED
449 RUNNING
450 COMPLETED
451 TERMINATED_BY_REQUEST
452 TERMINATED_BY_ERROR
453
454 created
455 Usage : $job->created (1)
456 Returns : time when this job was created
457 Args : optional
458
459 Without any argument it returns a time of creation of this job in
460 seconds, counting from the beginning of the UNIX epoch (1.1.1970). With
461 a true argument it returns a formatted time, using rules described in
462 "Bio::Tools::Run::Analysis::Utils::format_time".
463
464 started
465 Usage : $job->started (1)
466 Returns : time when this job was started
467 Args : optional
468
469 See "created".
470
471 ended
472 Usage : $job->ended (1)
473 Returns : time when this job was terminated
474 Args : optional
475
476 See "created".
477
478 elapsed
479 Usage : $job->elapsed
480 Returns : elapsed time of the execution of the given job
481 (in milliseconds), or 0 of job was not yet started
482 Args : none
483
484 Note that some server implementations cannot count in millisecond - so
485 the returned time may be rounded to seconds.
486
487 times
488 Usage : $job->times ('formatted')
489 Returns : a hash refrence with all time characteristics
490 Args : optional
491
492 It is a convenient method returning a hash reference with the folowing
493 keys:
494
495 created
496 started
497 ended
498 elapsed
499
500 See "create" for remarks on time formating.
501
502 An example - both for unformatted and formatted times:
503
504 use Data::Dumper;
505 use Bio::Tools::Run::Analysis;
506 my $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
507 ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
508 ->times (1);
509 print Data::Dumper->Dump ( [$rh], ['Times']);
510 $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
511 ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
512 ->times;
513 print Data::Dumper->Dump ( [$rh], ['Times']);
514
515 $Times = {
516 'ended' => 'Mon Mar 3 17:52:06 2003',
517 'started' => 'Mon Mar 3 17:52:05 2003',
518 'elapsed' => '1000',
519 'created' => 'Mon Mar 3 17:52:05 2003'
520 };
521 $Times = {
522 'ended' => '1046713961',
523 'started' => '1046713926',
524 'elapsed' => '35000',
525 'created' => '1046713926'
526 };
527
528 results
529 Usage : $job->results (...)
530 Returns : one or more results created by this job
531 Args : various, see belou
532
533 This is a complex method trying to make sense for all kinds of results.
534 Especially it tries to help to put binary results (such as images) into
535 local files. Generally it deals with fhe following facts:
536
537 · Each analysis tool may produce more results.
538
539 · Some results may contain binary data not suitable for printing into
540 a terminal window.
541
542 · Some results may be split into variable number of parts (this is
543 mainly true for the image results that can consist of more *.png
544 files).
545
546 Note also that results have names to distinguish if there are more of
547 them. The names can be obtained by method "result_spec".
548
549 Here are the rules how the method works:
550
551 Retrieving NAMED results:
552 -------------------------
553 results ('name1', ...) => return results as they are, no storing into files
554
555 results ( { 'name1' => 'filename', ... } ) => store into 'filename', return 'filename'
556 results ( 'name1=filename', ...) => ditto
557
558 results ( { 'name1' => '-', ... } ) => send result to the STDOUT, do not return anything
559 results ( 'name1=-', ...) => ditto
560
561 results ( { 'name1' => '@', ... } ) => store into file whose name is invented by
562 this method, perhaps using RESULT_NAME_TEMPLATE env
563 results ( 'name1=@', ...) => ditto
564
565 results ( { 'name1' => '?', ... } ) => find of what type is this result and then use
566 {'name1'=>'@' for binary files, and a regular
567 return for non-binary files
568 results ( 'name=?', ...) => ditto
569
570 Retrieving ALL results:
571 -----------------------
572 results() => return all results as they are, no storing into files
573
574 results ('@') => return all results, as if each of them given
575 as {'name' => '@'} (see above)
576
577 results ('?') => return all results, as if each of them given
578 as {'name' => '?'} (see above)
579
580 Misc:
581 -----
582 * any result can be returned as a scalar value, or as an array reference
583 (the latter is used for results consisting of more parts, such images);
584 this applies regardless whether the returned result is the result itself
585 or a filename created for the result
586
587 * look in the documentation of the C<panalysis[.PLS]> script for examples
588 (especially how to use various templates for inventing file names)
589
590 result
591 Usage : $job->result (...)
592 Returns : the first result
593 Args : see 'results'
594
595 remove
596 Usage : $job->remove
597 Returns : 1
598 Args : none
599
600 The job object is not actually removed in this time but it is marked
601 (setting 1 to "_destroy_on_exit" attribute) as ready for deletion when
602 the client program ends (including a request to server to forget the
603 job mirror object on the server side).
604
605
606
607perl v5.12.0 2010-04-29 Bio::AnalysisI(3)