Bio::AnalysisI(3pm)

1Bio::AnalysisI(3)     User Contributed Perl Documentation    Bio::AnalysisI(3)
2
3
4

NAME

6       Bio::AnalysisI - An interface to any (local or remote) analysis tool
7

SYNOPSIS

9       This is an interface module - you do not instantiate it.  Use
10       "Bio::Tools::Run::Analysis" module:
11
12         use Bio::Tools::Run::Analysis;
13         my $tool = new Bio::Tools::Run::Analysis (@args);
14

DESCRIPTION

16       This interface contains all public methods for accessing and control‐
17       ling local and remote analysis tools. It is meant to be used on the
18       client side.
19

FEEDBACK

21       Mailing Lists
22
23       User feedback is an integral part of the evolution of this and other
24       Bioperl modules. Send your comments and suggestions preferably to the
25       Bioperl mailing list.  Your participation is much appreciated.
26
27         bioperl-l@bioperl.org                  - General discussion
28         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
29
30       Reporting Bugs
31
32       Report bugs to the Bioperl bug tracking system to help us keep track of
33       the bugs and their resolution. Bug reports can be submitted via the
34       web:
35
36         http://bugzilla.open-bio.org/
37

AUTHOR

39       Martin Senger (martin.senger@gmail.com)
40

COPYRIGHT

42       Copyright (c) 2003, Martin Senger and EMBL-EBI.  All Rights Reserved.
43
44       This module is free software; you can redistribute it and/or modify it
45       under the same terms as Perl itself.
46

DISCLAIMER

48       This software is provided "as is" without warranty of any kind.
49

APPENDIX

54       This is actually the main documentation...
55
56       If you try to call any of these methods directly on this "Bio::Analy‐
57       sisI" object you will get a not implemented error message. You need to
58       call them on a "Bio::Tools::Run::Analysis" object instead.
59
60       analysis_name
61
62        Usage   : $tool->analysis_name;
63        Returns : a name of this analysis
64        Args    : none
65
66       analysis_spec
67
68        Usage   : $tool->analysis_spec;
69        Returns : a hash reference describing this analysis
70        Args    : none
71
72       The returned hash reference uses the following keys (not all of them
73       always present, perhaps others present as well): "name", "type", "ver‐
74       sion", "supplier", "installation", "description".
75
76       Here is an example output:
77
78         Analysis 'edit.seqret':
79               installation => EMBL-EBI
80               description => Reads and writes (returns) sequences
81               supplier => EMBOSS
82               version => 2.6.0
83               type => edit
84               name => seqret
85
86       describe
87
88        Usage   : $tool->analysis_spec;
89        Returns : an XML detailed description of this analysis
90        Args    : none
91
92       The returned XML string contains metadata describing this analysis ser‐
93       vice. It includes also metadata returned (and easier used) by method
94       "analysis_spec", "input_spec" and "result_spec".
95
96       The DTD used for returned metadata is based on the adopted standard
97       (BSA specification for analysis engine):
98
99         <!ELEMENT DsLSRAnalysis (analysis)+>
100
101         <!ELEMENT analysis (description?, input*, output*, extension?)>
102
103         <!ATTLIST analysis
104             type          CDATA #REQUIRED
105             name          CDATA #IMPLIED
106             version       CDATA #IMPLIED
107             supplier      CDATA #IMPLIED
108             installation  CDATA #IMPLIED>
109
110         <!ELEMENT description ANY>
111         <!ELEMENT extension ANY>
112
113         <!ELEMENT input (default?, allowed*, extension?)>
114
115         <!ATTLIST input
116             type          CDATA #REQUIRED
117             name          CDATA #REQUIRED
118             mandatory     (true⎪false) "false">
119
120         <!ELEMENT default (#PCDATA)>
121         <!ELEMENT allowed (#PCDATA)>
122
123         <!ELEMENT output (extension?)>
124
125         <!ATTLIST output
126             type          CDATA #REQUIRED
127             name          CDATA #REQUIRED>
128
129       But the DTD may be extended by provider-specific metadata. For example,
130       the EBI experimental SOAP-based service on top of EMBOSS uses DTD
131       explained at "http://www.ebi.ac.uk/~senger/applab".
132
133       input_spec
134
135        Usage   : $tool->input_spec;
136        Returns : an array reference with hashes as elements
137        Args    : none
138
139       The analysis input data are named, and can be also associated with a
140       default value, with allowed values and with few other attributes. The
141       names are important for feeding the service with the input data (the
142       inputs are given to methods "create_job", "run", and/or "wait_for" as
143       name/value pairs).
144
145       Here is a (slightly shortened) example of an input specification:
146
147        $input_spec = [
148                 {
149                   'mandatory' => 'false',
150                   'type' => 'String',
151                   'name' => 'sequence_usa'
152                 },
153                 {
154                   'mandatory' => 'false',
155                   'type' => 'String',
156                   'name' => 'sequence_direct_data'
157                 },
158                 {
159                   'mandatory' => 'false',
160                   'allowed_values' => [
161                                         'gcg',
162                                         'gcg8',
163                                         ...
164                                         'raw'
165                                       ],
166                   'type' => 'String',
167                   'name' => 'sformat'
168                 },
169                 {
170                   'mandatory' => 'false',
171                   'type' => 'String',
172                   'name' => 'sbegin'
173                 },
174                 {
175                   'mandatory' => 'false',
176                   'type' => 'String',
177                   'name' => 'send'
178                 },
179                 {
180                   'mandatory' => 'false',
181                   'type' => 'String',
182                   'name' => 'sprotein'
183                 },
184                 {
185                   'mandatory' => 'false',
186                   'type' => 'String',
187                   'name' => 'snucleotide'
188                 },
189                 {
190                   'mandatory' => 'false',
191                   'type' => 'String',
192                   'name' => 'sreverse'
193                 },
194                 {
195                   'mandatory' => 'false',
196                   'type' => 'String',
197                   'name' => 'slower'
198                 },
199                 {
200                   'mandatory' => 'false',
201                   'type' => 'String',
202                   'name' => 'supper'
203                 },
204                 {
205                   'mandatory' => 'false',
206                   'default' => 'false',
207                   'type' => 'String',
208                   'name' => 'firstonly'
209                 },
210                 {
211                   'mandatory' => 'false',
212                   'default' => 'fasta',
213                   'allowed_values' => [
214                                         'gcg',
215                                         'gcg8',
216                                         'embl',
217                                         ...
218                                         'raw'
219                                       ],
220                   'type' => 'String',
221                   'name' => 'osformat'
222                 }
223               ];
224
225       result_spec
226
227        Usage   : $tool->result_spec;
228        Returns : a hash reference with result names as keys
229                  and result types as values
230        Args    : none
231
232       The analysis results are named and can be retrieved using their names
233       by methods "results" and "result".
234
235       Here is an example of the result specification (again for the service
236       edit.seqret):
237
238         $result_spec = {
239                 'outseq' => 'String',
240                 'report' => 'String',
241                 'detailed_status' => 'String'
242               };
243
244       create_job
245
246        Usage   : $tool->create_job ( {'sequence'=>'tatat'} )
247        Returns : Bio::Tools::Run::Analysis::Job
248        Args    : data and parameters for this execution
249                  (in various formats)
250
251       Create an object representing a single execution of this analysis tool.
252
253       Call this method if you wish to "stage the scene" - to create a job
254       with all input data but without actually running it. This method is
255       called automatically from other methods ("run" and "wait_for") so usu‐
256       ally you do not need to call it directly.
257
258       The input data and prameters for this execution can be specified in
259       various ways:
260
261       array reference
262           The array has scalar elements of the form
263
264              name = [[@]value]
265
266           where "name" is the name of an input data or input parameter (see
267           method "input_spec" for finding what names are recognized by this
268           analysis) and "value" is a value for this data/parameter. If
269           "value" is missing a 1 is assumed (which is convenient for the
270           boolean options). If "value" starts with "@" it is treated as a
271           local filename, and its contents is used as the data/parameter
272           value.
273
274       hash reference
275           The same as with the array reference but now there is no need to
276           use an equal sign. The hash keys are input names and hash values
277           their data. The values can again start with a "@" sign indicating a
278           local filename.
279
280       scalar
281           In this case, the parameter represents a job ID obtained in some
282           previous invocation - such job already exists on the server side,
283           and we are just re-creating it here using the same job ID.
284
285           TBD: here we should allow the same by using a reference to the
286           Bio::Tools::Run::Analysis::Job object.
287
288       undef
289           Finally, if the parameter is undefined, ask server to create an
290           empty job. The input data may be added later using "set_data..."
291           method(s) - see scripts/papplmaker.PLS for details.
292
293       run
294
295        Usage   : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
296        Returns : Bio::Tools::Run::Analysis::Job,
297                  representing started job (an execution)
298        Args    : the same as for create_job
299
300       Create a job and start it, but do not wait for its completion.
301
302       wait_for
303
304        Usage   : $tool->wait_for ( { 'sequence' => '@my,file' } )
305        Returns : Bio::Tools::Run::Analysis::Job,
306                  representing finished job
307        Args    : the same as for create_job
308
309       Create a job, start it and wait for its completion.
310
311       Note that this is a blocking method. It returns only after the executed
312       job finishes, either normally or by an error.
313
314       Usually, after this call, you ask for results of the finished job:
315
316           $analysis->wait_for (...)->results;
317

Module Bio::AnalysisI::JobI

319       An interface to the public methods provided by "Bio::Tools::Run::Analy‐
320       sis::Job" objects.
321
322       The "Bio::Tools::Run::Analysis::Job" objects represent a created, run‐
323       ning, or finished execution of an analysis tool.
324
325       The factory for these objects is module "Bio::Tools::Run::Analysis"
326       where the following methods return an "Bio::Tools::Run::Analysis::Job"
327       object:
328
329           create_job   (returning a prepared job)
330           run          (returning a running job)
331           wait_for     (returning a finished job)
332
333       id
334
335        Usage   : $job->id;
336        Returns : this job ID
337        Args    : none
338
339       Each job (an execution) is identifiable by this unique ID which can be
340       used later to re-create the same job (in other words: to re-connect to
341       the same job). It is useful in cases when a job takes long time to fin‐
342       ish and your client program does not want to wait for it within the
343       same session.
344
345       run
346
347        Usage   : $job->run
348        Returns : itself
349        Args    : none
350
351       It starts previously created job.  The job already must have all input
352       data filled-in. This differs from the method of the same name of the
353       "Bio::Tools::Run::Analysis" object where the "run" method creates also
354       a new job allowing to set input data.
355
356       wait_for
357
358        Usage   : $job->wait_for
359        Returns : itself
360        Args    : none
361
362       It waits until a previously started execution of this job finishes.
363
364       terminate
365
366        Usage   : $job->terminate
367        Returns : itself
368        Args    : none
369
370       Stop the currently running job (represented by this object). This is a
371       definitive stop, there is no way to resume it later.
372
373       last_event
374
375        Usage   : $job->last_event
376        Returns : an XML string
377        Args    : none
378
379       It returns a short XML document showing what happened last with this
380       job. This is the used DTD:
381
382          <!-- place for extensions -->
383          <!ENTITY % event_body_template "(state_changed ⎪ heartbeat_progress ⎪ percent_progress ⎪ time_progress ⎪ step_progress)">
384
385          <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
386
387          <!ATTLIST analysis_event
388              timestamp  CDATA #IMPLIED>
389
390          <!ELEMENT message (#PCDATA)>
391
392          <!ELEMENT state_changed EMPTY>
393          <!ENTITY % analysis_state "created ⎪ running ⎪ completed ⎪ terminated_by_request ⎪ terminated_by_error">
394          <!ATTLIST state_changed
395              previous_state  (%analysis_state;) "created"
396              new_state       (%analysis_state;) "created">
397
398          <!ELEMENT heartbeat_progress EMPTY>
399
400          <!ELEMENT percent_progress EMPTY>
401          <!ATTLIST percent_progress
402              percentage CDATA #REQUIRED>
403
404          <!ELEMENT time_progress EMPTY>
405          <!ATTLIST time_progress
406              remaining CDATA #REQUIRED>
407
408          <!ELEMENT step_progress EMPTY>
409          <!ATTLIST step_progress
410              total_steps      CDATA #IMPLIED
411              steps_completed CDATA #REQUIRED>
412
413       Here is an example what is returned after a job was created and
414       started, but before it finishes (note that the example uses an analysis
415       'showdb' which does not need any input data):
416
417          use Bio::Tools::Run::Analysis;
418          print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
419                    ->run
420                    ->last_event;
421
422       It prints:
423
424          <?xml version = "1.0"?>
425          <analysis_event>
426            <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
427            <state_changed previous_state="created" new_state="running"/>
428          </analysis_event>
429
430       The same example but now after it finishes:
431
432          use Bio::Tools::Run::Analysis;
433          print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
434                    ->wait_for
435                    ->last_event;
436
437          <?xml version = "1.0"?>
438          <analysis_event>
439            <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
440            <state_changed previous_state="running" new_state="completed"/>
441          </analysis_event>
442
443       status
444
445        Usage   : $job->status
446        Returns : string describing the job status
447        Args    : none
448
449       It returns one of the following strings (and perhaps more if a server
450       implementation extended possible job states):
451
452          CREATED
453          RUNNING
454          COMPLETED
455          TERMINATED_BY_REQUEST
456          TERMINATED_BY_ERROR
457
458       created
459
460        Usage   : $job->created (1)
461        Returns : time when this job was created
462        Args    : optional
463
464       Without any argument it returns a time of creation of this job in sec‐
465       onds, counting from the beginning of the UNIX epoch (1.1.1970). With a
466       true argument it returns a formatted time, using rules described in
467       "Bio::Tools::Run::Analysis::Utils::format_time".
468
469       started
470
471        Usage   : $job->started (1)
472        Returns : time when this job was started
473        Args    : optional
474
475       See "created".
476
477       ended
478
479        Usage   : $job->ended (1)
480        Returns : time when this job was terminated
481        Args    : optional
482
483       See "created".
484
485       elapsed
486
487        Usage   : $job->elapsed
488        Returns : elapsed time of the execution of the given job
489                  (in milliseconds), or 0 of job was not yet started
490        Args    : none
491
492       Note that some server implementations cannot count in millisecond - so
493       the returned time may be rounded to seconds.
494
495       times
496
497        Usage   : $job->times ('formatted')
498        Returns : a hash refrence with all time characteristics
499        Args    : optional
500
501       It is a convenient method returning a hash reference with the folowing
502       keys:
503
504          created
505          started
506          ended
507          elapsed
508
509       See "create" for remarks on time formating.
510
511       An example - both for unformatted and formatted times:
512
513          use Data::Dumper;
514          use Bio::Tools::Run::Analysis;
515          my $rh = new Bio::Tools::Run::Analysis (-name => 'nucleic_cpg_islands.cpgplot')
516                    ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
517                    ->times (1);
518          print Data::Dumper->Dump ( [$rh], ['Times']);
519          $rh = new Bio::Tools::Run::Analysis (-name => 'nucleic_cpg_islands.cpgplot')
520                    ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
521                    ->times;
522          print Data::Dumper->Dump ( [$rh], ['Times']);
523
524          $Times = {
525                  'ended'   => 'Mon Mar  3 17:52:06 2003',
526                  'started' => 'Mon Mar  3 17:52:05 2003',
527                  'elapsed' => '1000',
528                  'created' => 'Mon Mar  3 17:52:05 2003'
529                };
530          $Times = {
531                  'ended'   => '1046713961',
532                  'started' => '1046713926',
533                  'elapsed' => '35000',
534                  'created' => '1046713926'
535                };
536
537       results
538
539        Usage   : $job->results (...)
540        Returns : one or more results created by this job
541        Args    : various, see belou
542
543       This is a complex method trying to make sense for all kinds of results.
544       Especially it tries to help to put binary results (such as images) into
545       local files. Generally it deals with fhe following facts:
546
547       ·   Each analysis tool may produce more results.
548
549       ·   Some results may contain binary data not suitable for printing into
550           a terminal window.
551
552       ·   Some results may be split into variable number of parts (this is
553           mainly true for the image results that can consist of more *.png
554           files).
555
556       Note also that results have names to distinguish if there are more of
557       them. The names can be obtained by method "result_spec".
558
559       Here are the rules how the method works:
560
561           Retrieving NAMED results:
562           -------------------------
563            results ('name1', ...)   => return results as they are, no storing into files
564
565            results ( { 'name1' => 'filename', ... } )  => store into 'filename', return 'filename'
566            results ( 'name1=filename', ...)            => ditto
567
568            results ( { 'name1' => '-', ... } )         => send result to the STDOUT, do not return anything
569            results ( 'name1=-', ...)                   => ditto
570
571            results ( { 'name1' => '@', ... } )  => store into file whose name is invented by
572                                                    this method, perhaps using RESULT_NAME_TEMPLATE env
573            results ( 'name1=@', ...)            => ditto
574
575            results ( { 'name1' => '?', ... } )  => find of what type is this result and then use
576                                                    {'name1'=>'@' for binary files, and a regular
577                                                    return for non-binary files
578            results ( 'name=?', ...)             => ditto
579
580           Retrieving ALL results:
581           -----------------------
582            results()     => return all results as they are, no storing into files
583
584            results ('@') => return all results, as if each of them given
585                             as {'name' => '@'} (see above)
586
587            results ('?') => return all results, as if each of them given
588                             as {'name' => '?'} (see above)
589
590           Misc:
591           -----
592            * any result can be returned as a scalar value, or as an array reference
593              (the latter is used for results consisting of more parts, such images);
594              this applies regardless whether the returned result is the result itself
595              or a filename created for the result
596
597            * look in the documentation of the C<panalysis[.PLS]> script for examples
598              (especially how to use various templates for inventing file names)
599
600       result
601
602        Usage   : $job->result (...)
603        Returns : the first result
604        Args    : see 'results'
605
606       remove
607
608        Usage   : $job->remove
609        Returns : 1
610        Args    : none
611
612       The job object is not actually removed in this time but it is marked
613       (setting 1 to "_destroy_on_exit" attribute) as ready for deletion when
614       the client program ends (including a request to server to forget the
615       job mirror object on the server side).
616
617
618
619perl v5.8.8                       2007-05-07                 Bio::AnalysisI(3)