1Boulder::Blast(3)     User Contributed Perl Documentation    Boulder::Blast(3)
2
3
4

NAME

6       Boulder::Blast - Parse and read BLAST files
7

SYNOPSIS

9         use Boulder::Blast;
10
11         # parse from a single file
12         $blast = Boulder::Blast->parse('run3.blast');
13
14         # parse and read a set of blast output files
15         $stream = Boulder::Blast->new('run3.blast','run4.blast');
16         while ($blast = $stream->get) {
17            # do something with $blast object
18         }
19
20         # parse and read a whole directory of blast runs
21         $stream = Boulder::Blast->new(<*.blast>);
22         while ($blast = $stream->get) {
23            # do something with $blast object
24         }
25
26         # parse and read from STDIN
27         $stream = Boulder::Blast->new;
28         while ($blast = $stream->get) {
29            # do something with $blast object
30         }
31
32         # parse and read as a filehandle
33         $stream = Boulder::Blast->newFh(<*.blast>);
34         while ($blast = <$stream>) {
35            # do something with $blast object
36         }
37
38         # once you have a $blast object, you can get info about it:
39         $query = $blast->Blast_query;
40         @hits  = $blast->Blast_hits;
41         foreach $hit (@hits) {
42            $hit_sequence = $hit->Name;    # get the ID
43            $significance = $hit->Signif;  # get the significance
44            @hsps = $hit->Hsps;            # list of HSPs
45            foreach $hsp (@hsps) {
46              $query   = $hsp->Query;      # query sequence
47              $subject = $hsp->Subject;    # subject sequence
48              $signif  = $hsp->Signif;     # significance of HSP
49            }
50         }
51

DESCRIPTION

53       The Boulder::Blast class parses the output of the Washington University
54       (WU) or National Cenber for Biotechnology Information (NCBI) series of
55       BLAST programs and turns them into Stone records.  You may then use the
56       standard Stone access methods to retrieve information about the BLAST
57       run, or add the information to a Boulder stream.
58
59       The parser works equally well on the contents of a static file, or on
60       information read dynamically from a filehandle or pipe.
61

METHODS

63   parse() Method
64           $stone = Boulder::Blast->parse($file_path);
65           $stone = Boulder::Blast->parse($filehandle);
66
67       The parse() method accepts a path to a file or a filehandle, parses its
68       contents, and returns a Boulder Stone object.  The file path may be
69       absolute or relative to the current directgly.  The filehandle may be
70       specified as an IO::File object, a FileHandle object, or a reference to
71       a glob ("\*FILEHANDLE" notation).  If you call parse() without any
72       arguments, it will try to parse the contents of standard input.
73
74   new() Method
75           $stream = Boulder::Blast->new;
76           $stream = Boulder::Blast->new($file [,@more_files]);
77           $stream = Boulder::Blast->new(\*FILEHANDLE);
78
79       If you wish, you may create the parser first with Boulder::Blast new(),
80       and then invoke the parser object's parse() method as many times as you
81       wish to, producing a Stone object each time.
82

TAGS

84       The following tags are defined in the parsed Blast Stone object:
85
86   Information about the program
87       These top-level tags provide information about the version of the BLAST
88       program itself.
89
90       Blast_program
91           The name of the algorithm used to run the analysis.  Possible
92           values include:
93
94                   blastn
95                   blastp
96                   blastx
97                   tblastn
98                   tblastx
99                   fasta3
100                   fastx3
101                   fasty3
102                   tfasta3
103                   tfastx3
104                   tfasty3
105
106       Blast_version
107           This gives the version of the program in whatever form appears on
108           the banner page, e.g. "2.0a19-WashU".
109
110       Blast_program_date
111           This gives the date at which the program was compiled, if and only
112           if it appears on the banner page.
113
114   Information about the run
115       These top-level tags give information about the particular run, such as
116       the parameters that were used for the algorithm.
117
118       Blast_run_date
119           This gives the date and time at which the similarity analysis was
120           run, in the format "Fri Jul  6 09:32:36 1998"
121
122       Blast_parms
123           This points to a subrecord containing information about the
124           algorithm's runtime parameters.  The following subtags are used.
125           Others may be added in the future:
126
127                   Hspmax          the value of the -hspmax argument
128                   Expectation     the value of E
129                   Matrix          the matrix in use, e.g. BLOSUM62
130                   Ctxfactor       the value of the -ctxfactor argument
131                   Gapall          The value of the -gapall argument
132
133   Information about the query sequence and subject database
134       Thse top-level tags give information about the query sequence and the
135       database that was searched on.
136
137       Blast_query
138           The identifier for the search sequence, as defined by the FASTA
139           format.  This will be the first set of non-whitespace characters
140           following the ">" character.  In other words, the search sequence
141           "name".
142
143       Blast_query_length
144           The length of the query sequence, in base pairs.
145
146       Blast_db
147           The Unix filesystem path to the subject database.
148
149       Blast_db_title
150           The title of the subject database.
151
152   The search results: the Blast_hits tag.
153       Each BLAST hit is represented by the tag Blast_hits.  There may be
154       zero, one, or many such tags.  They will be presented in reverse sorted
155       order of significance, i.e. most significant hit first.
156
157       Each Blast_hits tag is a Stone subrecord containing the following
158       subtags:
159
160       Name
161           The name/identifier of the sequence that was hit.
162
163       Length
164           The total length of the sequence that was hit
165
166       Signif
167           The significance of the hit.  If there are multiple HSPs in the
168           hit, this will be the most significant (smallest) value.
169
170       Identity
171           The percent identity of the hit.  If there are multiple HSPs, this
172           will be the one with the highest percent identity.
173
174       Expect
175           The expectation value for the hit.  If there are multiple HSPs,
176           this will be the lowest expectation value in the set.
177
178       Hsps
179           One or more sub-sub-tags, pointing to a nested record containing
180           information about each high-scoring segment pair (HSP).  See the
181           next section for details.
182
183   The Hsp records: the Hsps tag
184       Each Blast_hit tag will have at least one, and possibly several Hsps
185       tags, each one corresponding to a high-scoring segment pair (HSP).
186       These records contain detailed information about the hit, including the
187       alignments.  Tags are as follows:
188
189       Signif
190           The significance (P value) of this HSP.
191
192       Bits
193           The number of bits of significance.
194
195       Expect
196           Expectation value for this HSP.
197
198       Identity
199           Percent identity.
200
201       Positives
202           Percent positive matches.
203
204       Score
205           The Smith-Waterman alignment score.
206
207       Orientation
208           The word "plus" or "minus".  This tag is only present for
209           nucleotide searches, when the reverse complement match may be
210           present.
211
212       Strand
213           Depending on algorithm used, indicates complementarity of match and
214           possibly the reading frame.  This is copied out of the blast
215           report.  Possibilities include:
216
217            "Plus / Minus" "Plus / Plus" -- blastn algorithm
218            "+1 / -2" "+2 / -2"         -- blastx, tblastx
219
220       Query_start
221           Position at which the HSP starts in the query sequence (1-based
222           indexing).
223
224       Query_end
225           Position at which the HSP stops in the query sequence.
226
227       Subject_start
228           Position at which the HSP starts in the subject (target) sequence.
229
230       Subject_end
231           Position at which the HSP stops in the subject (target) sequence.
232
233       Query, Subject, Alignment
234           These three tags contain strings which, together, create the gapped
235           alignment of the query sequence with the subject sequence.
236
237           For example, to print the alignment of the first HSP of the first
238           match, you might say:
239
240             $hsp = $blast->Blast_hits->Hsps;
241             print join("\n",$hsp->Query,$hsp->Alignment,$hsp->Subject),"\n";
242
243       See the bottom of this manual page for an example BLAST run.
244

CAVEATS

246       This module has been extensively tested with WUBLAST, but very little
247       with NCBI BLAST.  It probably will not work with PSI Blast or other
248       variants.
249
250       The author plans to adapt this module to parse other formats, as well
251       as non-BLAST formats such as the output of Fastn.
252

SEE ALSO

254       Boulder, Boulder::GenBank
255

AUTHOR

257       Lincoln Stein <lstein@cshl.org>.
258
259       Copyright (c) 1998-1999 Cold Spring Harbor Laboratory
260
261       This library is free software; you can redistribute it and/or modify it
262       under the same terms as Perl itself.  See DISCLAIMER.txt for
263       disclaimers of warranty.
264

EXAMPLE BLASTN RUN

266       This output was generated by the quickblast.pl program, which is
267       located in the eg/ subdirectory of the Boulder distribution directory.
268       It is a typical blastn (nucleotide->nucleotide) run; however long lines
269       (usually DNA sequences) have been truncated.  Also note that per the
270       Boulder protocol, the percent sign (%) is escaped in the usual way.  It
271       will be unescaped when reading the stream back in.
272
273        Blast_run_date=Fri Nov  6 14:40:41 1998
274        Blast_db_date=2:40 PM EST Nov 6, 1998
275        Blast_parms={
276          Hspmax=10
277          Expectation=10
278          Matrix=+5,-4
279          Ctxfactor=2.00
280        }
281        Blast_program_date=05-Feb-1998
282        Blast_db= /usr/tmp/quickblast18202aaaa
283        Blast_version=2.0a19-WashU
284        Blast_query=BCD207R
285        Blast_db_title= test.fasta
286        Blast_query_length=332
287        Blast_program=blastn
288        Blast_hits={
289          Signif=3.5e-74
290          Expect=3.5e-74,
291          Name=BCD207R
292          Identity=100%25
293          Length=332
294          Hsps={
295            Subject=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
296            Signif=3.5e-74
297            Length=332
298            Bits=249.1
299            Query_start=1
300            Subject_end=332
301            Query=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
302            Positives=100%25
303            Expect=3.5e-74,
304            Identity=100%25
305            Query_end=332
306            Orientation=plus
307            Score=1660
308            Strand=Plus / Plus
309            Subject_start=1
310            Alignment=||||||||||||||||||||||||||||||||||||||||||||||||||||||||||...
311          }
312        }
313        =
314

Example BLASTP run

316       Here is the output from a typical blastp (protein->protein) run.  Long
317       lines have again been truncated.
318
319        Blast_run_date=Fri Nov  6 14:37:23 1998
320        Blast_db_date=2:36 PM EST Nov 6, 1998
321        Blast_parms={
322          Hspmax=10
323          Expectation=10
324          Matrix=BLOSUM62
325          Ctxfactor=1.00
326        }
327        Blast_program_date=05-Feb-1998
328        Blast_db= /usr/tmp/quickblast18141aaaa
329        Blast_version=2.0a19-WashU
330        Blast_query=YAL004W
331        Blast_db_title= elegans.fasta
332        Blast_query_length=216
333        Blast_program=blastp
334        Blast_hits={
335          Signif=0.95
336          Expect=3.0,
337          Name=C28H8.2
338          Identity=30%25
339          Length=51
340          Hsps={
341            Subject=HMTVEFHVTSQSW---FGFEDHFHMIIR-AVNDENVGWGVRYLSMAF
342            Signif=0.95
343            Length=46
344            Bits=15.8
345            Query_start=100
346            Subject_end=49
347            Query=HLTQD-HGGDLFWGKVLGFTLKFNLNLRLTVNIDQLEWEVLHVSLHF
348            Positives=52%25
349            Expect=3.0,
350            Identity=30%25
351            Query_end=145
352            Orientation=plus
353            Score=45
354            Subject_start=7
355            Alignment=H+T + H     W    GF   F++ +R  VN + + W V ++S+ F
356          }
357        }
358        Blast_hits={
359          Signif=0.99
360          Expect=4.7,
361          Name=ZK896.2
362          Identity=24%25
363          Length=340
364          Hsps={
365            Subject=FSGKFTTFVLNKDQATLRMSSAEKTAEWNTAFDSRRGFF----TSGNYGL...
366            Signif=0.99
367            Length=101
368            Bits=22.9
369            Query_start=110
370            Subject_end=243
371            Query=FWGKVLGFTL-KFNLNLRLTVNIDQLEWEVLHVSLHFWVVEVSTDQTLSVE...
372            Positives=41%25
373            Expect=4.7,
374            Identity=24%25
375            Query_end=210
376            Orientation=plus
377            Score=65
378            Subject_start=146
379            Alignment=F GK   F L K    LR++      EW     S   +     T     +...
380          }
381        }
382        =
383
384
385
386perl v5.12.0                      2002-02-04                 Boulder::Blast(3)
Impressum