1Boulder::Blast(3)     User Contributed Perl Documentation    Boulder::Blast(3)
2
3
4

NAME

6       Boulder::Blast - Parse and read BLAST files
7

SYNOPSIS

9         use Boulder::Blast;
10
11         # parse from a single file
12         $blast = Boulder::Blast->parse('run3.blast');
13
14         # parse and read a set of blast output files
15         $stream = Boulder::Blast->new('run3.blast','run4.blast');
16         while ($blast = $stream->get) {
17            # do something with $blast object
18         }
19
20         # parse and read a whole directory of blast runs
21         $stream = Boulder::Blast->new(<*.blast>);
22         while ($blast = $stream->get) {
23            # do something with $blast object
24         }
25
26         # parse and read from STDIN
27         $stream = Boulder::Blast->new;
28         while ($blast = $stream->get) {
29            # do something with $blast object
30         }
31
32         # parse and read as a filehandle
33         $stream = Boulder::Blast->newFh(<*.blast>);
34         while ($blast = <$stream>) {
35            # do something with $blast object
36         }
37
38         # once you have a $blast object, you can get info about it:
39         $query = $blast->Blast_query;
40         @hits  = $blast->Blast_hits;
41         foreach $hit (@hits) {
42            $hit_sequence = $hit->Name;    # get the ID
43            $significance = $hit->Signif;  # get the significance
44            @hsps = $hit->Hsps;            # list of HSPs
45            foreach $hsp (@hsps) {
46              $query   = $hsp->Query;      # query sequence
47              $subject = $hsp->Subject;    # subject sequence
48              $signif  = $hsp->Signif;     # significance of HSP
49            }
50         }
51

DESCRIPTION

53       The Boulder::Blast class parses the output of the Washington University
54       (WU) or National Cenber for Biotechnology Information (NCBI) series of
55       BLAST programs and turns them into Stone records.  You may then use the
56       standard Stone access methods to retrieve information about the BLAST
57       run, or add the information to a Boulder stream.
58
59       The parser works equally well on the contents of a static file, or on
60       information read dynamically from a filehandle or pipe.
61

METHODS

63       parse() Method
64
65           $stone = Boulder::Blast->parse($file_path);
66           $stone = Boulder::Blast->parse($filehandle);
67
68       The parse() method accepts a path to a file or a filehandle, parses its
69       contents, and returns a Boulder Stone object.  The file path may be
70       absolute or relative to the current directgly.  The filehandle may be
71       specified as an IO::File object, a FileHandle object, or a reference to
72       a glob ("\*FILEHANDLE" notation).  If you call parse() without any
73       arguments, it will try to parse the contents of standard input.
74
75       new() Method
76
77           $stream = Boulder::Blast->new;
78           $stream = Boulder::Blast->new($file [,@more_files]);
79           $stream = Boulder::Blast->new(\*FILEHANDLE);
80
81       If you wish, you may create the parser first with Boulder::Blast new(),
82       and then invoke the parser object's parse() method as many times as you
83       wish to, producing a Stone object each time.
84

TAGS

86       The following tags are defined in the parsed Blast Stone object:
87
88       Information about the program
89
90       These top-level tags provide information about the version of the BLAST
91       program itself.
92
93       Blast_program
94           The name of the algorithm used to run the analysis.  Possible val‐
95           ues include:
96
97                   blastn
98                   blastp
99                   blastx
100                   tblastn
101                   tblastx
102                   fasta3
103                   fastx3
104                   fasty3
105                   tfasta3
106                   tfastx3
107                   tfasty3
108
109       Blast_version
110           This gives the version of the program in whatever form appears on
111           the banner page, e.g. "2.0a19-WashU".
112
113       Blast_program_date
114           This gives the date at which the program was compiled, if and only
115           if it appears on the banner page.
116
117       Information about the run
118
119       These top-level tags give information about the particular run, such as
120       the parameters that were used for the algorithm.
121
122       Blast_run_date
123           This gives the date and time at which the similarity analysis was
124           run, in the format "Fri Jul  6 09:32:36 1998"
125
126       Blast_parms
127           This points to a subrecord containing information about the algo‐
128           rithm's runtime parameters.  The following subtags are used.  Oth‐
129           ers may be added in the future:
130
131                   Hspmax          the value of the -hspmax argument
132                   Expectation     the value of E
133                   Matrix          the matrix in use, e.g. BLOSUM62
134                   Ctxfactor       the value of the -ctxfactor argument
135                   Gapall          The value of the -gapall argument
136
137       Information about the query sequence and subject database
138
139       Thse top-level tags give information about the query sequence and the
140       database that was searched on.
141
142       Blast_query
143           The identifier for the search sequence, as defined by the FASTA
144           format.  This will be the first set of non-whitespace characters
145           following the ">" character.  In other words, the search sequence
146           "name".
147
148       Blast_query_length
149           The length of the query sequence, in base pairs.
150
151       Blast_db
152           The Unix filesystem path to the subject database.
153
154       Blast_db_title
155           The title of the subject database.
156
157       The search results: the Blast_hits tag.
158
159       Each BLAST hit is represented by the tag Blast_hits.  There may be
160       zero, one, or many such tags.  They will be presented in reverse sorted
161       order of significance, i.e. most significant hit first.
162
163       Each Blast_hits tag is a Stone subrecord containing the following sub‐
164       tags:
165
166       Name
167           The name/identifier of the sequence that was hit.
168
169       Length
170           The total length of the sequence that was hit
171
172       Signif
173           The significance of the hit.  If there are multiple HSPs in the
174           hit, this will be the most significant (smallest) value.
175
176       Identity
177           The percent identity of the hit.  If there are multiple HSPs, this
178           will be the one with the highest percent identity.
179
180       Expect
181           The expectation value for the hit.  If there are multiple HSPs,
182           this will be the lowest expectation value in the set.
183
184       Hsps
185           One or more sub-sub-tags, pointing to a nested record containing
186           information about each high-scoring segment pair (HSP).  See the
187           next section for details.
188
189       The Hsp records: the Hsps tag
190
191       Each Blast_hit tag will have at least one, and possibly several Hsps
192       tags, each one corresponding to a high-scoring segment pair (HSP).
193       These records contain detailed information about the hit, including the
194       alignments.  Tags are as follows:
195
196       Signif
197           The significance (P value) of this HSP.
198
199       Bits
200           The number of bits of significance.
201
202       Expect
203           Expectation value for this HSP.
204
205       Identity
206           Percent identity.
207
208       Positives
209           Percent positive matches.
210
211       Score
212           The Smith-Waterman alignment score.
213
214       Orientation
215           The word "plus" or "minus".  This tag is only present for nucleo‐
216           tide searches, when the reverse complement match may be present.
217
218       Strand
219           Depending on algorithm used, indicates complementarity of match and
220           possibly the reading frame.  This is copied out of the blast
221           report.  Possibilities include:
222
223            "Plus / Minus" "Plus / Plus" -- blastn algorithm
224            "+1 / -2" "+2 / -2"         -- blastx, tblastx
225
226       Query_start
227           Position at which the HSP starts in the query sequence (1-based
228           indexing).
229
230       Query_end
231           Position at which the HSP stops in the query sequence.
232
233       Subject_start
234           Position at which the HSP starts in the subject (target) sequence.
235
236       Subject_end
237           Position at which the HSP stops in the subject (target) sequence.
238
239       Query, Subject, Alignment
240           These three tags contain strings which, together, create the gapped
241           alignment of the query sequence with the subject sequence.
242
243           For example, to print the alignment of the first HSP of the first
244           match, you might say:
245
246             $hsp = $blast->Blast_hits->Hsps;
247             print join("\n",$hsp->Query,$hsp->Alignment,$hsp->Subject),"\n";
248
249       See the bottom of this manual page for an example BLAST run.
250

CAVEATS

252       This module has been extensively tested with WUBLAST, but very little
253       with NCBI BLAST.  It probably will not work with PSI Blast or other
254       variants.
255
256       The author plans to adapt this module to parse other formats, as well
257       as non-BLAST formats such as the output of Fastn.
258

SEE ALSO

260       Boulder, Boulder::GenBank
261

AUTHOR

263       Lincoln Stein <lstein@cshl.org>.
264
265       Copyright (c) 1998-1999 Cold Spring Harbor Laboratory
266
267       This library is free software; you can redistribute it and/or modify it
268       under the same terms as Perl itself.  See DISCLAIMER.txt for dis‐
269       claimers of warranty.
270

EXAMPLE BLASTN RUN

272       This output was generated by the quickblast.pl program, which is
273       located in the eg/ subdirectory of the Boulder distribution directory.
274       It is a typical blastn (nucleotide->nucleotide) run; however long lines
275       (usually DNA sequences) have been truncated.  Also note that per the
276       Boulder protocol, the percent sign (%) is escaped in the usual way.  It
277       will be unescaped when reading the stream back in.
278
279        Blast_run_date=Fri Nov  6 14:40:41 1998
280        Blast_db_date=2:40 PM EST Nov 6, 1998
281        Blast_parms={
282          Hspmax=10
283          Expectation=10
284          Matrix=+5,-4
285          Ctxfactor=2.00
286        }
287        Blast_program_date=05-Feb-1998
288        Blast_db= /usr/tmp/quickblast18202aaaa
289        Blast_version=2.0a19-WashU
290        Blast_query=BCD207R
291        Blast_db_title= test.fasta
292        Blast_query_length=332
293        Blast_program=blastn
294        Blast_hits={
295          Signif=3.5e-74
296          Expect=3.5e-74,
297          Name=BCD207R
298          Identity=100%25
299          Length=332
300          Hsps={
301            Subject=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
302            Signif=3.5e-74
303            Length=332
304            Bits=249.1
305            Query_start=1
306            Subject_end=332
307            Query=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
308            Positives=100%25
309            Expect=3.5e-74,
310            Identity=100%25
311            Query_end=332
312            Orientation=plus
313            Score=1660
314            Strand=Plus / Plus
315            Subject_start=1
316            Alignment=⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪...
317          }
318        }
319        =
320

Example BLASTP run

322       Here is the output from a typical blastp (protein->protein) run.  Long
323       lines have again been truncated.
324
325        Blast_run_date=Fri Nov  6 14:37:23 1998
326        Blast_db_date=2:36 PM EST Nov 6, 1998
327        Blast_parms={
328          Hspmax=10
329          Expectation=10
330          Matrix=BLOSUM62
331          Ctxfactor=1.00
332        }
333        Blast_program_date=05-Feb-1998
334        Blast_db= /usr/tmp/quickblast18141aaaa
335        Blast_version=2.0a19-WashU
336        Blast_query=YAL004W
337        Blast_db_title= elegans.fasta
338        Blast_query_length=216
339        Blast_program=blastp
340        Blast_hits={
341          Signif=0.95
342          Expect=3.0,
343          Name=C28H8.2
344          Identity=30%25
345          Length=51
346          Hsps={
347            Subject=HMTVEFHVTSQSW---FGFEDHFHMIIR-AVNDENVGWGVRYLSMAF
348            Signif=0.95
349            Length=46
350            Bits=15.8
351            Query_start=100
352            Subject_end=49
353            Query=HLTQD-HGGDLFWGKVLGFTLKFNLNLRLTVNIDQLEWEVLHVSLHF
354            Positives=52%25
355            Expect=3.0,
356            Identity=30%25
357            Query_end=145
358            Orientation=plus
359            Score=45
360            Subject_start=7
361            Alignment=H+T + H     W    GF   F++ +R  VN + + W V ++S+ F
362          }
363        }
364        Blast_hits={
365          Signif=0.99
366          Expect=4.7,
367          Name=ZK896.2
368          Identity=24%25
369          Length=340
370          Hsps={
371            Subject=FSGKFTTFVLNKDQATLRMSSAEKTAEWNTAFDSRRGFF----TSGNYGL...
372            Signif=0.99
373            Length=101
374            Bits=22.9
375            Query_start=110
376            Subject_end=243
377            Query=FWGKVLGFTL-KFNLNLRLTVNIDQLEWEVLHVSLHFWVVEVSTDQTLSVE...
378            Positives=41%25
379            Expect=4.7,
380            Identity=24%25
381            Query_end=210
382            Orientation=plus
383            Score=65
384            Subject_start=146
385            Alignment=F GK   F L K    LR++      EW     S   +     T     +...
386          }
387        }
388        =
389
390
391
392perl v5.8.8                       2000-06-08                 Boulder::Blast(3)
Impressum