1Boulder::Blast(3) User Contributed Perl Documentation Boulder::Blast(3)
2
3
4
6 Boulder::Blast - Parse and read BLAST files
7
9 use Boulder::Blast;
10
11 # parse from a single file
12 $blast = Boulder::Blast->parse('run3.blast');
13
14 # parse and read a set of blast output files
15 $stream = Boulder::Blast->new('run3.blast','run4.blast');
16 while ($blast = $stream->get) {
17 # do something with $blast object
18 }
19
20 # parse and read a whole directory of blast runs
21 $stream = Boulder::Blast->new(<*.blast>);
22 while ($blast = $stream->get) {
23 # do something with $blast object
24 }
25
26 # parse and read from STDIN
27 $stream = Boulder::Blast->new;
28 while ($blast = $stream->get) {
29 # do something with $blast object
30 }
31
32 # parse and read as a filehandle
33 $stream = Boulder::Blast->newFh(<*.blast>);
34 while ($blast = <$stream>) {
35 # do something with $blast object
36 }
37
38 # once you have a $blast object, you can get info about it:
39 $query = $blast->Blast_query;
40 @hits = $blast->Blast_hits;
41 foreach $hit (@hits) {
42 $hit_sequence = $hit->Name; # get the ID
43 $significance = $hit->Signif; # get the significance
44 @hsps = $hit->Hsps; # list of HSPs
45 foreach $hsp (@hsps) {
46 $query = $hsp->Query; # query sequence
47 $subject = $hsp->Subject; # subject sequence
48 $signif = $hsp->Signif; # significance of HSP
49 }
50 }
51
53 The Boulder::Blast class parses the output of the Washington University
54 (WU) or National Cenber for Biotechnology Information (NCBI) series of
55 BLAST programs and turns them into Stone records. You may then use the
56 standard Stone access methods to retrieve information about the BLAST
57 run, or add the information to a Boulder stream.
58
59 The parser works equally well on the contents of a static file, or on
60 information read dynamically from a filehandle or pipe.
61
63 parse() Method
64
65 $stone = Boulder::Blast->parse($file_path);
66 $stone = Boulder::Blast->parse($filehandle);
67
68 The parse() method accepts a path to a file or a filehandle, parses its
69 contents, and returns a Boulder Stone object. The file path may be
70 absolute or relative to the current directgly. The filehandle may be
71 specified as an IO::File object, a FileHandle object, or a reference to
72 a glob ("\*FILEHANDLE" notation). If you call parse() without any
73 arguments, it will try to parse the contents of standard input.
74
75 new() Method
76
77 $stream = Boulder::Blast->new;
78 $stream = Boulder::Blast->new($file [,@more_files]);
79 $stream = Boulder::Blast->new(\*FILEHANDLE);
80
81 If you wish, you may create the parser first with Boulder::Blast new(),
82 and then invoke the parser object's parse() method as many times as you
83 wish to, producing a Stone object each time.
84
86 The following tags are defined in the parsed Blast Stone object:
87
88 Information about the program
89
90 These top-level tags provide information about the version of the BLAST
91 program itself.
92
93 Blast_program
94 The name of the algorithm used to run the analysis. Possible val‐
95 ues include:
96
97 blastn
98 blastp
99 blastx
100 tblastn
101 tblastx
102 fasta3
103 fastx3
104 fasty3
105 tfasta3
106 tfastx3
107 tfasty3
108
109 Blast_version
110 This gives the version of the program in whatever form appears on
111 the banner page, e.g. "2.0a19-WashU".
112
113 Blast_program_date
114 This gives the date at which the program was compiled, if and only
115 if it appears on the banner page.
116
117 Information about the run
118
119 These top-level tags give information about the particular run, such as
120 the parameters that were used for the algorithm.
121
122 Blast_run_date
123 This gives the date and time at which the similarity analysis was
124 run, in the format "Fri Jul 6 09:32:36 1998"
125
126 Blast_parms
127 This points to a subrecord containing information about the algo‐
128 rithm's runtime parameters. The following subtags are used. Oth‐
129 ers may be added in the future:
130
131 Hspmax the value of the -hspmax argument
132 Expectation the value of E
133 Matrix the matrix in use, e.g. BLOSUM62
134 Ctxfactor the value of the -ctxfactor argument
135 Gapall The value of the -gapall argument
136
137 Information about the query sequence and subject database
138
139 Thse top-level tags give information about the query sequence and the
140 database that was searched on.
141
142 Blast_query
143 The identifier for the search sequence, as defined by the FASTA
144 format. This will be the first set of non-whitespace characters
145 following the ">" character. In other words, the search sequence
146 "name".
147
148 Blast_query_length
149 The length of the query sequence, in base pairs.
150
151 Blast_db
152 The Unix filesystem path to the subject database.
153
154 Blast_db_title
155 The title of the subject database.
156
157 The search results: the Blast_hits tag.
158
159 Each BLAST hit is represented by the tag Blast_hits. There may be
160 zero, one, or many such tags. They will be presented in reverse sorted
161 order of significance, i.e. most significant hit first.
162
163 Each Blast_hits tag is a Stone subrecord containing the following sub‐
164 tags:
165
166 Name
167 The name/identifier of the sequence that was hit.
168
169 Length
170 The total length of the sequence that was hit
171
172 Signif
173 The significance of the hit. If there are multiple HSPs in the
174 hit, this will be the most significant (smallest) value.
175
176 Identity
177 The percent identity of the hit. If there are multiple HSPs, this
178 will be the one with the highest percent identity.
179
180 Expect
181 The expectation value for the hit. If there are multiple HSPs,
182 this will be the lowest expectation value in the set.
183
184 Hsps
185 One or more sub-sub-tags, pointing to a nested record containing
186 information about each high-scoring segment pair (HSP). See the
187 next section for details.
188
189 The Hsp records: the Hsps tag
190
191 Each Blast_hit tag will have at least one, and possibly several Hsps
192 tags, each one corresponding to a high-scoring segment pair (HSP).
193 These records contain detailed information about the hit, including the
194 alignments. Tags are as follows:
195
196 Signif
197 The significance (P value) of this HSP.
198
199 Bits
200 The number of bits of significance.
201
202 Expect
203 Expectation value for this HSP.
204
205 Identity
206 Percent identity.
207
208 Positives
209 Percent positive matches.
210
211 Score
212 The Smith-Waterman alignment score.
213
214 Orientation
215 The word "plus" or "minus". This tag is only present for nucleo‐
216 tide searches, when the reverse complement match may be present.
217
218 Strand
219 Depending on algorithm used, indicates complementarity of match and
220 possibly the reading frame. This is copied out of the blast
221 report. Possibilities include:
222
223 "Plus / Minus" "Plus / Plus" -- blastn algorithm
224 "+1 / -2" "+2 / -2" -- blastx, tblastx
225
226 Query_start
227 Position at which the HSP starts in the query sequence (1-based
228 indexing).
229
230 Query_end
231 Position at which the HSP stops in the query sequence.
232
233 Subject_start
234 Position at which the HSP starts in the subject (target) sequence.
235
236 Subject_end
237 Position at which the HSP stops in the subject (target) sequence.
238
239 Query, Subject, Alignment
240 These three tags contain strings which, together, create the gapped
241 alignment of the query sequence with the subject sequence.
242
243 For example, to print the alignment of the first HSP of the first
244 match, you might say:
245
246 $hsp = $blast->Blast_hits->Hsps;
247 print join("\n",$hsp->Query,$hsp->Alignment,$hsp->Subject),"\n";
248
249 See the bottom of this manual page for an example BLAST run.
250
252 This module has been extensively tested with WUBLAST, but very little
253 with NCBI BLAST. It probably will not work with PSI Blast or other
254 variants.
255
256 The author plans to adapt this module to parse other formats, as well
257 as non-BLAST formats such as the output of Fastn.
258
260 Boulder, Boulder::GenBank
261
263 Lincoln Stein <lstein@cshl.org>.
264
265 Copyright (c) 1998-1999 Cold Spring Harbor Laboratory
266
267 This library is free software; you can redistribute it and/or modify it
268 under the same terms as Perl itself. See DISCLAIMER.txt for dis‐
269 claimers of warranty.
270
272 This output was generated by the quickblast.pl program, which is
273 located in the eg/ subdirectory of the Boulder distribution directory.
274 It is a typical blastn (nucleotide->nucleotide) run; however long lines
275 (usually DNA sequences) have been truncated. Also note that per the
276 Boulder protocol, the percent sign (%) is escaped in the usual way. It
277 will be unescaped when reading the stream back in.
278
279 Blast_run_date=Fri Nov 6 14:40:41 1998
280 Blast_db_date=2:40 PM EST Nov 6, 1998
281 Blast_parms={
282 Hspmax=10
283 Expectation=10
284 Matrix=+5,-4
285 Ctxfactor=2.00
286 }
287 Blast_program_date=05-Feb-1998
288 Blast_db= /usr/tmp/quickblast18202aaaa
289 Blast_version=2.0a19-WashU
290 Blast_query=BCD207R
291 Blast_db_title= test.fasta
292 Blast_query_length=332
293 Blast_program=blastn
294 Blast_hits={
295 Signif=3.5e-74
296 Expect=3.5e-74,
297 Name=BCD207R
298 Identity=100%25
299 Length=332
300 Hsps={
301 Subject=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
302 Signif=3.5e-74
303 Length=332
304 Bits=249.1
305 Query_start=1
306 Subject_end=332
307 Query=GTGCTTTCAAACATTGATGGATTCCTCCCCTTGACATATATATATACTTTGGGTTCCCGCAA...
308 Positives=100%25
309 Expect=3.5e-74,
310 Identity=100%25
311 Query_end=332
312 Orientation=plus
313 Score=1660
314 Strand=Plus / Plus
315 Subject_start=1
316 Alignment=⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪...
317 }
318 }
319 =
320
322 Here is the output from a typical blastp (protein->protein) run. Long
323 lines have again been truncated.
324
325 Blast_run_date=Fri Nov 6 14:37:23 1998
326 Blast_db_date=2:36 PM EST Nov 6, 1998
327 Blast_parms={
328 Hspmax=10
329 Expectation=10
330 Matrix=BLOSUM62
331 Ctxfactor=1.00
332 }
333 Blast_program_date=05-Feb-1998
334 Blast_db= /usr/tmp/quickblast18141aaaa
335 Blast_version=2.0a19-WashU
336 Blast_query=YAL004W
337 Blast_db_title= elegans.fasta
338 Blast_query_length=216
339 Blast_program=blastp
340 Blast_hits={
341 Signif=0.95
342 Expect=3.0,
343 Name=C28H8.2
344 Identity=30%25
345 Length=51
346 Hsps={
347 Subject=HMTVEFHVTSQSW---FGFEDHFHMIIR-AVNDENVGWGVRYLSMAF
348 Signif=0.95
349 Length=46
350 Bits=15.8
351 Query_start=100
352 Subject_end=49
353 Query=HLTQD-HGGDLFWGKVLGFTLKFNLNLRLTVNIDQLEWEVLHVSLHF
354 Positives=52%25
355 Expect=3.0,
356 Identity=30%25
357 Query_end=145
358 Orientation=plus
359 Score=45
360 Subject_start=7
361 Alignment=H+T + H W GF F++ +R VN + + W V ++S+ F
362 }
363 }
364 Blast_hits={
365 Signif=0.99
366 Expect=4.7,
367 Name=ZK896.2
368 Identity=24%25
369 Length=340
370 Hsps={
371 Subject=FSGKFTTFVLNKDQATLRMSSAEKTAEWNTAFDSRRGFF----TSGNYGL...
372 Signif=0.99
373 Length=101
374 Bits=22.9
375 Query_start=110
376 Subject_end=243
377 Query=FWGKVLGFTL-KFNLNLRLTVNIDQLEWEVLHVSLHFWVVEVSTDQTLSVE...
378 Positives=41%25
379 Expect=4.7,
380 Identity=24%25
381 Query_end=210
382 Orientation=plus
383 Score=65
384 Subject_start=146
385 Alignment=F GK F L K LR++ EW S + T +...
386 }
387 }
388 =
389
390
391
392perl v5.8.8 2000-06-08 Boulder::Blast(3)