1Bio::Restriction::EnzymUesIe(r3)Contributed Perl DocumenBtiaot:i:oRnestriction::EnzymeI(3)
2
3
4
6 Bio::Restriction::EnzymeI - Interface class for restriction
7 endonuclease
8
10 # do not run this class directly
11
13 This module defines methods for a single restriction endonuclease. For
14 an implementation, see Bio::Restriction::Enzyme.
15
17 Mailing Lists
18 User feedback is an integral part of the evolution of this and other
19 Bioperl modules. Send your comments and suggestions preferably to one
20 of the Bioperl mailing lists. Your participation is much appreciated.
21
22 bioperl-l@bioperl.org - General discussion
23 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
24
25 Support
26 Please direct usage questions or support issues to the mailing list:
27
28 bioperl-l@bioperl.org
29
30 rather than to the module maintainer directly. Many experienced and
31 reponsive experts will be able look at the problem and quickly address
32 it. Please include a thorough description of the problem with code and
33 data examples if at all possible.
34
35 Reporting Bugs
36 Report bugs to the Bioperl bug tracking system to help us keep track
37 the bugs and their resolution. Bug reports can be submitted via the
38 web:
39
40 http://bugzilla.open-bio.org/
41
43 Heikki Lehvaslaiho, heikki-at-bioperl-dot-org
44
46 Rob Edwards, redwards@utmem.edu
47
49 Bio::Restriction::Enzyme
50
52 Methods beginning with a leading underscore are considered private and
53 are intended for internal use by this module. They are not considered
54 part of the public interface and are described here for documentation
55 purposes only.
56
58 name
59 Title : name
60 Usage : $re->name($newval)
61 Function : Gets/Sets the restriction enzyme name
62 Example : $re->name('EcoRI')
63 Returns : value of name
64 Args : newvalue (optional)
65
66 This will also clean up the name. I have added this because some people
67 get confused about restriction enzyme names. The name should be One
68 upper case letter, and two lower case letters (because it is derived
69 from the organism name, eg. EcoRI is from E. coli). After that it is
70 all confused, but the numbers should be roman numbers not numbers,
71 therefore we'll correct those. At least this will provide some
72 standard, I hope.
73
74 site
75 Title : site
76 Usage : $re->site();
77 Function : Gets/sets the recognition sequence for the enzyme.
78 Example : $seq_string = $re->site();
79 Returns : String containing recognition sequence indicating
80 : cleavage site as in 'G^AATTC'.
81 Argument : n/a
82 Throws : n/a
83
84 Side effect: the sequence is always converted to upper case.
85
86 The cut site can also be set by using methods cut and
87 complementary_cut.
88
89 This will pad out missing sequence with N's. For example the enzyme
90 Acc36I cuts at ACCTGC(4/8). This will be returned as ACCTGCNNNN^
91
92 Note that the common notation ACCTGC(4/8) means that the forward strand
93 cut is four nucleotides after the END of the recognition site. The
94 forward cut() in the coordinates used here in Acc36I ACCTGC(4/8) is at
95 6+4 i.e. 10.
96
97 ** This is the main setable method for the recognition site.
98
99 revcom_site
100 Title : revcom_site
101 Usage : $re->revcom_site();
102 Function : Gets/sets the complementary recognition sequence for the enzyme.
103 Example : $seq_string = $re->revcom_site();
104 Returns : String containing recognition sequence indicating
105 : cleavage site as in 'G^AATTC'.
106 Argument : Sequence of the site
107 Throws : n/a
108
109 This is the same as site, except it returns the revcom site. For
110 palindromic enzymes these two are identical. For non-palindromic
111 enzymes they are not!
112
113 See also site above.
114
115 cut
116 Title : cut
117 Usage : $num = $re->cut(1);
118 Function : Sets/gets an integer indicating the position of cleavage
119 relative to the 5' end of the recognition sequence in the
120 forward strand.
121
122 For type II enzymes, sets the symmetrically positioned
123 reverse strand cut site by calling complementary_cut().
124
125 Returns : Integer, 0 if not set
126 Argument : an integer for the forward strand cut site (optional)
127
128 Note that the common notation ACCTGC(4/8) means that the forward strand
129 cut is four nucleotides after the END of the recognition site. The
130 forwad cut in the coordinates used here in Acc36I ACCTGC(4/8) is at 6+4
131 i.e. 10.
132
133 Note that REBASE uses notation where cuts within symmetic sites are
134 marked by '^' within the forward sequence but if the site is asymmetric
135 the parenthesis syntax is used where numbering ALWAYS starts from last
136 nucleotide in the forward strand. That's why AciI has a site usually
137 written as CCGC(-3/-1) actualy cuts in
138
139 C^C G C
140 G G C^G
141
142 In our notation, these locations are 1 and 3.
143
144 The cuts locations in the notation used are relative to the first (non-
145 N) nucleotide of the reported forward strand of the recognition
146 sequence. The following diagram numbers the phosphodiester bonds
147 (marked by + ) which can be cut by the restriction enzymes:
148
149 1 2 3 4 5 6 7 8 ...
150 N + N + N + N + N + G + A + C + T + G + G + N + N + N
151 ... -5 -4 -3 -2 -1
152
153 complementary_cut
154 Title : complementary_cut
155 Usage : $num = $re->complementary_cut('1');
156 Function : Sets/Gets an integer indicating the position of cleavage
157 : on the reverse strand of the restriction site.
158 Returns : Integer
159 Argument : An integer (optional)
160 Throws : Exception if argument is non-numeric.
161
162 This method determines the cut on the reverse strand of the sequence.
163 For most enzymes this will be within the sequence, and will be set
164 automatically based on the forward strand cut, but it need not be.
165
166 Note that the returned location indicates the location AFTER the first
167 non-N site nucleotide in the FORWARD strand.
168
170 type
171 Title : type
172 Usage : $re->type();
173 Function : Get/set the restriction system type
174 Returns :
175 Argument : optional type: ('I'|II|III)
176
177 Restriction enzymes have been catezorized into three types. Some REBASE
178 formats give the type, but the following rules can be used to classify
179 the known enzymes:
180
181 1. Bipartite site (with 6-8 Ns in the middle and the cut site is > 50
182 nt away) => type I
183
184 2. Site length < 3 => type I
185
186 3. 5-6 asymmetric site and cuts >20 nt away => type III
187
188 4. All other => type II
189
190 There are some enzymes in REBASE which have bipartite recognition site
191 and cat far from the site but are still classified as type I. I've no
192 idea if this is really so.
193
194 seq
195 Title : seq
196 Usage : $re->seq();
197 Function : Get the Bio::PrimarySeq.pm object representing
198 : the recognition sequence
199 Returns : A Bio::PrimarySeq object representing the
200 enzyme recognition site
201 Argument : n/a
202 Throws : n/a
203
204 string
205 Title : string
206 Usage : $re->string();
207 Function : Get a string representing the recognition sequence.
208 Returns : String. Does NOT contain a '^' representing the cut location
209 as returned by the site() method.
210 Argument : n/a
211 Throws : n/a
212
213 revcom
214 Title : revcom
215 Usage : $re->revcom();
216 Function : Get a string representing the reverse complement of
217 : the recognition sequence.
218 Returns : String
219 Argument : n/a
220 Throws : n/a
221
222 recognition_length
223 Title : recognition_length
224 Usage : $re->recognition_length();
225 Function : Get the length of the RECOGNITION sequence.
226 This is the total recognition sequence,
227 inluding the ambiguous codes.
228 Returns : An integer
229 Argument : Nothing
230
231 See also: non_ambiguous_length
232
233 non_ambiguous_length
234 Title : non_ambiguous_length
235 Usage : $re->non_ambiguous_length();
236 Function : Get the nonambiguous length of the RECOGNITION sequence.
237 This is the total recognition sequence,
238 excluding the ambiguous codes.
239 Returns : An integer
240 Argument : Nothing
241
242 See also: non_ambiguous_length
243
244 cutter
245 Title : cutter
246 Usage : $re->cutter
247 Function : Returns the "cutter" value of the recognition site.
248
249 This is a value relative to site length and lack of
250 ambiguity codes. Hence: 'RCATGY' is a five (5) cutter site
251 and 'CCTNAGG' a six cutter
252
253 This measure correlates to the frequency of the enzyme
254 cuts much better than plain recognition site length.
255
256 Example : $re->cutter
257 Returns : integer or float number
258 Args : none
259
260 Why is this better than just stripping the ambiguous codes? Think about
261 it like this: You have a random sequence; all nucleotides are equally
262 probable. You have a four nucleotide re site. The probability of that
263 site finding a match is one out of 4^4 or 256, meaning that on average
264 a four cutter finds a match every 256 nucleotides. For a six cutter,
265 the average fragment length is 4^6 or 4096. In the case of ambiguity
266 codes the chances are finding the match are better: an R (A|T) has 1/2
267 chance of finding a match in a random sequence. Therefore, for RGCGCY
268 the probability is one out of (2*4*4*4*4*2) which exactly the same as
269 for a five cutter! Cutter, although it can have non-integer values
270 turns out to be a useful and simple measure.
271
272 From bug 2178: VHDB are ambiguity symbols that match three different
273 nucleotides, so they contribute less to the effective recognition
274 sequence length than e.g. Y which matches only two nucleotides. A
275 symbol which matches n of the 4 nucleotides has an effective length of
276 1 - log(n) / log(4).
277
278 is_palindromic
279 Title : is_palindromic
280 Usage : $re->is_palindromic();
281 Function : Determines if the recognition sequence is palindromic
282 : for the current restriction enzyme.
283 Returns : Boolean
284 Argument : n/a
285 Throws : n/a
286
287 A palindromic site (EcoRI):
288
289 5-GAATTC-3
290 3-CTTAAG-5
291
292 overhang
293 Title : overhang
294 Usage : $re->overhang();
295 Function : Determines the overhang of the restriction enzyme
296 Returns : "5'", "3'", "blunt" of undef
297 Argument : n/a
298 Throws : n/a
299
300 A blunt site in SmaI returns "blunt"
301
302 5' C C C^G G G 3'
303 3' G G G^C C C 5'
304
305 A 5' overhang in EcoRI returns "5'"
306
307 5' G^A A T T C 3'
308 3' C T T A A^G 5'
309
310 A 3' overhang in KpnI returns "3'"
311
312 5' G G T A C^C 3'
313 3' C^C A T G G 5'
314
315 overhang_seq
316 Title : overhang_seq
317 Usage : $re->overhang_seq();
318 Function : Determines the overhang sequence of the restriction enzyme
319 Returns : a Bio::LocatableSeq
320 Argument : n/a
321 Throws : n/a
322
323 I do not think it is necessary to create a seq object of these.
324 (Heikki)
325
326 Note: returns empty string for blunt sequences and undef for ones that
327 we don't know. Compare these:
328
329 A blunt site in SmaI returns empty string
330
331 5' C C C^G G G 3'
332 3' G G G^C C C 5'
333
334 A 5' overhang in EcoRI returns "AATT"
335
336 5' G^A A T T C 3'
337 3' C T T A A^G 5'
338
339 A 3' overhang in KpnI returns "GTAC"
340
341 5' G G T A C^C 3'
342 3' C^C A T G G 5'
343
344 Note that you need to use method overhang to decide whether it is a 5'
345 or 3' overhang!!!
346
347 Note: The overhang stuff does not work if the site is asymmetric!
348 Rethink!
349
350 compatible_ends
351 Title : compatible_ends
352 Usage : $re->compatible_ends($re2);
353 Function : Determines if the two restriction enzyme cut sites
354 have compatible ends.
355 Returns : 0 if not, 1 if only one pair ends match, 2 if both ends.
356 Argument : a Bio::Restriction::Enzyme
357 Throws : unless the argument is a Bio::Resriction::Enzyme and
358 if there are Ns in the ovarhangs
359
360 In case of type II enzymes which which cut symmetrically, this function
361 can be considered to return a boolean value.
362
363 is_ambiguous
364 Title : is_ambiguous
365 Usage : $re->is_ambiguous();
366 Function : Determines if the restriction enzyme contains ambiguous sequences
367 Returns : Boolean
368 Argument : n/a
369 Throws : n/a
370
371 Additional methods from Rebase
372 is_prototype
373 Title : is_prototype
374 Usage : $re->is_prototype
375 Function : Get/Set method for finding out if this enzyme is a prototype
376 Example : $re->is_prototype(1)
377 Returns : Boolean
378 Args : none
379
380 Prototype enzymes are the most commonly available and usually first
381 enzymes discoverd that have the same recognition site. Using only
382 prototype enzymes in restriciton analysis avoids redundacy and speeds
383 things up.
384
385 prototype_name
386 Title : prototype_name
387 Usage : $re->prototype_name
388 Function : Get/Set method for the name of prototype for
389 this enzyme's recognition site
390 Example : $re->prototype_name(1)
391 Returns : prototype enzyme name string or an empty string
392 Args : optional prototype enzyme name string
393
394 If the enzyme itself is the protype, its own name is returned. Not to
395 confuse the negative result with an unset value, use method
396 is_prototype.
397
398 This method is called prototype_name rather than prototype, because it
399 returns a string rather than on object.
400
401 isoschizomers
402 Title : isoschizomers
403 Usage : $re->isoschizomers(@list);
404 Function : Gets/Sets a list of known isoschizomers (enzymes that
405 recognize the same site, but don't necessarily cut at
406 the same position).
407 Arguments : A reference to an array that contains the isoschizomers
408 Returns : A reference to an array of the known isoschizomers or 0
409 if not defined.
410
411 Added for compatibility to REBASE
412
413 purge_isoschizomers
414 Title : purge_isoschizomers
415 Usage : $re->purge_isoschizomers();
416 Function : Purges the set of isoschizomers for this enzyme
417 Arguments :
418 Returns : 1
419
420 methylation_sites
421 Title : methylation_sites
422 Usage : $re->methylation_sites(\%sites);
423 Function : Gets/Sets known methylation sites (positions on the sequence
424 that get modified to promote or prevent cleavage).
425 Arguments : A reference to a hash that contains the methylation sites
426 Returns : A reference to a hash of the methylation sites or
427 an empty string if not defined.
428
429 There are three types of methylation sites:
430
431 · (6) = N6-methyladenosine
432
433 · (5) = 5-methylcytosine
434
435 · (4) = N4-methylcytosine
436
437 These are stored as 6, 5, and 4 respectively. The hash has the
438 sequence position as the key and the type of methylation as the value.
439 A negative number in the sequence position indicates that the DNA is
440 methylated on the complementary strand.
441
442 Note that in REBASE, the methylation positions are given Added for
443 compatibility to REBASE.
444
445 purge_methylation_sites
446 Title : purge_methylation_sites
447 Usage : $re->purge_methylation_sites();
448 Function : Purges the set of methylation_sites for this enzyme
449 Arguments :
450 Returns :
451
452 microbe
453 Title : microbe
454 Usage : $re->microbe($microbe);
455 Function : Gets/Sets microorganism where the restriction enzyme was found
456 Arguments : A scalar containing the microbes name
457 Returns : A scalar containing the microbes name or 0 if not defined
458
459 Added for compatibility to REBASE
460
461 source
462 Title : source
463 Usage : $re->source('Rob Edwards');
464 Function : Gets/Sets the person who provided the enzyme
465 Arguments : A scalar containing the persons name
466 Returns : A scalar containing the persons name or 0 if not defined
467
468 Added for compatibility to REBASE
469
470 vendors
471 Title : vendors
472 Usage : $re->vendor(@list_of_companies);
473 Function : Gets/Sets the a list of companies that you can get the enzyme from.
474 Also sets the commercially_available boolean
475 Arguments : A reference to an array containing the names of companies
476 that you can get the enzyme from
477 Returns : A reference to an array containing the names of companies
478 that you can get the enzyme from
479
480 Added for compatibility to REBASE
481
482 purge_vendors
483 Title : purge_vendors
484 Usage : $re->purge_references();
485 Function : Purges the set of references for this enzyme
486 Arguments :
487 Returns :
488
489 vendor
490 Title : vendor
491 Usage : $re->vendor(@list_of_companies);
492 Function : Gets/Sets the a list of companies that you can get the enzyme from.
493 Also sets the commercially_available boolean
494 Arguments : A reference to an array containing the names of companies
495 that you can get the enzyme from
496 Returns : A reference to an array containing the names of companies
497 that you can get the enzyme from
498
499 Added for compatibility to REBASE
500
501 references
502 Title : references
503 Usage : $re->references(string);
504 Function : Gets/Sets the references for this enzyme
505 Arguments : an array of string reference(s) (optional)
506 Returns : an array of references
507
508 Use purge_references to reset the list of references
509
510 This should be a Bio::Biblio or Bio::Annotation::Reference object, but
511 its not (yet)
512
513 purge_references
514 Title : purge_references
515 Usage : $re->purge_references();
516 Function : Purges the set of references for this enzyme
517 Arguments :
518 Returns : 1
519
520 clone
521 Title : clone
522 Usage : $re->clone
523 Function : Deep copy of the object
524 Arguments : -
525 Returns : new Bio::Restriction::EnzymeI object
526
527 This works as long as the object is a clean in-memory object using
528 scalars, arrays and hashes. You have been warned.
529
530 If you have module Storable, it is used, otherwise local code is used.
531 Todo: local code cuts circular references.
532
533
534
535perl v5.12.0 2010-04-29 Bio::Restriction::EnzymeI(3)