1Bio::Restriction::EnzymUes(e3r)Contributed Perl DocumentBaitoi:o:nRestriction::Enzyme(3)
2
3
4
6 Bio::Restriction::Enzyme - A single restriction endonuclease (cuts DNA
7 at specific locations)
8
10 # set up a single restriction enzyme. This contains lots of
11 # information about the enzyme that is generally parsed from a
12 # rebase file and can then be read back
13
14 use Bio::Restriction::Enzyme;
15
16 # define a new enzyme with the cut sequence
17 my $re=new Bio::Restriction::Enzyme
18 (-enzyme=>'EcoRI', -seq=>'G^AATTC');
19
20 # once the sequence has been defined a bunch of stuff is calculated
21 # for you:
22
23 #### PRECALCULATED
24
25 # find where the enzyme cuts after ...
26 my $ca=$re->cut;
27
28 # ... and where it cuts on the opposite strand
29 my $oca = $re->complementary_cut;
30
31 # get the cut sequence string back.
32 # Note that site will return the sequence with a caret
33 my $with_caret=$re->site; #returns 'G^AATTC';
34
35 # but it is also a Bio::PrimarySeq object ....
36 my $without_caret=$re->seq; # returns 'GAATTC';
37 # ... and so does string
38 $without_caret=$re->string; #returns 'GAATTC';
39
40 # what is the reverse complement of the cut site
41 my $rc=$re->revcom; # returns 'GAATTC';
42
43 # now the recognition length. There are two types:
44 # recognition_length() is the length of the sequence
45 # cutter() estimate of cut frequency
46
47 my $recog_length = $re->recognition_length; # returns 6
48 # also returns 6 in this case but would return
49 # 4 for GANNTC and 5 for RGATCY (BstX2I)!
50 $recog_length=$re->cutter;
51
52 # is the sequence a palindrome - the same forwards and backwards
53 my $pal= $re->palindromic; # this is a boolean
54
55 # is the sequence blunt (i.e. no overhang - the forward and reverse
56 # cuts are the same)
57 print "blunt\n" if $re->overhang eq 'blunt';
58
59 # Overhang can have three values: "5'", "3'", "blunt", and undef
60 # Direction is very important if you use Klenow!
61 my $oh=$re->overhang;
62
63 # what is the overhang sequence
64 my $ohseq=$re->overhang_seq; # will return 'AATT';
65
66 # is the sequence ambiguous - does it contain non-GATC bases?
67 my $ambig=$re->is_ambiguous; # this is boolean
68
69 print "Stuff about the enzyme\nCuts after: $ca\n",
70 "Complementary cut: $oca\nSite:\n\t$with_caret or\n",
71 "\t$without_caret\n";
72 print "Reverse of the sequence: $rc\nRecognition length: $recog_length\n",
73 "Is it palindromic? $pal\n";
74 print "The overhang is $oh with sequence $ohseq\n",
75 "And is it ambiguous? $ambig\n\n";
76
77 ### THINGS YOU CAN SET, and get from rich REBASE file
78
79 # get or set the isoschizomers (enzymes that recognize the same
80 # site)
81 $re->isoschizomers('PvuII', 'SmaI'); # not really true :)
82 print "Isoschizomers are ", join " ", $re->isoschizomers, "\n";
83
84 # get or set the methylation sites
85 $re->methylation_sites(2); # not really true :)
86 print "Methylated at ", join " ", keys %{$re->methylation_sites},"\n";
87
88 #Get or set the source microbe
89 $re->microbe('E. coli');
90 print "It came from ", $re->microbe, "\n";
91
92 # get or set the person who isolated it
93 $re->source("Rob"); # not really true :)
94 print $re->source, " sent it to us\n";
95
96 # get or set whether it is commercially available and the company
97 # that it can be bought at
98 $re->vendors('NEB'); # my favorite
99 print "Is it commercially available :";
100 print $re->vendors ? "Yes" : "No";
101 print " and it can be got from ", join " ",
102 $re->vendors, "\n";
103
104 # get or set a reference for this
105 $re->reference('Edwards et al. J. Bacteriology');
106 print "It was not published in ", $re->reference, "\n";
107
108 # get or set the enzyme name
109 $re->name('BamHI');
110 print "The name of EcoRI is not really ", $re->name, "\n";
111
113 This module defines a single restriction endonuclease. You can use it
114 to make custom restriction enzymes, and it is used by Bio::Restric‐
115 tion::IO to define enzymes in the New England Biolabs REBASE collec‐
116 tion.
117
118 Use Bio::Restriction::Analysis to figure out which enzymes are avail‐
119 able and where they cut your sequence.
120
122 At least three geneticaly and biochamically distinct restriction modi‐
123 fication systems exist. The cutting components of them are known as
124 restriction endonuleases. The three systems are known by roman numer‐
125 als: Type I, II, and III restriction enzymes.
126
127 REBASE format 'cutzymes'(#15) lists enzyme type in its last field. The
128 categories there do not always match the the following short descrip‐
129 tions of the enzymes types. See http://it.stlawu.edu/~tbudd/rmsyst.html
130 for a better overview.
131
132 TypeI
133
134 Type I systems recognize a bipartite asymetrical sequence of 5-7 bp:
135
136 ---TGA*NnTGCT--- * = methylation sites
137 ---ACTNnA*CGA--- n = 6 for EcoK, n = 8 for EcoB
138
139 The cleavage site is roughly 1000 (400-7000) base pairs from the recog‐
140 nition site.
141
142 TypeII
143
144 The simplest and most common (at least commercially).
145
146 Site recognition is via short palindromic base sequences that are 4-6
147 base pairs long. Cleavage is at the recognition site (but may occasion‐
148 ally be just adjacent to the palindromic sequence, usually within) and
149 may produce blunt end termini or staggered, "sticky end" termini.
150
151 TypeIII
152
153 The recognition site is a 5-7 bp asymmetrical sequence. Cleavage is ATP
154 dependent 24-26 base pairs downstream from the recognition site and
155 usually yields staggered cuts 2-4 bases apart.
156
158 I am trying to make this backwards compatible with Bio::Tools::Restric‐
159 tionEnzyme. Undoubtedly some things will break, but we can fix things
160 as we progress.....!
161
162 I have added another comments section at the end of this POD that dis‐
163 cusses a couple of areas I know are broken (at the moment)
164
166 · Convert vendors touse full names of companies instead of code
167
168 · Add regular expression based matching to vendors
169
170 · Move away from the archaic ^ notation for cut sites. Ideally I'd
171 totally like to remove this altogether, or add a method that adds it
172 in if someone really wants it. We should be fixed on a sequence, num‐
173 ber notation.
174
176 Mailing Lists
177
178 User feedback is an integral part of the evolution of this and other
179 Bioperl modules. Send your comments and suggestions preferably to one
180 of the Bioperl mailing lists. Your participation is much appreciated.
181
182 bioperl-l@bioperl.org - General discussion
183 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
184
185 Reporting Bugs
186
187 Report bugs to the Bioperl bug tracking system to help us keep track
188 the bugs and their resolution. Bug reports can be submitted via the
189 web:
190
191 http://bugzilla.open-bio.org/
192
194 Rob Edwards, redwards@utmem.edu
195
197 Heikki Lehvaslaiho, heikki-at-bioperl-dot-org Peter Blaiklock,
198 pblaiklo@restrictionmapper.org
199
201 Copyright (c) 2003 Rob Edwards.
202
203 Some of this work is Copyright (c) 1997-2002 Steve A. Chervitz. All
204 Rights Reserved. This module is free software; you can redistribute it
205 and/or modify it under the same terms as Perl itself.
206
208 Bio::Restriction::Analysis, Bio::Restriction::EnzymeCollection,
209 Bio::Restriction::IO
210
212 Methods beginning with a leading underscore are considered private and
213 are intended for internal use by this module. They are not considered
214 part of the public interface and are described here for documentation
215 purposes only.
216
217 new
218
219 Title : new
220 Function
221 Function : Initializes the Enzyme object
222 Returns : The Restriction::Enzyme object
223 Argument : A standard definition can have several formats. For example:
224 $re->new(-enzyme='EcoRI', -seq->'GAATTC' -cut->'1')
225 Or, you can define the cut site in the sequence, for example
226 $re->new(-enzyme='EcoRI', -seq->'G^AATTC'), but you must use a caret
227 Or, a sequence can cut outside the recognition site, for example
228 $re->new(-enzyme='AbeI', -seq->'CCTCAGC' -cut->'-5/-2')
229
230 Other arguments:
231 -isoschizomers=>\@list a reference to an array of
232 known isoschizomers
233 -references=>$ref a reference to the enzyme
234 -source=>$source the source (person) of the enzyme
235 -commercial_availability=>@companies a list of companies
236 that supply the enzyme
237 -methylation_site=>\%sites a reference to hash that has
238 the position as the key and the type of methylation
239 as the value
240
241 A Restriction::Enzyme object manages its recognition sequence as a
242 Bio::PrimarySeq object.
243
244 The minimum requirement is for a name and a sequence.
245
246 This will create the restriction enzyme object, and define several
247 things about the sequence, such as palindromic, size, etc.
248
250 name
251
252 Title : name
253 Usage : $re->name($newval)
254 Function : Gets/Sets the restriction enzyme name
255 Example : $re->name('EcoRI')
256 Returns : value of name
257 Args : newvalue (optional)
258
259 This will also clean up the name. I have added this because some people
260 get confused about restriction enzyme names. The name should be One
261 upper case letter, and two lower case letters (because it is derived
262 from the organism name, eg. EcoRI is from E. coli). After that it is
263 all confused, but the numbers should be roman numbers not numbers,
264 therefore we'll correct those. At least this will provide some stan‐
265 dard, I hope.
266
267 site
268
269 Title : site
270 Usage : $re->site();
271 Function : Gets/sets the recognition sequence for the enzyme.
272 Example : $seq_string = $re->site();
273 Returns : String containing recognition sequence indicating
274 : cleavage site as in 'G^AATTC'.
275 Argument : n/a
276 Throws : n/a
277
278 Side effect: the sequence is always converted to upper case.
279
280 The cut site can also be set by using methods cut and complemen‐
281 tary_cut.
282
283 This will pad out missing sequence with N's. For example the enzyme
284 Acc36I cuts at ACCTGC(4/8). This will be returned as ACCTGCNNNN^
285
286 Note that the common notation ACCTGC(4/8) means that the forward strand
287 cut is four nucleotides after the END of the recognition site. The for‐
288 ward cut() in the coordinates used here in Acc36I ACCTGC(4/8) is at 6+4
289 i.e. 10.
290
291 ** This is the main setable method for the recognition site.
292
293 revcom_site
294
295 Title : revcom_site
296 Usage : $re->revcom_site();
297 Function : Gets/sets the complementary recognition sequence for the enzyme.
298 Example : $seq_string = $re->revcom_site();
299 Returns : String containing recognition sequence indicating
300 : cleavage site as in 'G^AATTC'.
301 Argument : Sequence of the site
302 Throws : n/a
303
304 This is the same as site, except it returns the revcom site. For palin‐
305 dromic enzymes these two are identical. For non-palindromic enzymes
306 they are not!
307
308 See also site above.
309
310 cut
311
312 Title : cut
313 Usage : $num = $re->cut(1);
314 Function : Sets/gets an integer indicating the position of cleavage
315 relative to the 5' end of the recognition sequence in the
316 forward strand.
317
318 For type II enzymes, sets the symmetrically positioned
319 reverse strand cut site by calling complementary_cut().
320
321 Returns : Integer, 0 if not set
322 Argument : an integer for the forward strand cut site (optional)
323
324 Note that the common notation ACCTGC(4/8) means that the forward strand
325 cut is four nucleotides after the END of the recognition site. The for‐
326 wad cut in the coordinates used here in Acc36I ACCTGC(4/8) is at 6+4
327 i.e. 10.
328
329 Note that REBASE uses notation where cuts within symmetic sites are
330 marked by '^' within the forward sequence but if the site is asymmetric
331 the parenthesis syntax is used where numbering ALWAYS starts from last
332 nucleotide in the forward strand. That's why AciI has a site usually
333 written as CCGC(-3/-1) actualy cuts in
334
335 C^C G C
336 G G C^G
337
338 In our notation, these locations are 1 and 3.
339
340 The cuts locations in the notation used are relative to the first
341 (non-N) nucleotide of the reported forward strand of the recognition
342 sequence. The following diagram numbers the phosphodiester bonds
343 (marked by + ) which can be cut by the restriction enzymes:
344
345 1 2 3 4 5 6 7 8 ...
346 N + N + N + N + N + G + A + C + T + G + G + N + N + N
347 ... -5 -4 -3 -2 -1
348
349 complementary_cut
350
351 Title : complementary_cut
352 Usage : $num = $re->complementary_cut('1');
353 Function : Sets/Gets an integer indicating the position of cleavage
354 : on the reverse strand of the restriction site.
355 Returns : Integer
356 Argument : An integer (optional)
357 Throws : Exception if argument is non-numeric.
358
359 This method determines the cut on the reverse strand of the sequence.
360 For most enzymes this will be within the sequence, and will be set
361 automatically based on the forward strand cut, but it need not be.
362
363 Note that the returned location indicates the location AFTER the first
364 non-N site nucleotide in the FORWARD strand.
365
367 type
368
369 Title : type
370 Usage : $re->type();
371 Function : Get/set the restriction system type
372 Returns :
373 Argument : optional type: ('I'⎪II⎪III)
374
375 Restriction enzymes have been catezorized into three types. Some REBASE
376 formats give the type, but the following rules can be used to classify
377 the known enzymes:
378
379 1 Bipartite site (with 6-8 Ns in the middle and the cut site is > 50
380 nt away) => type I
381
382 2 Site length < 3 => type I
383
384 3 5-6 asymmetric site and cuts >20 nt away => type III
385
386 4 All other => type II
387
388 There are some enzymes in REBASE which have bipartite recognition site
389 and cat far from the site but are still classified as type I. I've no
390 idea if this is really so.
391
392 seq
393
394 Title : seq
395 Usage : $re->seq();
396 Function : Get the Bio::PrimarySeq.pm object representing
397 : the recognition sequence
398 Returns : A Bio::PrimarySeq object representing the
399 enzyme recognition site
400 Argument : n/a
401 Throws : n/a
402
403 string
404
405 Title : string
406 Usage : $re->string();
407 Function : Get a string representing the recognition sequence.
408 Returns : String. Does NOT contain a '^' representing the cut location
409 as returned by the site() method.
410 Argument : n/a
411 Throws : n/a
412
413 revcom
414
415 Title : revcom
416 Usage : $re->revcom();
417 Function : Get a string representing the reverse complement of
418 : the recognition sequence.
419 Returns : String
420 Argument : n/a
421 Throws : n/a
422
423 recognition_length
424
425 Title : recognition_length
426 Usage : $re->recognition_length();
427 Function : Get the length of the RECOGNITION sequence.
428 This is the total recognition sequence,
429 inluding the ambiguous codes.
430 Returns : An integer
431 Argument : Nothing
432
433 See also: non_ambiguous_length
434
435 cutter
436
437 Title : cutter
438 Usage : $re->cutter
439 Function : Returns the "cutter" value of the recognition site.
440
441 This is a value relative to site length and lack of
442 ambiguity codes. Hence: 'RCATGY' is a five (5) cutter site
443 and 'CCTNAGG' a six cutter
444
445 This measure correlates to the frequency of the enzyme
446 cuts much better than plain recognition site length.
447
448 Example : $re->cutter
449 Returns : integer or float number
450 Args : none
451
452 Why is this better than just stripping the ambiguos codes? Think about
453 it like this: You have a random sequence; all nucleotides are equally
454 probable. You have a four nucleotide re site. The probability of that
455 site finding a match is one out of 4^4 or 256, meaning that on average
456 a four cutter finds a match every 256 nucleotides. For a six cutter,
457 the average fragment length is 4^6 or 4096. In the case of ambiguity
458 codes the chances are finding the match are better: an R (A⎪T) has 1/2
459 chance of finding a match in a random sequence. Therefore, for RGCGCY
460 the probability is one out of (2*4*4*4*4*2) which exactly the same as
461 for a five cutter! Cutter, although it can have non-integer values
462 turns out to be a useful and simple measure.
463
464 is_palindromic
465
466 Title : is_palindromic
467 Usage : $re->is_palindromic();
468 Function : Determines if the recognition sequence is palindromic
469 : for the current restriction enzyme.
470 Returns : Boolean
471 Argument : n/a
472 Throws : n/a
473
474 A palindromic site (EcoRI):
475
476 5-GAATTC-3
477 3-CTTAAG-5
478
479 overhang
480
481 Title : overhang
482 Usage : $re->overhang();
483 Function : Determines the overhang of the restriction enzyme
484 Returns : "5'", "3'", "blunt" of undef
485 Argument : n/a
486 Throws : n/a
487
488 A blunt site in SmaI returns "blunt"
489
490 5' C C C^G G G 3'
491 3' G G G^C C C 5'
492
493 A 5' overhang in EcoRI returns "5'"
494
495 5' G^A A T T C 3'
496 3' C T T A A^G 5'
497
498 A 3' overhang in KpnI returns "3'"
499
500 5' G G T A C^C 3'
501 3' C^C A T G G 5'
502
503 overhang_seq
504
505 Title : overhang_seq
506 Usage : $re->overhang_seq();
507 Function : Determines the overhang sequence of the restriction enzyme
508 Returns : a Bio::LocatableSeq
509 Argument : n/a
510 Throws : n/a
511
512 I do not think it is necessary to create a seq object of these.
513 (Heikki)
514
515 Note: returns empty string for blunt sequences and undef for ones that
516 we don't know. Compare these:
517
518 A blunt site in SmaI returns empty string
519
520 5' C C C^G G G 3'
521 3' G G G^C C C 5'
522
523 A 5' overhang in EcoRI returns "AATT"
524
525 5' G^A A T T C 3'
526 3' C T T A A^G 5'
527
528 A 3' overhang in KpnI returns "GTAC"
529
530 5' G G T A C^C 3'
531 3' C^C A T G G 5'
532
533 Note that you need to use method overhang to decide whether it is a 5'
534 or 3' overhang!!!
535
536 Note: The overhang stuff does not work if the site is asymmetric!
537 Rethink!
538
539 compatible_ends
540
541 Title : compatible_ends
542 Usage : $re->compatible_ends($re2);
543 Function : Determines if the two restriction enzyme cut sites
544 have compatible ends.
545 Returns : 0 if not, 1 if only one pair ends match, 2 if both ends.
546 Argument : a Bio::Restriction::Enzyme
547 Throws : unless the argument is a Bio::Resriction::Enzyme and
548 if there are Ns in the ovarhangs
549
550 In case of type II enzymes which which cut symmetrically, this function
551 can be considered to return a boolean value.
552
553 is_ambiguous
554
555 Title : is_ambiguous
556 Usage : $re->is_ambiguous();
557 Function : Determines if the restriction enzyme contains ambiguous sequences
558 Returns : Boolean
559 Argument : n/a
560 Throws : n/a
561
562 Additional methods from Rebase
563
564 is_prototype
565
566 Title : is_prototype
567 Usage : $re->is_prototype
568 Function : Get/Set method for finding out if this enzyme is a prototype
569 Example : $re->is_prototype(1)
570 Returns : Boolean
571 Args : none
572
573 Prototype enzymes are the most commonly available and usually first
574 enzymes discoverd that have the same recognition site. Using only pro‐
575 totype enzymes in restriciton analysis avoids redundacy and speeds
576 things up.
577
578 prototype_name
579
580 Title : prototype_name
581 Usage : $re->prototype_name
582 Function : Get/Set method for the name of prototype for
583 this enzyme's recognition site
584 Example : $re->prototype_name(1)
585 Returns : prototype enzyme name string or an empty string
586 Args : optional prototype enzyme name string
587
588 If the enzyme itself is the protype, its own name is returned. Not to
589 confuse the negative result with an unset value, use method is_proto‐
590 type.
591
592 This method is called prototype_name rather than prototype, because it
593 returns a string rather than on object.
594
595 isoschizomers
596
597 Title : isoschizomers
598 Usage : $re->isoschizomers(@list);
599 Function : Gets/Sets a list of known isoschizomers (enzymes that
600 recognize the same site, but don't necessarily cut at
601 the same position).
602 Arguments : A reference to an array that contains the isoschizomers
603 Returns : A reference to an array of the known isoschizomers or 0
604 if not defined.
605
606 This has to be the hardest name to spell. Added for compatibility to
607 REBASE
608
609 purge_isoschizomers
610
611 Title : purge_isoschizomers
612 Usage : $re->purge_isoschizomers();
613 Function : Purges the set of isoschizomers for this enzyme
614 Arguments :
615 Returns : 1
616
617 methylation_sites
618
619 Title : methylation_sites
620 Usage : $re->methylation_sites(\%sites);
621 Function : Gets/Sets known methylation sites (positions on the sequence
622 that get modified to promote or prevent cleavage).
623 Arguments : A reference to a hash that contains the methylation sites
624 Returns : A reference to a hash of the methylation sites or
625 an empty string if not defined.
626
627 There are three types of methylation sites:
628
629 * (6) = N6-methyladenosine
630 * (5) = 5-methylcytosine
631 * (4) = N4-methylcytosine
632
633 These are stored as 6, 5, and 4 respectively. The hash has the
634 sequence position as the key and the type of methylation as the value.
635 A negative number in the sequence position indicates that the DNA is
636 methylated on the complementary strand.
637
638 Note that in REBASE, the methylation positions are given Added for com‐
639 patibility to REBASE.
640
641 purge_methylation_sites
642
643 Title : purge_methylation_sites
644 Usage : $re->purge_methylation_sites();
645 Function : Purges the set of methylation_sites for this enzyme
646 Arguments :
647 Returns :
648
649 microbe
650
651 Title : microbe
652 Usage : $re->microbe($microbe);
653 Function : Gets/Sets microorganism where the restriction enzyme was found
654 Arguments : A scalar containing the microbes name
655 Returns : A scalar containing the microbes name or 0 if not defined
656
657 Added for compatibility to REBASE
658
659 source
660
661 Title : source
662 Usage : $re->source('Rob Edwards');
663 Function : Gets/Sets the person who provided the enzyme
664 Arguments : A scalar containing the persons name
665 Returns : A scalar containing the persons name or 0 if not defined
666
667 Added for compatibility to REBASE
668
669 vendors
670
671 Title : vendors
672 Usage : $re->vendor(@list_of_companies);
673 Function : Gets/Sets the a list of companies that you can get the enzyme from.
674 Also sets the commercially_available boolean
675 Arguments : A reference to an array containing the names of companies
676 that you can get the enzyme from
677 Returns : A reference to an array containing the names of companies
678 that you can get the enzyme from
679
680 Added for compatibility to REBASE
681
682 purge_vendors
683
684 Title : purge_vendors
685 Usage : $re->purge_references();
686 Function : Purges the set of references for this enzyme
687 Arguments :
688 Returns :
689
690 vendor
691
692 Title : vendor
693 Usage : $re->vendor(@list_of_companies);
694 Function : Gets/Sets the a list of companies that you can get the enzyme from.
695 Also sets the commercially_available boolean
696 Arguments : A reference to an array containing the names of companies
697 that you can get the enzyme from
698 Returns : A reference to an array containing the names of companies
699 that you can get the enzyme from
700
701 Added for compatibility to REBASE
702
703 references
704
705 Title : references
706 Usage : $re->references(string);
707 Function : Gets/Sets the references for this enzyme
708 Arguments : an array of string reference(s) (optional)
709 Returns : an array of references
710
711 Use purge_references to reset the list of references
712
713 This should be a Bio::Biblio object, but its not (yet)
714
715 purge_references
716
717 Title : purge_references
718 Usage : $re->purge_references();
719 Function : Purges the set of references for this enzyme
720 Arguments :
721 Returns : 1
722
723 clone
724
725 Title : clone
726 Usage : $re->clone
727 Function : Deep copy of the object
728 Arguments : -
729 Returns : new Bio::Restriction::EnzymeI object
730
731 This works as long as the object is a clean in-memory object using
732 scalars, arrays and hashes. You have been warned.
733
734 If you have module Storable, it is used, otherwise local code is used.
735 Todo: local code cuts circular references.
736
737
738
739perl v5.8.8 2007-05-07 Bio::Restriction::Enzyme(3)