Bio::Restriction::EnzymeI(3pm)

1Bio::Restriction::EnzymUesIe(r3)Contributed Perl DocumenBtiaot:i:oRnestriction::EnzymeI(3)
2
3
4

NAME

6       Bio::Restriction::EnzymeI - Interface class for restriction
7       endonuclease
8

SYNOPSIS

10         # do not run this class directly
11

DESCRIPTION

13       This module defines methods for a single restriction endonuclease.  For
14       an implementation, see Bio::Restriction::Enzyme.
15

FEEDBACK

17   Mailing Lists
18       User feedback is an integral part of the evolution of this and other
19       Bioperl modules. Send your comments and suggestions preferably to one
20       of the Bioperl mailing lists. Your participation is much appreciated.
21
22         bioperl-l@bioperl.org                  - General discussion
23         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
24
25   Support
26       Please direct usage questions or support issues to the mailing list:
27
28       bioperl-l@bioperl.org
29
30       rather than to the module maintainer directly. Many experienced and
31       reponsive experts will be able look at the problem and quickly address
32       it. Please include a thorough description of the problem with code and
33       data examples if at all possible.
34
35   Reporting Bugs
36       Report bugs to the Bioperl bug tracking system to help us keep track
37       the bugs and their resolution. Bug reports can be submitted via the
38       web:
39
40         http://bugzilla.open-bio.org/
41

AUTHOR

43       Heikki Lehvaslaiho, heikki-at-bioperl-dot-org
44

CONTRIBUTORS

46       Rob Edwards, redwards@utmem.edu
47

APPENDIX

52       Methods beginning with a leading underscore are considered private and
53       are intended for internal use by this module. They are not considered
54       part of the public interface and are described here for documentation
55       purposes only.
56

Essential methods

58   name
59        Title    : name
60        Usage    : $re->name($newval)
61        Function : Gets/Sets the restriction enzyme name
62        Example  : $re->name('EcoRI')
63        Returns  : value of name
64        Args     : newvalue (optional)
65
66       This will also clean up the name. I have added this because some people
67       get confused about restriction enzyme names.  The name should be One
68       upper case letter, and two lower case letters (because it is derived
69       from the organism name, eg.  EcoRI is from E. coli). After that it is
70       all confused, but the numbers should be roman numbers not numbers,
71       therefore we'll correct those. At least this will provide some
72       standard, I hope.
73
74   site
75        Title     : site
76        Usage     : $re->site();
77        Function  : Gets/sets the recognition sequence for the enzyme.
78        Example   : $seq_string = $re->site();
79        Returns   : String containing recognition sequence indicating
80                  : cleavage site as in  'G^AATTC'.
81        Argument  : n/a
82        Throws    : n/a
83
84       Side effect: the sequence is always converted to upper case.
85
86       The cut site can also be set by using methods cut and
87       complementary_cut.
88
89       This will pad out missing sequence with N's. For example the enzyme
90       Acc36I cuts at ACCTGC(4/8). This will be returned as ACCTGCNNNN^
91
92       Note that the common notation ACCTGC(4/8) means that the forward strand
93       cut is four nucleotides after the END of the recognition site. The
94       forward cut() in the coordinates used here in Acc36I ACCTGC(4/8) is at
95       6+4 i.e. 10.
96
97       ** This is the main setable method for the recognition site.
98
99   revcom_site
100        Title     : revcom_site
101        Usage     : $re->revcom_site();
102        Function  : Gets/sets the complementary recognition sequence for the enzyme.
103        Example   : $seq_string = $re->revcom_site();
104        Returns   : String containing recognition sequence indicating
105                  : cleavage site as in  'G^AATTC'.
106        Argument  : Sequence of the site
107        Throws    : n/a
108
109       This is the same as site, except it returns the revcom site. For
110       palindromic enzymes these two are identical. For non-palindromic
111       enzymes they are not!
112
113       See also site above.
114
115   cut
116        Title     : cut
117        Usage     : $num = $re->cut(1);
118        Function  : Sets/gets an integer indicating the position of cleavage
119                    relative to the 5' end of the recognition sequence in the
120                    forward strand.
121
122                    For type II enzymes, sets the symmetrically positioned
123                    reverse strand cut site by calling complementary_cut().
124
125        Returns   : Integer, 0 if not set
126        Argument  : an integer for the forward strand cut site (optional)
127
128       Note that the common notation ACCTGC(4/8) means that the forward strand
129       cut is four nucleotides after the END of the recognition site. The
130       forwad cut in the coordinates used here in Acc36I ACCTGC(4/8) is at 6+4
131       i.e. 10.
132
133       Note that REBASE uses notation where cuts within symmetic sites are
134       marked by '^' within the forward sequence but if the site is asymmetric
135       the parenthesis syntax is used where numbering ALWAYS starts from last
136       nucleotide in the forward strand. That's why AciI has a site usually
137       written as CCGC(-3/-1) actualy cuts in
138
139         C^C G C
140         G G C^G
141
142       In our notation, these locations are 1 and 3.
143
144       The cuts locations in the notation used are relative to the first (non-
145       N) nucleotide of the reported forward strand of the recognition
146       sequence. The following diagram numbers the phosphodiester bonds
147       (marked by + ) which can be cut by the restriction enzymes:
148
149                                  1   2   3   4   5   6   7   8  ...
150            N + N + N + N + N + G + A + C + T + G + G + N + N + N
151         ... -5  -4  -3  -2  -1
152
153   complementary_cut
154        Title     : complementary_cut
155        Usage     : $num = $re->complementary_cut('1');
156        Function  : Sets/Gets an integer indicating the position of cleavage
157                  : on the reverse strand of the restriction site.
158        Returns   : Integer
159        Argument  : An integer (optional)
160        Throws    : Exception if argument is non-numeric.
161
162       This method determines the cut on the reverse strand of the sequence.
163       For most enzymes this will be within the sequence, and will be set
164       automatically based on the forward strand cut, but it need not be.
165
166       Note that the returned location indicates the location AFTER the first
167       non-N site nucleotide in the FORWARD strand.
168

Read only (usually) recognition site descriptive methods

170   type
171        Title     : type
172        Usage     : $re->type();
173        Function  : Get/set the restriction system type
174        Returns   :
175        Argument  : optional type: ('I'|II|III)
176
177       Restriction enzymes have been catezorized into three types. Some REBASE
178       formats give the type, but the following rules can be used to classify
179       the known enzymes:
180
181       1.  Bipartite site (with 6-8 Ns in the middle and the cut site is > 50
182           nt away) => type I
183
184       2.  Site length < 3  => type I
185
186       3.  5-6 asymmetric site and cuts >20 nt away => type III
187
188       4.  All other  => type II
189
190       There are some enzymes in REBASE which have bipartite recognition site
191       and cat far from the site but are still classified as type I. I've no
192       idea if this is really so.
193
194   seq
195        Title     : seq
196        Usage     : $re->seq();
197        Function  : Get the Bio::PrimarySeq.pm object representing
198                  : the recognition sequence
199        Returns   : A Bio::PrimarySeq object representing the
200                    enzyme recognition site
201        Argument  : n/a
202        Throws    : n/a
203
204   string
205        Title     : string
206        Usage     : $re->string();
207        Function  : Get a string representing the recognition sequence.
208        Returns   : String. Does NOT contain a  '^' representing the cut location
209                    as returned by the site() method.
210        Argument  : n/a
211        Throws    : n/a
212
213   revcom
214        Title     : revcom
215        Usage     : $re->revcom();
216        Function  : Get a string representing the reverse complement of
217                  : the recognition sequence.
218        Returns   : String
219        Argument  : n/a
220        Throws    : n/a
221
222   recognition_length
223        Title     : recognition_length
224        Usage     : $re->recognition_length();
225        Function  : Get the length of the RECOGNITION sequence.
226                    This is the total recognition sequence,
227                    inluding the ambiguous codes.
228        Returns   : An integer
229        Argument  : Nothing
230
231       See also: non_ambiguous_length
232
233   non_ambiguous_length
234        Title     : non_ambiguous_length
235        Usage     : $re->non_ambiguous_length();
236        Function  : Get the nonambiguous length of the RECOGNITION sequence.
237                    This is the total recognition sequence,
238                    excluding the ambiguous codes.
239        Returns   : An integer
240        Argument  : Nothing
241
242       See also: non_ambiguous_length
243
244   cutter
245        Title    : cutter
246        Usage    : $re->cutter
247        Function : Returns the "cutter" value of the recognition site.
248
249                   This is a value relative to site length and lack of
250                   ambiguity codes. Hence: 'RCATGY' is a five (5) cutter site
251                   and 'CCTNAGG' a six cutter
252
253                   This measure correlates to the frequency of the enzyme
254                   cuts much better than plain recognition site length.
255
256        Example  : $re->cutter
257        Returns  : integer or float number
258        Args     : none
259
260       Why is this better than just stripping the ambiguous codes? Think about
261       it like this: You have a random sequence; all nucleotides are equally
262       probable. You have a four nucleotide re site. The probability of that
263       site finding a match is one out of 4^4 or 256, meaning that on average
264       a four cutter finds a match every 256 nucleotides. For a six cutter,
265       the average fragment length is 4^6 or 4096. In the case of ambiguity
266       codes the chances are finding the match are better: an R (A|T) has 1/2
267       chance of finding a match in a random sequence. Therefore, for RGCGCY
268       the probability is one out of (2*4*4*4*4*2) which exactly the same as
269       for a five cutter! Cutter, although it can have non-integer values
270       turns out to be a useful and simple measure.
271
272       From bug 2178: VHDB are ambiguity symbols that match three different
273       nucleotides, so they contribute less to the effective recognition
274       sequence length than e.g. Y which matches only two nucleotides. A
275       symbol which matches n of the 4 nucleotides has an effective length of
276       1 - log(n) / log(4).
277
278   is_palindromic
279        Title     : is_palindromic
280        Usage     : $re->is_palindromic();
281        Function  : Determines if the recognition sequence is palindromic
282                  : for the current restriction enzyme.
283        Returns   : Boolean
284        Argument  : n/a
285        Throws    : n/a
286
287       A palindromic site (EcoRI):
288
289         5-GAATTC-3
290         3-CTTAAG-5
291
292   overhang
293        Title     : overhang
294        Usage     : $re->overhang();
295        Function  : Determines the overhang of the restriction enzyme
296        Returns   : "5'", "3'", "blunt" of undef
297        Argument  : n/a
298        Throws    : n/a
299
300       A blunt site in SmaI returns "blunt"
301
302         5' C C C^G G G 3'
303         3' G G G^C C C 5'
304
305       A 5' overhang in EcoRI returns "5'"
306
307         5' G^A A T T C 3'
308         3' C T T A A^G 5'
309
310       A 3' overhang in KpnI returns "3'"
311
312         5' G G T A C^C 3'
313         3' C^C A T G G 5'
314
315   overhang_seq
316        Title     : overhang_seq
317        Usage     : $re->overhang_seq();
318        Function  : Determines the overhang sequence of the restriction enzyme
319        Returns   : a Bio::LocatableSeq
320        Argument  : n/a
321        Throws    : n/a
322
323       I do not think it is necessary to create a seq object of these.
324       (Heikki)
325
326       Note: returns empty string for blunt sequences and undef for ones that
327       we don't know.  Compare these:
328
329       A blunt site in SmaI returns empty string
330
331         5' C C C^G G G 3'
332         3' G G G^C C C 5'
333
334       A 5' overhang in EcoRI returns "AATT"
335
336         5' G^A A T T C 3'
337         3' C T T A A^G 5'
338
339       A 3' overhang in KpnI returns "GTAC"
340
341         5' G G T A C^C 3'
342         3' C^C A T G G 5'
343
344       Note that you need to use method overhang to decide whether it is a 5'
345       or 3' overhang!!!
346
347       Note: The overhang stuff does not work if the site is asymmetric!
348       Rethink!
349
350   compatible_ends
351        Title     : compatible_ends
352        Usage     : $re->compatible_ends($re2);
353        Function  : Determines if the two restriction enzyme cut sites
354                     have compatible ends.
355        Returns   : 0 if not, 1 if only one pair ends match, 2 if both ends.
356        Argument  : a Bio::Restriction::Enzyme
357        Throws    : unless the argument is a Bio::Resriction::Enzyme and
358                    if there are Ns in the ovarhangs
359
360       In case of type II enzymes which which cut symmetrically, this function
361       can be considered to return a boolean value.
362
363   is_ambiguous
364        Title     : is_ambiguous
365        Usage     : $re->is_ambiguous();
366        Function  : Determines if the restriction enzyme contains ambiguous sequences
367        Returns   : Boolean
368        Argument  : n/a
369        Throws    : n/a
370
371   Additional methods from Rebase
372   is_prototype
373        Title    : is_prototype
374        Usage    : $re->is_prototype
375        Function : Get/Set method for finding out if this enzyme is a prototype
376        Example  : $re->is_prototype(1)
377        Returns  : Boolean
378        Args     : none
379
380       Prototype enzymes are the most commonly available and usually first
381       enzymes discoverd that have the same recognition site. Using only
382       prototype enzymes in restriciton analysis avoids redundacy and speeds
383       things up.
384
385   prototype_name
386        Title    : prototype_name
387        Usage    : $re->prototype_name
388        Function : Get/Set method for the name of prototype for
389                   this enzyme's recognition site
390        Example  : $re->prototype_name(1)
391        Returns  : prototype enzyme name string or an empty string
392        Args     : optional prototype enzyme name string
393
394       If the enzyme itself is the protype, its own name is returned.  Not to
395       confuse the negative result with an unset value, use method
396       is_prototype.
397
398       This method is called prototype_name rather than prototype, because it
399       returns a string rather than on object.
400
401   isoschizomers
402        Title     : isoschizomers
403        Usage     : $re->isoschizomers(@list);
404        Function  : Gets/Sets a list of known isoschizomers (enzymes that
405                    recognize the same site, but don't necessarily cut at
406                    the same position).
407        Arguments : A reference to an array that contains the isoschizomers
408        Returns   : A reference to an array of the known isoschizomers or 0
409                    if not defined.
410
411       Added for compatibility to REBASE
412
413   purge_isoschizomers
414        Title     : purge_isoschizomers
415        Usage     : $re->purge_isoschizomers();
416        Function  : Purges the set of isoschizomers for this enzyme
417        Arguments :
418        Returns   : 1
419
420   methylation_sites
421        Title     : methylation_sites
422        Usage     : $re->methylation_sites(\%sites);
423        Function  : Gets/Sets known methylation sites (positions on the sequence
424                    that get modified to promote or prevent cleavage).
425        Arguments : A reference to a hash that contains the methylation sites
426        Returns   : A reference to a hash of the methylation sites or
427                    an empty string if not defined.
428
429       There are three types of methylation sites:
430
431       ·  (6) = N6-methyladenosine
432
433       ·  (5) = 5-methylcytosine
434
435       ·  (4) = N4-methylcytosine
436
437       These are stored as 6, 5, and 4 respectively.  The hash has the
438       sequence position as the key and the type of methylation as the value.
439       A negative number in the sequence position indicates that the DNA is
440       methylated on the complementary strand.
441
442       Note that in REBASE, the methylation positions are given Added for
443       compatibility to REBASE.
444
445   purge_methylation_sites
446        Title     : purge_methylation_sites
447        Usage     : $re->purge_methylation_sites();
448        Function  : Purges the set of methylation_sites for this enzyme
449        Arguments :
450        Returns   :
451
452   microbe
453        Title     : microbe
454        Usage     : $re->microbe($microbe);
455        Function  : Gets/Sets microorganism where the restriction enzyme was found
456        Arguments : A scalar containing the microbes name
457        Returns   : A scalar containing the microbes name or 0 if not defined
458
459       Added for compatibility to REBASE
460
461   source
462        Title     : source
463        Usage     : $re->source('Rob Edwards');
464        Function  : Gets/Sets the person who provided the enzyme
465        Arguments : A scalar containing the persons name
466        Returns   : A scalar containing the persons name or 0 if not defined
467
468       Added for compatibility to REBASE
469
470   vendors
471        Title     : vendors
472        Usage     : $re->vendor(@list_of_companies);
473        Function  : Gets/Sets the a list of companies that you can get the enzyme from.
474                    Also sets the commercially_available boolean
475        Arguments : A reference to an array containing the names of companies
476                    that you can get the enzyme from
477        Returns   : A reference to an array containing the names of companies
478                    that you can get the enzyme from
479
480       Added for compatibility to REBASE
481
482   purge_vendors
483        Title     : purge_vendors
484        Usage     : $re->purge_references();
485        Function  : Purges the set of references for this enzyme
486        Arguments :
487        Returns   :
488
489   vendor
490        Title     : vendor
491        Usage     : $re->vendor(@list_of_companies);
492        Function  : Gets/Sets the a list of companies that you can get the enzyme from.
493                    Also sets the commercially_available boolean
494        Arguments : A reference to an array containing the names of companies
495                    that you can get the enzyme from
496        Returns   : A reference to an array containing the names of companies
497                    that you can get the enzyme from
498
499       Added for compatibility to REBASE
500
501   references
502        Title     : references
503        Usage     : $re->references(string);
504        Function  : Gets/Sets the references for this enzyme
505        Arguments : an array of string reference(s) (optional)
506        Returns   : an array of references
507
508       Use purge_references to reset the list of references
509
510       This should be a Bio::Biblio or Bio::Annotation::Reference object, but
511       its not (yet)
512
513   purge_references
514        Title     : purge_references
515        Usage     : $re->purge_references();
516        Function  : Purges the set of references for this enzyme
517        Arguments :
518        Returns   : 1
519
520   clone
521        Title     : clone
522        Usage     : $re->clone
523        Function  : Deep copy of the object
524        Arguments : -
525        Returns   : new Bio::Restriction::EnzymeI object
526
527       This works as long as the object is a clean in-memory object using
528       scalars, arrays and hashes. You have been warned.
529
530       If you have module Storable, it is used, otherwise local code is used.
531       Todo: local code cuts circular references.
532
533
534
535perl v5.12.0                      2010-04-29      Bio::Restriction::EnzymeI(3)