1GO::AnnotationProvider:U:sAenrnoCtoanttiroinbPuatGreOsd:e:rPA(en3rn)lotDaotciuomnePnrtoavtiidoenr::AnnotationParser(3)
2
3
4

NAME

6       GO::AnnotationProvider::AnnotationParser - parses a gene annotation
7       file
8

SYNOPSIS

10       GO::AnnotationProvider::AnnotationParser - reads a Gene Ontology gene
11       associations file, and provides methods by which to retrieve the GO
12       annotations for the an annotated entity.  Note, it is case insensitive,
13       with some caveats - see documentation below.
14
15           my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(annotationFile => "data/gene_association.sgd");
16
17           my $geneName = "AAT2";
18
19           print "GO associations for gene: ", join (" ", $annotationParser->goIdsByName(name   => $geneName,
20                                                                                         aspect => 'P')), "\n";
21
22           print "Database ID for gene: ", $annotationParser->databaseIdByName($geneName), "\n";
23
24           print "Database name: ", $annotationParser->databaseName(), "\n";
25
26           print "Standard name for gene: ", $annotationParser->standardNameByName($geneName), "\n";
27
28           my $i;
29
30           my @geneNames = $annotationParser->allStandardNames();
31
32           foreach $i (0..10) {
33
34               print "$geneNames[$i]\n";
35
36           }
37

DESCRIPTION

39       GO::AnnotationProvider::AnnotationParser is a concrete subclass of
40       GO::AnnotationProvider, and creates a data structure mapping gene names
41       to GO annotations by parsing a file of annotations provided by the Gene
42       Ontology Consortium.
43
44       This package provides object methods for retrieving GO annotations that
45       have been parsed from a 'gene associations' file, provided by the gene
46       ontology consortium.  The format for the file is:
47
48       Lines beginning with a '!' character are comment lines.
49
50           Column  Cardinality   Contents
51           ------  -----------   -------------------------------------------------------------
52               0       1         Database abbreviation for the source of annotation (e.g. SGD)
53               1       1         Database identifier of the annotated entity
54               2       1         Standard name of the annotated entity
55               3       0,1       NOT (if a gene is specifically NOT annotated to the term)
56               4       1         GOID of the annotation
57               5       1,n       Reference(s) for the annotation
58               6       1         Evidence code for the annotation
59               7       0,n       With or From (a bit mysterious)
60               8       1         Aspect of the Annotation (C, F, P)
61               9       0,1       Name of the product being annotated
62              10       0,n       Alias(es) of the annotated product
63              11       1         type of annotated entity (one of gene, transcript, protein)
64              12       1,2       taxonomic id of the organism encoding and/or using the product
65              13       1         Date of annotation YYYYMMDD
66              14       1         Assigned_by : The database which made the annotation
67
68       Columns are separated by tabs.  For those entries with a cardinality
69       greater than 1, multiple entries are pipe , |, delimited.
70
71       Further details can be found at:
72
73       http://www.geneontology.org/doc/GO.annotation.html#file
74
75       The following assumptions about the file are made (and should be true):
76
77           1.  All aliases appear for all entries of a given annotated product
78           2.  The database identifiers are unique, in that two different
79               entities cannot have the same database id.
80

TODO

82       Also see the TODO list in the parent, GO::AnnotationProvider.
83
84        1.  Add in methods that will allow retrieval of evidence codes with
85            the annotations for a particular entity.
86
87        2.  Add in methods that return all the annotated entities for a
88            particular GOID.
89
90        3.  Add in the ability to request only annotations either including
91            or excluding particular evidence codes.  Such evidence codes
92            could be provided as an anonymous array as the value of a named
93            argument.
94
95        4.  Same as number 3, except allow the retrieval of annotated
96            entities for a particular GOID, based on inclusion or exclusion
97            of certain evidence codes.
98
99        These first four items will require a reworking of how data are
100        stored on the backend, and thus the parsing code itself, though it
101        should not affect any of the already existing API.
102
103        5.  Instead of 'use'ing Storable, 'require' it instead, only at the
104            point of use, which will mean that AnnotationParser can be
105            happily used in the absence of Storable, just without those
106            functions that need it.
107
108        6.  Extend the ValidateFile class method to check that an entity
109            should never be annotated to the same node twice, with the same
110            evidence, with the same reference.
111
112        7.  An additional checker, that uses an AnnotationProvider in
113            conjunction with an OntologyProvider, would be useful, that
114            checks that some of the annotations themselves are valid, ie
115            that no entities are annotated to the 'unknown' node in a
116            particular aspect, and also to another node within that same
117            aspect.  Can annotations be redundant? ie, if an entity is
118            annotated to a node, and an ancestor of the node, is that
119            annotation redundant?  Does it depend on the evidence codes and
120            references.  Or are such annotations reinforcing?  These things
121            are useful to consider when formulating the confidence which can
122            be attributed to an annotation.
123

Class Methods

125   Usage
126       This class method simply prints out a usage statement, along with an
127       error message, if one was passed in.
128
129       Usage :
130
131           GO::AnnotationProvider::AnnotationParser->Usage();
132
133   ValidateFile
134       This class method reads an annotation file, and returns a reference to
135       an array of errors that are present within the file.  The errors are
136       simply strings, each beginning with "Line $lineNo : " where $lineNo is
137       the number of the line in the file where the error was found.
138
139       Usage:
140
141           my $errorsRef = GO::AnnotationProvider::AnnotationParser->ValidateFile(annotationFile => $file);
142

Constructor

144   new
145       This is the constructor for an AnnotationParser object.
146
147       The constructor expects one of two arguments, either a 'annotationFile'
148       argument, or and 'objectFile' argument.  When instantiated with an
149       annotationFile argument, it expects it to correspond to an annotation
150       file created by one of the GO consortium members, according to their
151       file format.  When instantiated with an objectFile argument, it expects
152       to open a previously created annotationParser object that has been
153       serialized to disk (see the serializeToDisk method).
154
155       Usage:
156
157           my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(annotationFile => $file);
158
159           my $annotationParser = GO::AnnotationProvider::AnnotationParser->new(objectFile => $file);
160

Public instance methods

Some methods dealing with ambiguous names

163       Because there are many names by which an annotated entity may be
164       referred to, that are non-unique, there exist a set of methods for
165       determining whether a name is ambiguous, and to what database
166       identifiers such ambiguous names may refer.
167
168       Note, that the AnnotationParser is now case insensitive, but with some
169       caveats.  For instance, you can use 'cdc6' to retrieve data for CDC6.
170       However, This if gene has been referred to as abc1, and another
171       referred to as ABC1, then these are treated as different, and
172       unambiguous.  However, the text 'Abc1' would be considered ambiguous,
173       because it could refer to either.  On the other hand, if a single gene
174       is referred to as XYZ1 and xyz1, and no other genes have that name (in
175       any casing), then Xyz1 would still be considered unambiguous.
176
177   nameIsAmbiguous
178       This public method returns a boolean to indicate whether a name is
179       ambiguous, i.e. whether the name might map to more than one entity (and
180       therefore more than one databaseId).
181
182       NB: API change:
183
184       nameIsAmbiguous is now case insensitive - that is, if there is a name
185       that is used twice using different casing, that will be treated as
186       ambiguous.  Previous versions would have not treated these as
187       ambiguous.  In the case that a name is provided in a certain casing,
188       which was encountered only once, then it will be treated as
189       unambiguous.  This is the price of wanting a case insensitive
190       annotation parser...
191
192       Usage:
193
194           if ($annotationParser->nameIsAmbiguous($name)){
195
196               do something useful....or not....
197
198           }
199
200   databaseIdsForAmbiguousName
201       This public method returns an array of database identifiers for an
202       ambiguous name.  If the name is not ambiguous, an empty list will be
203       returned.
204
205       NB: API change:
206
207       databaseIdsForAmbiguousName is now case insensitive - that is, if there
208       is a name that is used twice using different casing, that will be
209       treated as ambiguous.  Previous versions would have not treated these
210       as ambiguous.  However, if the name provided is of the exact casing as
211       a name that appeared only once with that exact casing, then it is
212       treated as unambiguous. This is the price of wanting a case insensitive
213       annotation parser...
214
215       Usage:
216
217           my @databaseIds = $annotationParser->databaseIdsForAmbiguousName($name);
218
219   ambiguousNames
220       This method returns an array of names, which from the annotation file
221       have been deemed to be ambiguous.
222
223       Note - even though we have made the annotation parser case insensitive,
224       if something appeared in the annotations file as BLAH1 and blah1, we
225       would not deem either of these to be ambiguous.  However, if it
226       appeared as blah1 twice, referring to two different genes, then blah1
227       would be ambiguous.
228
229       Usage:
230
231           my @ambiguousNames = $annotationParser->ambiguousNames();
232

Methods for retrieving GO annotations for entities

234   goIdsByDatabaseId
235       This public method returns a reference to an array of GOIDs that are
236       associated with the supplied databaseId for a specific aspect.  If no
237       annotations are associated with that databaseId in that aspect, then a
238       reference to an empty array will be returned.  If the databaseId is not
239       recognized, then undef will be returned. In the case that a databaseId
240       is ambiguous (for instance the same databaseId exists but with
241       different casings) then if the supplied database id matches the exact
242       case of one of those supplied, then that is the one it will be treated
243       as.  In the case where the databaseId matches none of the possibilities
244       by case, then a fatal error will occur, because the provided databaseId
245       was ambiguous.
246
247       Usage:
248
249           my $goidsRef = $annotationParser->goIdsByDatabaseId(databaseId => $databaseId,
250                                                               aspect     => <P|F|C>);
251
252   goIdsByStandardName
253       This public method returns a reference to an array of GOIDs that are
254       associated with the supplied standardName for a specific aspect.  If no
255       annotations are associated with the entity with that standard name in
256       that aspect, then a reference to an empty list will be returned.  If
257       the supplied name is not used as a standard name, then undef will be
258       returned.  In the case that the supplied standardName is ambiguous (for
259       instance the same standardName exists but with different casings) then
260       if the supplied standardName matches the exact case of one of those
261       supplied, then that is the one it will be treated as.  In the case
262       where the standardName matches none of the possibilities by case, then
263       a fatal error will occur, because the provided standardName was
264       ambiguous.
265
266       Usage:
267
268           my $goidsRef = $annotationParser->goIdsByStandardName(standardName =>$standardName,
269                                                                 aspect       =><P|F|C>);
270
271   goIdsByName
272       This public method returns a reference to an array of GO IDs that are
273       associated with the supplied name for a specific aspect.  If there are
274       no GO associations for the entity corresponding to the supplied name in
275       the provided aspect, then a reference to an empty list will be
276       returned.  If the supplied name does not correspond to any entity, then
277       undef will be returned.  Because the name can be any of the databaseId,
278       the standard name, or any of the aliases, it is possible that the name
279       might be ambiguous.  Clients of this object should first test whether
280       the name they are using is ambiguous, using the nameIsAmbiguous()
281       method, and handle it accordingly.  If an ambiguous name is supplied,
282       then it will die.
283
284       NB: API change:
285
286       goIdsByName is now case insensitive - that is, if there is a name that
287       is used twice using different casing, that will be treated as
288       ambiguous.  Previous versions would have not treated these as
289       ambiguous.  This is the price of wanting a case insensitive annotation
290       parser.  In the event that a name is provided that is ambiguous because
291       of case, if it matches exactly the case of one of the possible matches,
292       it will be treated unambiguously.
293
294       Usage:
295
296           my $goidsRef = $annotationParser->goIdsByName(name   => $name,
297                                                         aspect => <P|F|C>);
298

Methods for mapping different types of name to each other

300   standardNameByDatabaseId
301       This method returns the standard name for a database id.
302
303       NB: API change
304
305       standardNameByDatabaseId is now case insensitive - that is, if there is
306       a databaseId that is used twice (or more) using different casing, it
307       will be treated as ambiguous.  Previous versions would have not treated
308       these as ambiguous.  This is the price of wanting a case insensitive
309       annotation parser.  In the event that a name is provided that is
310       ambiguous because of case, if it matches exactly the case of one of the
311       possible matches, it will be treated unambiguously.
312
313       Usage:
314
315           my $standardName = $annotationParser->standardNameByDatabaseId($databaseId);
316
317   databaseIdByStandardName
318       This method returns the database id for a standard name.
319
320       NB: API change
321
322       databaseIdByStandardName is now case insensitive - that is, if there is
323       a standard name that is used twice (or more) using different casing, it
324       will be treated as ambiguous.  Previous versions would have not treated
325       these as ambiguous.  This is the price of wanting a case insensitive
326       annotation parser.  In the event that a name is provided that is
327       ambiguous because of case, if it matches exactly the case of one of the
328       possible matches, it will be treated unambiguously.
329
330       Usage:
331
332           my $databaseId = $annotationParser->databaseIdByStandardName($standardName);
333
334   databaseIdByName
335       This method returns the database id for any identifier for a gene (e.g.
336       by databaseId itself, by standard name, or by alias).  If the used name
337       is ambiguous, then the program will die.  Thus clients should call the
338       nameIsAmbiguous() method, prior to using this method.  If the name does
339       not map to any databaseId, then undef will be returned.
340
341       NB: API change
342
343       databaseIdByName is now case insensitive - that is, if there is a name
344       that is used twice using different casing, that will be treated as
345       ambiguous.  Previous versions would have not treated these as
346       ambiguous.  This is the price of wanting a case insensitive annotation
347       parser.  In the event that a name is provided that is ambiguous because
348       of case, if it matches exactly the case of one of the possible matches,
349       it will be treated unambiguously.
350
351       Usage:
352
353           my $databaseId = $annotationParser->databaseIdByName($name);
354
355   standardNameByName
356       This public method returns the standard name for the the gene specified
357       by the given name.  Because a name may be ambiguous, the
358       nameIsAmbiguous() method should be called first.  If an ambiguous name
359       is supplied, then it will die with an appropriate error message.  If
360       the name does not map to a standard name, then undef will be returned.
361
362       NB: API change
363
364       standardNameByName is now case insensitive - that is, if there is a
365       name that is used twice using different casing, that will be treated as
366       ambiguous.  Previous versions would have not treated these as
367       ambiguous.  This is the price of wanting a case insensitive annotation
368       parser.
369
370       Usage:
371
372           my $standardName = $annotationParser->standardNameByName($name);
373

Other methods relating to names

375   nameIsStandardName
376       This method returns a boolean to indicate whether the supplied name is
377       used as a standard name.
378
379       NB : API change.
380
381       This is now case insensitive.  If you provide abC1, and ABc1 is a
382       standard name, then it will return true.
383
384       Usage :
385
386           if ($annotationParser->nameIsStandardName($name)){
387
388               # do something
389
390           }
391
392   nameIsDatabaseId
393       This method returns a boolean to indicate whether the supplied name is
394       used as a database id.
395
396       NB : API change.
397
398       This is now case insensitive.  If you provide abC1, and ABc1 is a
399       database id, then it will return true.
400
401       Usage :
402
403           if ($annotationParser->nameIsDatabaseId($name)){
404
405               # do something
406
407           }
408
409   nameIsAnnotated
410       This method returns a boolean to indicate whether the supplied name has
411       any annotations, either when considered as a databaseId, a
412       standardName, or an alias.  If an aspect is also supplied, then it
413       indicates whether that name has any annotations in that aspect only.
414
415       NB: API change.
416
417       This is now case insensitive.  If you provide abC1, and ABc1 has
418       annotation, then it will return true.
419
420       Usage :
421
422           if ($annotationParser->nameIsAnnotated(name => $name)){
423
424               # blah
425
426           }
427
428       or:
429
430           if ($annotationParser->nameIsAnnotated(name   => $name,
431                                                  aspect => $aspect)){
432
433               # blah
434
435           }
436

Other public methods

438   databaseName
439       This method returns the name of the annotating authority from the file
440       that was supplied to the constructor.
441
442       Usage :
443
444           my $databaseName = $annotationParser->databaseName();
445
446   numAnnotatedGenes
447       This method returns the number of entities in the annotation file that
448       have annotations in the supplied aspect.  If no aspect is provided,
449       then it will return the number of genes with an annotation in at least
450       one aspect of GO.
451
452       Usage:
453
454           my $numAnnotatedGenes = $annotationParser->numAnnotatedGenes();
455
456           my $numAnnotatedGenes = $annotationParser->numAnnotatedGenes($aspect);
457
458   allDatabaseIds
459       This public method returns an array of all the database identifiers
460
461       Usage:
462
463           my @databaseIds = $annotationParser->allDatabaseIds();
464
465   allStandardNames
466       This public method returns an array of all standard names.
467
468       Usage:
469
470           my @standardNames = $annotationParser->allStandardNames();
471

Methods to do with files

473   file
474       This method returns the name of the file that was used to instantiate
475       the object.
476
477       Usage:
478
479           my $file = $annotationParser->file;
480
481   serializeToDisk
482       This public method saves the current state of the Annotation Parser
483       Object to a file, using the Storable package.  The data are saved in
484       network order for portability, just in case.  The name of the object
485       file is returned.  By default, the name of the original file will be
486       used to make the name of the object file (including the full path from
487       where the file came), or the client can instead supply their own
488       filename.
489
490       Usage:
491
492           my $fileName = $annotationParser->serializeToDisk;
493
494           my $fileName = $annotationParser->serializeToDisk(filename => $filename);
495

Modifications

497       CVS info is listed here:
498
499        # $Author: sherlock $
500        # $Date: 2008/05/13 23:06:16 $
501        # $Log: AnnotationParser.pm,v $
502        # Revision 1.35  2008/05/13 23:06:16  sherlock
503        # updated to fix bug with querying with a name that was unambiguous when
504        # taking its casing into account.
505        #
506        # Revision 1.34  2007/03/18 03:09:05  sherlock
507        # couple of PerlCritic suggested improvements, and an extra check to
508        # make sure that the cardinality between standard names and database ids
509        # is 1:1
510        #
511        # Revision 1.33  2006/07/28 00:02:14  sherlock
512        # fixed a couple of typos
513        #
514        # Revision 1.32  2004/07/28 17:12:10  sherlock
515        # bumped version
516        #
517        # Revision 1.31  2004/07/28 17:03:49  sherlock
518        # fixed bugs when calling goidsByDatabaseId instead of goIdsByDatabaseId
519        # on lines 1592 and 1617 - thanks to lfriedl@cs.umass.edu for spotting this.
520        #
521        # Revision 1.30  2003/11/26 18:44:28  sherlock
522        # finished making all the changes that were required to make it case
523        # insensitive, and modified POD accordingly.  It appears to all work as
524        # expected...
525        #
526        # Revision 1.29  2003/11/22 00:05:05  sherlock
527        # made a very large number of changes to make much of it
528        # case-insensitive, such that using CDC6 or cdc6 amounts to the same
529        # query, as long as both versions of that name don't exist in the
530        # annotations file.  Still needs a little work to allow names that are
531        # potentially ambiguous to be not ambiguous, if their casing matches
532        # exactly one form of the name that has been seen.  Have started to
533        # update test suite to check all the case insensitive stuff, but is not
534        # yet finished.
535        #
536        #
537

AUTHORS

539       Elizabeth Boyle, ell@mit.edu
540
541       Gavin Sherlock,  sherlock@genome.stanford.edu
542
543
544
545perl v5.28.0                      20G0O8:-:0A5n-n1o3tationProvider::AnnotationParser(3)
Impressum