bp_unflatten_seq.pl(1)

1BP_UNFLATTEN_SEQ(1)   User Contributed Perl Documentation  BP_UNFLATTEN_SEQ(1)
2
3
4

NAME

6       unflatten_seq - unflatten a genbank or genbank-style feature file into
7       a nested SeqFeature hierarchy
8

SYNOPSIS

10         unflatten_seq.PLS -e 3 -gff ~/cvs/bioperl-live/t/data/AE003644_Adh-genomic.gb
11
12         unflatten_seq.PLS --detail ~/cvs/bioperl-live/t/data/AE003644_Adh-genomic.gb
13
14         unflatten_seq.PLS -i foo.embl --from embl --to chadoxml -o out.chado.xml
15
16         unflatten_seq.PLS --notypemap --detail --to asciitree -ethresh 2 AE003644_Adh-genomic.gb
17

DESCRIPTION

19       This script will unflatten a genbank or genbank-style file of
20       SeqFeatures into a nested hierarchy.
21
22       See Bio::SeqFeature::Tools::Unflattener
23
24       In a GenBank/EMBL representation, features are 'flat' - for example,
25       there is no link between an mRNA and a CDS, other than implicit links
26       (eg via tags or via splice site coordinates) which may be hard to code
27       for.
28
29       This is most easily illustrated with the default output format,
30       asciitree
31
32       An unflattened genbank feature set may look like this (AB077698)
33
34         Seq: AB077698
35           databank_entry                                   1..2701[+]
36           gene
37             mRNA
38               CDS hCHCR-G                                  80..1144[+]
39               exon                                         80..1144[+]
40             five_prime_UTR                                 1..79[+]
41             located_sequence_feature                       137..196[+]
42             located_sequence_feature                       239..292[+]
43             located_sequence_feature                       617..676[+]
44             located_sequence_feature                       725..778[+]
45             three_prime_UTR                                1145..2659[+]
46             polyA_site                                     1606..1606[+]
47             polyA_site                                     2660..2660[+]
48
49       Or like this (portion of AE003734)
50
51         gene
52           mRNA CG3320-RA
53             CDS CG3320-PA                                53126..54971[-]
54             exon                                         52204..53323[-]
55             exon                                         53404..53631[-]
56             exon                                         53688..53735[-]
57             exon                                         53798..53918[-]
58             exon                                         54949..55287[-]
59           mRNA CG3320-RB
60             CDS CG3320-PB                                53383..54971[-]
61             exon                                         52204..53631[-]
62             exon                                         53688..53735[-]
63             exon                                         53798..53918[-]
64             exon                                         54949..55287[-]
65
66       The unflattening will also 'normalize' the containment hierarchy (in
67       the sense of standardising it - e.g. making sure there is always a
68       transcript record, even if genbank just specifies CDS and gene)
69
70       By default, the GenBank types will be mapped to SO types
71
72       See Bio::SeqFeature::Tools::TypeMapper
73

COMMAND LINE ARGUMENTS

75       -i|input FILE
76           input file (can also be specified as last argument)
77
78       -from FORMAT
79           input format (defaults to genbank)
80
81           probably doesnt make so much sense to use this for non-flat
82           formats; ie other than embl/genbank
83
84       -to FORMAT
85           output format (defaults to asciitree)
86
87           should really be a format that is nested SeqFeature aware; I think
88           this is only asciitree, chadoxml and gff3
89
90       -gff
91           with export to GFF3 format (pre-3 GFFs make no sense with
92           unflattened sequences, as they have no set way of representing
93           feature graphs)
94
95       -o|output FILE
96           outfile defaults to STDOUT
97
98       -detail
99           show extra detail on features (asciitree mode only)
100
101       -e|ethresh INT
102           sets the error threshold on unflattening
103
104           by default this script will throw a wobbly if it encounters weird
105           stuff in the genbank file - raise the error threshold to signal
106           these to be ignored (and reported on STDERR)
107
108       -nomagic
109           suppress use_magic in unflattening (see
110           Bio::SeqFeature::Tools::Unflattener
111
112       -notypemap
113           suppress type mapping (see Bio::SeqFeature::Tools::TypeMapper
114

TODO

116       Bio::SeqFeature::Tools::Unflattener allows fine-grained control over
117       the unflattening process - need to add more options to allow this
118       control at the command line
119

FEEDBACK

121   Mailing Lists
122       User feedback is an integral part of the evolution of this and other
123       Bioperl modules. Send your comments and suggestions preferably to the
124       Bioperl mailing list.  Your participation is much appreciated.
125
126         bioperl-l@bioperl.org                  - General discussion
127         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
128
129   Reporting Bugs
130       Report bugs to the Bioperl bug tracking system to help us keep track of
131       the bugs and their resolution. Bug reports can be submitted via email
132       or the web:
133
134         http://bugzilla.open-bio.org/
135

AUTHOR

137        Chris Mungall E<lt>cjm-at-bioperl.orgE<gt>
138
139
140
141perl v5.12.0                      2010-04-29               BP_UNFLATTEN_SEQ(1)