bp_unflatten_seq.pl(1)

1BP_UNFLATTEN_SEQ(1)   User Contributed Perl Documentation  BP_UNFLATTEN_SEQ(1)
2
3
4

NAME

6       unflatten_seq - unflatten a genbank or genbank-style feature file into
7       a nested SeqFeature hierarchy
8

SYNOPSIS

10         unflatten_seq.PLS -e 3 -gff ~/cvs/bioperl-live/t/data/AE003644_Adh-genomic.gb
11
12         unflatten_seq.PLS --detail ~/cvs/bioperl-live/t/data/AE003644_Adh-genomic.gb
13
14         unflatten_seq.PLS -i foo.embl --from embl --to chadoxml -o out.chado.xml
15
16         unflatten_seq.PLS --notypemap --detail --to asciitree -ethresh 2 AE003644_Adh-genomic.gb
17

DESCRIPTION

19       This script will unflatten a genbank or genbank-style file of SeqFea‐
20       tures into a nested hierarchy.
21
22       See Bio::SeqFeature::Tools::Unflattener
23
24       In a GenBank/EMBL representation, features are 'flat' - for example,
25       there is no link between an mRNA and a CDS, other than implicit links
26       (eg via tags or via splice site coordinates) which may be hard to code
27       for.
28
29       This is most easily illustrated with the default output format, asci‐
30       itree
31
32       An unflattened genbank feature set may look like this (AB077698)
33
34         Seq: AB077698
35           databank_entry                                   1..2701[+]
36           gene
37             mRNA
38               CDS hCHCR-G                                  80..1144[+]
39               exon                                         80..1144[+]
40             five_prime_UTR                                 1..79[+]
41             located_sequence_feature                       137..196[+]
42             located_sequence_feature                       239..292[+]
43             located_sequence_feature                       617..676[+]
44             located_sequence_feature                       725..778[+]
45             three_prime_UTR                                1145..2659[+]
46             polyA_site                                     1606..1606[+]
47             polyA_site                                     2660..2660[+]
48
49       Or like this (portion of AE003734)
50
51         gene
52           mRNA CG3320-RA
53             CDS CG3320-PA                                53126..54971[-]
54             exon                                         52204..53323[-]
55             exon                                         53404..53631[-]
56             exon                                         53688..53735[-]
57             exon                                         53798..53918[-]
58             exon                                         54949..55287[-]
59           mRNA CG3320-RB
60             CDS CG3320-PB                                53383..54971[-]
61             exon                                         52204..53631[-]
62             exon                                         53688..53735[-]
63             exon                                         53798..53918[-]
64             exon                                         54949..55287[-]
65
66       The unflattening will also 'normalize' the containment hierarchy (in
67       the sense of standardising it - e.g. making sure there is always a
68       transcript record, even if genbank just specifies CDS and gene)
69
70       By default, the GenBank types will be mapped to SO types
71
72       See Bio::SeqFeature::Tools::TypeMapper
73

COMMAND LINE ARGUMENTS

75       -i⎪input FILE
76           input file (can also be specified as last argument)
77
78       -from FORMAT
79           input format (defaults to genbank)
80
81           probably doesnt make so much sense to use this for non-flat for‐
82           mats; ie other than embl/genbank
83
84       -to FORMAT
85           output format (defaults to asciitree)
86
87           should really be a format that is nested SeqFeature aware; I think
88           this is only asciitree, chadoxml and gff3
89
90       -gff
91           with export to GFF3 format (pre-3 GFFs make no sense with unflat‐
92           tened sequences, as they have no set way of representing feature
93           graphs)
94
95       -o⎪output FILE
96           outfile defaults to STDOUT
97
98       -detail
99           show extra detail on features (asciitree mode only)
100
101       -e⎪ethresh INT
102           sets the error threshold on unflattening
103
104           by default this script will throw a wobbly if it encounters weird
105           stuff in the genbank file - raise the error threshold to signal
106           these to be ignored (and reported on STDERR)
107
108       -nomagic
109           suppress use_magic in unflattening (see Bio::SeqFea‐
110           ture::Tools::Unflattener
111
112       -notypemap
113           suppress type mapping (see Bio::SeqFeature::Tools::TypeMapper
114

TODO

116       Bio::SeqFeature::Tools::Unflattener allows fine-grained control over
117       the unflattening process - need to add more options to allow this con‐
118       trol at the command line
119

FEEDBACK

121       Mailing Lists
122
123       User feedback is an integral part of the evolution of this and other
124       Bioperl modules. Send your comments and suggestions preferably to the
125       Bioperl mailing list.  Your participation is much appreciated.
126
127         bioperl-l@bioperl.org                  - General discussion
128         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
129
130       Reporting Bugs
131
132       Report bugs to the Bioperl bug tracking system to help us keep track of
133       the bugs and their resolution. Bug reports can be submitted via email
134       or the web:
135
136         http://bugzilla.open-bio.org/
137

AUTHOR

139        Chris Mungall E<lt>cjm-at-bioperl.orgE<gt>
140
141
142
143perl v5.8.8                       2007-05-07               BP_UNFLATTEN_SEQ(1)