bp_genbank2gff3.pl(1)

1BP_GENBANK2GFF3(1)    User Contributed Perl Documentation   BP_GENBANK2GFF3(1)
2
3
4

NAME

6       bp_genbank2gff3.pl -- Genbank->gbrowse-friendly GFF3
7

SYNOPSIS

9         bp_gbrowse_genbank2gff3.pl [options] filename(s)
10
11         # process a directory containing GenBank flatfiles
12         perl gbrowse_genbank2gff3.pl --dir path_to_files --zip
13
14         # process a single file, ignore explicit exons and introns
15         perl bp_genbank2gff3.pl --filter exon --filter intron file.gbk.gz
16
17         # process a list of files
18         perl bp_genbank2gff3.pl *gbk.gz
19
20           Options:
21               --dir     -d  path to a list of genbank flatfiles
22               --outdir  -o  location to write GFF files
23               --zip     -z  compress GFF3 output files with gzip
24               --summary -s  print a summary of the features in each contig
25               --filter  -x  genbank feature type(s) to ignore
26               --split   -y  split output to seperate GFF and fasta files for
27                             each genbank record
28               --nolump  -n  seperate file for each reference sequence
29                             (default is to lump all records together into one
30                              output file for each input file)
31               --ethresh -e  error threshold for unflattener
32                             set this high (>2) to ignore all unflattener errors
33               --help    -h  display this message
34

DESCRIPTION

36       This script uses Bio::SeqFeature::Tools::Unflattener and
37       Bio::Tools::GFF to convert GenBank flatfiles to GFF3 with gene contain‐
38       ment hierarchies mapped for optimal display in gbrowse.
39
40       The input files are assumed to be gzipped GenBank flatfiles for refseq
41       contigs.  The files may contain multiple GenBank records.  Either a
42       single file or an entire directory can be processed.  By default, the
43       DNA sequence is embedded in the GFF but it can be saved into seperate
44       fasta file with the --split(-y) option.
45
46       If an input file contains multiple records, the default behaviour is to
47       dump all GFF and sequence to a file of the same name (with .gff
48       appended).  Using the 'nolump' option will create a seperate file for
49       each genbank record.  Using the 'split' option will create seperate GFF
50       and Fasta files for each genbank record.
51
52       Notes
53
54       Note1:
55
56       In cases where the input files contain many GenBank records (for exam‐
57       ple, the chromosome files for the mouse genome build), a very large
58       number of output files will be produced if the 'split' or 'nolump'
59       options are selected.  If you do have lists of files > 6000, use the
60       --long_list option in bp_bulk_load_gff.pl or bp_fast_load_gff.pl to
61       load the gff and/ or fasta files.
62
63       Note2:
64
65       This script is designed for refseq genomic sequence entries.  It may
66       work for third party annotations but this has not been tested.
67

AUTHOR

69       Sheldon McKay (mckays@cshl.edu)
70
71       Copyright (c) 2004 Cold Spring Harbor Laboratory.
72
73
74
75perl v5.8.8                       2007-05-07                BP_GENBANK2GFF3(1)