1SIBsim4(1)                       User Manuals                       SIBsim4(1)
2
3
4

NAME

6       SIBsim4 - align RNA sequences with a DNA sequence, allowing for introns
7

SYNOPSIS

9       SIBsim4 [ options ] dna rna_db
10

DESCRIPTION

12       SIBsim4  is  a  similarity-based  tool  for  aligning  a  collection of
13       expressed sequences (EST, mRNA) with a genomic DNA sequence.
14
15       Launching SIBsim4 without any arguments will print  the  options  list,
16       along with their default values.
17
18       SIBsim4  employs  a  blast-based technique to first determine the basic
19       matching blocks representing the "exon cores".  In this first stage, it
20       detects  all  possible exact matches of W-mers (i.e., DNA words of size
21       W) between the two sequences and extends them to maximal  scoring  gap-
22       free  segments.   In the second stage, the exon cores are extended into
23       the adjacent as-yet-unmatched fragments using  greedy  alignment  algo‐
24       rithms, and heuristics are used to favor configurations that conform to
25       the splice-site recognition signals (e.g., GT-AG).  If  necessary,  the
26       process  is  repeated  with  less stringent parameters on the unmatched
27       fragments.
28
29       By default, SIBsim4 searches both strands and reports the best matches,
30       measured  by the number of matching nucleotides found in the alignment.
31       The R command line option can be used to restrict  the  search  to  one
32       orientation (strand) only.
33
34       Currently,  four  major  alignment  display options are supported, con‐
35       trolled by the A option. By default, only the endpoints, overall  simi‐
36       larity,  and  orientation  of  the  introns are reported. An arrow sign
37       ('->' or '<-') indicates the orientation of the intron.  The sign  `=='
38       marks  the  absence  from  the alignment of a cDNA fragment starting at
39       that position.
40
41       In the description below, the term MSP denotes a maximal scoring  pair,
42       that  is,  a  pair  of  highly  similar fragments in the two sequences,
43       obtained during the blast-like procedure by extending a  W-mer  hit  by
44       matches and perhaps a few mismatches.
45
46

OPTIONS

48       -A <int>
49              output format
50                0: exon endpoints only
51                1: alignment text
52                3: both exon endpoints and alignment text
53                4: both exon endpoints and alignment text with polyA info
54
55              Note that 2 is unimplemented.
56
57              Default value is 0.
58
59       -C <int>
60              MSP score threshold for the second pass.
61
62              Default value is 12.
63
64       -c <int>
65              minimum  score cutoff value.  Alignments which have scores below
66              this value are not reported.
67
68              Default value is 50.
69
70       -E <int>
71              cutoff value.
72
73              Default value is 3.
74
75       -f <int>
76              score filter in percent.  When multiple hits  are  detected  for
77              the same RNA element, only those having a score within this per‐
78              centage of the maximal score for that RNA element are  reported.
79              Setting  this value to 0 disables filtering and all hits will be
80              reported, provided their score is above the cutoff value  speci‐
81              fied through the c option.
82
83              Default value is 75.
84
85       -g <int>
86              join exons when gap on genomic and RNA have lengths which differ
87              at most by this percentage.
88
89              Default value is 10.
90
91       -I <int>
92              window width in which to search for intron splicing.
93
94              Default value is 6.
95
96       -K <int>
97              MSP score threshold for the first pass.
98
99              Default value is 16.
100
101       -L <str>
102              a comma separated list of forward splice-types.
103
104              Default value is "GTAG,GCAG,GTAC,ATAC".
105
106       -M <int>
107              scoring splice sites, evaluate match within M nucleotides.
108
109              Default value is 10.
110
111       -o <int>
112              when printing results, offset nt positions in  dna  sequence  by
113              this amount.
114
115              Default value is 0.
116
117       -q <int>
118              penalty for a nucleotide mismatch.
119
120              Default value is -5.
121
122       -R <int>
123              direction of search
124                0: search the '+' (direct) strand only
125                1: search the '-' strand only
126                2: search both strands
127
128              Default value is 2.
129
130       -r <int>
131              reward for a nucleotide match.
132
133              Default value is 1.
134
135       -s <int>
136              split  score  in percent.  While linking MSP, if two consecutive
137              group of exons appear like they could be part of  two  different
138              copies of the same gene, they will be tested to see if the score
139              of each individual group relative to the best overall  score  is
140              greater  than  this value.  If both groups have a relative score
141              above this threshold they will be split.
142
143              Default value is 75.
144
145       -W <int>
146              word size.
147
148              Default value is 12.
149
150       -X <int>
151              value for terminating word extensions.
152
153              Default value is 12.
154
155
156
157Bioinformatics                    April 2007                        SIBsim4(1)
Impressum