1SIBsim4(1)                       User Manuals                       SIBsim4(1)
2
3
4

NAME

6       SIBsim4 - align RNA sequences with a DNA sequence, allowing for introns
7

SYNOPSIS

9       SIBsim4 [ options ] dna rna_db
10

DESCRIPTION

12       SIBsim4  is  a  similarity-based  tool  for  aligning  a  collection of
13       expressed sequences (EST, mRNA) with a genomic DNA sequence.
14
15       Launching SIBsim4 without any arguments will print  the  options  list,
16       along with their default values.
17
18       SIBsim4  employs  a  blast-based technique to first determine the basic
19       matching blocks representing the "exon cores".  In this first stage, it
20       detects  all  possible exact matches of W-mers (i.e., DNA words of size
21       W) between the two sequences and extends them to maximal  scoring  gap-
22       free  segments.   In the second stage, the exon cores are extended into
23       the adjacent as-yet-unmatched fragments using  greedy  alignment  algo‐
24       rithms, and heuristics are used to favor configurations that conform to
25       the splice-site recognition signals (e.g., GT-AG).  If  necessary,  the
26       process  is  repeated  with  less stringent parameters on the unmatched
27       fragments.
28
29       By default, SIBsim4 searches both strands and reports the best matches,
30       measured  by the number of matching nucleotides found in the alignment.
31       The R command line option can be used to restrict  the  search  to  one
32       orientation (strand) only.
33
34       Currently,  four  major  alignment  display options are supported, con‐
35       trolled by the A option. By default, only the endpoints, overall  simi‐
36       larity,  and  orientation  of  the  introns are reported. An arrow sign
37       ('->' or '<-') indicates the orientation of the intron.  The sign  `=='
38       marks  the  absence  from  the alignment of a cDNA fragment starting at
39       that position.
40
41       In the description below, the term MSP denotes a maximal scoring  pair,
42       that  is,  a  pair  of  highly  similar fragments in the two sequences,
43       obtained during the blast-like procedure by extending a  W-mer  hit  by
44       matches and perhaps a few mismatches.
45
46

OPTIONS

48       -A <int>
49              output format
50                0: exon endpoints only
51                1: alignment text
52                3: both exon endpoints and alignment text
53                4: both exon endpoints and alignment text with polyA info
54
55              Note that 2 is unimplemented.
56
57              Default value is 0.
58
59       -C <int>
60              MSP score threshold for the second pass.
61
62              Default value is 12.
63
64       -c <int>
65              minimum  score cutoff value.  Alignments which have scores below
66              this value are not reported.
67
68              Default value is 50.
69
70       -E <int>
71              cutoff value.
72
73              Default value is 3.
74
75       -f <int>
76              score filter in percent.  When multiple hits  are  detected  for
77              the same RNA element, only those having a score within this per‐
78              centage of the maximal score for that RNA element are  reported.
79              Setting  this value to 0 disables filtering and all hits will be
80              reported, provided their score is above the cutoff value  speci‐
81              fied through the c option.
82
83              Default value is 75.
84
85       -g <int>
86              join exons when gap on genomic and RNA have lengths which differ
87              at most by this percentage.
88
89              Default value is 10.
90
91       -H <int>
92              report chimeric transcripts when the best score  is  lower  than
93              this  percentage  of  the  overall  RNA coverage and the chimera
94              score is greater than this percentage of the RNA length (0  dis‐
95              ables this report)
96
97              Default value is 75.
98
99       -I <int>
100              window width in which to search for intron splicing.
101
102              Default value is 6.
103
104       -K <int>
105              MSP score threshold for the first pass.
106
107              Default value is 16.
108
109       -L <str>
110              a comma separated list of forward splice-types.
111
112              Default value is "GTAG,GCAG,GTAC,ATAC".
113
114       -M <int>
115              scoring splice sites, evaluate match within M nucleotides.
116
117              Default value is 10.
118
119       -o <int>
120              when  printing  results,  offset nt positions in dna sequence by
121              this amount.
122
123              Default value is 0.
124
125       -q <int>
126              penalty for a nucleotide mismatch.
127
128              Default value is -5.
129
130       -R <int>
131              direction of search
132                0: search the '+' (direct) strand only
133                1: search the '-' strand only
134                2: search both strands
135
136              Default value is 2.
137
138       -r <int>
139              reward for a nucleotide match.
140
141              Default value is 1.
142
143       -S <int>
144              splice site indels search breadth.  While determining  the  best
145              position  of a splice site, SIBsim4 will evaluate adding at most
146              this number of insertions and deletions on  the  DNA  strand  on
147              each side of the splice junction.
148
149              Default value is 2.
150
151       -s <int>
152              split  score  in percent.  While linking MSP, if two consecutive
153              group of exons appear like they could be part of  two  different
154              copies of the same gene, they will be tested to see if the score
155              of each individual group relative to the best overall  score  is
156              greater  than  this value.  If both groups have a relative score
157              above this threshold they will be split.
158
159              Default value is 75.
160
161       -W <int>
162              word size.
163
164              Default value is 12.
165
166       -X <int>
167              value for terminating word extensions.
168
169              Default value is 12.
170
171
172
173Bioinformatics                    April 2007                        SIBsim4(1)
Impressum