1mkdssp(1) USER COMMANDS mkdssp(1)
2
3
4
6 mkdssp - Calculate secondary structure for proteins in a PDB file
7
9 mkdssp [OPTION] pdbfile [dsspfile]
10
12 The mkdssp program was originally designed by Wolfgang Kabsch and Chris
13 Sander to standardize secondary structure assignment. DSSP is a data‐
14 base of secondary structure assignments (and much more) for all protein
15 entries in the Protein Data Bank (PDB) and mkdssp is the application
16 that calculates the DSSP entries from PDB entries. Please note that
17 mkdssp does not predict secondary structure.
18
20 If you invoke mkdssp with only one parameter, it will be interpreted as
21 the PDB file to process and output will be sent to stdout. If a second
22 parameter is specified this is interpreted as the name of the DSSP file
23 to create. Both the input and the output file names may have either .gz
24 or .bz2 as extension resulting in the proper compression.
25
26 -i, --input filename
27 The file name of a PDB formatted file containing the protein
28 structure data. This file may be a file compressed by gzip or
29 bzip2.
30
31 -o, --output filename
32 The file name of a DSSP file to create. If the filename ends in
33 .gz or .bz2 a compressed file is created.
34
35 -v, --verbose
36 Write out diagnositic information.
37
38 --version
39 Print the version number and exit.
40
41 -h, --help
42 Print the help message and exit. The directory containing the
43 parser scripts for mrs.
44
46 The DSSP program works by calculating the most likely secondary struc‐
47 ture assignment given the 3D structure of a protein. It does this by
48 reading the position of the atoms in a protein (the ATOM records in a
49 PDB file) followed by calculation of the H-bond energy between all
50 atoms. The best two H-bonds for each atom are then used to determine
51 the most likely class of secondary structure for each residue in the
52 protein.
53
54 This means you do need to have a full and valid 3D structure for a pro‐
55 tein to be able to calculate the secondary structure. There's no magic
56 in DSSP, so e.g. it cannot guess the secondary structure for a mutated
57 protein for which you don't have the 3D structure.
58
60 The header part of each DSSP file is self explaining, it contains some
61 of the information copied over from the PDB file and there are some
62 statistics gathered while calculating the secondary structure.
63
64 The second half of the file contains the calculated secondary structure
65 information per residue. What follows is a brief explanation for each
66 column.
67
68 Column Name Description
69 ────────────────────────────────────────────────────────────────────────
70 # The residue number as counted by mkdssp
71 RESIDUE The residue number as specified by the PDB
72 file followed by a chain identifier.
73 AA The one letter code for the amino acid. If
74 this letter is lower case this means this
75 is a cysteine that form a sulfur bridge
76 with the other amino acid in this column
77 with the same lower case letter.
78 STRUCTURE This is a complex column containing multi‐
79 ple sub columns. The first column con‐
80 tains a letter indicating the secondary
81 structure assigned to this residue. Valid
82 values are:
83 Code Description
84 H Alpha Helix
85 B Beta Bridge
86 E Strand
87 G Helix-3
88 I Helix-5
89 T Turn
90 S Bend
91 What follows are three column indicating
92 for each of the three helix types (3, 4
93 and 5) whether this residue is a candidate
94 in forming this helix. A > character indi‐
95 cates it starts a helix, a number indi‐
96 cates it is inside such a helix and a <
97 character means it ends the helix.
98 The next column contains a S character if
99 this residue is a possible bend.
100 Then there's a column indicating the chi‐
101 rality and this can either be positive or
102 negative (i.e. the alpha torsion is either
103 positive or negative).
104 The last two columns contain beta bridge
105 labels. Lower case here means parallel
106 bridge and thus upper case means anti par‐
107 allel.
108 BP1 and BP2 The first and second bridge pair candi‐
109 date, this is followed by a letter indi‐
110 cating the sheet.
111 ACC The accessibility of this residue, this is
112 the surface area expressed in square
113 Ångstrom that can be accessed by a water
114 molecule.
115 N-H-->O..O-->H-N Four columns, they give for each residue
116 the H-bond energy with another residue
117 where the current residue is either accep‐
118 tor or donor. Each column contains two
119 numbers, the first is an offset from the
120 current residue to the partner residue in
121 this H-bond (in DSSP numbering), the sec‐
122 ond number is the calculated energy for
123 this H-bond.
124 TCO The cosine of the angle between C=O of the
125 current residue and C=O of previous
126 residue. For alpha-helices, TCO is near
127 +1, for beta-sheets TCO is near -1. Not
128 used for structure definition.
129
130
131
132
133 Kappa The virtual bond angle (bend angle)
134 defined by the three C-alpha atoms of the
135 residues current - 2, current and current
136 + 2. Used to define bend (structure code
137 'S').
138 PHI and PSI IUPAC peptide backbone torsion angles.
139 X-CA, Y-CA and Z-CA The C-alpha coordinates
140
141
143 The original DSSP application was written by Wolfgang Kabsch and Chris
144 Sander in Pascal. This version is a complete rewrite in C++ based on
145 the original source code. A few bugs have been fixed since and the
146 algorithms have been tweaked here and there.
147
149 The code desperately needs an update. The first thing that needs imple‐
150 menting is the improved recognition of pi-helices. A second improvement
151 would be to use angle dependent H-bond energy calculation.
152
154 If you find any, please let me know.
155
157 Maarten L. Hekkelman (m.hekkelman (at) cmbi.ru.nl)
158
159
160
161version 2.0.4 18-apr-2012 mkdssp(1)