Protein Shape Strings and DNA Sequences

Stockholm University
Structural Chemistry
Sven Hovmöller
  

query about protein shape strings and DNA sequences

query about Ramachandran plot for proteins in the PDB

Search swissprot access number from PDB chains and others

This site contains two databases for all protein chains in the Protein Data Bank. One is for shape strings and the other for their corresponding DNA sequences. You can search a shape string or DNA sequence or amino acid sequence for a protein chain. you can also do query for a large set of proteins. The database was updated on .

1. Shape strings

This is co-project between Roger E. Ison, Robert H. Kretsinger and Sven Hovmöller. There is also a program, "FRAGS", which was developed by Roger Ison based on shape string for protein structure prediction.

Each amino acid is assigned a shape symbol according to its dihedral angles(Φ ,Ψ) in the Ramachandran plot (see figure below). There are 8 shape states, A=α-helices; K=3_10-helices; S=β-sheets; R=poly Pro II; U,V=bridging regions; T = turns(also called right-handed helix); G=almost entirely Gly. For different amino acids, the areas for each shape are little different. For details see [Hovmöller,S., Zhou, T. And Ohlson, T. (Acta Cryst D58(2002):768)]. Here is the reprint of the paper in PDF format.

AAS composition for click following AAS to
show shape composition
click a shape area to
ALA VAL LEU ILE PRO PHE MET LYS ARG HIS GLY SER THR CYS TYR ASN GLU TRP ASP GLN All
AAS % AAS % show AAS composition
ALA: GLY: Composition
VAL: SER: for:
LEU: THR: S:
ILE: CYS: R:
PRO: TYR: U:
PHE: ASN: V:
MET: GLU: K:
LYS: TRP: A:
ARG: ASP: T:
HIS: GLN: G:
Definition of shape string Ramachandran plot for
Note: AAS = amino acids

Click table to download complete composition data for 20 amino acids of all 8 shapes. Up to the updated date above, there are shape strings for out of chains in PDB.

sheet random bridge 1 bridge 2 3-10 helix right-handed helix turns special Gly region special Gly region

2. DNA sequences

If given a subunit of protein in PDB, of course, you can get its DNA sequence with SRS1 (Sequence Retrieval System, a network browser for databanks in molecular biology). But it is time-consuming because there is no entry for a subchain of protein in PDB and you can only get one DNA sequence each time. There is also a file containing an index of PDB entries referenced in Swiss-Prot at it's FTP Server, but there is no detailed information telling which subunit or which segment of a subunit that corresponds to the sequence in Swiss-Prot. Here, we present a database that shows the subunits of proteins from PDB and their DNA sequences and corresponding segment(s) of entries in Swiss-prot. Up to the updated date above, there are DNA sequences for out of chains in PDB.

The DNA database is developed as follows: 1) searching all Swiss-prot entries which refer to protein chains in PDB; 2) extracting the DNA sequences for protein chains from EMBL by Swiss-prot entries obtained above which are referred in EMBL. There are two files available(ftp.fos.su.se/pub/pdbdna), one ("pdb_siss.zip") for the relationship between PDB entry and Swiss-prot entry and the other ("pdb_swiss_embl.zip") for DNA sequences of proteins in PDB

When a chain (or segment) from PDB agrees more than 98% to a chain (or segment) in Swiss-prot, we accept the DNA sequence of Swiss-prot as that of the PDB entry. Thus, the amino acid sequences as deduced from the DNA sequences often are not 100% identical to the amino acid sequences in the PDB. The most common reason for this discrepancy is that the protein reported in the PDB has been engineered slightly. In most cases one or two amino acids have been mutated or deleted. The DNA sequences shown are those occurring in the native organisms, while the amino acids are those of the structure reported in PDB. The genetic code is color-marked at the position where the amino acids are mis-matched.

query about protein shape strings and DNA sequences

query about Ramachandran plot for proteins in the PDB

 


Any comments or suggestions about this service are welcome to Tuping Zhou    
00009635