Initial
Bioinformatic Investigation
Using Bioinformatic Tools to
Strategically Design Expression/Purification Projects
Dr.
Nurit Kleinberger-Doron
Your comments are most
welcome.
Entries since November 2003
Bioinformatics Tools Sorted according
to Rationale Project Design
Back
Bioinformatics Tools Sorted by Expression
Problems
Problem
|
Possible Causes
|
Bioinformatics Tools
|
Very low amounts of
expressed proteins
|
-
Secondary structure of mRNA
-
Rare codons
-
Low t1/2,
-
Secretion signal
|
|
Truncated forms
|
-
Rare codons
-
Genetic code differences [trp-stop]
-
Additional RBS [consider GUG
too] {alternative reading frame}
-
Proteases during induction
or lysis
-
Cloning out of frame
|
|
Insoluble protein
|
-
Post-translational modifications
-
Transmembrane domains
-
In-frame mutation/s due to rare
codons or non-standard genetic code
|
|
Back
Additional
Tools & Databases
Back
Preliminary
Search of DatabasesBefore
Starting a Project
Preliminary
Search Using Keywords
Sequence-based
Preliminary Search & Sequence Alignments
DNA
& RNA Sequence Analysis
Secondary
Structure of mRNA
Codon
Usage and Translation Frames
Alternative Splicing
Ribosome Binding Site (Background)
Protein
Sequence Analysis
Motifs
and Repeats in Proteins
-
Interpro
- A database of protein families, domains and functional sites in which
identifiable features found in known proteins can be applied to unknown
protein sequences.
-
CDD[Conserved
Domain Database] - a collection of sequence alignments and
profiles representing protein domains conserved in molecular evolution.
(NCBI)
[also appears as part of Blast output]. Run
a BLAST search against the CDD.
-
ELM
- Eukaryotic Linear Motif Resource for Functional Sites in Proteins.
-
Prosite
- A database of protein families and domains. It consists of biologically
significant sites, patterns and profiles that help to reliably identify
to which known protein family (if any) a new sequence belongs.
-
ProTeUs
- (PROtein TErminUS) - a tool for the identification of short linear signatures
in protein termini. (About)
-
QuasiMotiFinder
- A server for the identification of motifs and signature-like patterns
in protein sequences (based on Prosite and multiple alignments)
-
Motif
Search - for DNA or protein sequences.
-
Radar
- Rapid Automatic Detection and Alignment of Repeats in protein sequences.
-
SAPS
- Statistical Analysis of Protein Sequences.
-
Additional
sites (Expasy list - "Pattern & Profiles Searches").
- Protein patterns and motifs - various tools (GeneInfinity)
- Protein domains and families - various tools (GeneInfinity)
Physicochemical
properties
-
Protparam
- Physico-chemical parameters of a protein sequence (amino-acid and atomic
compositions, pI, extinction coefficient, etc.)
-
Protein
Calculator - Generates molecular weight information (including scanning
mass spectrometry results), estimated charges (including pI estimation),
uv absorption coefficients, crystallographic solvent content percentage
and Vm, and counts atoms and residues based on the protein sequence.
-
EMBOSS sequence statistics - Pepinfo / Pepwindow / Pepstats .
- ProtScale (several hydrophobicity scales)
Protein
Turnover
-
PESTfind
- Polypeptide sequences enriched in Proline (P), glutamic acid (E), serine
(S) and threonine (T) target proteins for rapid destruction. PESTfind
produces a score ranging form about -50 to +50. By definition, a score
above zero denotes a possible PEST region, but a value greater than +5
sparks real interest.
-
Destruction
Box (D box) Finder - characterizes some proteins destined to proteolysis
by ubiquitin and the 26S proteasome pathway.
- SProtP Human - Short-lived protein prediction in human
-
Protparam
-
references
to N-end rule (scroll down)
-
I-Mutant2.0
- a tool for predicting protein stability upon single site mutation.
- MUpro - Prediction of protein stability changes for single site mutations from sequences.
- Additional tools for assessing protein stability following mutagenesis.
Proteolytic
Cleavage
Co-
and Post-translational Modifications (A
short summary)
Phosphorylation
-
Netphos
- Predicts serine, threonine and tyrosine phosphorylation sites in eukaryotic
proteins.
-
KinasePhos
- computes the location of the phosphorylation sites and the corresponding
catalytic protein kinases.
-
NetPhosK
- produces predictions of kinase specific eukaryotic protein phosphoylation
sites. Currently NetPhosK covers the following kinases: PKA, PKC,
PKG, CKII, Cdc2, CaM-II, ATM, DNA PK, Cdk5, p38 MAPK, GSK3, CKI, PKB, RSK,
INSR, EGFR and Src.
- NetPhosYeast - predicts serine and threonine phosphorylation sites in yeast proteins.
-
Scansite3.0
- searches for motifs within proteins that are likely to be phosphorylated
by specific protein kinases or bind to domains such as SH2 domains, 14-3-3
domains or PDZ domains.
- Phospho.ELM - a database of S/T/Y phosphorylation sites
- PhosphoSitePlus - covers phosphrylation, acetylation, ubiquitination and additional PTMs
-
GPS:
Group-based
Phosphorylation Scoring Method - in-silico prediction of
phosphorylation sites of specific kinases.
- PPSP - Prediction of Protein Kinases-specific Phosphorylation sites.
- PhosphoSVM
- a non-kinase-specific protein phosphorylation site prediction method
that integrates nine different sequence level scores: shannon entropy
(SE), relative entropy (RE), predicted protein secondary structure
(SS), predicted protein disorder (PD), accessible surface area (ASA),
overlapping properties (OP), averaged cumulative hydrophobicity (ACH),
and k-nearest neighbor (KNN).
- DisPhos - predicts S, T and Y phosphorylation.
The observation that amino acid composition, sequence complexity,
hydrophobicity, charge and other sequence attributes of regions
adjacent to phosphorylation sites are very similar to those of
intrinsically disordered protein regions suggests that disorder in and
around the potential phosphorylation target site is an important
prerequisite for phosphorylation. Thus, DISPHOS uses disorder
information to improve the discrimination between phosphorylation and
non-phosphorylation sites.
- Phosida - posttranslational modifications database - covers phosphorylation, acetylation & N-glycosylation
Glycosylation (Uniprot background)
- GlycoEP - prediction of glycosites in eukaryotic glycoproteins.
-
YinOYang
- Predicts O-ß-GlcNAc attachment sites in eukaryotic protein sequences.
-
NetOGlyc
- Predicts mucin type GalNAc O-glycosylation sites in mammalian proteins.
-
NetNGlyc
- Predicts N-Glycosylation sites in human proteins.
-
DictyOGlyc
- Predicts GlcNAc O-glycosylation sites in Dictyostelium discoideum proteins.
- GPP - glycosylation predictor (N- & O- linked)
- EnsenbleGly - O-, N- & C- linked clycosylation sites
- O-glycosylation prediction
- ISOGlyP - Isoform Specific O-Glycosylation Prediction
- NetCGlyc - predictions of C-mannosylation sites in mammalian proteins.
- NetGlycate - predicts glycation of epsilon amino groups of lysines in mammalian proteins.
- Phosida - posttranslational modifications database - covers phosphorylation, acetylation & N-glycosylation
Addition of Lipid
Moieties
-
NMT
- Predicts N-terminal N-Myristoylation by MyristoylCoA:Protein N-Myristoyltransferase.
-
Myristoylator
- predicts N-terminal myristoylation of proteins by neural networks.
- PlantsP - predicts plant specific myristoylation.
-
PrePS
- Prenylation Prediction Suite (additional
details).
- CSS-Palm - palmitoylation sites prediction
- NBA-Palm - prediction of palmitoylation sites.
-
GPI lipid anchor predictor -
animals,
plants.
-
GPI-SOM:
Identification of GPI-anchor signals.
- PredGPI -
- Big-PI predictor - GPI modification site prediction.
S-S
bonds or metal binding sites
-
CYSPRED
- Predicts cysteins that are likely to be partners in cysteine bridges.
(Program
described in: Fariselli
P, Riccobelli P, Casadio R PROTEINS(1999) 36:340-346)
-
DCON
- Predictor of Disulfide Connectivity in Proteins.
-
Disulfind
- Cysteines Bonding State and Connectivity Predictor.
-
Dianna
- Cysteine state and Disulfide Bond partner prediction.
-
EDBCP - Ensemble-based disulfide bonding connectivity pattern
-
GDAP
- disulfide bond prediction by sequence to structure mapping
- CysRedox -
predicting the redox state of cysteins in proteins from multiple sequence alignments.
- CysState -
cysteines disulfide bonding state prediction from protein sequence
- DiPro - protein disulfide bond prediction
- MetalDetector - cysteine and histidine metal binding sites predictor
- Dinosolve - disulfide bonding prediction server
Ubiquitination & Sumoylation
Others
- TermiNator
- predicts N-terminal methionine excision, N-terminal acetylation,
N-terminal myristoylation and S-palmitoylation of either prokaryotic or
eukaryotic proteins originating from organellar or nuclear genomes.
- The
Sulfinator - Predicts tyrosine sulfation sites in protein sequences
- SulfoSite - predicts sulfation sites.
- GPS-YNO2 - prediction of tyrosine nitration sites.
-
NetAcet
- predicts substrates of N-acetyltransferase A (NatA). The method
was trained on yeast data but it obtains similar performance values
on mammalian substrates acetylated by NatA orthologs.
- GPS-SNO - prediction of S-nitrosylation sites.
- MeMo - predicts arginine and lysine methylation sites in proteins
- PhosphoSitePlus - covers phosphrylation, acetylation, ubiquitination and additional PTMs
- PTMs peptide scanner - phosphorylation, sumoylation, palmitoylation, methylation and acetylation.
- Additional sites of interest (Expasy list, Gene Infinity list)
Sub-cellular
Localization and Signal Peptides
Subcellular Compartments
-
psort
- Several programs for subcellular localization prediction (eukaryotic sequences,
plant and Gram-positive bacterial sequences, Gram-negative bacterial sequences)
-
SoftBerry
Protein Location Finding - offers different programs for animal/fungi,
plant and bacterial proteins.
-
ESLPred
- prediction of subcellular localization of
eukaryotic proteins.
- PSLPred - for prokaryotic proteins
- Cell PLoc 2.0
- a package of web servers for predicting subcellular localization of
proteins in different organisms (prokaryotes & eukaryotes including
plants)
- LocDB - protein localization database for human & arabidopsis
- MultiLoc2 - for eukaryotes (University of Tubingen)
- eSLDB - covers human, mouse, C. elegans, S. cerevisiae and A. thaliana.
- Locate - subcellular localization database for mouse and human
- OrganelleDB - for some eukaryotes
- BaCello - for eukaryotes
- TargetP
Server - Predicts the subcellular location of eukaryotic protein sequences.
The subcellular location assignment is based on the predicted presence
of any of the N-terminal presequences chloroplast transit peptide (cTP),
mitochondrial targeting peptide (mTP) or secretory pathway signal peptide
(SP).
-
SubLoc 1.0
- for prokaryotes and eukaryotes (contains less
subcellular targets than other programs, and so may bias results)
-
SecretomeP
- Prediction of non-classical and leaderless protein secretion. Produces
ab initio predictions of non-classical i.e. not signal peptide triggered
protein secretion. The method queries a large number of other feature prediction
servers to obtain information on various post-translational and localizational
aspects of the protein, which are integrated into the final secretion prediction.
(Paper)
-
Golgi
transmembrane predictor - predicts Golgi membrane proteins based on
their transmembrane domains. This prediction method is only valid
for Type II transmembrane proteins, and output from the method is simply
predicted to be Golgi localised or predicted to transit through the Golgi
(post-Golgi localisation).
-
PTS1
predictor - predicts the peroxisomal targeting signal 1.
- Additional tools (compiled by GeneInfinity)
Targetting Peptides
-
psort
- various tools for prediction of protein sorting signals in different groups of organisms
-
SignalP
- Predicts the presence and location of signal peptide cleavage sites in
amino acid sequences from different organisms (Gram-positive prokaryotes,
Gram-negative prokaryotes, and eukaryotes).
-
Sigcleave
- Reports protein signal cleavage sites.
- Signal-3L - animals, plants and bacteria
-
SPEPLip
- Predictor of Signal Peptide and Lipoprotein Cleavage Sites in Proteins
-
LipoP
- prediction of lipoproteins and signal peptides in Gram negative bacteria
- PredLipo - prediction of lipoproteins and signal peptides in Gram positive bacteria
- cNLS mapper - Prediction of importin α-dependent nuclear localization signals
- NucPred -
-
NetNES
- predicts leucine-rich nuclear export signals (NES) in eukaryotic proteins.
-
ChloroP
- Prediction of chloroplast transit peptides
-
MitoProt
- Prediction of mitochondrial targeting sequences.
-
Predotar
- Prediction of mitochondrial and plastid targeting sequences.
- PTS1 predictor - for Peroxisomal Targeting Signal 1
-
SecretomeP
- predicts non-classical i.e. not signal peptide triggered protein secretion
in eukaryotes. The method queries a large number of other feature
prediction servers to obtain information on various post-translational
and localizational aspects of the protein, which are integrated into the
final secretion prediction.
-
TatP
- predicts the presence and location of Twin-arginine signal peptide cleavage
sites in bacteria.
-
ProP
- predicts arginine and lysine propeptide cleavage sites in eukaryotic
protein sequences.
- Aditional tools (compiled by Gene Infinity)
Protein-Protein
Interactions
-
STRING
- a database of known and predicted protein-protein interactions. The interactions
include direct (physical) and indirect (functional) associations; they
are derived from four sources: 1. Genomic Context 2. High-throughput
Experiments 3. (Conserved) Coexpression 4. Previous Knowledge
.
-
IntAct
- all interactions are derived from literature curation or direct user
submissions
-
HGPRD
(Human Protein Reference Database) - examine the sections: "interactions"
& "PTMs and Substrates".
-
DIP
- Database of interacting proteins.
-
MINT
- Molecular Interactions Database.
-
Bind
- Biomolecular Interaction Network Database.
-
iPfam
- describes domain-domain interactions that are observed in PDB entries.
-
InterDom
- a database of putative interacting protein domains derived
from multiple sources, ranging from domain fusions (Rosetta Stone), protein
interactions (DIP and BIND), protein complexes (PDB), to scientific literature
(MEDLINE).
- Additional tools (compiled by Gene Infinity)
Biological
Pathways
Experimentally
Determined Protein Structures
Search for known structures
using:
-
PDB
- RCSB protein data bank.
-
MMDB
- Entrez structures (molecular modeling database)
-
PDBe -
-
OCA
- provides rich content annotation on structure and function, generating
dynamic links to several external sources.
-
PDBSum
- is a pictorial database providing an at-a-glance overview of every macromolecular
structure deposited in the PDB. It provides schematic diagrams of the molecules
in each structure and of the interactions between them.
-
iPfam
- describes domain-domain interactions that are observed in PDB entries.
Structure Classification
- SCOP - structural classification of proteins
- CATH - a hierarchical domain classification of protein structures in the Protein Data Bank.
Structure
& Function Predictions
Function
Prediction
-
ProtFun
- predicts protein function from sequence. The method queries
a large number of other feature prediction servers to obtain information
on various post-translational and localizational aspects of the protein,
which are integrated into final predictions of the cellular role, enzyme
class (if any), and selected Gene Ontology categories of the submitted
sequence; Paper
1; Paper
2;
- PFP - protein function prediction
-
ConSurf
- Server for the identification of functional regions in proteins with or without
known structures.
- ProFunc - prediction of function from protein structure
- GOtcha - a function prediction for your sequence
Secondary
Structure Prediction
-
PsiPred
- Protein Structure Prediction Server
- Proteus2 -
bundles signal peptide identification, transmembrane helix prediction,
transmembrane beta-strand prediction, secondary structure prediction
(for soluble proteins) and homology modeling (i.e. 3D structure
generation) into a single prediction pipeline.
- JPred3 - a secondary structure prediction server powered by the Jnet algorithm.
-
PredictProtein
- offers the following: generation of multiple sequence alignments (MaxHom)
, detection of functional motifs (PROSITE), detection of composition-bias
(SEG),
detection of protein domains (PRODOM), fold recognition by prediction-based
threading (TOPITS), predictions of: secondary structure (PHDsec,
and PROFsec), residue solvent accessibility (PHDacc, and PROFacc),
transmembrane
helix location and topology (PHDhtm, PHDtopology),
protein globularity
(GLOBE), coiled-coil regions
(COILS), cysteine bonds (CYSPRED),
structural switching regions (ASP)
- Scratch protein predictor
- offers the following predictions: secondary structure, solvent
accesibility, transmembrane regions, disordered regions, disulfide
bonds, domains, antigenicity.
-
Expasy
tools
- Additional tools (compiled by Gene Infinity)
Disordered Proteins
Domain & domain-linker Prediction
Topology Prediction
a
helices
-
Psipred
- You may select one of three prediction methods to apply to your sequence:
PSIPRED - a highly accurate method for protein secondary structure prediction,
MEMSAT - a widely used transmembrane topology prediction method and GenTHREADER
- a sequence profile based fold recognition method.
-
TMHMM
- Prediction of transmembrane helices in proteins, nice graphics
-
Phobius
- a combined transmembrane topology and signal peptide predictor.
-
SOSUI
- Predicts transmembrane helices in proteins and includes the helical wheels
in the graphic presentation. Checks for presence of signal peptide to avoid
the risk of signal peptides being predicted as putative TM as well.
- TMPred - presence of transmembrane helices and their orientation.
- MemBrain -
- TOPCONS - consensus prediction of membrane protein topology
- Octopus
- MINNOU
- Topo2 - transmembrane protein display software - user has to supply the data about TMDs. Residues of interest can be highlighted.
b
sheets
Helical
Wheels
Additional
Tools
-
Suggest
an expression system - developed in Weizmann Institute of Sciences.
SuggestES takes the protein sequence you provide and scans a large database
with protein sequences with known results for different expression systems.
At the time of generating a suggestion, suggestES takes into consideration
several parameters:
-
Similarity: how similar is your sequence to
the existing data in the database?. The expression systems used on sequences
similar to yours are preferred when creating the list of suggestions.
-
Recentness: how recently was used a given
expression system?. The older the record of the usage of a given expression
system, the less this system will influence the final result. This will
provide visibility to recently appearing system.
-
Frequency: how frequently a given expression
system has been used?
- SECRET predicts the chance that a soluble protein will crystallize.
- ESSPRESSO (estimation of protein expression and solubility)
- SERp aims
to aid identification of sites that are most suitable for mutation
designed to enhance crystallizability by a Surface Entropy Reduction
approach.
- CRYSTALP2 - for in-silico prediction of protein crystallization propensity.
This site is maintained by Dr.
Nurit Doron . Your comments are most welcome.
entries since November 2003