Initial
Bioinformatic Investigation
Using Bioinformatic Tools to
Strategically Design Expression/Purification Projects
Dr.
Nurit Kleinberger-Doron
Your comments are most
welcome.
Entries since November 2003
Bioinformatics Tools Sorted according
to Rationale Project Design
Back
Bioinformatics Tools Sorted by Expression
Problems
|
Problem
|
Possible Causes
|
Bioinformatics Tools
|
|
Very low amounts of
expressed proteins
|
-
Secondary structure of mRNA
-
Rare codons
-
Low t1/2,
-
Secretion signal
|
|
|
Truncated forms
|
-
Rare codons
-
Genetic code differences [trp-stop]
-
Additional RBS [consider GUG
too] {alternative reading frame}
-
Proteases during induction
or lysis
-
Cloning out of frame
|
|
|
Insoluble protein
|
-
Post-translational modifications
-
Transmembrane domains
-
In-frame mutation/s due to rare
codons or non-standard genetic code
|
|
Back
Additional
Tools & Databases
Back
Preliminary
Search of DatabasesBefore
Starting a Project
Preliminary
Search Using Keywords
Sequence-based
Preliminary Search & Sequence Alignments
DNA
Sequence Analysis
Secondary
Structure of mRNA
Codon
Usage and Translation Frames
Alternative Splicing
Protein
Sequence Analysis
Motifs
and Repeats in Proteins
-
Interpro
- A database of protein families, domains and functional sites in which
identifiable features found in known proteins can be applied to unknown
protein sequences.
-
CDD[Conserved
Domain Database] - a collection of sequence alignments and
profiles representing protein domains conserved in molecular evolution.
(NCBI)
[also appears as part of Blast output]. Run
a BLAST search against the CDD.
-
ELM
- Eukaryotic Linear Motif Resource for Functional Sites in Proteins.
-
Prosite
- A database of protein families and domains. It consists of biologically
significant sites, patterns and profiles that help to reliably identify
to which known protein family (if any) a new sequence belongs.
-
ProTeUs
- (PROtein TErminUS) - a tool for the identification of short linear signatures
in protein termini. (About)
-
QuasiMotiFinder
- A server for the identification of motifs and signature-like patterns
in protein sequences (based on Prosite and multiple alignments)
-
Motif
Search - for DNA or protein sequences.
-
Radar
- Rapid Automatic Detection and Alignment of Repeats in protein sequences.
-
SAPS
- Statistical Analysis of Protein Sequences.
-
Additional
sites (Expasy list).
Physicochemical
properties
-
Protparam
- Physico-chemical parameters of a protein sequence (amino-acid and atomic
compositions, pI, extinction coefficient, etc.)
-
Protein
Calculator - Generates molecular weight information (including scanning
mass spectrometry results), estimated charges (including pI estimation),
uv absorption coefficients, crystallographic solvent content percentage
and Vm, and counts atoms and residues based on the protein sequence.
-
EMBOSS
Pepinfo/Pepwindow/Pepstats .
- ProtScale (several hydrophobicity scales)
Protein
Turnover
-
PESTfind
- Polypeptide sequences enriched in Proline (P), glutamic acid (E), serine
(S) and threonine (T) target proteins for rapid destruction. PESTfind
produces a score ranging form about -50 to +50. By definition, a score
above zero denotes a possible PEST region, but a value greater than +5
sparks real interest.
-
Destruction
Box (D box) Finder - characterizes some proteins destined to proteolysis
by ubiquitin and the 26S proteasome pathway.
-
Protparam
-
references
to N-end rule (scroll down)
-
I-Mutant2.0
- a tool for predicting protein stability upon single site mutation.
Proteolytic
Cleavage
Co-
and Post-translational Modifications (A
short summary)
Phosphorylation
Sub-cellular
Localization and Signal Peptides
Subcellular Compartments
-
psort
- Programs for subcellular localization prediction (eukaryotic sequences,
plant and Gram-positive bacterial sequences, Gram-negative bacterial sequences)
-
SoftBerry
Protein Location Finding - offers different programs for animal/fungi,
plant and bacterial proteins.
-
ESLPred
- prediction of subcellular localization of
proteins.
-
TargetP
Server - Predicts the subcellular location of eukaryotic protein sequences.
The subcellular location assignment is based on the predicted presence
of any of the N-terminal presequences chloroplast transit peptide (cTP),
mitochondrial targeting peptide (mTP) or secretory pathway signal peptide
(SP).
-
PLOC
- protein localization prediction (plants, fungi, animals).
- MultiLoc/TargetLoc (Tubingen University)
-
LocTarget
- from Columbia University Bioinformatics Center
-
LOCSVMpsi
- eukaryotic protein subcellular LOCalization based on SVM and PSI-blast
-
LOCtree
- a eukaryotic and prokaryotic localization prediction tool available at
the CUBIC site.
-
Proteome
Analyst Specialized Subcellular Localization Server .
-
SubLoc
- Prediction of protein subcellular localization (contains less
subcellular targets than other programs, and so may bias results)
-
BLSTM-LOC
Protein Localization Prediction Server (eukaryotes: plants vs. non
plants)
-
SecretomeP
- Prediction of non-classical and leaderless protein secretion. Produces
ab initio predictions of non-classical i.e. not signal peptide triggered
protein secretion. The method queries a large number of other feature prediction
servers to obtain information on various post-translational and localizational
aspects of the protein, which are integrated into the final secretion prediction.
(Paper)
-
Golgi
transmembrane predictor - predicts Golgi membrane proteins based on
their transmembrane domains. This prediction method is only valid
for Type II transmembrane proteins, and output from the method is simply
predicted to be Golgi localised or predicted to transit through the Golgi
(post-Golgi localisation).
-
PTS1
predictor - predicts the peroxisomal targeting signal 1.
Targetting Peptides
-
ipsort
- predicts whether a sequence contains a Signal Peptide (SP), Mitochondrial
Targeting Peptide (mTP), or Chloroplast Transit Peptide (cTP).
-
SignalP
- Predicts the presence and location of signal peptide cleavage sites in
amino acid sequences from different organisms (Gram-positive prokaryotes,
Gram-negative prokaryotes, and eukaryotes).
-
Sigcleave
- Reports protein signal cleavage sites.
-
SIGFIND
- Signal Peptide Prediction Server (Eukaryotes)
-
Signal
Peptide Prediction (Leeds University)
-
SPEPLip
- Predictor of Signal Peptide and Lipoprotein Cleavage Sites in Proteins
-
LipoP
- prediction of lipoproteins and signal peptides in Gram negative bacteria
-
PredictNLS
- analysis and determination of Nuclear Localization Signals
-
NetNES
- predicts leucine-rich nuclear export signals (NES) in eukaryotic proteins.
-
ChloroP
- Prediction of chloroplast transit peptides
-
MitoProt
- Prediction of mitochondrial targeting sequences.
-
Predotar
- Prediction of mitochondrial and plastid targeting sequences.
-
SecretomeP
- predicts non-classical i.e. not signal peptide triggered protein secretion
in eukaryotes. The method queries a large number of other feature
prediction servers to obtain information on various post-translational
and localizational aspects of the protein, which are integrated into the
final secretion prediction.
-
TatP
- predicts the presence and location of Twin-arginine signal peptide cleavage
sites in bacteria.
-
ProP
- predicts arginine and lysine propeptide cleavage sites in eukaryotic
protein sequences.
Protein-Protein
Interactions
-
STRING
- a database of known and predicted protein-protein interactions. The interactions
include direct (physical) and indirect (functional) associations; they
are derived from four sources: 1. Genomic Context 2. High-throughput
Experiments 3. (Conserved) Coexpression 4. Previous Knowledge
.
-
IntAct
- all interactions are derived from literature curation or direct user
submissions
-
HGPRD
(Human Protein Reference Database) - examine the sections: "interactions"
& "PTMs and Substrates".
-
DIP
- Database of interacting proteins.
-
MINT
- Molecular Interactions Database.
-
Bind
- Biomolecular Interaction Network Database.
-
iPfam
- describes domain-domain interactions that are observed in PDB entries.
-
InterDom
- a database of putative interacting protein domains derived
from multiple sources, ranging from domain fusions (Rosetta Stone), protein
interactions (DIP and BIND), protein complexes (PDB), to scientific literature
(MEDLINE).
-
Additional sites (list
of the UK Human Genome Mapping Project Resource Centre, Finley
Lab list)
Biological
Pathways
Experimentally
Determined Protein Structures
Search for known structures
using:
-
PDB
- RCSB protein data bank.
-
MMDB
- Entrez structures (molecular modeling database)
-
MSD
- EBI structures(macromolecular structures database)
-
OCA
- provides rich content annotation on structure and function, generating
dynamic links to several external sources.
-
PDBSum
- is a pictorial database providing an at-a-glance overview of every macromolecular
structure deposited in the PDB. It provides schematic diagrams of the molecules
in each structure and of the interactions between them.
-
iPfam
- describes domain-domain interactions that are observed in PDB entries.
Structure
& Function Predictions
Function
Prediction
-
ProtFun
- predicts protein function from sequence. The method queries
a large number of other feature prediction servers to obtain information
on various post-translational and localizational aspects of the protein,
which are integrated into final predictions of the cellular role, enzyme
class (if any), and selected Gene Ontology categories of the submitted
sequence; Paper
1; Paper
2; A short presentation;
-
GeneQuiz
Server - derives functional annotation for protein sequences and provides
supporting evidence, including family alignments.
-
SVMProt:
Protein Functional Family Prediction
-
ConSEq
- A server for the identification of functionally and structurally important
residues in protein sequences without known structures.
-
ConSurf
- Server for the identification of functional regions in proteins with
known structures.
Secondary
Structure Prediction
-
PsiPred
- Protein Structure Prediction Server
- Proteus2 - bundles signal peptide identification, transmembrane helix prediction, transmembrane beta-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline.
- JPred3 - a secondary structure prediction server powered by the Jnet algorithm.
-
PredictProtein
or PredictProtein
- offers the following: generation of multiple sequence alignments (MaxHom)
, detection of functional motifs (PROSITE), detection of composition-bias
(SEG),
detection of protein domains (PRODOM), fold recognition by prediction-based
threading (TOPITS), predictions of: secondary structure (PHDsec,
and PROFsec), residue solvent accessibility (PHDacc, and PROFacc),
transmembrane
helix location and topology (PHDhtm, PHDtopology),
protein globularity
(GLOBE), coiled-coil regions
(COILS), cysteine bonds (CYSPRED),
structural switching regions (ASP)
-
Expasy
tools
Disordered Proteins
Topology Prediction
a
helices
-
Psipred
- You may select one of three prediction methods to apply to your sequence:
PSIPRED - a highly accurate method for protein secondary structure prediction,
MEMSAT - a widely used transmembrane topology prediction method and GenTHREADER
- a sequence profile based fold recognition method.
-
TMHMM
- Prediction of transmembrane helices in proteins, nice graphics
-
Phobius
- a combined transmembrane topology and signal peptide predictor.
-
SOSUI
- Predicts transmembrane helices in proteins and includes the helical wheels
in the graphic presentation. Checks for presence of signal peptide to avoid
the risk of signal peptides being predicted as putative TM as well.
-
BPROMPT
- BayesianPRedictionOfMembraneProteinTopology. Uses a Bayesian Belief
Network to combine the results of other prediction methods, providing a
more accurate consensus prediction. Topology predictions
with accuracies of 70% for prokaryotes
and 53% for eukaryotes were achieved. (Abstract
of paper)
-
Expasy's
tools for topology prediction.
b
sheets
-
ConBBPRED:
Consensus Prediction of TransMembrane Beta-Barrel Proteins (about)
-
PROFtmb
- a prediction service for Bacterial Transmembrane Beta Barrels
-
TMB-Hunt
- a web server to screen sequence sets for transmembrane beta-barrel proteins.
Helical
Wheels
Additional
Tools
-
Suggest
an expression system - developed in Weizmann Institute of Sciences.
SuggestES takes the protein sequence you provide and scans a large database
with protein sequences with known results for different expression systems.
At the time of generating a suggestion, suggestES takes into consideration
several parameters:
-
Similarity: how similar is your sequence to
the existing data in the database?. The expression systems used on sequences
similar to yours are preferred when creating the list of suggestions.
-
Recentness: how recently was used a given
expression system?. The older the record of the usage of a given expression
system, the less this system will influence the final result. This will
provide visibility to recently appearing system.
-
Frequency: how frequently a given expression
system has been used?
This site is maintained by Dr.
Nurit Doron . Your comments are most welcome.
entries since November 2003