Repository of databases and bioinformatic tools related to transcription regulatory processes.
- ISMARA The Integrated System for Motif Activity Response Analysis is a free online tool that models genome-wide expression data in terms of our genome-wide annotations of regulatory sites. For an given input expression data-set it infers the key transcription regulators, their sample-dependent activities, and their genome-wide targets.
- CRUNCH A completely automated procedure for ChIP-seq data analysis, starting from raw read quality control, through read mapping, peak detection and annotation, and including comprehensive DNA sequence motif analysis
- SwissRegulon A database of genome-wide annotations of regulatory sites. We currently have annotations for 17 prokaryotes and 3 eukaroytes in our collection.
- Phylogibbs An algorithm for inferring regulatory motifs and regulatory sites from collections of DNA sequences, including multiple alignments of orthologous sequences from related organisms. Phylogibbs uses a rigorous Bayesian approach that combines search for overrepresented sequence-motifs with sequence conservation analysis of the putative sites for these motifs.
- RealPhy The Reference sequence Alignment based Phylogeny builder is a free online pipeline that can infer phylogenetic trees from whole genome sequence data.
- TCS A data-base of predicted two-component signalling interactions across bacterial genomes.
We present MotEvo, a integrated suite of Bayesian probabilistic methods for
the prediction of TFBSs and inference of regulatory motifs from multiple
alignments of phylogenetically related DNA sequences which incorporates all
features just mentioned. In addition, MotEvo incorporates a novel model for
detecting unknown functional elements that are under evolutionary
constraint, and a new robust model for treating gain and loss of TFBSs
along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show
that MotEvos novel features significantly improve the accuracy of TFBS
prediction, motif inference, and enhancer prediction.
PhyloGibbs is an algorithm for discovering regulatory sites in a collection
of DNA sequences, including multiple alignments of orthologous sequences
from related organisms. Many existing approaches to either search for
sequence-motifs that are overrepresented in the input data, or for
sequence-segments that are more conserved evolutionary than
expected. PhyloGibbs combines these two approaches and identifies
significant sequence-motifs by taking both over-representation and
conservation signals into account.
Using the assumption that regulatory sites can be represented as samples
from weight matrices (WMs), we derive a unique probability distribution for
assignments of sites into clusters. Our algorithm, PROCSE (probabilistic
clustering of sequences), uses Monte Carlo sampling of this distribution to
partition and align thousands of short DNA sequences into clusters. The
algorithm internally determines the number of clusters from the data and
assigns significance to the resulting clusters.
We develop a computational method that uses Hidden Markov Models and an
Expectation Maximization algorithm to detect cis-regulatory modules, given
the weight matrices of a set of transcription factors known to work
together. Two novel features of our probabilistic model are: (i)
correlations between binding sites, known to be required for module
activity, are exploited, and (ii) phylogenetic comparisons among sequences
from multiple species are made to highlight a regulatory module. The novel
features are shown to improve detection of modules, in experiments on
synthetic as well as biological data.
Spa is a computer program for aligning cDNA sequences to a genome. It uses
a probabilistic Bayesian model to find the optimal alignment. To keep
running times feasible we use the BLAT gfServer to identify genomic loci
and return the best mapping from these loci.