Software

Motevo
We present MotEvo, a integrated suite of Bayesian probabilistic methods for
the prediction of TFBSs and inference of regulatory motifs from multiple
alignments of phylogenetically related DNA sequences which incorporates all
features just mentioned. In addition, MotEvo incorporates a novel model for
detecting unknown functional elements that are under evolutionary
constraint, and a new robust model for treating gain and loss of TFBSs
along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show
that MotEvos novel features significantly improve the accuracy of TFBS
prediction, motif inference, and enhancer prediction.

Download:
Source,
Linux binary,
Mac binary


PhyloGibbs

PhyloGibbs is an algorithm for discovering regulatory sites in a collection
of DNA sequences, including multiple alignments of orthologous sequences
from related organisms. Many existing approaches to either search for
sequence-motifs that are overrepresented in the input data, or for
sequence-segments that are more conserved evolutionary than
expected. PhyloGibbs combines these two approaches and identifies
significant sequence-motifs by taking both over-representation and
conservation signals into account.


PROCSE


Using the assumption that regulatory sites can be represented as samples
from weight matrices (WMs), we derive a unique probability distribution for
assignments of sites into clusters. Our algorithm, PROCSE (probabilistic
clustering of sequences), uses Monte Carlo sampling of this distribution to
partition and align thousands of short DNA sequences into clusters. The
algorithm internally determines the number of clusters from the data and
assigns significance to the resulting clusters.


STUBB

We develop a computational method that uses Hidden Markov Models and an
Expectation Maximization algorithm to detect cis-regulatory modules, given
the weight matrices of a set of transcription factors known to work
together. Two novel features of our probabilistic model are: (i)
correlations between binding sites, known to be required for module
activity, are exploited, and (ii) phylogenetic comparisons among sequences
from multiple species are made to highlight a regulatory module. The novel
features are shown to improve detection of modules, in experiments on
synthetic as well as biological data.


SPA

Spa is a computer program for aligning cDNA sequences to a genome. It uses
a probabilistic Bayesian model to find the optimal alignment. To keep
running times feasible we use the BLAT gfServer to identify genomic loci
and return the best mapping from these loci.