Electronic Resource
Berkeley, Calif.
:
Berkeley Electronic Press (now: De Gruyter)
Statistical applications in genetics and molecular biology
4.2005, 1, art1
ISSN:
1544-6115
Source:
Berkeley Electronic Press Academic Journals
Topics:
Biology
Notes:
Transcription factors and many other DNA-binding proteins recognizemore than one specific sequence. Among sequences recognized by a givenDNA-binding protein, different positions exhibit varying degrees ofconservation. The reason is that base pairs that are more extensivelycontacted by the protein tend to be more conserved. This observationcan be used in the discovery of transcription factor bindingsites. Here we present a rigorous means to accomplish this. Inparticular, we constrain the order of the information (entropy) in thecolumns of the position specific weight matrix (PWM) whichcharacterizes the motif being sought. We then show how to compute themaximum likelihood estimate of a PWM under such orderrestrictions. This computation is easily integrated with the EMalgorithm or the Gibbs sampler to enhance performance in the searchfor motifs in unaligned sequences. We demonstrate our method on awell-known data set of binding sites of the transcription factor Crpin E. coli.
Type of Medium:
Electronic Resource
URL:
http://www.bepress.com/sagmb/vol4/iss1/art1
Permalink
Library |
Location |
Call Number |
Volume/Issue/Year |
Availability |