The EH1 motif is found in some transcription factors where it provides a repressive function by binding to the Groucho/TLE1 repressor. Copley reported new EH1 motifs which he justified on the grounds that they were enriched in certain transcriptional keywords (PMID:16309560). Using our SIRW server with the Lig_EH1 regular expression from ELM, we can attempt to repeat his findings. SIRW allows combined keyword + RegExp searches. The probability of an association can be estimated by the Fisher Exact Test or the related Hypergeometric Distribution. Both should be in any statistical package (such as R, Mathematica). See the Wikipedia entries for more info on these tests. Open SIRW, select Uniprot_human and then pagequery. Type "human" in the Species field and then cut and paste the EH1 motif into the Pattern field. Click Do Query, then be patient while the result loads. This search gives you two numbers: The total sequences and the subset matching the motif. Now type in "HOX" in the Link field and repeat the search. This gives you two more numbers: The number of sequences known to have a HOX domain and the subset that match the motif. Question: * Is EH1 more or less frequent in HOX proteins? Repeat with different domain class keywords BZIP, TBOX, ZnF_C2H2, PAS. Questions: * Is EH1 found more, less or at the expected frequency in these transcription factor classes? * How would you explain it, if any classes disfavour EH1? * Do you think keyword associations can indicate that (most) of these motifs are functional?Now Calculate the probability
You must first use the SIRW numbers to calculate the 4 exclusive pots for the 2X2 contingency matrix (the four permutations with/without keyword and with/without motif). Enter the numbers and press the calculate_Fishers_Exact_Test button. How significant are the enrichments and reductions?