## Fisher's Exact Test

#### Estimate probability of the motif and keyword association

## Example: Calculating keyword, motif association

**Using keyword association to provide surrogate significance.**
The EH1 motif is found in some transcription factors where it provides a
repressive function by binding to the Groucho/TLE1 repressor. Copley reported
new EH1 motifs which he justified on the grounds that they were enriched in
certain transcriptional keywords (PMID:16309560).
Using our SIRW server with the Lig_EH1 regular expression from ELM, we can attempt to repeat
his findings. SIRW allows combined keyword + RegExp searches. The probability
of an association can be estimated by the Fisher Exact Test or the related Hypergeometric
Distribution. Both should be in any statistical package (such as R, Mathematica).
See the Wikipedia entries for more info on these tests.
Open SIRW, select Uniprot_human and then pagequery.
Type "human" in the Species field and then cut and paste the EH1 motif into the
Pattern field. Click Do Query, then be patient while the result loads. This search
gives you two numbers: The total sequences and the subset matching the motif. Now type
in "HOX" in the Link field and repeat the search. This gives you two more numbers:
The number of sequences known to have a HOX domain and the subset that match the motif.
Question:
* Is EH1 more or less frequent in HOX proteins?
Repeat with different domain class keywords BZIP, TBOX, ZnF_C2H2, PAS.
Questions:
* Is EH1 found more, less or at the expected frequency in these transcription
factor classes?
* How would you explain it, if any classes disfavour EH1?
* Do you think keyword associations can indicate that (most) of these motifs
are functional?

You must first use the SIRW numbers to calculate the 4 exclusive pots for the 2X2
contingency matrix (the four permutations with/without keyword and with/without motif).
Enter the numbers and press the calculate_Fishers_Exact_Test button.
How significant are the enrichments and reductions?

