Wednesday, November 28, 2007

Biclusters and support threshold.

Each relation was biclustered for several different support values.
Support threshold chosen for relations:
1) Gene -metabolite = 1
In this relation, we have excluded the following metabolites (considering that they occur in just too many reactions):

GDP, Hydrogen peroxide, 2-Oxoglutarate, Ammonium, Acetyl-CoA, L-Glutamate, O2, Nicotinamide adenine dinucleotide - reduced, AMP, Nicotinamide adenine dinucleotide, CO2, Coenzyme A, Nicotinamide adenine dinucleotide phosphate - reduced, Nicotinamide adenine dinucleotide phosphate, Diphosphate, Phosphate, ADP, ATP, H2O, H+

Because we removed these metabolites, the total number of genes in the relation reduces from 748 to 627!!

2) GO-Bio - Gene = 1
3) GO - Cel - Gene = 1
4) GO - Mol - Gene = 1
5) Gene - Biochem pathway = 1

6) DNABinding
We have 14 different DNAbinding relations. The support threshold for each relation are:
Acid - 1
Alpha - 1
BUT14 -1
BUT90 - 1
GAL - 1
H2O2 Hi - 1
H2O2 Lo - 1
HEAT - 1
Pi - 1
RAFF - 1
RAPA - 1
SM - 1
Thi - 1
YPD - 15

YPD was a huge relation. The number of biclusters at
support 1 = 61431
support 10 = 27156
support 11 = 22225
support 12 = 18081
support 13 = 14594
support 14 = 11752
support 15 = 9510
support 16 = 7752
support 17 = 6413
support 20 = 3874
support 25 = 1950
support 30 = 1094
support 35 = 670

Looking at the number of biclusters the support 1 and support 10 were eliminated. For the rest, I plotted the size of biclusters(rowxcol) against their support

Since 10,000 is a number that the CDM pipeline can handle, the contention was between support 14,15 and 16. I plotted the following graphs to get an idea as to how good the sizes of the biclusters are:

1)
Number of genes x Number of TFs
2)
Number of genes x Number of biclusters with Y number of genes
3)
Number of TFs x Number of biclusters with Y number of TFs
4) Number of genes x Number of TFs x Number of biclusters with X genes and Y TFs

I have 15 as the support threshold for YPD because at this threshold, the number of biclusters are feasible to handle and the number of biclsuters with thick rows and columns are considerably high.

No comments: