Statistics of the Database


Contents of Statistics Statistics of the Database
Table 1. Statistics of the Database

Table 2A. Statistics of the cDNA

Table 2B. Statistics of the Solexa datasets

Figure 1. Identification of the Putative Alternative Promoters (PAPs) in Human Genes

Table 3. Statistics of the Putative Alternative Promoters (PAPs)

Figure 2. Schematic representation of the data flow

Figure 3. classification of the TSSs using different cut-offs and criteria

Figure 4. Size of the one TSS cluster

Library List: H.sapiens

Library List: M.musclus

Statistics of the Database Statistics of the Database
Table 1. Statistics of the Database

#covered locus#promoter#total TSSs#allocated clones

human15,26230,964425,1171,359,000
mouse14,16219,023149,876364,487
zebrafish3,0613,38215,19832,263
malaria1,527N.A.6,90810,236
schyzon3,635N.A.14,02922,923

Statistics of the cDNA Statistics of the Database
Table 2A. Statistics of the cDNA

humanmousezebrafishmalariaschyzon

Total cDNAs1,780,295580,209159,98112,48430,766
Mapped cDNAs1,548,357457,00798,1649,49827,947
Possible truncated cDNAs181,23728,08112,1982,69176
RefSeq hit cDNAs1,359,000364,48732,26310,23622,923
*Covered RefSeq14,62813,7043,0611,2183,627
Promoters30,96419,0233,3821,7344,067

*RefSeq total 17,217 (Human), 16,138 (Mouse), 4,234 (Zebrafish), 5,409 (Malaria), 4,539 (schyzon)
Statistics of the Solexa datasets Statistics of the Database
Table 2B. Statistics of the Solexa datasets
Statistics of the datasets produced by the Solexa sequencer (upper two rows) and by the original Sanger sequencer (bottom row) in humans. TSSs which were supported by equal or greater than 5 sequences were counted for the Solexa datasets.

total #mapped sequences#sequences
associated with NMs
#represented
NMs
total #putative promoters

MCF7(Solexa)11,919,33010,000,34912,13329,210
HEK293(Solexa)10,062,5608,633,34511,59841,238
cDNA(Sanger, total)1,540,4111,370,98515,19432,122

Identification of the Putative Alternative Promoters (PAPs)
in Human Genes
Statistics of the Database


Figure 1. Identification of the Putative Alternative Promoters (PAPs) in Human Genes
Schematic representation of the mapping of the 5'-ends of the oligo-cap cDNAs, the determination of the TSSs and clustering of the TSSs to identify the PAPs. The boxes and lines represent exons and introns, respectively. The RefSeq sequences and the oligo-cap cDNAs are colored in red and blue, respectively. The lowest gray oligo-cap cDNA is excluded from the dataset, since its 5'-end is located within an internal exon of the RefSeq. The third-lowest oligo-cap cDNA is accepted because the truncation of the erroneously sliced second-lowest transcript would otherwise need to be hypothesized to explain its presence, and the chance of the combination of such events should be low. The shaded boxes represent the retained introns. Altogether, this case consists of 8 "full-length" oligo-cap cDNAs that are mapped at 6 TSSs, clustered into 3 PAPs.
Statistics of the Putative Alternative Promoters (PAPs) Statistics of the Database
Table 3. Statistics of the Putative Alternative Promoters (PAPs)

#PAPs#Locus#included TSS positions#cDNA clones(ave.)

1(PAP-less)6954 (48%)7017543
23724 (26%)6784683
31821 (12%)44455115
41003 (6.9%)32582160
5490 (3.3%)19962166
6294 (2.0%)13937159
7147 (1.0%)7948184
885 (0.6%)4912194
942 (0.3%)2167163
1025 (0.2%)1650164
>1043 (0.3%)4140341

total14628 (100%)26977480
Schematic representation of the data flow Statistics of the Database

Figure 2. Schematic representation of the data flow
The boxes that are marked with asterisks (A~G*) correspond to the respective forms illustrated in Figure XXX.
classification of the TSSs using different cut-offs and criteria Statistics of the Database

Figure 3. classification of the TSSs using different cut-offs and criteria
The distribution of the putative alternative promoters (PAPs) was calculated using different intervals. The vertical axis shows the number of the AP containing genes. The horizontal axis represents the intervals separating the PAPs.
Size of the one TSS cluster Statistics of the Database

Figure 4. Size of the one TSS cluster
The horizontal axis shows the size of the TSS clusters (distance between the two TSSs the farthest apart). The vertical axis shows the number of clusters.
<<Back