Statistics of the Database
May, 2003 (updated for DBTSS ver.3)


Contents of Statistics Statistics of the Database
Table 1. Statistics of the Database

Table 2. Statistics of Full-length cDNA Sequences Used for the Database Construction

Figure 1. Comparison between Ref-fulls and RefSeqs

Figure 2. Distribution of the Numbers of Mapped Full-length cDNAs

Figure 3. Distribution of the Sequence Conservation between Human and Mouse Promoter Pairs

Figure 4. Schematic representation of the data flow

Library List: H.sapiens

Library List: M.musclus

Statistics of the Database Statistics of the Database
Table 1. Statistics of the Database



HumanMouse



RefSeq and Ref-full
   Ref-full (promoter retrieval successful)8,793 (48%)6,875 (53%)
   Ref-full (total)9,270 (51%)7,524 (58%)
   Ref-full that extended RefSeq6,042 (33%)5,018 (38%)
   Ref-full that did not extend RefSeq3,228 (18%)2,506 (19%)
   RefSeq that are not covered by Ref-full8,944 (49%)5,557 (42%)
   RefSeq (total)18,214(100%)13,081 (100%)
 
One-pass sequences and genome mapping     
   Hit to RefSeq (genome mapping successful)190,964 (48%)195,446 (34%)
   Hit to RefSeq (genome mapping ambiguous)36,267 (9%)36,624 (6%)
   No hit to RefSeq172,994 (43%)348,139 (60%)
   One-pass (total)400,225 (100%)580,209 (100%)



 
Statistics of the number of promoters, the redundancies of the supporting full-length cDNAs and the differences between the public data are shown.
Statistics of Full-length cDNA Sequences
Used for the Database Construction
Statistics of the Database
Table 2. Statistics of Full-length cDNA Sequences Used for the Database Construction




number of registered genes (average redundancy)average length difference against RefSeq (mRNA level)average length difference against RefSeq (genomic level)




   Human8,793 (21.7)71.6 4,396
   Mouse6,875 (28.4)76.04,027
   human/mouse pair3,324 (25.2/38.0)63.3/68.83,998/3,380




 
Statistics of the full-length cDNAs used for the database construction is shown.
Comparison between Ref-fulls and RefSeqs Statistics of the Database



Figure 1. Comparison between Ref-fulls and RefSeqs
The distributions of the differences between Ref-fulls and RefSeqs are presented, when compared at the mRNA level (A) and genomic level (B). Black and gray bars represent the cases for human and mouse genes, respectively.
Distribution of the Numbers of Mapped Full-length cDNAs Statistics of the Database
Figure 2. Distribution of the Numbers of Mapped Full-length cDNAs
The distribution of the numbers of mapped TSSs is shown for human and mouse genes by black and gray bars, respectively.
Distribution of the Sequence Conservation between
Human and Mouse Promoter Pairs
Statistics of the Database
Figure 3. Distribution of the Sequence Conservation between Human and Mouse Promoter Pairs
Sequence alignments were performed using LALIGN using the default parameters. The sequence identity was evaluated as the number of aligned nucleotides in the regions of -1000 to +200 (TSS: 0).
Schematic representation of the data flow Statistics of the Database

Figure 4. Schematic representation of the data flow
The boxes that are marked with asterisks (A~G*) correspond to the respective forms illustrated in Figure 5.
<<Back