1-5. Database Production

I) Similarity test of cDNA and genomic sequences Database Production
Each sequence produced through the random sequencing of the cDNA libraries constructed by the oligo-capping method was first processed to trim its vector site and its low quality. Then, they were compared with human reference sequences (RefSeq) using the BLAST program. If a homologue was found with more than 95 % identity and less than 10-100 in e-value, it was regarded as identical to the RefSeq sequence. In order to identify precise TSS information sequences that have multiple homologues in RefSeq were discarded. Besides, we removed sequences that were not mapped on the human genome working draft sequence (Golden Path) database. Using the sim4 program, we mapped the rest sequences to the human genome sequence. Mapped sequences were classified into corresponding Refseq genes.


II) Representative cDNA: Ref-full Database Production
We have constructed representative cDNA as 'Ref-full' sequence. Sequence comparison between Refseq and our mapped clones data indicated that almost half of the RefSeq sequences could be extended towards the 5'-ends. For example, while the RefSeq entry NM_005718 starts at position 11,780,511 of chromosome 3, our clone, HRC00655, starts at 11,775,385. Then, we added 175 bases of the 5'-end region from HRC00655 to NM_005718 defining a 'Ref-Full' sequence of NM_005718. The Ref-full sequences are obtainable via ftp.
<<Back