Background


	Background

1. Human Genome Project

Background


	With the completion of sequencing the chromosome 21 (Hattori et al) and 22 (Dunham et al ) and draft human genome (Nature 2001, Science 2001), the human genome sequencing project comes to the final stage. Now, the collected sequence data is being assembled and posted in GoldenPath (http://genome.ucsc.edu/) and NCBI (http://www.ncbi.nlm.nih.gov/genome/seq/) web site. Although there still remains efforts to proofread errors in current data as to sequeicing assembling and order of contigs, we have now achieved a overview of the entire structure of human genome. The finishing of the other chromosomes are also expected within 2-3 years (Collins et al). That will mark a transition from the era of structure genomics to that of functional genomics. Advances in the computational approach are also remarkable to explore genes from large volumes of compiled genomic sequeences. However, it is still difficult to determine the position and the precise gene structure solely based on computational methods. Identification of genes and deciphering of their functions will extend for a prolonged time....

2. Human cDNA Project

Background


	A cDNA is a faithful copy of an mRNA, containing all the information on which part of human genome is transcribed and how it is spliced. Because it contains continuous CDS (not separated by introns), cDNA could be served as an ideal template to produce recombinant protein. Therefore, cDNA cloning is indispensable for experimental analysis of gene functions. In parallel with human genomic sequencing, cDNA sequencing project has been also intensively carried out. In cDNA databases, millions of cDNA sequences (EST sequences) are registered.. Recently RefSeq Project (RefSeq) has reorganized the fragmented cDNA sequence to generate a non-redundant "reference cDNA set" so that each of entry contains the entire CDS at least. Although it covers about 13,000 kinds of human genes, most of their entries represent only imcomplete cDNA sequences, usually lacking the exact 5' end information. It is caused by the fact that most of the cDNA clones are trucated when they are isolated from cDNA libraries constructed by conventional methods.

3. Human Full-length cDNA Project

Background


	Exact 5'-end information of full-length cDNA is indispensable for; 1.Precise determination of the transcriptional start sites (TSS). 2.Identification of the promoter region, which is usually located adjacent to the TSS. In order to efficiently collect full-length cDNAs, we had to overcome the serious drawback in conventional cDNA libraries with respect to their low contents of full-length cDNAs. We have developed the oligo-capping method and a procedure to construct a full-length enriched cDNA library (for more details, see 1. Construction of a full-length cDNA library). Using full-length enriched cDNA libraries constructed by our new technology, we have been conducting a large-scale collection and sequencing of full-length cDNA. Last year, about 160,000 clones were isolated from more than 50 full-length enriched human cDNA libraries and one-pass sequencing were performed from the 5'-ends. The list of the libraries used for analysis are shown List of full-length cDNA libraries. These sequence data was useful not only for characterization of the genes but also for the analysis of promoter region (for more details, see 2. Significance of full-length cDNA data).

<<Back	1. Construction of a Full-length cDNA Library >>