Article on human genome draft sequences was published in Nature, February 15 (vol. 409, pp. 860-921) by International Human Genome Sequencing Consortium. Public research institutions in USA, United Kingdom, Japan, France, Germany, and China consist of this international Consortium. Human Genome Research Group at RIKEN Genomic Sciences Center (Project Director: Sakaki Yoshiyuki) and research group of Professor Shimizu Nobuyoshi at Department of Molecular Biology, Keio University School of Medicine are involved in this International Consortium from Japan. GenBank/NCBI, EMBL/EBI, and DDBJ/CIB, which are constructing and maintaining nucleotide sequence database through international collaboration, were also mentioned at the end of this paper.
Announcement was made in last June at White House on the completion of this human genome draft sequences, followed by improvement of sequence quality and biological analyses. All the efforts produced Nature paper and Science paper (vol. 291, pp. 1304-1351) by Celera Genomics. As written at the beginning of Nature paper, 100 years after the rediscovery of Mendel’s laws of heredity, we human beings reached the fundamental level of our own genetic information in which no further details cannot be described. This achievement also has a great significance as the starting point of Biology in the 21st Century.
International Human Genome Sequencing Consortium had simultaneous press releases worldwide prior publication of Nature paper on February 12. In Japan, director Wada Akiyoshi, project director Sakaki Yoshiyuki, and team leader Fujiyama Asao of RIKEN Genomic Sciences Center, professor Shimizu Nobuyoshi and associate professor Minoshima Nobuo of Keio University School of Medicine, and professor Sugawara Hideaki of Center for Information Biology, National Institute of Genetics attended the press release. Professor Sugawara represented DNA Data Bank of Japan (DDBJ), and Dr. Fujiyama is at Department of Human Genetics, National Institute of Genetics.
Following results through genomic analyses (from Nature paper) enhance our intellectual curiosity such as evolution of our own species, variety of humanbiological phenomena.
- The genomic landscape shows marked variation in the distribution of a number of features, including genes, transposable elements, GC content, CpG islands and recombination rate. This gives us important clues about function. For example, the developmentally important HOX gene clusters are the most repeat-poor regions of the human genome, probably reflecting the very complex coordinate regulation of the genes in the clusters.
- There appear to be about 30,000 to 40,000 protein-coding genes in the human genomeonly about twice as many as in worm or fruit fly. However, the genes are more complex, with more alternative splicing generating a larger number of protein products.
- The full set of proteins (the `proteome') encoded by the human genome is more complex than those of invertebrates. This is due in part to the presence of vertebrate-specific protein domains and motifs (an estimated 7% of the total), but more to the fact that vertebrates appear to have arranged pre-existing components into a richer collection of domain architectures.
- Hundreds of human genes appear likely to have resulted from horizontal transfer from bacteria at some point in the vertebrate lineage.
- Although about half of the human genome derives from transposable elements, there has been a marked decline in the overall activity of such elements in the hominid lineage. DNA transposons appear to have become completely inactive and long-terminal repeat (LTR) retroposons may also have done so.
- The mutation rate is about twice as high in male as in female meiosis.
- Recombination rates tend to be much higher in distal regions of chromosomes and on shorter chromosome arms in general, in a pattern that promotes the occurrence of at least one crossover per chromosome arm in each meiosis.
- More than 1.4 million single nucleotide polymorphisms (SNPs) in the human genome have been identified.
All nucleotide sequence data determined by International Human Genome Sequencing Consortium are open from DDBJ/EMBL/GenBank International Nucleotide Sequence Database. We DDBJ show those sequence entries either in HTG or HUM division of the DDBJ database. Chromosome-wise human sequence data can be retrieved from downloading site of the DDBJ/CIB Human Genomics Studio.
Nucleotide sequence data released by International Human Genome Sequencing Consortium and those determined by Celera Genomics differ. For example, International Consortium covered human genome by small number of long contigs, while Celera did by many short contigs. Results of sequence analyses also differ, probably caused by use of different material (individual difference), by difference on nucleotide sequence determination and way of data analyses.
However, data produced by Celera Genomics are not released through public databases, but from the company server with usage limitation. Many protests, including that by Science Council of Japan, were made againt this situation, because this kind of activity may cause loss of reproducibility that must be assured in scientific papers, and because this will lead to fragmentation of database important for biological studies.
Although such political issues remain, when the whole nucleotide sequences of the human genome is obtained within 2-3 years, this will become a great achievement not only of biology but of modern civilization. We, DNA Data Bank of Japan (DDBJ), have an important responsibility to construct database in this international enterprise, and will expand our efforts. We expect your continuous cooperation.