NIG International symposium 2017, Commemorating the 30th Anniversary of DDBJ was held in Mishima Citizens Cultual Hall in shizuoka, Japan. On the third day of the symposium (29 May), oral sessions were held.
In this talk, Goto Susumu makes a presentation entitled "From database to database integration". (32:16) All presentations are listed in the YouTube list.
In the big data era for life science, it is crucial for researchers to be able to access up-to-date and easy-to-use databases. Over 1,500 databases are catalogued in the online Nucleic Acids Research (NAR) database issue web site, and it is still not an easy task to find an appropriate database for each researcher in terms of both freshness and usefulness. While developing and maintaining each useful database is of course very important, integration of the databases is also indispensable for easier understanding and interpreting of the experimental results. Databases are classified using several criteria such as data types and source information used in the category list of NAR web site. Another classification can be archival data repositories such as DDBJ and knowledgebases such as KEGG. KEGG can be considered as an integrated database as well by curating information from several data sources using its own ontology, which provides unique biological contents in an integrated and uniform way. It also serves as a backend data source for several bioinformatics analysis tools including functional annotation system for omics data. To further integrate databases for more useful applications, technologies to support interoperability of databases distributed in the internet will be necessary. Semantic web is one such technology exploited by National Bioscience Database Center, JST and Database Center for Life Science, ROIS. Most data are represented in resource description framework (RDF) format with properly designed ontologies to follow FAIR principle to make data findable, accessible, interoperable and reusable. DBCLS has been developing technologies to integrate databases using RDF, more broadly the linked open data concept, and harnessing the community to provide, integrate and utilize the biological data in an integrated way. I have recently moved to DBCLS from KEGG group, so I would like to talk about the two approaches and discuss their differences.