International Nucleotide Sequence Database Collaboration (INSDC), consisted of DDBJ, EBI and NCBI, hold the international collaborators meeting every year.
In 2010, the meeting was held at EBI in UK, 19-21 May, to discuss practical matters to maintain and update nucleotide sequence data archives; DDBJ, EMBL-Bank, GenBank, Sequence Read Archive and Trace Archive. As a result of travel disruptions relating to the Icelandic volcano, the meeting was shorter than expected. Despite these difficulties, we believe that we made significant progress at the meeting. The outcomes of the meeting are summarized below.
Sampling information for genome scale data
According to the request from Genomic Standards Consortium (GSC), INSDC has discussed to include sampling information of genome scale data in complying with Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) and Minimum Information about an Environmental Sequence (MIENS) in sequence data records, since 2005
Since 2009, DDBJ, EMBL-Bank and GenBank have been using structured COMMENT/CC lines to describe this kind of extended metadata. However, making reference database for extensible metadata has some advantages; to maintain and to update independently; to reduce redundancy of contents. So, INSDC should consider to provide extensible metadata by some reference database.
See also Genomic Standards Consortium on Wikipedia
Minimal submission requirements for INSDC
INSDC will register its minimal submission requirements into Minimum Information for Biological and Biomedical Investigations (MIBBI). MIBBI is a project to synthesize reporting guidelines from various communities into a suite of orthogonal standards.
Prokaryotic Annotation Workshop
Researchers participated Prokaryotic Annotation Workshop, hosted by NCBI, requested to INSDC some modifications for the description rules of features and qualifiers. The requests were mainly from J. Craig Venter Institute (JCVI).
INSDC mainly discussed how to cite references for annotated features and a guideline for protein nomenclature for values of /product qualifiers in CDS features.
Since 2005, INSDC has discussed project ID assignment as a flag to specify many kinds of large scale sequencing projects with considerable modifications.
In 2010, the schema for project ID will be largely modified to extend its targets to many kinds of biological data other than nucleotide sequence, such as array, mass spectrometry, and so on. The database was renamed to BioProject database. BioProject database will be provided from NCBI near future.
All organism names that are represented in the sequence data of INSDC are registered to the taxonomy database.
Since 2009, taxonomy database has considered to terminate assignment of strain level taxonomy ID for microorganism genomes. However, since many institutes have already cited those strain level IDs, we will continue to add strain level taxids for prokaryotes at least for one more year.
From May 2010 at UK, the European Nucleotide Archive (ENA) has been launched, consolidating three major sequence resources in Europe, EMBL Nucleotide Sequence Database (EMBL-Bank), Trace Archive and Sequence Read Archive, to become Europe's primary access point to globally comprehensive nucleotide sequence information.
Since 2009, new collaborators have joined to INSDC. So, some INSDC documents about policies and activities should be updated.
A paper for SRA introduction
We will prepare a joint SRA paper with details about the data model.
Dealing with data from new sequencing platforms
SRA schema will be updated to support new sequencing platforms;
The following items will be applied from October 2010 with the revision of Feature Table Definition, if not otherwise specified.
Examples /experiment="COORDINATES: N-terminus verified by Edman degradation [PMID: 8096212]" /inference="DESCRIPTION: similar to AA sequence: INSDC: AAF23014.2"