International Nucleotide Sequence Database Collaboration (INSDC), consisted of DDBJ, EBI and NCBI, hold the international collaborators meeting every year.
In 2010, the meeting was held at EBI in UK, 19-21 May, to discuss practical matters to maintain and update nucleotide sequence data archives; DDBJ, EMBL-Bank, GenBank, Sequence Read Archive and Trace Archive. As a result of travel disruptions relating to the Icelandic volcano, the meeting was shorter than expected. Despite these difficulties, we believe that we made significant progress at the meeting. The outcomes of the meeting are summarized below.
Sampling information for genome scale data
According to the request from Genomic Standards Consortium (GSC), INSDC has discussed to include sampling information of genome scale data in complying with Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) and Minimum Information about an Environmental Sequence (MIENS) in sequence data records, since 2005
Since 2009, DDBJ, EMBL-Bank and GenBank have been using structured COMMENT/CC lines to describe this kind of extended metadata. However, making reference database for extensible metadata has some advantages; to maintain and to update independently; to reduce redundancy of contents. So, INSDC should consider to provide extensible metadata by some reference database.
See also Genomic Standards Consortium on Wikipedia
Minimal submission requirements for INSDC
INSDC will register its minimal submission requirements into Minimum Information for Biological and Biomedical Investigations (MIBBI). MIBBI is a project to synthesize reporting guidelines from various communities into a suite of orthogonal standards.
Prokaryotic Annotation Workshop
Researchers participated Prokaryotic Annotation Workshop, hosted by NCBI, requested to INSDC some modifications for the description rules of features and qualifiers. The requests were mainly from J. Craig Venter Institute (JCVI).
INSDC mainly discussed how to cite references for annotated features and a guideline for protein nomenclature for values of /product qualifiers in CDS features.
Since 2005, INSDC has discussed project ID assignment as a flag to specify many kinds of large scale sequencing projects with considerable modifications.
In 2010, the schema for project ID will be largely modified to extend its targets to many kinds of biological data other than nucleotide sequence, such as array, mass spectrometry, and so on. The database was renamed to BioProject database. BioProject database will be provided from NCBI near future.
All organism names that are represented in the sequence data of INSDC are registered to the taxonomy database.
Since 2009, taxonomy database has considered to terminate assignment of strain level taxonomy ID for microorganism genomes. However, since many institutes have already cited those strain level IDs, we will continue to add strain level taxids for prokaryotes at least for one more year.
From May 2010 at UK, the European Nucleotide Archive (ENA) has been launched, consolidating three major sequence resources in Europe, EMBL Nucleotide Sequence Database (EMBL-Bank), Trace Archive and Sequence Read Archive, to become Europe's primary access point to globally comprehensive nucleotide sequence information.
Since 2009, new collaborators have joined to INSDC. So, some INSDC documents about policies and activities should be updated.
A paper for SRA introduction
We will prepare a joint SRA paper with details about the data model.
Dealing with data from new sequencing platforms
SRA schema will be updated to support new sequencing platforms;
The following items will be applied from October 2010 with the revision of Feature Table Definition, if not otherwise specified.
For data submitted to DDBJ, the conflict feature can be no longer used.
For data submitted to DDBJ, those qualifiers can be no longer used.
Since 2006, transposable element has been described with repeat_region feature and /mobile_element qualifier. mobile_element feature and /mobile_element_type qualifier will be added and used to describe transposable element.
This modification will be applied in December 2010.
To flag entries oriented to sequence whole replicon, we will use /whole_replicon qualifier.
Time course for this addition has not yet been specified.
Since 2009, /artificial_location qualifier has been introduced as a valueless qualifier. To classify the reasons of its usages, the qualifier will have either of two controlled values; "heterogenous population sequenced" or "low-quality sequence region".
On the basis of requests from Prokaryotic Annotation Workshop, formats for /experiment and /inference qualifiers will be improved mainly to cite its support evidence in a feature.
Examples
/experiment="COORDINATES: N-terminus verified by Edman degradation
[PMID: 8096212]"
/inference="DESCRIPTION: similar to AA sequence: INSDC: AAF23014.2"
As mentioned above, Prokaryotic Annotation Workshop requested to improve description rules of features and qualifiers. One of their requests is improvement of pseudogene annotation. Also, to solve a problem of /pseudo qualifier usage in ICM2009, we discussed this issue. However, we could not reach any agreement in the meeting, mainly because of difficulties to keep integrity with existing records.
This issue will be reconsidered.