To operate and implement the collaborative construction of the international nucleotide sequence database, the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meeting every year.In 2005, the meeting was held at DDBJ in Mishima, 16-18 May.
DDBJ, EMBL-Bank, GenBank reported each databank activities in the last year, discussed practical matters to maintain and to update the nucleotide sequence database. The outcomes of the meeting are summarized below.
The international nucleotide sequence database constructed by the collaboration among DDBJ, EMBL-Bank and GenBank was agreed to be called INSDC; International Nucleotide Sequence Database Collaboration. INSDC has made public its web site; http://www.insdc.org/
Since 2003, we have discussed the format of INSDSeq-XML.DDBJ has made the trial data in INSDSeq-XML format available at its FTP site and its retrival tool, getentry, in advance of EMBL-Bank and GenBank.
Since 2004, we have accepted the submission of MGA data. We reconsidered the rules for acceptance and the format for distribution.
Since 2002, we have accepted TPA submission. In the past, some biological evidence was required for the TPA submission. Now, we are planning to accept the inferred sequences by non-experimental evidence. We will continue to discuss the guideline for acceptance and classification of TPA submissions.
Since 2003, the /locus_tag qualifier has been used as the identifier for the tracking purpose by many genome projects. In the past, we allowed submitters to use the flexible prefixes for their locus_tag. However, since we are afraid that it would cause some disruption in the future, we will manage and assign prefixes of locus_tag to keep uniquness through the whole database. In association with it, we will improve our flat file format to include the project ID that can be utilized to specify the project (mainly for genome projects).
Relating to their strandness and partiality, rRNA sequences are not consistently annotated in the database. It was agreed that we should check them,and also that the same preference for plus stranded annotation should be applied to other single feature.
For features, especially CDS, the database users demand the information if the feature description is based on some biological experiment or only inference based on sequence similarity or so. To make the evidence information available, evidence qualifier will be split into two new qualifiers, /experiment and /inference;
Recently, the number of the entries for the research of environmental sampling and divergence of the life (e.g. the BARCODE project) is significantly increased. For these submission, it is important to describe the information on specification of the specimen from which the sequence is obtained. So, five new qualifiers will become legal on the source feature;
"hydrogenosome" will be added to the list of legal values for the /organelle qualifier.
The rules for the description of location will be changed;
Combinations of "join" and "order" operators in one location will be illegal.
The use of two identical location construction operators within one location will be illegal.
i.e. "100..100" will be illegal.
The usage of "^" will be restricted to adjacent nucleotides.
i.e. "100^200" will be illegal.
The use of range (m.n) descriptor within location spans will be illegal.
i.e. "(5.10)..100" will be illegal.