International Nucleotide Sequence Database Collaboration (INSDC), the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meeting every year.
In 2009, the meeting was held at NCBI in USA, 12-13 May.
DDBJ, EMBL-Bank, GenBank reported each bank activities in the last year, discussed practical matters to maintain and update INSDC.
Also, since this year (2009), INSDC has added a coraborative meeting to deal with mass sequence data produced by the next generation sequencers (Short Read Archive) and traces produced by traditional sequencers (Trace Archive).
The first meeting for this collaboration was held at NCBI in USA, 14-15 May 2009.
The outcomes of the two meetings are summarized below.
As mentioned above, the databases collecting outputs from the next generation sequencers have joined INSDC since 2009.
INSDC will request major scientific journals that DRA/ERA/SRA accession numbers for corresponding sequence data should be included in paper submissions.
DDBJ/EMBL-Bank/GenBank reject submissions of EST sequence data produced by 454 sequencers (GS-20, GS-FLX, etc.).
In principle, only DRA/ERA/SRA should accept those kinds of EST data.
In 2008, INSDC decided to use project ID, not only for genome/metegenome projects, but also many kinds of large scale sequencing projects including transcriptomes.
DDBJ and GenBank indicate project ID at DBLINK line on flat files.
EMBL-Bank indicate project IDs in PR line on flat files.
For the genome/metagenome projects, we have almostly completed to assign project IDs.
All organism names that are represented in the sequence data of DDBJ/EMBL-Bank/GenBank are registered to the taxonomy database.Taxonomy database assigned strain level taxonomy IDs for whole genome scale submissions of microorganisms, to flag those genome projects.
Since INSDC provided project IDs as a solution to index genome projects, we discussed to terminate assignment of strain level taxonomy ID for microorganism genomes. However, since many institutes have already cited those strain level IDs, we should carefully considrer that the policy change would cause confusion.
Increasing submissions of large scale draft sequence data, submitters often want to annotate frame mismatched candidates of protein coding regions with CDS features avoiding translation errors by operatively joined location.
To distinguish these kinds of CDS features, we will prepare a new qualifier, /artificial_location qualifier as a flag. In this regard, however DDBJ/EMBL-Bank/GenBank will accept only submissions from whole genome scaleprojects including large scale transcriptomes.
Recently, GenBank started to use structured COMMENT approach to capture metadata related to a biological sample that has been sequenced.
The concept behind structured COMMENT is to provide submitters with a mechanism that allows them to supply a set of tag/value data elements that currently are not supported by the Feature Table.
DDBJ/EMBL-Bank/GenBank will discuss the format of structured COMMENT/CC line to use it in a formalized way.
The following items will be applied from October 2009 with the revision of Feature Table Definition, if not otherwise specified.
The word "pseudo" is likely to be associated with "pseudogene" but it is used for both putative pseudogenes and non-functional forms, so, the /pseudo qualifier should be separated and/or renamed to reflect their actual usages.
This issue will be reconsidered.
Previously (before May 2009), DDBJ accepted the sequence data with description of multiple-names in a /strain qualifier;
/strain="ATCC #### (= JCM ### = NBRC ###)"
To describe equivalent strain names, appropriate usage of /note qualifier is recommended.
/note="strain coidentity: JCM ### = NBRC ###" /strain="ATCC ####"
The modification will be applied in December 2009.
In order to describe inferential supports more effectively, format /inference qualifier will be improved.
The discussion has been continued since 2008.