International Nucleotide Sequence Database Collaboration (INSDC), the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international collaborators meeting every year.
In 2008, the meeting was held at DDBJ in Japan, 20-22 May.
DDBJ, EMBL-Bank, GenBank reported each bank activities in the last year, discussed some practical matters to maintain and develop INSDC.
From June 2008, INSDC introduce a new division for assembled mRNA sequences, TSA. Note that it is required that the TSA submission with the original sequence data of primary transcripts is classified into the EST division of INSDC, Trace Archive, or Short Read Archive. More information about how to submit the TSA entry will be provided via DDBJ website.
In principle, raw reads from next generation sequencing should be registered to Short Read Archive. Following the workshop on MINSEQE (Minimal Information about a High Throughput Sequencing Experiment), data from next generation sequencing not initially intended for INSD submissions might result in discoveries of variation or re-annotation that could be submitted to INSDC as TPA or TSA entries. The number of TPA entries is not expected to grow rapidly.
INSDC basically accept all sequence data, regardless of source and sequence identity. However, in order to take advantage of normalisation for variation studies, a single submission to represent multiple identical sequences is also acceptable with frequency and total sample number described by /frequency qualifier of source feature.
The electronic publication token in REFERENCE/JOURNAL lines, "(er)", will be removed. Old records will be retrofitted to conventional article citations where possible.
The following items will be applied from October 2008 with the revision of Feature Table Definition, if not otherwise specified.
The /mol_type qualifier is used to indicate in vivo, synthetic or hypothetical molecule type in source feature. The vocabrary list for /mol_type qualifier will be modified as follows;
The ncRNA feature utilizes a /ncRNA_class qualifier with a controlled vocabulary to indicate what type of non-protein-coding feature is being represented. The list for controlled vocabulary of /ncRNA_class qualifier will be modified as follows;
Format "<satellite_type>[:<class>][ <identifier>]"
where satellite_type is one of the following;
"satellite", "microsatellite", "minisatellite"
Example /satellite="satellite: S1a" /satellite="satellite: gamma III" /satellite="minisatellite" /satellite="microsatellite: DC130"
In order to represent a sample size, following descriptions will also be legal for the value formats of the /frequency qualifier in addition to decimal fractions;
"[m] in [n]" or "[m] / [n]".
Example /frequency="23/108" /frequency="1 in 12"
Both /host and /lab_host should be described with a binominal scientific name, if possible.
Example /lab_host="Gallus gallus" /lab_host="Gallus gallus embryo" /lab_host="Escherichia coli strain DH5 alpha" /lab_host="Homo sapiens HeLa cells"
Note: The /proviral qualifier will remain in use.
Basically, both /rearranged and /germline qualifiers should be used to indicate if the sequence has undergone somatic rearrangement as part of an adaptive immune response or not. However, since many of them have been wrongly used, we will correct them.
We also expect further minor changes in the usage of /gene qualifier. Details of changes will be made available shortly.
In order to describe inferential supports more effectively, format /inference qualifier will be improved. Details of changes will be made available shortly.
The /sex qualifier will also remain in use. Guidelines of descriptions for both /mating_type and /sex will be made available shortly.