27th INSDC meeting report: May 20-22 2014, Mishima, Japan
2014
27th INSDC meeting report: May 20-22 2014, Mishima, Japan
International Nucleotide Sequence Database Collaboration
(INSDC), consisted of DDBJ,
EBI and NCBI,
hold the international meeting every year.
In 2014, the meeting was held at DDBJ, 20-22
May, to discuss practical matters to maintain and update nucleotide
sequence data archives; DDBJ,
ENA,
GenBank,
Sequence Read Archive (SRA) and Trace Archive.
The outcomes of the meeting are summarized below.
The Items; Discussed and To Be Studied
- BioSample database
- The BioSample database contains descriptions of biological source materials used in experimental assays.
DDBJ has also started accepting BioSample submissions in early 2014.
Following the meetings on 2012 and 2013, we discussed action items to collect and to share BioSample data at INSDC. - Issues about lots of genome submissions, WGS and so on
- Data Exchange
- We discussed new data formats and an effective way to exchange data each other.
- Assembly (Genome Collection)
- Following the meeting on 2012, we continues collaboration with Assembly activities to collect genome sequences.
- About /protein_id
- Currently, we have so many submissions of similar genomes derived from multiple strains of a species etc. In those cases, there are many orthologs in CDS features, so, many /protein_ids seem to be redundantly assigned for them. We discussed possibilities about new ways of /protein_id assignment.
Changes in SRA XML schema
SRA XML schema version 2.0 continues to be discussed for refactoring SRA metadata with BioProject and BioSample data.
Forthcoming changes in The DDBJ/EMBL/GenBank Feature Table: Definition
The following items will be applied from October 2014 with the next revision of Feature Table Definition, if not otherwise specified.
- A new feature, regulatory, and a
new qualifier,
/regulatory_class, will
be available from December 2014.
Following features will be replaced by the new feature; -35_signal, -10_signal, CAAT_signal, GC_signal, TATA_signal, polyA_signal, attenuator, terminator, promoter, enhancer, RBS. - A base code for ‘dihydrouridine’ of Modified Base Abbreviations and /mod_base qualifier, “d” will be replaced by “dhu”.
- To be sure using prim_transcript and precursor_RNA for RNA transcripts other than mRNA, their definitions will be updated appropriately.
- Since 2013, a new qualifier,
/type_material, is considered to specify type strains, type specimens and so on.
It is not decided in details and applicable period of the qualifier.