• (5st November 13:00-November 10th 13:00) Suspension of all submission and search services for maintenance
  • Suspension of taxonomy database update due to US federal funding expiration
  • DDBJ Rel. 139.0, DAD Rel. 109.0 Completed

26th INSDC meeting report: May 21-23 2013, Hinxton, UK

  • Home
  • 26th INSDC meeting report: May 21-23 2013, Hinxton, UK

2013

26th INSDC meeting report: May 21-23 2013, Hinxton, UK

International Nucleotide Sequence Database Collaboration (INSDC), consisted of DDBJ, EBI and NCBI, hold the international meeting every year.
In 2013, the meeting was held at EBI in UK, 21-23 May, to discuss practical matters to maintain and update nucleotide sequence data archives; DDBJ, EMBL-Bank, GenBank, Sequence Read Archive (SRA) and Trace Archive.
The outcomes of the meeting are summarized below.

The Items; Discussed and To Be Studied

BioSample database
The BioSample database contains descriptions of biological source materials used in experimental assays. The purpose of the BioSample database is to provide unified storage and access to information about biological samples. These samples may have investigation information stored in other databases (e.g. nucleotide sequence, expression).
Following the meeting on 2012, we discussed action items to collect and to share BioSample data at INSDC.
DDBJ started to accept BioSample submissions in 2014.
Strain level taxonomy ID assignment for microorganism genome submission
All organism names that are represented in the sequence data of INSDC are registered to the taxonomy database.
Since 2009, taxonomy database has considered to terminate assignment of strain level taxonomy ID for microorganism genomes.
From 2014, we will provide BioSample data instead of strain level taxonomy ID, and will terminate to assign strain level taxonomy ID for microorganism genomes
We reported in detail about this issue in an academic paper

Changes related to INSDC submission

Relaxation rules to accept WGS and scaffold data
Heretofore, INSDC accepted sequences of overlapping reads (not including any sequencing gaps) as WGS entries and accepted AGP format to indicate scaffolds (including sequencing gaps) as CON entries.
Recently, the policy seemed to be out of date, because some of software tools for genome assemble support to output scaffold only in sequences, not in AGP format.
So, we decided to accept sequences of scaffolds with gap n’s.
See also INSDC standards for genome assembly submission
Accepting submission of scaffolded TSA data
Recently, paired-end sequencing is fairly common not only for genomes but also for transcriptomes and some of the RNAseq assembly software packages have added scaffolding. So, we started accepting these scaffolded assemblies as TSA records with assembly_gap features and /linkage_evidence=”paired-ends” or some.
Update guidelines for TPA submission
Guidelines for TPA submission will be updated to cope with the current status of data submission.
See also TPA Submission Guidelines.
Major modification points are follows;
  1. TPA is renamed from “Third Party Annotation” to “Third Party Data”.
  2. Specify to accept not only annotation but also assemble for TPA.
  3. A new subcategory, “TPA:specialist_db” will be added in TPA to accept submissions from expert databases.

Changes in SRA XML schema

SRA XML schema version 1.5 has been applied.
The modification points are elimination and consolidation of redundant description items.

SRA XML schema version 2.0 continues to be discussed for refactoring SRA metadata with BioProject and BioSample data.

We decided to allow SRA accessions to have variable lengths after 6 digits have been used up, e.g. SRR1000000 would follow SRR999999.

Forthcoming changes in the DDBJ/EMBL/GenBank Feature Table: Definition

The following items will be applied from October 2013 with the revision of Feature Table Definition, if not otherwise specified.

  • It is reconfirmed that 5’UTR and 3’UTR features can be used for RNA viral genome.
    Their definitions will be updated appropriately
  • It is reconfirmed that 5’UTR and 3’UTR features can be used for RNA viral genome.
    It will be applied from December 2013
    1. Time (with time zone): in the ISO standard format
      i.e. “2007-04-05T14:30Z
    2. Range: in the format delimited by “/”
      i.e. “2007-03-01T13:00Z/2008-05-11T15:30Z””
  • A new value, “lncRNA”, will be legal for /ncRNA_class qualifier.
  • The qualifier, /estimated_length, will be modified to allow different lengths for unknown length gaps.
  • A new qualifier, /type_material, will be considered to specify type strains, type specimens and so on.
    It is not decided in details and applicable period of the qualifier.