26th: May 21-23 2013, Hinxton, UK
2013
26th: May 21-23 2013, Hinxton, UK
International Nucleotide Sequence Database Collaboration
(INSDC), consisted of DDBJ,
EBI and NCBI,
hold the international collaborators meeting every year.
In 2013, the meeting was held at EBI in UK, 21-23 May, to discuss
practical matters to maintain and update nucleotide sequence data
archives; DDBJ,
EMBL-Bank,
GenBank,
Sequence Read Archive (SRA) and Trace Archive.
The outcomes of the meeting are summarized below.
The Items; Discussed and To Be Studied
- BioSample database
- The BioSample database contains descriptions of biological source materials used in experimental assays. The purpose of the BioSample database is to provide unified storage and access to information about biological samples. These samples may have investigation information stored in other databases (e.g. nucleotide sequence, expression).
Following the meeting on 2012, we discussed action items to collect and to share BioSample data at INSDC.
DDBJ started to accept BioSample submissions in 2014. - Strain level taxonomy ID assignment for microorganism genome submission
- All organism names that are represented in the sequence data of INSDC are registered to the taxonomy database.
Since 2009, taxonomy database has considered to terminate assignment of strain level taxonomy ID for microorganism genomes.
From 2014, we will provide BioSample data instead of strain level taxonomy ID, and will terminate to assign strain level taxonomy ID for microorganism genomes
We reported in detail about this issue in an academic paper
Changes related to INSDC submission
- Relaxation rules to accept WGS and scaffold data
- Heretofore, INSDC accepted sequences of overlapping reads (not including any sequencing gaps) as WGS entries and accepted
AGP format to indicate scaffolds (including sequencing gaps) as
CON entries.
Recently, the policy seemed to be out of date, because some of software tools for genome assemble support to output scaffold only in sequences, not in AGP format.
So, we decided to accept sequences of scaffolds with gap n’s.
See also INSDC standards for genome assembly submission - Accepting submission of scaffolded TSA data
- Recently, paired-end sequencing is fairly common not only for genomes but also for transcriptomes and some of the RNAseq assembly software packages have added scaffolding. So, we started accepting these scaffolded assemblies as TSA records with assembly_gap features and /linkage_evidence=”paired-ends” or some.
- Update guidelines for TPA submission
- Guidelines for TPA submission will be updated to cope with the current status of data submission.
See also TPA Submission Guidelines.
Major modification points are follows;- TPA is renamed from “Third Party Annotation” to “Third Party Data”.
- Specify to accept not only annotation but also assemble for TPA.
- A new subcategory, “TPA:specialist_db” will be added in TPA to accept submissions from expert databases.
Changes in SRA XML schema
SRA XML schema version 1.5 has been applied.
The modification points are elimination and consolidation of redundant description items.
SRA XML schema version 2.0 continues to be discussed for refactoring SRA metadata with BioProject and BioSample data.
We decided to allow SRA accessions to have variable lengths after 6 digits have been used up, e.g. SRR1000000 would follow SRR999999.
Forthcoming changes in the DDBJ/EMBL/GenBank Feature Table: Definition
The following items will be applied from October 2013 with the revision of Feature Table Definition, if not otherwise specified.
- It is reconfirmed that 5’UTR and
3’UTR features can be used for RNA viral genome.
Their definitions will be updated appropriately - It is reconfirmed that 5’UTR and
3’UTR features can be used for RNA viral genome.
It will be applied from December 2013- Time (with time zone): in the ISO standard
format
i.e. “2007-04-05T14:30Z - Range: in the format delimited by “/”
i.e. “2007-03-01T13:00Z/2008-05-11T15:30Z””
- Time (with time zone): in the ISO standard
format
- A new value, “lncRNA”, will be legal for /ncRNA_class qualifier.
- The qualifier, /estimated_length, will be modified to allow different lengths for unknown length gaps.
- A new qualifier, /type_material, will be considered to specify type
strains, type specimens and so on.
It is not decided in details and applicable period of the qualifier.