19th INSDC meeting report: May 15-17 2006, Bethesda, USA

2006

19th INSDC meeting report: May 15-17 2006, Bethesda, USA

International Nucleotide Sequence Database Collaboration (INSDC), the three data banks; DDBJ, EMBL-Bank/EBI, GenBank/NCBI hold the international meeting every year.
In 2006, the meeting was held at GenBank in Bethesda, Maryland, USA, 15-17 May.

DDBJ, EMBL-Bank and GenBank reported each bank activities in the last year, discussed practical matters to maintain and to update INSDC.
The outcomes of the meeting are summarized below.

The Items; Discussed and To Be Studied

No restriction for using INSDC data: INSDC confirmed that we should not accept any submissions with restrictions in free public access.
Non-submission of data to INSDC: There are sequence and annotation data, although published, but not available at INSD. We will be in touch with authors and editors to remind them of the importance of submitting sequences and annotation to the databases.
INSDC homepage: Since 2005, INSDC has made public its web site; www.insdc.org/.
The three banks agreed with that we are to add more contents for the web site.
INSDSeq-XML: Since 2003, we have discussed the schema of this common XML description named INSDSeq-XML.
Since 2005, three banks have trially exchanged data in INSDSeq-XML format.
Thoroughly reviewing of the trial, we discussed some improvement of INSDSeq-XML to provide it as common XML description among three banks.
locus_tag: Since 2003, the /locus_tag qualifier has been used as the identifier for the tracking purpose by many genome projects. In the past, we allowed submitters to use the flexible prefixes for their locus_tag.
Since 2005, to keep uniquness through INSDC, we have disccused to manage and to assign prefixes of locus_tag. The framework to assign the locus_tag prefixes will be available in the near future

Changes to the Feature Table Document: Features and Qualifiers

New amino acid abbreviations, “J” and “O”

1) Pyl (O); Pyrrolysine: The 22nd naturally encoded amino acid, pyrrolysine was discovered.
The JCBN IUBMB-IUPAC (the Joint Commission on Biochemical Nomenclature of IUBMB and IUPAC) has agreed that Pyl (the three-letter abbreviation), O (the one-letter abbreviation) will be recommended for this amino acid.
2) Xle (J); Leucine or Isoleucine: The residue abbreviations, Xle (the three-letter abbreviation) and J (the one-letter abbreviation) are reserved for the case that cannot experimentally distinguish leucine from isoleucine.
So, we are to add the following abbreviations;

Abbreviation	1 letter abbreviation	Amino acid name
Xle	J	Leucine or Isoleucine
Pyl	O	Pyrrolysine

INSDC will use “J” and “O” for the amino acid sequences of /translation qualifiers in CDS features.

Two old qualifiers, /transposon and /insertion_seq, will be integrated into a new qualifier, /mobile_element. The qualifier will be legal on only repeat_region feature as below;
```
Format:
      /mobile_element="<mobile_element_type>[:<mobile_element_name>]"
Example:
      /mobile_element="transposon:Tnp9"
```
The specified value for <mobile_element_type> is either of followings;
- transposon
- retrotransposon
- integron
- insertion sequence
- non-LTR retrotransposon
- SINE
- MITE
- LINE
- other
“viral cRNA” is added to the specified values for the qualifier, /mol_type that indicates molecule type of the sequence in vivo on the source feature.

Definition of “viral cRNA”
cRNA is a plus-strand copy of a minus strand RNA genome which serves as a template to make viral progeny genomes
The /operon qualifier will be legal on rRNA feature.
/EC_number should be more controlled.

Furthermore, we will accept the symbol “n” (initial of “new”) to indicate that the code is not available now and will be assigned later.
For the values of /PCR_primers qualifier, modified base codes.

Modified base codes (i.e. “i”; inosine) are required to be described with enclosing in the brackets; “<” and “>” for the values of /PCR_primers.
```
Example:
      /PCR_primer="fwd_name; hoge1, fwd_seq;cgkgtgtatcttact
      rev_name; hoge2, rev_seq;cg<i>gtgtatcttact"
```
The rules for the description of location will be slightly changed.

The use of range “(m.n)” descriptor will be discontinued.