Last updated:2015.9.16.

Requirements Specific to Each Category

In this chapter, some requirements specific to each category are explained. Please refer with general requirements to make sequence and annotation files for your submission. See also followings;

 

DIVISION

Example: DIVISION in annotation file
Entry Feature Location Qualifier Value
COMMON DIVISION division EST
  • DIVISION feature in annotation file indicates that entries are corresponding only to one of CON/ENV/EST/GSS/HTC/HTG/STS/SYN/TSA.
  • Please enter the division name, 3 capital letters in the Value for Qualifier: division.
  • In principle, please describe the DIVISION feature in the COMMON entry.

 

DATATYPE

Example: DATATYPE in annotation file
Entry Feature Location Qualifier Value
COMMON DATATYPE type WGS
  • DATATYPE feature indicates that entries are corresponding to either of WGS, TPA or TPA-WGS.
  • Please enter the name of type, WGS or TPA in the Value for Qualifier: type.
  • Please describe the DATATYPE feature in the COMMON entry.

 

KEYWORD

Example: KEYWORD in annotation file
Entry Feature Location Qualifier Value
KEYWORD keyword ENV
Specified values for KEYWORD/keyword
Categories the values for keyword Remarks
ENV ENV
EST EST
some other terms Please refer to For EST Submissions
HTC HTC, and some other terms Please contact us before your submission.
HTG HTG, and some other terms Depending on the phase of sequencing. Please contact us before your submission.
GSS GSS
STS STS
WGS WGS
TPA TPA, Third Party Data
TPA:inferential or TPA:experimental Either of two is mandatory.
TSA TSA, Transcriptome Shotgun Assembly
Others Please contact us before your submission.
  • On the basis of categories indicated at the sections, DIVISION and DATATYPE, KEYWORDs with controlled vocabulary describe more detail and specified information, such as experimental methods.
  • Please see INSDC agreed methodological keywords, which qualify controlled keyword terms.
  • Please describe the specified values for Qualifier: keyword.
  • Some Values described with red letters in the above table are mandatory for each category.
  • Please contact us before your submission to make sure the detail descriptions of KEYWORD.
  • For EST submissions, see also For EST Submissions.
  • In cases of WGS and CON, see also For WGS and scaffold CON.

 

For EST Submissions

Example: 5' EST
Entry Feature Location Qualifier Value
KEYWORD keyword EST
keyword 5'-end sequence (5'-EST)
  • For EST submissions, at least two keywords are required;
    EST and one of following three terms;

    • For 5' EST submissions --- 5'-end sequence (5'-EST)
    • For 3' EST submissions --- 3'-end sequence (3'-EST)
    • Other than above two cases --- unspecified EST
  • In the case of 3' EST, to distinguish whether your sequences are corresponding to anti-sense or sense strand, please describe either of following two COMMENTs.
For anti-sense strand;
Entry Feature Location Qualifier Value
COMMENT line 3'-EST sequences are presented as anti-sense strand.
For sense strand;
Entry Feature Location Qualifier Value
COMMENT line 3'-EST sequences are presented as sense strand.

 

For HTG submissions

For HTG submissions, we recommend to use keywords to indicate sequencing status of HTG data.
See INSDC agreed methodological keywords about definition of each KEYWORD for more details.

Example I: containing unordered pieces
Entry Feature Location Qualifier Value
KEYWORD keyword HTG
keywrod HTGS_PHASE1
keyword HTGS_DRAFT
Example II: containing only ordered pieces
Entry Feature Location Qualifier Value
KEYWORD keyword HTG
keywrod HTGS_PHASE2

 

For WGS and scaffold CON

Example: WGS draft genome
Entry Feature Location Qualifier Value
KEYWORD keyword WGS
keyword STANDARD_DRAFT

For WGS and scaffold CON, please select a keyword from the following list.
See INSDC agreed methodological keywords about definition of each KEYWORD for more details.

  • STANDARD_DRAFT
  • HIGH_QUALITY_DRAFT
  • IMPROVED_HIGH_QUALITY_DRAFT
  • NON_CONTIGUOUS_FINISHED

 

DBLINK

Example: DBLINK in annotation file
Entry Feature Location Qualifier Value
DBLINK project PRJDB12345
biosample SAMD90000000
sequence read archive DRR999000
sequence read archive DRR999001

 

locus_tag

For the submission in the whole genome scale with many annotated features, we recommend to use the qualifier locus_tag, for the Biological Features indicating protein products (CDSs), and transcripts (rRNA, tRNA and so on).
The locus_tag prefix and BioProject ID should be registered at DDBJ BioProject Database in advance.

 

source: ff_definition

Example: ff_definition in annotation file
Entry Feature Location Qualifier Value
source 1..516 organism Mus musculus
mol_type mRNA
ff_definition Mus musculus mRNA, clone: @@[clone]@@
clone PC0110
Value formats of ff_definition
Categories Format for the value of ff_definition
WGS [scientific name] DNA, contig: [contig id], [other information]
BAC/YAC genomic clones in unfinished phase (HTG) [scientific name] DNA, chromosome [chromosome, map], [BAC/YAC] clone: [clone name],*** SEQUENCING IN PROGRESS ***
BAC/YAC genomic clones in finished phase [scientific name] DNA, chromosome [chromosome, map], [BAC/YAC] clone: [clone name]
EST [scientific name] [mol_type], clone: [clone name], [other information]
EST [scientific name] cDNA, clone: [clone name], [other information]
GSS [scientific name] DNA, clone: [clone name], [other information]
STS [scientific name] DNA, [chromosome, map], [marker name], sequence tagged site
Others Please contact us before your submission, if necessary.
  • ff_definition is a Qualifier that is not defined in The DDBJ/EMBL/GenBank Feature Table: Definition.
    One ff_definition can be described in an entry, if necessary.
  • The Qualifier: ff_definition can be described on source, one of Biological Features.
  • You can describe only one ff_definition for one entry.
  • The value of ff_definition will be used for the DEFINITION line in the format of DDBJ flat file.
    Please refer to The relationships between annotation file and DDBJ flat file and Sample Annotation File
  • For the Value of ff_definition, a meta description (e.g. @@[clone]@@) is available to quote values of other qualifiers. The meta description, Qualifier name enclosed by "@@[" and "]@@", will be replaced by the value of target Qualifier ("clone" in the above sample) when ff_definition is reflected in DEFINITION line on DDBJ flat file.
  • In principle, you can describe DEFINITION according to the above table, however, if you like to input the values of ff_definition qualifiers, please contact us before your submission.

 

assembly_gap: Sequencing Gap Region

In cases of whole genome scale sequencing such as HTG or large scale of assembled EST sequences such as TSA division, the entries may have some sequencing gaps that would be resulted from the process of assembling or the region difficult to read. You can indicate them by describing "n" in its sequence. In annotation file, you have to indicate the regions of sequencing gaps with assembly_gap features.

Example: assembly_gap in annotation file
Entry Feature Location Qualifier Value
assembly_gap 101..200 estimated_length unknown
gap_type within scaffold
linkage_evidence paired-ends
  • Though the assembly_gap feature is one of Biological Features, the format is slightly different from others.
  • You can NOT use join, order, complement for the Location of assembly_gap features.

Length of the gap is unknown

The location of span of the assembly_gap feature for an unknown gap has to be specified by the submitter; the specified gap length has to be reasonable (less or = 1000) and will be indicated as "n"'s in the sequence.
It is required to indicate unknown for the Value of Qualifier: estimated_length on the assembly_gap feature.

In case of transcriptome record (TSA division), the value of the estimated_length of assembly_gap features must be in an integer, not be “unknown”.

Length of the gap is estimated

The location span of the assembly_gap feature for "known" gap should be indicated by the number of "n"'s in the sequence. It is required to indicate known for the Value of Qualifier: estimated_length on the assembly_gap feature.

 

TOPOLOGY

Example: TOPOLOGY in annotation file
Entry Feature Location Qualifier Value
TOPOLOGY circular
  • Please enter circular for the Qualifier of TOPOLOGY feature, when the topology of whole nucleotide molecule is circular and the first and the end positions are conjugated on real molecules.
    i.e. Complete genome sequence of a circular virus
  • In DDBJ flat file, topology is indicated in the LOCUS line. See also Sample annotation file.

 

TPA/TSA: PRIMARY_CONTIG, Citation of Primary Entries

Example: PRIMARY_CONTIG in annotation file
Entry Feature Location Qualifier Value
PRIMARY_CONTIG 1..438 entry ZZ000010.1
primary_bases 1..438
PRIMARY_CONTIG 377..696 entry ZZ000011.1
primary_bases 1..320
complement
PRIMARY_CONTIG 590..1191 entry ZZ000022.0
primary_bases 1..601
Qualifiers available for PRIMARY_CONTIG
Qualifier Remarks for the value description
entry Accession number of the cited primary entry (with version number)
primary_bases The base span of the cited primary sequence. Example) 1..500
complement To indicate citing the complementary strand of primary sequence
  • Please specify the value for DATATYPE/type, TPA or DIVISION/division, TSA in the annotation file.
  • PRIMARY_CONTIG, entry, and primary_bases are the Feature and Qualifiers prepared to describe the alignments of primary entries for TPA/TSA submission.
  • In PRIMARY_CONTIG, it is necessary to refer to accession number(s) (with version) in the primary database and enter the base spans of the primary sequences that contribute to the TPA/TSA sequence.
  • For the value of entry, please enter cited accession number (with version).
  • If the primary entry has been submitted to DDBJ/EMBL-Bank/GenBank, a version number is required for accession number. If the primary entry is not public, please use 0 [zero] for the version. e.g. ZZ000022.0
  • For the value of primary_bases, please input the base span cited from the primary sequence.
  • You can not use join, order, complement for Location column. Please describe each PRIMARY_CONTIG and location even in the same entry.
  • If primary sequence is corresponding to reverse strand in the TPA/TSA sequence, please put complement qualifier.
  • In detail, refer to Sample Annotation File and The relationships between annotation file and DDBJ flat file.
ページの先頭へ戻る