• Newly released or re-released DRAs cannot be searched on DDBJ Search
  • Entries from ENA and GenBank during a specific period are not being reflected in getentry

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • Representative submissions of identical sequences for variation studies

Representative submissions of identical sequences for variation studies

Representative submissions of identical sequences for variation studies

Recently, variation studies related to re-sequencing projects are increased, so the sequence data from these projects are also increasd.
DDBJ (INSDC) basically accepts all sequence data, regardless of source and sequence identity, however, if the policy is strictly applied, some of data would be very redundant.

In order to take advantage of normalisation for variation studies, a single submission to represent multiple identical sequences is also acceptable with frequency and total sample number described by /haplotype qualifier of source feature and/or /frequency qualifier of variation feature.

The way of representative submission for variation studies is NOT to mean that all identical (or similar) sequences derived from same species would be represented by a single sequence data.
To evaluate research data properly, DDBJ recommends to normalise research data for variation studies by appropriate set of entries; basically, the number of entries should be equal to multiplication of numbers of sequence polymorphisms and sampled populations.

sequence polymorphism
a unit of sequence variations that can keep unique descriptions of /haplotype, /allele and/or some other qualifiers.
sampled population
a unit of obserbed samples that can keep unique descriptions of /geo_loc_name, /lat_lon, /collection_date, /host and/or some other qualifiers.

For example, a study of a locus on cat genomes comparing Japan with USA shows that there are three haplotypes of sequence polymorphism indicated by below table, and within each haplotype, sequences are identical. DDBJ can accept these results as a submission of 231 sequence data for all indivisuals, however, the set of sequence data seem to be very redundant for both submitters and users.

polymorphism(haplotype) A B C total
Japan 75 38 0 113
USA 26 32 60 118
totla 101 60 70 231

Since observed identical sequences are three types, it would be possible for the publication of this study to submit only three representative sequence data to DDBJ.
However, if so, it would be difficult for users to understand what kind of samples were used for this study.
Therefore, it is strongly recommended to submit five representative data (There are 6 patterns; i.e. 3 haplotypes x 2 countries, but haplotype C is not observed in Japan.) to DDBJ in following descriptions for source features, respectively.
Furthermore, when observing at the passage of time, you may like to consider about the /collection_date qualifier as well.

    source        1..365
                  /collection_date="2007"
                  /geo_loc_name="Japan"
                  /haplotype="A [75 in 113]"
                  /mol_type="genomic DNA"
                  /organism="Felis catus"
    variation     124
                  /frequency="75 in 113"
                  /inference="similar to DNA sequence (same 
                  species):INSD:AB012345.1"
                  /replace="t"