• Newly released or re-released DRAs cannot be searched on DDBJ Search
  • Entries from ENA and GenBank during a specific period are not being reflected in getentry

DDBJ Annotated/Assembled Sequences

  • Home
  • Submission
    • Before Submission
    • Web submission
    • Mass Submission
    • Data Update
  • Search
    • getentry
    • ARSA
  • Flat file
    • Feature Table
    • Feature key
    • Qualifier key
    • Nucleotide Sequences
    • Organism qualifier
    • Identifiers
    • Description of Location
    • Protein Coding Sequence
    • The Genetic Codes
    • Codes Used in Sequence Description
    • Description Examples of Sequence Data
  • Data categories
    • Data Submission from Genome Project
    • Pseudohaplotype
    • WGS
    • Finished level genomic sequences
    • Metagenome Assembly
    • Single amplified genome
    • HTG
    • Environmental sample
    • ENV
    • TLS
    • Data Submission from Transcriptome Project
    • TSA
    • EST
    • HTC
    • Third Party Data (TPA)
  • FAQ
  • Other
    • Patent
    • MGA
  • Home
  • ddbj
  • TPA

TPA

From January 2025 TPA-Exp and TPA-Inf submission types will no longer be accepted as new submissions
TPA (Third Party Data) is a collection of nucleotide sequence data that has been assembled or (re) annotated by a third party (TPA submitters) based on entries already registered in INSDC (called primary entries). Those assemblies include two cases; one or more primary entries are used and newly determined sequence is contained. TPA sequence data should be submitted to INSDC databases (DDBJ/ENA/GenBank) as a part of the process to publish biological research for primary nucleotide sequences.

Reference Literature: Cochrane,G. et al. (2006) OMICS,10(2): 105-113

Definition of primary entry for TPA

Primary entries used to build a TPA sequence are those that have been experimentally determined and are publicly available in the INSD databases.
Each primary entry must be identified in the TPA entry.

Primary entries are sometimes not yet publicized at the submission of TPA sequence.However, the primary entries must be publicized when TPA sequence is opened to the public.

Acceptable TPA sequence data

In order to draw a distinction between annotation supported by wet-lab. experimental evidence and inferred annotation, the TPA dataset is divided into TPA:experimental and TPA:inferential.

Please refer to the detailed list of TPA rule.

TPA:experimental describes records that include functional annotation derived at least in part from peer-reviewed wet-lab experimental investigation.
TPA:inferential describes records that include functional annotation derived from peer-reviewed bioinformatic investigation.
TPA:assembly describes records reporting assembly or reassembly, for which the generation, whether it is purely informatic or informed by experimentation, has been subject to peer review. Annotation may or may not be available and does not require to be part of the peer review for this TPA class.
TPA:specialist_db describes records whose sequences are submitted from an existing authoritative public database that is built using INSDC sequence data and is described in an accepted peer-reviewed publication. The existing database is therefore recognized to be comprehensive, to have added value, and to be maintained long term.

[Note]Until 2005, the only entries which were supported by biological (wet-lab.) experiment were accepted in TPA. Since 2006, entries which are not supported by wet-lab. experiment have been included into TPA when the entry meets the requirements of TPA Submission Guidelines.

Refer also Unacceptable records for TP

The following cases are NOT acceptable in TPA

  • Consensus sequences obtained from multiple species are not acceptable.
  • Annotation of repeat (and no other) features.
  • Annotation that has arisen from an automated tool, such as GeneMark,tRNA scan or ORF finder, where no further evidence, experimental or otherwise, is presented for the annotation.
    The annotation in these cases has not been the subject of the peer review of the publication.
  • A record representing a completely sequenced genome including only features that have not been assigned gene symbols or product identifiers, for which none has wet laboratory experimental evidence.

Notes on the TPA submission

  • TPA data can be submitted through Mass Submission System (MSS) . Please visit the MSS form site to complete the TPA submission.
  • The accession numbers of primary entries used to assemble a TPA sequence must be cited for TPA submission.
  • The sequences of primary entries used to assemble a TPA sequence are required to be submitted to INSDC as ‘primary data (i.e. not TPA)’ or Trace Archive. If your TPA sequence contains a region that can not be obtained from INSDC or Trace Archive, but has been experimentally determined by yourself, at first, you have to submit it to DDBJ or Trace Archive.
  • For publicizing of TPA sequence, the evidence which support the sequence or annotation must be shown in a paper of a peer-reviewed journal.
  • To describe the correspondence of sequence regions between TPA and primary entries, both locations should be prepared.
  • In the case of whole scale genome assembly (e.g. TPA-WGS; Third Party Data-Whole Genome Shotgun) submission, it is mandatory to register both registration of a project to the BioProject and a bioresource to the BioSample databases respectively, prior to the TPA data submission. If you wish to annotate all protein-coding genes and non-protein-coding RNA genes on the assembly sequences, please register a locus_tag prefix when submitting each BioSample.
  • Sample annotation: TPA-WGS annotation

The sequence alignment rule between TPA and primary entries

  • The accession number of the primary entries should describe on the COMMENT line or in a PRIMARY block. For COMMENT lines, list the accession numbers (with/without addiotional information) can be described. For PRIMARY block, you can describe the details of the correspondence of sequence regions between TPA and primary entries such as location of the sequences.
  • There cannot be stretches of more than 50bp which are unaccounted for by any contributing entry.
  • A TPA sequence may not differ from the primary sequence(s) used to build/assemble it and any unmatched sections by greater than 5%. (This includes the overall length and individual primary accession)
  • This 5% (or less) difference will include sections of TPA sequence not covered by any primary, and it will include any differences between the TPA sequence and the primaries used, such as insertions, deletions, and substitutions.
  • These rules are based on length and similarity.

Aspects of TPA on DDBJ flat file

  • LOCUS line represents the taxonomic division except CON and TSA cases.
  • Either of “TPA_exp:” (for TPA:experimental) or “TPA_inf:” (for TPA:inferential) is shown at the beginning of DEFINITION line.
  • Either set of the following values is indicated in KEYWORDS line.
    forTPA:experimentalThird Party Data; TPA; TPA:experimental.
    for TPA:inferentialThird Party Data; TPA; TPA:inferential.
    for TPA:assemblyThird Party Data; TPA; TPA:assembly.
    for TPA:specialist_dbThird Party Data; TPA; TPA:specialist_db.
  • PRIMARY block provides base spans cited from sequeces of primary entries that contribute to regions of the TPA sequence.

Sample of TPA flat file

Example of non-TPA-assembly
Basically, a TPA entry submitted to DDBJ is assigned an accession number that consists of 2 alphabet characters and 6 digits or 4 alphabet characters and 8 digits.
LOCUS       BR000000              1203 bp    DNA    linear   INV 24-OCT-2023
DEFINITION  TPA_inf: Ladona fulva ELOVL9 mRNA for elongation of very 
            long chain fatty acids protein 9, complete cds
ACCESSION   BR000000
VERSION     BR000000.1
KEYWORDS    Third Party Data; TPA; TPA:inferential.
SOURCE      Ladona fulva (scarce chaser)
  ORGANISM  Ladona fulva
            Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; 
            Insecta; Pterygota; Palaeoptera; Odonata; Epiprocta;
            Anisoptera; Libellulidae; Ladona.
REFERENCE   1  (bases 1 to 1203)
  AUTHORS   Mishima,H. and Shizuoka,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (08-SEP-2022) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,
            Mishima, Shizuoka 411-8540, Japan
REFERENCE   2
  AUTHORS   Mishima,H., Shizuoka,T. and Fuji,I.
  TITLE     Molecular basis of wax-based color change and UV
            reflection in dragonflies
  JOURNAL   Elife 8, e43045 (2019)
COMMENT    THIRD PARTY DATABASE: This TPA record uses data from INSD 
             entry ********.*
PRIMARY     TPA_SPAN            PRIMARY_IDENTIFIER PRIMARY_SPAN        COMP
            1-211               ZZ000001.1         558648-558708 
            195-352             ZZ000012.1         465516-465706       c
            339-533             ZZ000101.1         465272-465352 
            526-789             ZZ123456.1         464731-464787       c
            754-1022            ZZ234567.1         462998-463103
            1005-1198           ZZ234568.1         462269-462405       c
            1002-1203           ZZ345679.1         460365-460532       c
FEATURES             Location/Qualifiers
     source          1..1203
                     /db_xref="taxon:123851"
                     /geo_loc_name="missing: thrid party data"
                     /collection_date="missing: thrid party data"
                     /mol_type="genomic DNA"
                     /organism="Ladona fulva"
     CDS             join(25..259,361..786,821..960) 
                     /codon_start=1
                     /gene="ELOVL9"
                     /product="elongation of very long chain fatty
                     acids protein 9"
                     /protein_id="xxxxxxxxxx.1"
                     /transl_table=1
                     /translation="MAAIASQVVDKYFEFMETKSDPRTSEWFLMSGP
                     GPLVFVLVTYLYFCNKVGPQWMEKRKPYDLKPLLIAYNLIQVLFSVW
                     LVWEGLQGGWLHHYNLKCQPVDYSNDPVAIRMANACWWYFFCKLIEL
                     LDTVFFVLRKKNNQISFLHLYHHTLMPVCAWIGTKFLPGGHGTFLGV
                     INSFVHIIMYFYYMMSAMGPQYQKYIWWKKYLTTLQMVQFCMIFIHS
                     SQLLIYECNYPKTIIVLLGINALFFLGLFGNFYRKSYKARNMKVE
"
BASE COUNT          214 a          156 c          174 g          257 t
ORIGIN
        1 atggcggcga tcgctagcca ggttgttgac aagtatttcg agttcatgga gaccaagagc
        :
        -- The rest of sequence is omitted --
//
Example of TPA-assembly
Basically, a TPA-Assembly entry submitted to DDBJ is assigned an accession number that consists of 2 alphabet characters and 6 digits or 4 alphabet characters and 8 digits.
LOCUS       EZZZ01000001              259680 bp    DNA    linear   VRT 24-OCT-2023
DEFINITION  TPA_asm: Casuarius casuarius DNA, secondary_bubble21.
ACCESSION   EZZZ01000001 EZZZ01000000
VERSION     EZZZ01000001.1
DBLINK      BioProject:PRJDB99999
            Sequence Read Archive:SRR9999990, SRR9999991, 
            SRR9999992, SRR9999993
            BioSample:SAMD99999999
KEYWORDS    WGS; Third Party Data; TPA; TPA:assembly.
SOURCE      Casuarius casuarius (southern cassowary)
  ORGANISM  Casuarius casuarius
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 
            Euteleostomi; Archelosauria; Archosauria; Dinosauria;
            Saurischia; Theropoda; Coelurosauria; Aves;
            Palaeognathae; Casuariiformes; Casuariidae; Casuarius.
REFERENCE   1  (bases 1 to 259680)
  AUTHORS   Mishima,H. and Shizuoka,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (11-NOV-2022) to the DDBJ/EMBL/GenBank databases.
            Contact:Hanako Mishima
            National Institute of Genetics, DNA Data Bank of Japan; Yata 1111,
            Mishima, Shizuoka 411-8540, Japan
REFERENCE   2
  AUTHORS   Mishima,H., Shizuoka,T. and Fuji,I.
  TITLE     Diploid genome assembly of Analysis of the Casuarius 
            casuarius.
  JOURNAL   Genome Biol Evol (2023)
  REMARK    Publication Status: Available-Online prior to print
            DOI:10.xxx/xxxx/xxxxxx
COMMENT    
            ##Genome-Assembly-Data-START##
            Assembly Method       :: HGAP v. 1.0; Celera Assembler v. 7.0; 
                                     Quiver v. 1.4.0; Sequencher v. 5.1
            Assembly Name         :: MusC56 v1
            Genome Coverage       :: 238x
            Sequencing Technology :: PacBio RS, Illumina GAIIx
            ##Genome-Assembly-Data-END##
            
            Third party assembly of primary data, 
            SRR9999990-SRR9999993.
            This is a diploid assembly of female cassowary 
            individual. Thealternate pseudohaplotype (secondary 
            bubble) contigs are secondary_bubble21 - 
            secondary_bubble181348. The unassigned (non
            bubble hetero) contigs are non_bubble_hetero3148954 -
            non_bubble_hetero3150069.The homologous (non bubble 
            other) contigs are      
            non_bubble_other181349-non_bubble_other181377.
FEATURES             Location/Qualifiers
     source          1..259680
                     /db_xref="taxon:8787"
                     /geo_loc_name="missing: thrid party data"
                     /collection_date="missing: thrid party data"
                     /submitter_seqid="secondary_bubble21"
                     /mol_type="genomic DNA"
                     /organism="Casuarius casuarius"
     CDS             join(36..256,321..597,712..891) 
                     /codon_start=1
                     /locus_tag="ABCDS_000010"
                     /product="hypothetical protein"
                     /protein_id="xxxxxxxxxx.1"
                     /transl_table=1
                     /translation="MSKSIRNPIYPPVKGTVFDQLFYNRLYDYQTEM
                     ANIEHVLKTNFSKYSKGKYNQDIVSDIFGQGIFVVDGEKWKQQRKLA
                     SFFSTRVLRDFSCSVFRRNAFEISGATKSFDMQDILMRCTLDSIFKV
                     GFGIDLNCLEGSSKEGTAFMDPEENDTYLRDIILNFMIAGKDTSANT
                     LSWFLYMLCKNPLIQEKVAQEVRDVVGGQVGDPDELVANITDAALEK
                     MHYL"
     assembly_gap    921..1156 
                     /estimated_length=236
                     /gap_type="within scaffold"
                     /linkage_evidence="paired_ends"

BASE COUNT          54123 a          69116 c          62143 g          62168 t
ORIGIN
        1 aaaaaaagag gttaaaaaat ctgggagttg cttagctaca ctagactgat ccttgaggaa
        :
        -- The rest of sequence is omitted --
//

Related pages

  • Data Submission from Genome Project
  • Submission of environmental sequences
  • Data Submission from Transcriptome Project