TPA (Third Party Data) is a nucleotide sequence data collection in which each primary is obtained by assembling primary entries publicized from DDBJ/EMBL-Bank/GenBank, International Nucleotide Sequence Database Collaboration (INSDC) and/or Trace Archive with additional feature annotation(s) determined by experimental or inferential methods by TPA submitter.Those assemblies include two cases; one or more primary entries are used and newly determined sequence is contained.TPA sequence data should be submitted to DDBJ/EMBL-Bank/GenBank as a part of the process to publish biological research for primary nucleotide sequences.
Reference Literature: Cochrane,G. et al. (2006) OMICS,10(2): 105-113
- Definition of primary entry for TPA
- Primary entries used to build a TPA sequence are those that have been experimentally determined and are publicly available in the DDBJ/EMBL-Bank/GenBank databases.Each primary entry must be identified in the TPA entry.
Primary entries are sometimes not yet publicized at the submission of TPA sequence.However, the primary entries must be publicized when TPA sequence is opened to the public.
- Acceptable TPA sequence data
- In order to draw a distinction between annotation supported by
wet-lab. experimental evidence and inferred annotation, the TPA
dataset is divided into TPA:experimental and TPA:inferential.
Please refer to the detailed list of TPA rule.
TPA:experimental describes records that include functional annotation derived at least in part from peer-reviewed wet-lab experimental investigation. TPA:inferential describes records that include functional annotation derived from peer-reviewed bioinformatic investigation. TPA:assembly describes records reporting assembly or reassembly, for which the generation, whether it is purely informatic or informed by experimentation, has been subject to peer review. Annotation may or may not be available and does not require to be part of the peer review for this TPA class. TPA:specialist_db describes records whose sequences are submitted from an existing authoritative public database that is built using INSDC sequence data and is described in an accepted peer-reviewed publication. The existing database is therefore recognized to be comprehensive, to have added value, and to be maintained long term.
[Note]Until 2005, the only entries which were supported by biological (wet-lab.) experiment were accepted in TPA. Since 2006, entries which are not supported by wet-lab. experiment have been included into TPA when the entry meets the requirements of TPA Submission Guidelines.
- The following cases are NOT acceptable in TPA
- Annotation of repeat (and no other) features.
- Annotation that has arisen from an automated tool, such as GeneMark,tRNA scan or ORF finder, where no further evidence, experimental or otherwise, is presented for the annotation. The annotation in these cases has not been the subject of the peer review of the publication.
- A record representing a completely sequenced genome including only features that have not been assigned gene symbols or product identifiers, for which none has wet laboratory experimental evidence.
- Notes on the TPA submission
- Consensus sequences obtained from multiple species are not acceptable.
- The sequences of primary entries used to assemble a TPA sequence are required to be submitted to INSDC as ‘primary data (i.e. not TPA)’ or Trace Archive. If your TPA sequence contains a region that can not be obtained from INSDC or Trace Archive, but has been experimentally determined by yourself, at first, you have to submit it to DDBJ or Trace Archive.
- For publicizing of TPA sequence, the evidence which support the sequence or annotation must be shown in a paper of a peer-reviewed journal.
- To describe the correspondence of sequence regions between TPA and primary entries, both locations should be prepared.
- The sequence alignment rule between TPA and primary entries
- There cannot be stretches of more than 50bp which are unaccounted for by any contributing entry.
- A TPA sequence may not differ from the primary sequence(s) used to build/assemble it and any unmatched sections by greater than 5%. (This includes the overall length and individual primary accession)
- This 5% (or less) difference will include sections of TPA sequence not covered by any primary, and it will include any differences between the TPA sequence and the primaries used, such as insertions, deletions, and substitutions.
- These rules are based on length and similarity.
- Aspects of TPA on DDBJ flat file
- LOCUS line provides the taxonomic division except CON and TSA cases.
- Either of “TPA_exp:” (for TPA:experimental) or “TPA_inf:” (for TPA:inferential) is shown at the beginning of DEFINITION line.
- Either set of the following values is indicated in KEYWORDS line.
for TPA:experimental Third Party Data; TPA; TPA:experimental. for TPA:inferential Third Party Data; TPA; TPA:inferential. for TPA:assembly Third Party Data; TPA; TPA:assembly. for TPA:specialist_db Third Party Data; TPA; TPA:specialist_db.
- PRIMARY block provides base spans cited from sequeces of primary entries that contribute to regions of the TPA sequence.
Sample of TPA flat file
LOCUS BR000000 2000 bp mRNA linear HUM 17-SEP-2006 DEFINITION TPA_exp: Homo sapiens GAPD gene for glyceraldehyde-3-phosphate dehydrogenase, complete cds. ACCESSION BR000000 VERSION BR000000.1 KEYWORDS Third Party Data; TPA; TPA:experimental. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 2000) AUTHORS Mishima,H. and Shizuoka,T. TITLE Direct Submission JOURNAL Submitted (30-NOV-2005) to the DDBJ/EMBL/GenBank databases. Contact:Hanako Mishima National Institute of Genetics, DNA Data Bank of Japan; Yata 1111, Mishima, Shizuoka 411-8540, Japan REFERENCE 2 AUTHORS Mishima,H., Shizuoka,T. and Fuji,I. TITLE Glyceraldehyde-3-phosphate dehydrogenase of human JOURNAL TPA Biol Chem 10, 50-59 (2006) COMMENT PRIMARY TPA_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP 1-1000 ZZ000001.1 50001-51000 101-200 ZZ000003.1 1-100 501-600 ZZ000003.1 101-200 901-2000 ZZ000002.1 25001-26100 c 1451-1550 ZZ000003.1 201-300 FEATURES Location/Qualifiers source 1..2000 /db_xref="taxon:9606" /mol_type="genomic DNA" /organism="Homo sapiens" CDS join(153..200,501..600,1451..1500) /codon_start=1 /gene="GAPD" /product="glyceraldehyde-3-phosphate dehydrogenase" /protein_id="FAA00000.1" /transl_table=1 /translation="MWYQSLVIIEKLNLEANIGKLINTKDNINIRCRLSHTEEHSWHS NNSQLNLIVDLIYNFYINWSK" BASE COUNT 522 a 493 c 524 g 461 t ORIGIN 1 attaatataa gctaaatatg tttttcaata tatattgata atagaatatc aacaatttgg : -- The rest of sequence is omitted -- : //