Trace Archive
Trace Archive
Trace Archive has been retired.
See “Access Trace Data” regarding how to access trace data.
Example: TI number 2282248605
curl “https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=2282248605&retmode=text”
You may submit capillary sequencing data to DRA. Please select capillary sequencing instruments for the Experiment Instrument.
Real data DRX395641-DRX395673.
Trace Archive overview
DDBJ Trace Archive (DTA)is a permanent repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects. DTA is a member of the International Nucleotide Sequence Database Collaboration (INSDC)and collects the data in a collaboration with NCBIand EBI. NCBI Trace Archiveissues and manages IDs.
Released data can be searched and retrieved at the NCBI Trace Archive.
DDBJ Sequence Read Archiveaccepts trace data. Please consider to submit trace data to DRA.
Metadata
There are fields that are required for specific combinations of STRATEGYand TRACE_TYPE_CODE. You may check requirements in the Validation Table. Metadata can be searched at the NCBI Trace Archive.
Required*
May be required, depending upon the trace type and strategy
employed*
Metadata Field List
- ACCESSION
- DDBJ/EMBL/Genbank accession number
Type: varchar(30)
Example: AC22227The ACCESSION is assigned upon deposition to a public repository (DDBJ/EMBL/Genbank). This field will not be applicable to all trace types (primarily WGS). However, if this field contains a validaccession identifier correlation between the primary sequence data (in Trace) and the secondary sequence data (in the public repository) is facilitated.
- AMPLIFICATION_FORWARD*
- The forward amplification primer sequence
Type: varchar(100)
Example: GGATTCTGACTAACGAGCThe AMPLIFICATION_FORWARD field is to allow submitters to define the primers used to amplify templates for sequencing. This field is required when TRACE_TYPE_CODE=PCR or RT-PCR.
- AMPLIFICATION_REVERSE*
- The reverse amplification primer sequence.
Type: varchar(100)
Example: GGATTCTGACTAACGAGCThe AMPLIFICATION_REVERSE field is to allow submitters to define the primers used to amplify templates for sequencing. This field is required when TRACE_TYPE_CODE=PCR or RT-PCR.
- AMPLIFICATION_SIZE
- The expected amplification size for a pair of primers.
Type: int
Example: 500The AMPLIFICATION_SIZE field allows submitters to define the expected amplification size for a pair of primers (defined in the AMPLIFICATION_FORWARD and AMPLIFICATION_REVERSEfields). This number should be given in base pairs. If TRACE_TYPE_CODE=PCR, the amplification size is based on amplification of genomic DNA. If the TRACE_TYPE_CODE=RT-PCR, then the amplification size is based on amplification of transcript.
- ANONYMIZED_ID
- Anonymous ID for an individual.
Type: varchar(100)
Example:2222anonymUsed in projects to maintain the anonymity of donors. In many cases, there may be a controlled access database that can map many anonymized_ids in the trace archive to a single individual id for which phenotypic information may be available.
- ATTEMPT
- Number of times the sequencing project has been attempted by the
center and/or submitted to the Trace Archive.
Type: tinyint(1-255)
Example: 2
- BASE_FILE
- File name with base calls.
Type: varchar(200)
Example: ./mytraces/123clone.fastaTrace files which do not include the basecalls must provide this information in a separate file. The file designations are recorde din the BASE_FILE field of the metadata file. If basecalls are provided in separate files the information in these files will overwrite any information in the trace (usually *.scf) file. If the base calls that would be provided in the BASE_FILE are the same as the information in the trace file, DO NOT PROVIDE THE FILE. If the center provides theBASE_FILE andQUAL_FILE, then the peak index information should also be provided in a file calledPEAK_FILE.
- CENTER_NAME*
- Name of the sequencing center.
Type: varchar(50)
Example: WUGSCSequencing centers wishing to submit data must contact the DDBJ Trace Archive administrators to determine a center abbreviation. This abbreviation issued in the CENTER_NAMEfield. This field has a controlled vocabulary. For the complete list of submitting centers see: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?view=submitting_centers
These center names are controlled separately from those of the Sequence Read Archive
- CENTER_PROJECT*
- Center defined project name.
Type: varchar(100)
Example: HBBBThe CENTER_PROJECT reflects a sequencing center’s internal designation for a specific sequencing project.This field can be useful for grouping related traces.
- CHEMISTRY
- Description of the chemistry used in the sequencing reaction.
Type: varchar(50)
Example: BIGDYEV3.0
- CHEMISTRY_TYPE
- Type of chemistry used in the sequencing reaction.
Type: char(50)
Example: PThe CHEMISTRY_TYPE uses a controlled list.
Accepted values are:
PrimerTerminatorp=primer; t=terminator
- CHROMOSOME
- Chromosome to which the trace is assigned.
Type: varchar(8)
Example: 11The CHROMOSOMEindicates to which chromosome a trace has been assigned. Gene names or cytogenetic positions are not appropriate substitutes for chromosome information.
- CLIP_QUALITY_LEFT
- Left clip of the read, in base pairs, based on quality analysis.
Type: int
Example: 56The CLIP_QUALITY_LEFT field indicates the base at the beginning of the sequence at which the read should be clipped due to poor quality sequence. The given value would be the first base of the high quality region of the trace.
- CLIP_QUALITY_RIGHT
- Right clip of the read, in base pairs, based on quality analysis.
Type: int
Example: 256The CLIP_QUALITY_RIGHT field indicates the base at the end of the sequence at which the read should be clipped due to poor quality sequence. The given value would be the last base of the high quality region of the trace.
- CLIP_VECTOR_LEFT*
- Left clip of the read, in base pairs, based on vector sequence.
Type: int
Example: 75The CLIP_VECTOR_LEFT field indicates the base at the beginning of the sequence at which the read should be clipped due to vector sequence. The given value would be the first base of non-vector sequence. This field is required for almost all combinations of STRATEGY and TRACE_TYPE_CODE. This information can be omitted if the INSERT_FLANK_LEFT field is populated or TRACE_TYPE_CODE is PCR or RT-PCR.
- CLIP_VECTOR_RIGHT*
- Right clip of the read, in base pairs, based on vector sequence.
Type: int
Example: 275The CLIP_VECTOR_RIGHT field indicates the base at the end of the sequence at which the read should be clipped due to vector sequence. The given value would be the last non-vector sequence. This field is required for almost all combinations of STRATEGYand TRACE_TYPE_CODE. This information can be omitted if the INSERT_FLANK_RIGHT field is populated or TRACE_TYPE_CODE is PCR or RT-PCR.NOTE: Many centers combine vector and quality analysis, and thus have only one set of clip values. Inthis case, the set of values should be placed in the CLIP_VECTOR_LEFT/CLIP_VECTOR_RIGHT fields.
- CLONE_ID*
- The name of the clone from which the trace was derived.
Type: varchar(30)
Example: RP23-1123F10The CLONE_ID field issued to store the identifier related to an individual clone, for example a BAC clone, PAC clone or cDNA clone. If the clone is registered with the clone registry(http://www.ncbi.nlm.nih.gov/clone/), standard clone registry nomenclature (http://www.ncbi.nlm.nih.gov/clone/content/overview/) should be used.
This field is required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=cDNA;TRACE_TYPE_CODE=Any
STRATEGY=EST;TRACE_TYPE_CODE=Any
STRATEGY=CLONEEND;TRACE_TYPE_CODE=CLONEEND
STRATEGY=CLONE;TRACE_TYPE_CODE=Any
STRATEGY=ENCODE;TRACE_TYPE_CODE=SHOTGUN;
PrimerWalk; CLONEEND STRATEGY=FINISHING;TRACE_TYPE_CODE=Any
- CLONE_ID_LIST*
- Semi-colon delimited list of clones if the Strategy is PoolClone.
Type: varchar(30)
Example: RP23-200A2;RP23-500P1The CLONE_ID_LISTfield is used only if STRATEGY=PoolClone. In this case, the list of clones is provided as a semicolon delimited list. If the clones are registered with the Clone Registry (http://www.ncbi.nlm.nih.gov/clone/), standard clone registry nomenclature (http://www.ncbi.nlm.nih.gov/clone/content/overview/) should be used (see CLONE_ID field).Note: The list of clones is not limited, but the size of the individual clone within the list is limited to 30 bytes.
This field is required for the following combination of STRATEGY and TRACE_TYPE_CODE: STRATEGY=PoolClone;TRACE_TYPE_CODE=Any
- COLLECTION_DATE*
- The full date, in “Mar 2 2006 12:00AM” format, on which an
environmental sample was collected.
Type: datetime
Example: Mar 2 2006 12:00AMThe COLLECTION_DATE field is used to define the date and time on which an environmental sample was collected.
This field is required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=Env Sample-Geo; TRACE_TYPE_CODE=Any
STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any
- CVECTOR_ACCESSION
- Repository (DDBJ/EMBL/Genbank) accession identifier for the cloning vector.
Type: varchar(50)
Example: AY451994The CVECTOR_ACCESSION field holds the accession number for the cloning vector used. This cloning vector relates to the clone named in the CLONE_ID field.
- CVECTOR_CODE
- Center defined code for the cloning vector.
Type: varchar(50)
Example: PBACE3.6The CVECTOR_CODE field holds the user defined identifier for the cloning vector. Submitters are encouraged to submit all vector sequence information to public repositories.
- DEPTH
- Depth (in meters) at which an environmental sample was collected.
Type: float
Example: 10MThe DEPTH field is applicable to water samples and earth samples. If the value of this field is NULL, it is anticipated the sample was taken from the surface of the environment. While this field is only applicable to environmental samples, it is not required.
- ELEVATION
- Elevation (in meters) at which an environmental sample was
collected.
Type: float
Example: 500If the value of this field is NULL it is assumed the data were obtained at sea level. The field ELEVATION is only applicable to some environmental sample data, but is not a required field.
- ENVIRONMENT_TYPE*
- Type of environment from which an environmental sample was
collected.
Type: varchar(250)
Example: sea waterThe ENVIRONMENT_TYPE field is used to describe the specific environment from which an environmental sample was taken. While the LATITUDE and LONGITUDE fields describe the location many types of environmental types could exist at this location (for example, soil, sludge, tree roots, etc).
This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample -Geo; TRACE_TYPE_CODE=Any
- EXTENDED_DATA
- Extra ancillary information wrapped around in a EXTENDED_DATA
block, where actual values are provided with a special tag.
Type: varchar()
Example:<extended_data>
<field name=’SamplingSiteMonthChlorophyllLevel’>1.4 mg_mm</field>
<field name=’SamplingSiteYearlyChlorophyllLevel’>1.12 mg_mm</field>
<field name=’SamplingSiteYearlyChlorophyllLevelStdError’>0.19 mg_mm</field>
</extended_data>
The ‘=’ sign and the field separator character ‘|’ should be excluded from names and their values. No other validity checks will be performed on the data.
- FEATURE_ID_FILE
- File describing the features and their locations on a chip.
Type: varchar(200)
Example: ./mytraces/chip2.cdfThe FEATURE_ID_FILE provides the location and sequence of the features for a given chip when TRACE_TYPE_CODE=”CHIP”.
- FEATURE_ID_FILE_NAME*
- Reference to a common FEATURE_ID_FILE which should be submitted first.
Type: varchar(200)
Example:This field is required when TRACE_TYPE_CODE=”CHIP”.
- FEATURE_SIGNAL_FILE
- File giving the signal and variance for features on a chip.
Type: varchar(200)
Example: ./mytraces/chip2.signalThe FEATURE_SIGNAL_FILE provides the signal and variance of signal for the features on a given chip when TRACE_TYPE_CODE=”CHIP”.
- FEATURE_SIGNAL_FILE_NAME*
- Reference to a common FEATURE_SIGNAL_FILE which should be
submitted first.
Type: varchar(200)
Example:This field is required when TRACE_TYPE_CODE=”CHIP”.
- GENE_NAME
- Gene name or some other common identifier.
Type: varchar(100)
Example: transporter 1Free text. Mainly this field would be for TRACE_TYPE_CODE=’Re-sequencing’ or’ENCODE’. When a group is analyzing a particular gene, they may want to refer to that gene by it’s name or some other common identifier.
- HI_FILTER_SIZE
- The largest filter used to stratify an environmental sample.
Type: varchar(50)
Example: 50 micronThe HI_FILTER_SIZE field is applicable only to environmental sample data but is not a required field.
- HOST_CONDITION
- The condition of the host from which an environmental sample was
obtained.
Type: varchar(100)
Example: HIV-positiveThe HOST_CONDITION field is only applicable to environmental sample data and is used to describe the condition (healthy, sick, etc) of the host from which a sample was taken.
- HOST_ID*
- Unique identifier for the specific host from which an environmental sample was taken.
Type: varchar(100)
Example: yerkes pedigree #C0479 ‘Clint’The HOST_IDENTIFIER field is only applicable to environmental sample data and is used to capture the unique name for the specific host from which a sample was obtained.
This field would be required for the following combination ofSTRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any
- HOST_LOCATION*
- Specific location on the host from which an environmental sample was collected.
Type: varchar(100)
Example: rumenThe HOST_LOCATION field is only applicable to environmental sample data and is used to describe the specific part of the host from which the sample was obtained, for example: dental plaque, hindgut, root surfaces.
This field would be required for the following combination ofSTRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any
- HOST_SPECIES*
- The host from which an environmental sample was obtained.
Type: varchar(100)
Example: Pan troglodytesThe HOST_SPECIES field is only applicable to environmental sample data.
This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any
- INDIVIDUAL_ID
- Publicly available identifier to denote a specific individual or
sample from which a trace was derived.
Type: varchar(100)
Example: NA12345The INDIVIDUAL_IDfield provides a center specific unique id that can associate as pecific trace to an individual. This will be used primarily for population based studies.
- INSERT_FLANK_LEFT*
- Flanking sequence at the cloning junction.
Type: varchar(100)
Example: AAGGTGCGATGCAGTGGCAGTAGCAGTGTCGACGTGACGATTCGTCCGGAThe INSERT_FLANK_LEFT field should provide from 50 up to 100 bases of sequence (including linkers) to the left of the cloning junction. This information will allow users to perform their own vector trimming of reads. This field is required for almost all combinations of STRATEGY and TRACE_TYPE_CODE. This field can be omitted if CLIP_VECTOR_LEFT is populated.However, INSERT_FLANK_LEFT is the preferred choice. If there was no cloning step involved in the sequencing, please populate the field with ‘NONE’.
- INSERT_FLANK_RIGHT*
- Flanking sequence at the cloning junction.
Type: varchar(100)
Example: AAGGCGCGATGCAGTGAGCGAGGCTGACGTCGGCTAGCGTCGCGTCGGGTThe INSERT_FLANK_RIGHT field should provide from 50 up to 100 bases of sequence (including linkers) to the right of the cloning junction. This information will allow users to perform their own vector trimming of reads. This field is required for almost all combinations of STRATEGY and TRACE_TYPE_CODE. This field can be omitted if CLIP_VECTOR_RIGHT is populated.However, INSERT_FLANK_RIGHT is the preferred choice. If there was no cloning step involved in the sequencing, please populate the field with ‘NONE’. It is anticipated that if INSERT_FLANK_LEFT is populated that INSERT_FLANK_RIGHT will also be populated. It is not anticipated that a mixture of clip values and junction sequence will be specified. (i.e. CLIP_VECTOR_LEFT andINSERT_FLANK_RIGHT populated for the same record.
- INSERT_SIZE*
- Expected size of the insert (referred to by the value in the
TEMPLATE_ID field) in base pairs
Type: int
Example: 2000The INSERT_SIZEfield indicates the expected insert size of the clone that is sequenced. It is understood that this is an estimate based upon the average insert sizes found in a given library. However, this information is critical for certain experiments, such as whole genome assembly.
This field would be required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=WGSSTRATEGY=Any;
TRACE_TYPE_CODE=WCSSTRATEGY=cDNA;TRACE_TYPE_CODE=CLONEENDSTRATEGY=CLONEEND;
TRACE_TYPE_CODE=CLONEEND
- INSERT_STDEV*
- Approximate standard deviation of value in INSERT_SIZE field.
Type: int
Example: 200The INSERT_STDEVfield reflects the approximate standard deviation of the insert size. It is understood that this information is an approximation and may change as better data is obtained. This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=WGSSTRATEGY=Any;
TRACE_TYPE_CODE=WCSSTRATEGY=cDNA;
TRACE_TYPE_CODE=CLONEENDSTRATEGY=CLONEEND;TRACE_TYPE_CODE=CLONEEND
- LATITUDE*
- The latitude measurement (using standard GPS notation) from which a
- sample was collected.
Type: float
Example: 54.736The LATITUDE field is required to describe the collection of some environmental sample data. The latitude range is [-90,90] with the equator as 0 latitude and positive values of latitude are north of the equator. This field would be required for the following combination ofSTRATEGY andTRACE_TYPE_CODE:
STRATEGY=Env Sample- Geo;TRACE_TYPE_CODE=Any
- LIBRARY_ID*
- The source of the clone identified in the CLONE_ID field
Type: varchar(100)
Example: RP23The LIBRARY_ID field documents the source library of the archival clone resource. Many genomic libraries have been registered with the Clone Registry (http://www.ncbi.nlm.nih.gov/clone) and the standard nomenclature (http://www.ncbi.nlm.nih.gov/clone/content/overview/) should be used for these libraries. This field would be requiredfor the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=cDNA;TRACE_TYPE_CODE=AnySTRATEGY=EST;TRACE_TYPE_CODE=Any
STRATEGY=CLONEEND;TRACE_TYPE_CODE=CLONEENDSTRATEGY=CLONE;
TRACE_TYPE_CODE=AnySTRATEGY=ENCODE;TRACE_TYPE_CODE=SHOTGUN;PrimerWalk; CLONEEND
- LONGITUDE*
- The longitude measurement (using standard GPS notation) from which a sample was collected.
Type: float
Example: -86.403The LONGITUDE field is required to describe the collection of some environmental sample data. The longitude is ranging from 0° at the Prime Meridian to +180° eastward and -180° westward.
This field would be required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=Env Sample-Geo; TRACE_TYPE_CODE=Any
- LO_FILTER_SIZE
- The smallest filter size used to stratify an environmental sample.
Type: varchar(50)
Example: 25 micronThe LO_FILTER_SIZE field is only applicable to environmental sample data but is not a required field.
- NCBI_PROJECT_ID
- BioProject ID generated by the INSDC.
Type: int
Example: 7NCBI_PROJECT_ID field would allow to link traces to BioProject database and easily retrieve sets of traces from each Project. Genome sequencing centers may apply their project to the DDBJ BioProject prior the submission of genomic sequence data. Submitters need not submit sequencing data at the time they register their project.
- ORGANISM_NAME*
- Description of species for BARCODE project from which trace is
derived.
Type: varchar(100)
Example: Acanthocybium solandriThe ORGANISM_NAME field is used to classify the read by species for BARCODE data, using proper taxonomic name in accordance with Taxonomy Browser. SPECIES_CODE=”BARCODESPECIES” for all traces from this project. This field would be required for the STRATEGY=BARCODE.
- PEAK_FILE
- Name of file that contains the list of peak values.
Type: varchar(200)
Example: ./mytraces/123clone.peakConsult the BASE_FILE field description for more information.
- PH
- The pH at which an environmental sample was collected.
Type: float
Example: 7.2The PH field is only applicable to environmental sample data but is not a required field.
- PICK_GROUP_ID
- Id to group traces picked at the same time.
Type: int
Example: 939065
- PLACE_NAME
- Country in which the biological sample was collected and/or common
name for a given location.
Type: varchar(250)
Example: Octopus SpringsThe PLACE_NAME field is applicable to environmental sample data, but is not required.
- PLATE_ID
- Submitter defined plate id.
Type: varchar(32)
Example: 203The PLATE_ID and WELL_ID fields are intended to identify the storage location of the sequencing template (not the library well coordinate of an archival clone named in theCLONE_ID field). This may enable flipped or contaminated trays to be easily identified. If a particular experiment did not require the use of a plate, please populate this field with ‘0’.
- POPULATION_ID
- Center provided id to designate a population from which a trace (or group of traces) was derived.
Type: varchar(100)
Example: CEPHThe POPULATION_ID field is used to capture center specific designations of groups of individuals. This will likely only be useful in population studies(usually STRATEGY=SNP).
- PREP_GROUP_ID
- ID that defines groups of traces prepared at the same time.
Type: varchar(30)
Example: A2
- PRIMER
- The primer sequence (used in the sequencing reaction).
Type: varchar(200)
Example: GAATACCTACGATCGCCThe value of the PRIMER field is the actual base sequence of the sequencing primer used. If a center uses a primer extensively, the primer sequence can be entered into the list of primer codes and the PRIMER_CODE field can be used.
- PRIMER_CODE
- Identifier for the sequencing primer used.
Type: varchar(30)
Example: Sp6
- PRIMER_LIST*
- A ‘;’ delimited list of primers used in a mapping experiment (such
as AFLP).
Type: varchar(100)
Example: AAGGTCTGCGCGTGTC;AGCTGCGTACGTAATCG;This field is required if STRATEGY=”AFLP” and TRACE_TYPE_CODE=”PCR”.
- PROGRAM_ID*
- The program used to create the trace file.
Type: varchar(100)
Example: phred-19990722hThe PROGRAM_ID field is used to indicate the base calling program. This field is free text. Program name, version numbers or dates are very useful.
More example values:- phred-19980904e
- abi-3.1
- ATQA
- TraceTuner
- Licor
- Megabase
- Beckman
- PROJECT_NAME
- Term by which to group traces from different centers based on a
common project.
Type: varchar(50)
Example: New ProjectIn this way sequencing centers that are working on the same large project can group all of the traces for this project using a common term. This field has a controlled vocabulary. Sequencing centers wishing to submit data must contact the DDBJ Trace Archive to determine a name that all members of the project agree on.
- QUAL_FILE
- Name of file containing the quality scores.
Type: varchar(200)
Example: ./mytraces/123clone.fasta.qsTrace files which do not include the quality scores must provide this information in a separate file. The file designations are recorded in the QUAL_FILE fields of the metadata file. The actual quality scores are stored in the file designated in theQUAL_FILE field. If quality scores are provided in separate files the information in these files will overwrite any information in the trace (usually *.scf) file. If the quality scores that would be provided in the QUAL_FILE are the same as the information in the trace file, DO NOT PROVIDE THE FILE. However, it is important to note that if some formats do not include the quality scores, then these values must be provided as ancillary information. If the center provides theBASE_FILE andQUAL_FILE, then the peak index information should also be provided in a file calledPEAK_FILE.
- REFERENCE_ACCESSION*
- Reference accession (use accession and version to specify a
particular instance of a sequence) used as the basis for a
re-sequencing project. In case of Comparative strategy show the
basis for primers design.
Type: varchar(50)
Example: NT_029829.1This field is required for the following combination ofSTRATEGY and TRACE_TYPE_CODE:
STRATEGY=Re-sequencing;Comparative TRACE_TYPE_CODE=Any
- REFERENCE_ACC_MAX*
- Finish position for a particular amplicon in re-sequencing or
comparative projects.
Type: int
Example: 30929This field points to the finishing coordinate of the accession.version described in the REFERENCE_ACCESSION field. All coordinates should be in 1 base coordinates (i.e.sequences start at base 1, not base 0). This field is required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Re-sequencing; TRACE_TYPE_CODE=SHOTGUN; PCR;RT-PCR
- REFERENCE_ACC_MIN*
- Start position for a particular amplicon in re-sequencing or
comparative projects.
Type: int
Example: 29829This field points to the starting coordinate of theaccession.version described in theREFERENCE_ACCESSIONfield. All coordinates should be in 1 base coordinates (i.e.sequences start at base 1, not base 0). This field is required forthe following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Re-sequencing; TRACE_TYPE_CODE=SHOTGUN; PCR;RT-PCR
- REFERENCE_OFFSET*
- Sequence offset of accession specified in REFERENCE_ACCESSION field
to define the coordinate start position used as the basis for a
re-sequencing project.
Type: int
Example: 1520899This field points to the starting coordinate of theaccession.version described in theREFERENCE_ACCESSIONfield. All coordinates should be in 1 base coordinates (i.e.sequences start at base 1, not base 0). This field is required forthe following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Re-sequencing; TRACE_TYPE_CODE=CHIP
- REFERENCE_SET_MAX
- Finish position for a entire re-sequencing region. This region may
include several amplicons.
Type: int
Example: 29829This field points to the starting coordinate of theaccession.version described in the REFERENCE_ACCESSION field for a entire re-sequencing region. All coordinates should be in 1 base coordinates (i.e. sequences start at base 1, not base 0).The REFERENCE_ACC_ [MIN|MAX] and REFERENCE_SET_[MIN|MAX] should refer to the same REFERENCE_ACC.
- REFERENCE_SET_MIN
- Start position for a entire re-sequencing region. This region may
include several amplicons.
Type: int
Example: 29829This field points to the starting coordinate of theaccession.version described in the REFERENCE_ACCESSION field for a entire re-sequencing region. All coordinates should be in 1 base coordinates (i.e. sequences start at base 1, not base 0).The REFERENCE_ACC_ [MIN|MAX] and REFERENCE_SET_[MIN|MAX] should refer to the same REFERENCE_ACC.
- RUN_DATE
- Date the sequencing reaction was run.
Type: datetime
Example: 2000-10-28
- RUN_GROUP_ID
- ID used to group traces run on the same machine.
Type: varchar(30)
Example: group2
- RUN_LANE
- Lane or capillary of the trace.
Type: int
Example: 1The RUN_LANE documents the specific lane or capillary on which a trace was obtained.
- RUN_MACHINE_ID
- ID of the specific sequencing machine on which a trace was obtained.
Type: varchar(30)
Example: machine2
- RUN_MACHINE_TYPE
- Type or model of machine on which a trace was obtained.
Type: varchar(30)
Example: ABI 310
- SALINITY
- The salinity at which an environmental sample was collected measured
in parts per thousand units (promille).
Type: float
Example: 20
The SALINITY field is only applicable to environmental sample data but is not a required field.
- SEQ_LIB_ID*
- Center specified M13/PUC library that is actually sequenced.
Type: varchar(255)
Example: 22194The SEQ_LIB_ID field is the center identifier for the M13/PUC based clone that is actually sequenced. This will allow grouping of traces by the actual ligation event and is applicable to most projects. Thi svalue will be unique within a given center. This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=SHOTGUN
STRATEGY=Any;TRACE_TYPE_CODE=WGS/WCS
- SOURCE_TYPE*
- Source of the DNA.
Type: varchar(50)
Example: GENOMIC DNAThe SOURCE_TYPEfield consists of a code. Possible values are:
- G=Genomic DNA (includes PCR products from genomic DNA)
- N=Non Genomic DNA (EST, cDNA, RT-PCR, screened libraries)
- VIRAL RNA=Viral RNA
- SYNTHETIC=Synthetic DNA
Accepted values are G, N, GENOMIC, NON GENOMIC, VIRAL RNA,SYNTHETIC
- SPECIES_CODE*
- Description of species from which trace is derived.
Type: varchar(100)
Example: Homo sapiensThe SPECIES_CODEfield is used to classify the read by species, using proper taxonomic names where possible. This field currently is maintained as a controlled vocabulary. For a list of species currently contained within the Trace Archive, see: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=stat&f=xml_list_species&m=obtain&s=speciesTo submit a new species, please contact the DDBJ Trace Archive prior to submission. For cases in which it is unclear ofthe taxonomic origin of a specific trace the taxonomic classification ‘ENVIRONMENTAL SEQUENCE’ can be used in a case of environmental samples or ‘ARTIFICIAL SEQUENCE’ in a case of artificial material.
- STRAIN*
- Strain from which a trace is derived.
Type: varchar(50)
Example: C57BL/6J
- STRATEGY*
- Experimental STRATEGY.
Type: varchar(50)
Example: MODEL VERIFYExperimental STRATEGY used when obtaining the trace. It is proposed that this would be a controlled vocabulary, but that submitters would contribute to this list as needed to define various experiments and projects.
Current values (this list would continually be expanding):
- AFLP: Amplified Fragment Length Polymorphism
- BARCODE: DNA sequence analysis of a uniform target gene to enable species identification
- CCS: Concatenated cDNA sequencing
- cDNA: Sequences generated in the process of sequencing cDNA clones
- CF-S: Cot-filtered single/low-copy genomic DNA
- CF-M: Cot-filtered moderately repetitive genomic DNA
- CF-H: Cot-filtered highly repetitive genomic DNA
- CF-T: Cot-filtered theoretical single-copy DNA
- CLONE: Genomic clone based (hierarchical) sequencing
- CLONEEND: Sequences generated from the end of a clone(BAC/PAC/Fosmid or cDNA)
- Comparative: Sequences obtained using primers design from related species
- CTS: Concatenated Tag Sequencing
- Env Sample-GEO: Geographically generated environmental sample
- Env Sample-Host: Environmental samples collected from a specific host
- EST: single pass sequencing of cDNA templates
- FINISHING: a read specifically made for finishing, could be either BAC finishing or Whole Genome Assembly (WGA) finishing
- MODEL VERIFY: Sequences obtained to verify proposed gene models
- PoolClone: Pools of clones (BACs mostly)
- SNP: Reads used for SNP identification
- TARGETED LOCUS: Sequences obtained from templates generated by primers designed to amplify a specific genetic locus
- Re-sequencing: Re-sequencing of targeted genomic regions
- RT-PCR: Sequences obtained using templates generated by Reverse Transcriptase Polymerase Chain Reaction
- WGA: Whole Genome Assembly
- SUBMISSION_TYPE*
- Type of submission.
Type: varchar(50)
Example: NEWThe SUBMISSION_TYPE field allowed values:
- NEW: use to submit new data
- UPDATE: use to renew traces and their ancillary information. Previous data will be saved with their TI’s; new traces with the same trace_name’s will receive new TI’s and they will become active
- UPDATEINFO: use to update or add ancillary information for already existing traces without re-submitting the entire package of data
- WITHDRAW: use to withdraw traces
- SVECTOR_ACCESSION
- DDBJ/EMBL/Genbank accession of the sequencing vector.
Type: varchar(50)
Example: X52325
- SVECTOR_CODE
- Center defined code for the sequencing vector.
Type: varchar(50)
Example: pBluescript SK(+)
- TEMPERATURE
- The temperature (in oC) at which an environmental sample was collected.
Type: float
Example: 30The TEMPERATUREfield is only applicable to environmental sample data but it is not a required field.
- TEMPLATE_ID
- Submitter defined identifier for the sequencing template.
Type: varchar(50)
Example: HBBBA2211The TEMPLATE_IDfield is used to uniquely identify the actual template that is sequenced. This field, in conjunction with the TRACE_END field, can be used to identify traces that should be marked as ‘mate_pairs’because they come from opposite ends of the same clone.
- TRACE_END
- Defines the end of the template contained in the read.
Type: varchar(50)
Example: FThe TRACE_END field can have the following values:
- F: FORWARD
- R: REVERSE
- N: UNKNOWN
- TRACE_FILE*
- Filename with the trace, relative to the top of the volume.
Type: varchar(200)
Example: ./traces/TRACE001.scf
- TRACE_FORMAT*
- Format of the trace file.
Type: varchar(20)
Example: scfThe TRACE_FORMATfield can have the following values:
- SCF - A standard file format for data from DNA sequencing instruments.
- ABI - A ABI-trace file is a binary file including the trace data and the sequence.
- TRACE_NAME*
- Center defined trace identifier.
Type: varchar(250)
Example: HBBBA1U2211The TRACE_NAME field must be unique within a center, but is not required to be unique between centers. The combination of TRACE_NAME and CENTER_NAME act as a unique key within the Trace Archive.
- TRACE_TYPE_CODE*
- Sequencing strategy by which the trace was obtained.
Type: varchar(50)
Example: wgs The fieldTRACE_TYPE_CODE reflects the sequencing STRATEGY used to obtain the trace.
Current values:
- CHIP: Sequences obtained using microarrays (also called DNAchips or gene chips)
- CLONEEND: Sequences generated from the end of a large insert(BAC/PAC/Fosmid) or cDNA clone
- EST: Single Pass Expressed Sequence Tag
- HTP SELEX: High throughput SELEX
- OTHER: Other than PCR, PrimerWalk, SHOTGUN or TRANSPOSON for FINISHING STRATEGY
- PCR: Sequences obtained using templates generated by genomic Polymerase Chain Reaction
- PrimerWalk: Sequences generated through a primer walkingstep
- RT-PCR: Sequences obtained using templates generated by Reverse Transcriptase Polymerase Chain Reaction
- SHOTGUN: Shotgun sequencing of clones (genomic or cDNA)
- TRANSPOSON: Sequences obtained using templates generated bytransposons
- WCS: Whole Chromosome Shotgun
- WGS: Whole Genome Shotgun
- TRANSPOSON_ACC*
- DDBJ/EMBL/Genbank accession for transposon used in generating
sequencing template.
Type: varchar(50)
Example: X00913The TRANSPOSON_ACC would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=TRANSPOSON
- TRANSPOSON_CODE*
- Center defined code for transposon used in generating sequencing
template.
Type: varchar(50)
Example: Mu transposonThis TRANSPOSON_CODE field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=TRANSPOSON
- WELL_ID
- Center defined well identifier for the sequencing reaction.
Type: varchar(50)
Example: A1 The fieldWELL_ID in combination with the field PLATE_ID, is used to define the storage location of the sequencing reaction (see note with the fieldPLATE_ID). Typically,sequencing reactions are performed in standard microtiter dishes having either 96 or 384 wells (see standard configurations below).
Standard 96 well microtiter configuration
Standard 384 well microtiter configuration
Internal Fields List
- BASECALL_LENGTH
- Length of the trace in base pairs.
Type: int
Example: 396
- BASES_20
- Number of base pairs for which quality score exceed 20.
Type: smallint
Example: 50Warning: There are some depositions that do not have quality scores. This is likely due to the center submitting ABI files and not providing quality calls separately.
- BASES_40
- Number of base pairs for which quality score exceed 40.
Type: smallint
Example: 50Warning: There are some deposition sthat do not have quality scores. This is likely due to the center submitting ABI files and not providing quality calls separately.
- BASES_60
- Number of base pairs for which quality score exceed 60.
Type: smallint
Example: 50Warning: There are some depositions that do not have quality scores. This is likely due to the center submitting ABI files and not providing quality calls separately.
- LOAD_DATE
- Date on which the data was loaded.
Type: smalldatetime
Example: Jan 8 2001 11:59AM
- MATE_PAIR
- TI’s of the reads obtained from the other end of the same template.
Type: int
Example: 203682255MATE PAIR is the pair of reads obtained from two ends of the same template (FORWARD and REVERSE).
- REPLACED_BY
- TI that replaced the current TI as “active”.
Type: int
Example: 304753779This field points to the more recent data set. If trace was updated then the REPLACED_BY field stores theTI for the new trace. If only ancillary information has been updated, then replaced_by=0 and is not shown.
- STATE
- Indicates the status of the trace.
Type: varchar
Example: activePossible values:
- active
- updated
- withdrawn
- TAXID
- NCBI Taxonomy ID.
Type: int
Example: 10090This field links DDBJ Trace Archive with NCBI Taxonomy Browser.
- TI
- Trace unique internal Identifier.
Type: int
Example: 304753779It is given for a record at the loading stage, and any record,or number of records can be obtain by their identifiers.
- UPDATE_DATE
- Date on which the data was updated/replaced.
Type: smalldatetime
Example: Jul 19 2001 3:48PMThis field is used to store the date of the last update.
Submit trace data
Data submission of human subjects research
For all data from human subjects researches submitted to DDBJ, it is
submitter’s responsibility to ensure that the privacy of participant
(human subject) is protected in accordance with all applicable laws,
regulations and policies of submitter’s institute.
In principle, make sure to remove any direct personal identifiers of
human subjects from your submissions.
Before submitting data from human subjects researches, read the “Data
submission of human subjects research”.
Create submission files
The metadata file (TRACEINFO file) describes the submitted data as well as points to the location of the chromatograms. All submissions when extracted should have a top directory. All metadata files should be placed under that directory. In case when the submission should contain trace files at least one more directory should be introduced to the top directory and all trace files should be placed under that directory. The trace files (either in SCFor in ABIformat) should not appear in the top level directory, but rather should be in a subdirectory. It is suggested to use the name of the traces or the name of the project for subdirectories. There may be subdirectories within and this is encouraged to group traces. Below are examples of the submission directory hierarchy.
Submission directory hierarchy example
TOP_DIRECTORY/
TOP_DIRECTORY/TRACEINFO
TOP_DIRECTORY/traces
TOP_DIRECTORY/traces/FLJ/
TOP_DIRECTORY/traces/FLJ/FLJA1U0001.scf
TOP_DIRECTORY/traces/FLJ/FLJA1U0002.scf
TOP_DIRECTORY/traces/FLJ/FLJA1U0003.scf
The metadatafile can be either in XML or in tab-delimited format. The metadata requirements are in the Validation Table (spreadsheet format)for specific combinations of STRATEGY and TRACE_TYPE_CODE. Both types of metadata files can contain common fields section at the beginning of it. This section defines common for the submission values if any.
Below are examples of TRACEINFO metadata files.
TRACEINFO xml example
<?xml version="1.0"?>
<trace_volume>
<common_fields>
<center_name>CENTER NAME ACRONYM IS HERE</center_name>
<center_project>FLJ</center_project>
<source_type>N</source_type>
<species_code>HOMO SAPIENS</species_code>
<strategy>EST</strategy>
<submission_type>NEW</submission_type>
<trace_format>SCF</trace_format>
<trace_type_code>EST</trace_type_code>
</common_fields>
<trace>
<trace_name>F-3NB691000020</trace_name>
<trace_file>./traces/F-3NB691000020.scf</trace_file>
<clone_id>3NB691000020</clone_id>
<library_id>3NB691</library_id>
<template_id>3NB691000020</template_id>
</trace>
<trace>
<trace_name>F-3NB691000033</trace_name>
<trace_file>./traces/F-3NB691000033.scf</trace_file>
<clone_id>3NB691000033</clone_id>
<library_id>3NB691</library_id>
<template_id>3NB691000033</template_id>
</trace>
--- more information ---
</trace_volume>
TRACEINFO tab-delimited text example
center_name = CENTER NAME ACRONYM IS HERE
center_project = FLJ
source_type = N
species_code = HOMO SAPIENS
strategy = EST
submission_type = NEW
trace_format = SCF
trace_type_code = EST
trace_name clone_id library_id template_id trace_file
F-3NB691000020 3NB691000020 3NB691 3NB691000020 ./traces/F-3NB691000020.scf
F-3NB691000033 3NB691000033 3NB691 3NB691000033 ./traces/F-3NB691000033.scf
--- more information ---
Upload submission files
DTA creates a directory for data submission. Please contact to the DTA team. Transfer files by SCP according to the manual.
Submission directory example
submission/submitter_id/dta/dta_submitter_id-0001
Directory for the DTA submission is separated from those for the DDBJ Sequence Read Archive.
Completion of submission
After submission files become complete, DTA can keep the data private until the submitters instruct us to release the data. After instruction of data release, DTA uploads the files to the NCBI Trace Archive. As soon as the data are loaded to the NCBI Trace Archive, TI numbers are assigned and the data become public.
Please note that TI number assignment and data release are concurrent events.
Update
To update the records, please contact to the DTA team.