Trace Archive

Trace Archive has been retired.
See “Access Trace Data” regarding how to access trace data.

Example: TI number 2282248605
curl “https://www.ncbi.nlm.nih.gov/Traces/sra-reads-be/fasta?ti=2282248605&retmode=text”

You may submit capillary sequencing data to DRA. Please select capillary sequencing instruments for the Experiment Instrument.
Real data DRX395641-DRX395673.

Trace Archive overview

DDBJ Trace Archive (DTA)is a permanent repository of DNA sequence chromatograms (traces), base calls, and quality estimates for single-pass reads from various large-scale sequencing projects. DTA is a member of the International Nucleotide Sequence Database Collaboration (INSDC)and collects the data in a collaboration with NCBIand EBI. NCBI Trace Archiveissues and manages IDs.

Released data can be searched and retrieved at the NCBI Trace Archive.

DDBJ Sequence Read Archiveaccepts trace data. Please consider to submit trace data to DRA.

Metadata

There are fields that are required for specific combinations of STRATEGYand TRACE_TYPE_CODE. You may check requirements in the Validation Table. Metadata can be searched at the NCBI Trace Archive.

Trace Archive RFC

Required*
May be required, depending upon the trace type and strategy employed*

Metadata Field List

ACCESSION

DDBJ/EMBL/Genbank accession number

Type: varchar(30)
Example: AC22227

The ACCESSION is assigned upon deposition to a public repository (DDBJ/EMBL/Genbank). This field will not be applicable to all trace types (primarily WGS). However, if this field contains a validaccession identifier correlation between the primary sequence data (in Trace) and the secondary sequence data (in the public repository) is facilitated.

AMPLIFICATION_FORWARD *

The forward amplification primer sequence

Type: varchar(100)
Example: GGATTCTGACTAACGAGC

The AMPLIFICATION_FORWARD field is to allow submitters to define the primers used to amplify templates for sequencing. This field is required when TRACE_TYPE_CODE=PCR or RT-PCR.

AMPLIFICATION_REVERSE *

The reverse amplification primer sequence.

Type: varchar(100)
Example: GGATTCTGACTAACGAGC

The AMPLIFICATION_REVERSE field is to allow submitters to define the primers used to amplify templates for sequencing. This field is required when TRACE_TYPE_CODE=PCR or RT-PCR.

AMPLIFICATION_SIZE

The expected amplification size for a pair of primers.

Type: int
Example: 500

The AMPLIFICATION_SIZE field allows submitters to define the expected amplification size for a pair of primers (defined in the AMPLIFICATION_FORWARD and AMPLIFICATION_REVERSEfields). This number should be given in base pairs. If TRACE_TYPE_CODE=PCR, the amplification size is based on amplification of genomic DNA. If the TRACE_TYPE_CODE=RT-PCR, then the amplification size is based on amplification of transcript.

ANONYMIZED_ID

Anonymous ID for an individual.

Type: varchar(100)
Example:2222anonym

Used in projects to maintain the anonymity of donors. In many cases, there may be a controlled access database that can map many anonymized_ids in the trace archive to a single individual id for which phenotypic information may be available.

ATTEMPT: Number of times the sequencing project has been attempted by the center and/or submitted to the Trace Archive.
Type: tinyint(1-255)
Example: 2

BASE_FILE

File name with base calls.

Type: varchar(200)
Example: ./mytraces/123clone.fasta

Trace files which do not include the basecalls must provide this information in a separate file. The file designations are recorde din the BASE_FILE field of the metadata file. If basecalls are provided in separate files the information in these files will overwrite any information in the trace (usually *.scf) file. If the base calls that would be provided in the BASE_FILE are the same as the information in the trace file, DO NOT PROVIDE THE FILE. If the center provides theBASE_FILE andQUAL_FILE, then the peak index information should also be provided in a file calledPEAK_FILE.

CENTER_NAME *

Name of the sequencing center.

Type: varchar(50)
Example: WUGSC

Sequencing centers wishing to submit data must contact the DDBJ Trace Archive administrators to determine a center abbreviation. This abbreviation issued in the CENTER_NAMEfield. This field has a controlled vocabulary. For the complete list of submitting centers see: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?view=submitting_centers

These center names are controlled separately from those of the Sequence Read Archive

CENTER_PROJECT *

Center defined project name.

Type: varchar(100)
Example: HBBB

The CENTER_PROJECT reflects a sequencing center’s internal designation for a specific sequencing project.This field can be useful for grouping related traces.

CHEMISTRY: Description of the chemistry used in the sequencing reaction.
Type: varchar(50)
Example: BIGDYEV3.0

CHEMISTRY_TYPE

Type of chemistry used in the sequencing reaction.

Type: char(50)
Example: P

The CHEMISTRY_TYPE uses a controlled list.
Accepted values are:
PrimerTerminatorp=primer; t=terminator

CHROMOSOME

Chromosome to which the trace is assigned.

Type: varchar(8)
Example: 11

The CHROMOSOMEindicates to which chromosome a trace has been assigned. Gene names or cytogenetic positions are not appropriate substitutes for chromosome information.

CLIP_QUALITY_LEFT

Left clip of the read, in base pairs, based on quality analysis.

Type: int
Example: 56

The CLIP_QUALITY_LEFT field indicates the base at the beginning of the sequence at which the read should be clipped due to poor quality sequence. The given value would be the first base of the high quality region of the trace.

CLIP_QUALITY_RIGHT

Right clip of the read, in base pairs, based on quality analysis.

Type: int
Example: 256

The CLIP_QUALITY_RIGHT field indicates the base at the end of the sequence at which the read should be clipped due to poor quality sequence. The given value would be the last base of the high quality region of the trace.

CLIP_VECTOR_LEFT *

Left clip of the read, in base pairs, based on vector sequence.

Type: int
Example: 75

The CLIP_VECTOR_LEFT field indicates the base at the beginning of the sequence at which the read should be clipped due to vector sequence. The given value would be the first base of non-vector sequence. This field is required for almost all combinations of STRATEGY and TRACE_TYPE_CODE. This information can be omitted if the INSERT_FLANK_LEFT field is populated or TRACE_TYPE_CODE is PCR or RT-PCR.

CLIP_VECTOR_RIGHT *

Right clip of the read, in base pairs, based on vector sequence.

Type: int
Example: 275

The CLIP_VECTOR_RIGHT field indicates the base at the end of the sequence at which the read should be clipped due to vector sequence. The given value would be the last non-vector sequence. This field is required for almost all combinations of STRATEGYand TRACE_TYPE_CODE. This information can be omitted if the INSERT_FLANK_RIGHT field is populated or TRACE_TYPE_CODE is PCR or RT-PCR.NOTE: Many centers combine vector and quality analysis, and thus have only one set of clip values. Inthis case, the set of values should be placed in the CLIP_VECTOR_LEFT/CLIP_VECTOR_RIGHT fields.

CLONE_ID *

The name of the clone from which the trace was derived.

Type: varchar(30)
Example: RP23-1123F10

The CLONE_ID field issued to store the identifier related to an individual clone, for example a BAC clone, PAC clone or cDNA clone. If the clone is registered with the clone registry(http://www.ncbi.nlm.nih.gov/clone/), standard clone registry nomenclature (http://www.ncbi.nlm.nih.gov/clone/content/overview/) should be used.
This field is required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=cDNA;TRACE_TYPE_CODE=Any
STRATEGY=EST;TRACE_TYPE_CODE=Any
STRATEGY=CLONEEND;TRACE_TYPE_CODE=CLONEEND
STRATEGY=CLONE;TRACE_TYPE_CODE=Any
STRATEGY=ENCODE;TRACE_TYPE_CODE=SHOTGUN;
PrimerWalk; CLONEEND STRATEGY=FINISHING;TRACE_TYPE_CODE=Any

CLONE_ID_LIST *

Semi-colon delimited list of clones if the Strategy is PoolClone.

Type: varchar(30)
Example: RP23-200A2;RP23-500P1

The CLONE_ID_LISTfield is used only if STRATEGY=PoolClone. In this case, the list of clones is provided as a semicolon delimited list. If the clones are registered with the Clone Registry (http://www.ncbi.nlm.nih.gov/clone/), standard clone registry nomenclature (http://www.ncbi.nlm.nih.gov/clone/content/overview/) should be used (see CLONE_ID field).Note: The list of clones is not limited, but the size of the individual clone within the list is limited to 30 bytes.
This field is required for the following combination of STRATEGY and TRACE_TYPE_CODE: STRATEGY=PoolClone;TRACE_TYPE_CODE=Any

COLLECTION_DATE *

The full date, in “Mar 2 2006 12:00AM” format, on which an environmental sample was collected.

Type: datetime
Example: Mar 2 2006 12:00AM

The COLLECTION_DATE field is used to define the date and time on which an environmental sample was collected.
This field is required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=Env Sample-Geo; TRACE_TYPE_CODE=Any
STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any

CVECTOR_ACCESSION

Repository (DDBJ/EMBL/Genbank) accession identifier for the cloning vector.

Type: varchar(50)
Example: AY451994

The CVECTOR_ACCESSION field holds the accession number for the cloning vector used. This cloning vector relates to the clone named in the CLONE_ID field.

CVECTOR_CODE

Center defined code for the cloning vector.

Type: varchar(50)
Example: PBACE3.6

The CVECTOR_CODE field holds the user defined identifier for the cloning vector. Submitters are encouraged to submit all vector sequence information to public repositories.

DEPTH

Depth (in meters) at which an environmental sample was collected.

Type: float
Example: 10M

The DEPTH field is applicable to water samples and earth samples. If the value of this field is NULL, it is anticipated the sample was taken from the surface of the environment. While this field is only applicable to environmental samples, it is not required.

ELEVATION

Elevation (in meters) at which an environmental sample was collected.

Type: float
Example: 500

If the value of this field is NULL it is assumed the data were obtained at sea level. The field ELEVATION is only applicable to some environmental sample data, but is not a required field.

ENVIRONMENT_TYPE *

Type of environment from which an environmental sample was collected.

Type: varchar(250)
Example: sea water

The ENVIRONMENT_TYPE field is used to describe the specific environment from which an environmental sample was taken. While the LATITUDE and LONGITUDE fields describe the location many types of environmental types could exist at this location (for example, soil, sludge, tree roots, etc).
This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample -Geo; TRACE_TYPE_CODE=Any

EXTENDED_DATA

Extra ancillary information wrapped around in a EXTENDED_DATA block, where actual values are provided with a special tag.

Type: varchar()
Example:

<extended_data>
<field name=’SamplingSiteMonthChlorophyllLevel’>1.4 mg_mm</field>
<field name=’SamplingSiteYearlyChlorophyllLevel’>1.12 mg_mm</field>
<field name=’SamplingSiteYearlyChlorophyllLevelStdError’>0.19 mg_mm</field>
</extended_data>
The ‘=’ sign and the field separator character ‘|’ should be excluded from names and their values. No other validity checks will be performed on the data.

FEATURE_ID_FILE

File describing the features and their locations on a chip.

Type: varchar(200)
Example: ./mytraces/chip2.cdf

The FEATURE_ID_FILE provides the location and sequence of the features for a given chip when TRACE_TYPE_CODE=”CHIP”.

FEATURE_ID_FILE_NAME *

Reference to a common FEATURE_ID_FILE which should be submitted first.

Type: varchar(200)
Example:

This field is required when TRACE_TYPE_CODE=”CHIP”.

FEATURE_SIGNAL_FILE

File giving the signal and variance for features on a chip.

Type: varchar(200)
Example: ./mytraces/chip2.signal

The FEATURE_SIGNAL_FILE provides the signal and variance of signal for the features on a given chip when TRACE_TYPE_CODE=”CHIP”.

FEATURE_SIGNAL_FILE_NAME *

Reference to a common FEATURE_SIGNAL_FILE which should be submitted first.

Type: varchar(200)
Example:

This field is required when TRACE_TYPE_CODE=”CHIP”.

GENE_NAME

Gene name or some other common identifier.

Type: varchar(100)
Example: transporter 1

Free text. Mainly this field would be for TRACE_TYPE_CODE=’Re-sequencing’ or’ENCODE’. When a group is analyzing a particular gene, they may want to refer to that gene by it’s name or some other common identifier.

HI_FILTER_SIZE

The largest filter used to stratify an environmental sample.

Type: varchar(50)
Example: 50 micron

The HI_FILTER_SIZE field is applicable only to environmental sample data but is not a required field.

HOST_CONDITION

The condition of the host from which an environmental sample was obtained.

Type: varchar(100)
Example: HIV-positive

The HOST_CONDITION field is only applicable to environmental sample data and is used to describe the condition (healthy, sick, etc) of the host from which a sample was taken.

HOST_ID *

Unique identifier for the specific host from which an environmental sample was taken.

Type: varchar(100)
Example: yerkes pedigree #C0479 ‘Clint’

The HOST_IDENTIFIER field is only applicable to environmental sample data and is used to capture the unique name for the specific host from which a sample was obtained.
This field would be required for the following combination ofSTRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any

HOST_LOCATION *

Specific location on the host from which an environmental sample was collected.

Type: varchar(100)
Example: rumen

The HOST_LOCATION field is only applicable to environmental sample data and is used to describe the specific part of the host from which the sample was obtained, for example: dental plaque, hindgut, root surfaces.
This field would be required for the following combination ofSTRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any

HOST_SPECIES *

The host from which an environmental sample was obtained.

Type: varchar(100)
Example: Pan troglodytes

The HOST_SPECIES field is only applicable to environmental sample data.
This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE: STRATEGY=Env Sample-Host; TRACE_TYPE_CODE=Any

INDIVIDUAL_ID

Publicly available identifier to denote a specific individual or sample from which a trace was derived.

Type: varchar(100)
Example: NA12345

The INDIVIDUAL_IDfield provides a center specific unique id that can associate as pecific trace to an individual. This will be used primarily for population based studies.

INSERT_FLANK_LEFT *

Flanking sequence at the cloning junction.

Type: varchar(100)
Example: AAGGTGCGATGCAGTGGCAGTAGCAGTGTCGACGTGACGATTCGTCCGGA

The INSERT_FLANK_LEFT field should provide from 50 up to 100 bases of sequence (including linkers) to the left of the cloning junction. This information will allow users to perform their own vector trimming of reads. This field is required for almost all combinations of STRATEGY and TRACE_TYPE_CODE. This field can be omitted if CLIP_VECTOR_LEFT is populated.However, INSERT_FLANK_LEFT is the preferred choice. If there was no cloning step involved in the sequencing, please populate the field with ‘NONE’.

INSERT_FLANK_RIGHT *

Flanking sequence at the cloning junction.

Type: varchar(100)
Example: AAGGCGCGATGCAGTGAGCGAGGCTGACGTCGGCTAGCGTCGCGTCGGGT

The INSERT_FLANK_RIGHT field should provide from 50 up to 100 bases of sequence (including linkers) to the right of the cloning junction. This information will allow users to perform their own vector trimming of reads. This field is required for almost all combinations of STRATEGY and TRACE_TYPE_CODE. This field can be omitted if CLIP_VECTOR_RIGHT is populated.However, INSERT_FLANK_RIGHT is the preferred choice. If there was no cloning step involved in the sequencing, please populate the field with ‘NONE’. It is anticipated that if INSERT_FLANK_LEFT is populated that INSERT_FLANK_RIGHT will also be populated. It is not anticipated that a mixture of clip values and junction sequence will be specified. (i.e. CLIP_VECTOR_LEFT andINSERT_FLANK_RIGHT populated for the same record.

INSERT_SIZE *

Expected size of the insert (referred to by the value in the TEMPLATE_ID field) in base pairs

Type: int
Example: 2000

The INSERT_SIZEfield indicates the expected insert size of the clone that is sequenced. It is understood that this is an estimate based upon the average insert sizes found in a given library. However, this information is critical for certain experiments, such as whole genome assembly.

This field would be required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=WGSSTRATEGY=Any;
TRACE_TYPE_CODE=WCSSTRATEGY=cDNA;TRACE_TYPE_CODE=CLONEENDSTRATEGY=CLONEEND;
TRACE_TYPE_CODE=CLONEEND

INSERT_STDEV *

Approximate standard deviation of value in INSERT_SIZE field.

Type: int
Example: 200

The INSERT_STDEVfield reflects the approximate standard deviation of the insert size. It is understood that this information is an approximation and may change as better data is obtained. This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=WGSSTRATEGY=Any;
TRACE_TYPE_CODE=WCSSTRATEGY=cDNA;
TRACE_TYPE_CODE=CLONEENDSTRATEGY=CLONEEND;TRACE_TYPE_CODE=CLONEEND

LATITUDE *

The latitude measurement (using standard GPS notation) from which a

sample was collected.

Type: float
Example: 54.736

The LATITUDE field is required to describe the collection of some environmental sample data. The latitude range is [-90,90] with the equator as 0 latitude and positive values of latitude are north of the equator. This field would be required for the following combination ofSTRATEGY andTRACE_TYPE_CODE:
STRATEGY=Env Sample- Geo;TRACE_TYPE_CODE=Any

LIBRARY_ID *

The source of the clone identified in the CLONE_ID field

Type: varchar(100)
Example: RP23

The LIBRARY_ID field documents the source library of the archival clone resource. Many genomic libraries have been registered with the Clone Registry (http://www.ncbi.nlm.nih.gov/clone) and the standard nomenclature (http://www.ncbi.nlm.nih.gov/clone/content/overview/) should be used for these libraries. This field would be requiredfor the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=cDNA;TRACE_TYPE_CODE=AnySTRATEGY=EST;TRACE_TYPE_CODE=Any
STRATEGY=CLONEEND;TRACE_TYPE_CODE=CLONEENDSTRATEGY=CLONE;
TRACE_TYPE_CODE=AnySTRATEGY=ENCODE;TRACE_TYPE_CODE=SHOTGUN;PrimerWalk; CLONEEND

LONGITUDE *

The longitude measurement (using standard GPS notation) from which a sample was collected.

Type: float
Example: -86.403

The LONGITUDE field is required to describe the collection of some environmental sample data. The longitude is ranging from 0° at the Prime Meridian to +180° eastward and -180° westward.
This field would be required for the following combination of STRATEGYand TRACE_TYPE_CODE:
STRATEGY=Env Sample-Geo; TRACE_TYPE_CODE=Any

LO_FILTER_SIZE

The smallest filter size used to stratify an environmental sample.

Type: varchar(50)
Example: 25 micron

The LO_FILTER_SIZE field is only applicable to environmental sample data but is not a required field.

NCBI_PROJECT_ID

BioProject ID generated by the INSDC.

Type: int
Example: 7

NCBI_PROJECT_ID field would allow to link traces to BioProject database and easily retrieve sets of traces from each Project. Genome sequencing centers may apply their project to the DDBJ BioProject prior the submission of genomic sequence data. Submitters need not submit sequencing data at the time they register their project.

ORGANISM_NAME *

Description of species for BARCODE project from which trace is derived.

Type: varchar(100)
Example: Acanthocybium solandri

The ORGANISM_NAME field is used to classify the read by species for BARCODE data, using proper taxonomic name in accordance with Taxonomy Browser. SPECIES_CODE=”BARCODESPECIES” for all traces from this project. This field would be required for the STRATEGY=BARCODE.

PEAK_FILE

Name of file that contains the list of peak values.

Type: varchar(200)
Example: ./mytraces/123clone.peak

Consult the BASE_FILE field description for more information.

PH

The pH at which an environmental sample was collected.

Type: float
Example: 7.2

The PH field is only applicable to environmental sample data but is not a required field.

PICK_GROUP_ID: Id to group traces picked at the same time.
Type: int
Example: 939065

PLACE_NAME

Country in which the biological sample was collected and/or common name for a given location.

Type: varchar(250)
Example: Octopus Springs

The PLACE_NAME field is applicable to environmental sample data, but is not required.

PLATE_ID

Submitter defined plate id.

Type: varchar(32)
Example: 203

The PLATE_ID and WELL_ID fields are intended to identify the storage location of the sequencing template (not the library well coordinate of an archival clone named in theCLONE_ID field). This may enable flipped or contaminated trays to be easily identified. If a particular experiment did not require the use of a plate, please populate this field with ‘0’.

POPULATION_ID

Center provided id to designate a population from which a trace (or group of traces) was derived.

Type: varchar(100)
Example: CEPH

The POPULATION_ID field is used to capture center specific designations of groups of individuals. This will likely only be useful in population studies(usually STRATEGY=SNP).

PREP_GROUP_ID: ID that defines groups of traces prepared at the same time.
Type: varchar(30)
Example: A2

PRIMER

The primer sequence (used in the sequencing reaction).

Type: varchar(200)
Example: GAATACCTACGATCGCC

The value of the PRIMER field is the actual base sequence of the sequencing primer used. If a center uses a primer extensively, the primer sequence can be entered into the list of primer codes and the PRIMER_CODE field can be used.

PRIMER_CODE: Identifier for the sequencing primer used.
Type: varchar(30)
Example: Sp6

PRIMER_LIST *

A ‘;’ delimited list of primers used in a mapping experiment (such as AFLP).

Type: varchar(100)
Example: AAGGTCTGCGCGTGTC;AGCTGCGTACGTAATCG;

This field is required if STRATEGY=”AFLP” and TRACE_TYPE_CODE=”PCR”.

PROGRAM_ID *

The program used to create the trace file.

Type: varchar(100)
Example: phred-19990722h

The PROGRAM_ID field is used to indicate the base calling program. This field is free text. Program name, version numbers or dates are very useful.
More example values:

phred-19980904e
abi-3.1
ATQA
TraceTuner
Licor
Megabase
Beckman

PROJECT_NAME

Term by which to group traces from different centers based on a common project.

Type: varchar(50)
Example: New Project

In this way sequencing centers that are working on the same large project can group all of the traces for this project using a common term. This field has a controlled vocabulary. Sequencing centers wishing to submit data must contact the DDBJ Trace Archive to determine a name that all members of the project agree on.

QUAL_FILE

Name of file containing the quality scores.

Type: varchar(200)
Example: ./mytraces/123clone.fasta.qs

Trace files which do not include the quality scores must provide this information in a separate file. The file designations are recorded in the QUAL_FILE fields of the metadata file. The actual quality scores are stored in the file designated in theQUAL_FILE field. If quality scores are provided in separate files the information in these files will overwrite any information in the trace (usually *.scf) file. If the quality scores that would be provided in the QUAL_FILE are the same as the information in the trace file, DO NOT PROVIDE THE FILE. However, it is important to note that if some formats do not include the quality scores, then these values must be provided as ancillary information. If the center provides theBASE_FILE andQUAL_FILE, then the peak index information should also be provided in a file calledPEAK_FILE.

REFERENCE_ACCESSION *

Reference accession (use accession and version to specify a particular instance of a sequence) used as the basis for a re-sequencing project. In case of Comparative strategy show the basis for primers design.

Type: varchar(50)
Example: NT_029829.1

This field is required for the following combination ofSTRATEGY and TRACE_TYPE_CODE:
STRATEGY=Re-sequencing;Comparative TRACE_TYPE_CODE=Any

REFERENCE_ACC_MAX *

Finish position for a particular amplicon in re-sequencing or comparative projects.

Type: int
Example: 30929

This field points to the finishing coordinate of the accession.version described in the REFERENCE_ACCESSION field. All coordinates should be in 1 base coordinates (i.e.sequences start at base 1, not base 0). This field is required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Re-sequencing; TRACE_TYPE_CODE=SHOTGUN; PCR;RT-PCR

REFERENCE_ACC_MIN *

Start position for a particular amplicon in re-sequencing or comparative projects.

Type: int
Example: 29829

This field points to the starting coordinate of theaccession.version described in theREFERENCE_ACCESSIONfield. All coordinates should be in 1 base coordinates (i.e.sequences start at base 1, not base 0). This field is required forthe following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Re-sequencing; TRACE_TYPE_CODE=SHOTGUN; PCR;RT-PCR

REFERENCE_OFFSET *

Sequence offset of accession specified in REFERENCE_ACCESSION field to define the coordinate start position used as the basis for a re-sequencing project.

Type: int
Example: 1520899

REFERENCE_SET_MAX

Finish position for a entire re-sequencing region. This region may include several amplicons.

Type: int
Example: 29829　　

This field points to the starting coordinate of theaccession.version described in the REFERENCE_ACCESSION field for a entire re-sequencing region. All coordinates should be in 1 base coordinates (i.e. sequences start at base 1, not base 0).The REFERENCE_ACC_ [MIN|MAX] and REFERENCE_SET_[MIN|MAX] should refer to the same REFERENCE_ACC.

REFERENCE_SET_MIN

Start position for a entire re-sequencing region. This region may include several amplicons.

Type: int
Example: 29829

RUN_DATE: Date the sequencing reaction was run.
Type: datetime
Example: 2000-10-28

RUN_GROUP_ID: ID used to group traces run on the same machine.
Type: varchar(30)
Example: group2

RUN_LANE

Lane or capillary of the trace.

Type: int
Example: 1

The RUN_LANE documents the specific lane or capillary on which a trace was obtained.

RUN_MACHINE_ID: ID of the specific sequencing machine on which a trace was obtained.
Type: varchar(30)
Example: machine2

RUN_MACHINE_TYPE: Type or model of machine on which a trace was obtained.
Type: varchar(30)
Example: ABI 310

SALINITY: The salinity at which an environmental sample was collected measured in parts per thousand units (promille).
Type: float
Example: 20

The SALINITY field is only applicable to environmental sample data but is not a required field.

SEQ_LIB_ID *

Center specified M13/PUC library that is actually sequenced.

Type: varchar(255)
Example: 22194

The SEQ_LIB_ID field is the center identifier for the M13/PUC based clone that is actually sequenced. This will allow grouping of traces by the actual ligation event and is applicable to most projects. Thi svalue will be unique within a given center. This field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=SHOTGUN
STRATEGY=Any;TRACE_TYPE_CODE=WGS/WCS

SOURCE_TYPE *

Source of the DNA.

Type: varchar(50)
Example: GENOMIC DNA

The SOURCE_TYPEfield consists of a code. Possible values are:

G=Genomic DNA (includes PCR products from genomic DNA)
N=Non Genomic DNA (EST, cDNA, RT-PCR, screened libraries)
VIRAL RNA=Viral RNA
SYNTHETIC=Synthetic DNA

Accepted values are G, N, GENOMIC, NON GENOMIC, VIRAL RNA,SYNTHETIC

SPECIES_CODE *

Description of species from which trace is derived.

Type: varchar(100)
Example: Homo sapiens

The SPECIES_CODEfield is used to classify the read by species, using proper taxonomic names where possible. This field currently is maintained as a controlled vocabulary. For a list of species currently contained within the Trace Archive, see: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=stat&f=xml_list_species&m=obtain&s=speciesTo submit a new species, please contact the DDBJ Trace Archive prior to submission. For cases in which it is unclear ofthe taxonomic origin of a specific trace the taxonomic classification ‘ENVIRONMENTAL SEQUENCE’ can be used in a case of environmental samples or ‘ARTIFICIAL SEQUENCE’ in a case of artificial material.

STRAIN *

Strain from which a trace is derived.

Type: varchar(50)
Example: C57BL/6J　　

STRAIN is required for STRATEGY=”SNP”

STRATEGY *

Experimental STRATEGY.

Type: varchar(50)
Example: MODEL VERIFY

Experimental STRATEGY used when obtaining the trace. It is proposed that this would be a controlled vocabulary, but that submitters would contribute to this list as needed to define various experiments and projects.

Current values (this list would continually be expanding):

AFLP: Amplified Fragment Length Polymorphism
BARCODE: DNA sequence analysis of a uniform target gene to enable species identification
CCS: Concatenated cDNA sequencing
cDNA: Sequences generated in the process of sequencing cDNA clones
CF-S: Cot-filtered single/low-copy genomic DNA
CF-M: Cot-filtered moderately repetitive genomic DNA
CF-H: Cot-filtered highly repetitive genomic DNA
CF-T: Cot-filtered theoretical single-copy DNA
CLONE: Genomic clone based (hierarchical) sequencing
CLONEEND: Sequences generated from the end of a clone(BAC/PAC/Fosmid or cDNA)
Comparative: Sequences obtained using primers design from related species
CTS: Concatenated Tag Sequencing
Env Sample-GEO: Geographically generated environmental sample
Env Sample-Host: Environmental samples collected from a specific host
EST: single pass sequencing of cDNA templates
FINISHING: a read specifically made for finishing, could be either BAC finishing or Whole Genome Assembly (WGA) finishing
MODEL VERIFY: Sequences obtained to verify proposed gene models
PoolClone: Pools of clones (BACs mostly)
SNP: Reads used for SNP identification
TARGETED LOCUS: Sequences obtained from templates generated by primers designed to amplify a specific genetic locus
Re-sequencing: Re-sequencing of targeted genomic regions
RT-PCR: Sequences obtained using templates generated by Reverse Transcriptase Polymerase Chain Reaction
WGA: Whole Genome Assembly

SUBMISSION_TYPE *

Type of submission.

Type: varchar(50)
Example: NEW

The SUBMISSION_TYPE field allowed values:

NEW: use to submit new data
UPDATE: use to renew traces and their ancillary information. Previous data will be saved with their TI’s; new traces with the same trace_name’s will receive new TI’s and they will become active
UPDATEINFO: use to update or add ancillary information for already existing traces without re-submitting the entire package of data
WITHDRAW: use to withdraw traces

SVECTOR_ACCESSION: DDBJ/EMBL/Genbank accession of the sequencing vector.
Type: varchar(50)
Example: X52325

SVECTOR_CODE: Center defined code for the sequencing vector.
Type: varchar(50)
Example: pBluescript SK(+)

TEMPERATURE

The temperature (in ^oC) at which an environmental sample was collected.

Type: float
Example: 30

The TEMPERATUREfield is only applicable to environmental sample data but it is not a required field.

TEMPLATE_ID

Submitter defined identifier for the sequencing template.

Type: varchar(50)
Example: HBBBA2211

The TEMPLATE_IDfield is used to uniquely identify the actual template that is sequenced. This field, in conjunction with the TRACE_END field, can be used to identify traces that should be marked as ‘mate_pairs’because they come from opposite ends of the same clone.

TRACE_END

Defines the end of the template contained in the read.

Type: varchar(50)
Example: F

The TRACE_END field can have the following values:

F: FORWARD
R: REVERSE
N: UNKNOWN

TRACE_FILE *: Filename with the trace, relative to the top of the volume.
Type: varchar(200)
Example: ./traces/TRACE001.scf

TRACE_FORMAT *

Format of the trace file.

Type: varchar(20)
Example: scf　　

The TRACE_FORMATfield can have the following values:

SCF - A standard file format for data from DNA sequencing instruments.
ABI - A ABI-trace file is a binary file including the trace data and the sequence.

TRACE_NAME *

Center defined trace identifier.

Type: varchar(250)
Example: HBBBA1U2211

The TRACE_NAME field must be unique within a center, but is not required to be unique between centers. The combination of TRACE_NAME and CENTER_NAME act as a unique key within the Trace Archive.

TRACE_TYPE_CODE *

Sequencing strategy by which the trace was obtained.

Type: varchar(50)
Example: wgs The field

TRACE_TYPE_CODE reflects the sequencing STRATEGY used to obtain the trace.

Current values:

CHIP: Sequences obtained using microarrays (also called DNAchips or gene chips)
CLONEEND: Sequences generated from the end of a large insert(BAC/PAC/Fosmid) or cDNA clone
EST: Single Pass Expressed Sequence Tag
HTP SELEX: High throughput SELEX
OTHER: Other than PCR, PrimerWalk, SHOTGUN or TRANSPOSON for FINISHING STRATEGY
PCR: Sequences obtained using templates generated by genomic Polymerase Chain Reaction
PrimerWalk: Sequences generated through a primer walkingstep
RT-PCR: Sequences obtained using templates generated by Reverse Transcriptase Polymerase Chain Reaction
SHOTGUN: Shotgun sequencing of clones (genomic or cDNA)
TRANSPOSON: Sequences obtained using templates generated bytransposons
WCS: Whole Chromosome Shotgun
WGS: Whole Genome Shotgun

TRANSPOSON_ACC *

DDBJ/EMBL/Genbank accession for transposon used in generating sequencing template.

Type: varchar(50)
Example: X00913　　

The TRANSPOSON_ACC would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=TRANSPOSON

TRANSPOSON_CODE *

Center defined code for transposon used in generating sequencing template.

Type: varchar(50)
Example: Mu transposon

This TRANSPOSON_CODE field would be required for the following combination of STRATEGY and TRACE_TYPE_CODE:
STRATEGY=Any;TRACE_TYPE_CODE=TRANSPOSON

WELL_ID

Center defined well identifier for the sequencing reaction.

Type: varchar(50)
Example: A1 The field

WELL_ID in combination with the field PLATE_ID, is used to define the storage location of the sequencing reaction (see note with the fieldPLATE_ID). Typically,sequencing reactions are performed in standard microtiter dishes having either 96 or 384 wells (see standard configurations below).
Standard 96 well microtiter
configuration
Standard 96 well microtiter configuration
Standard 384 well microtiter
configuration
Standard 384 well microtiter configuration

Internal Fields List

BASECALL_LENGTH: Length of the trace in base pairs.
Type: int
Example: 396

BASES_20

Number of base pairs for which quality score exceed 20.

Type: smallint
Example: 50

Warning: There are some depositions that do not have quality scores. This is likely due to the center submitting ABI files and not providing quality calls separately.

BASES_40

Number of base pairs for which quality score exceed 40.

Type: smallint
Example: 50

Warning: There are some deposition sthat do not have quality scores. This is likely due to the center submitting ABI files and not providing quality calls separately.

BASES_60

Number of base pairs for which quality score exceed 60.

Type: smallint
Example: 50

Warning: There are some depositions that do not have quality scores. This is likely due to the center submitting ABI files and not providing quality calls separately.

LOAD_DATE: Date on which the data was loaded.
Type: smalldatetime
Example: Jan 8 2001 11:59AM

MATE_PAIR

TI’s of the reads obtained from the other end of the same template.

Type: int
Example: 203682255

MATE PAIR is the pair of reads obtained from two ends of the same template (FORWARD and REVERSE).

REPLACED_BY

TI that replaced the current TI as “active”.

Type: int
Example: 304753779

This field points to the more recent data set. If trace was updated then the REPLACED_BY field stores theTI for the new trace. If only ancillary information has been updated, then replaced_by=0 and is not shown.

STATE

Indicates the status of the trace.

Type: varchar
Example: active

Possible values:

active
updated
withdrawn

TAXID

NCBI Taxonomy ID.

Type: int
Example: 10090

This field links DDBJ Trace Archive with NCBI Taxonomy Browser.

TI

Trace unique internal Identifier.

Type: int
Example: 304753779

It is given for a record at the loading stage, and any record,or number of records can be obtain by their identifiers.

UPDATE_DATE

Date on which the data was updated/replaced.

Type: smalldatetime
Example: Jul 19 2001 3:48PM

This field is used to store the date of the last update.

Submit trace data

Data submission of human subjects research
For all data from human subjects researches submitted to DDBJ, it is submitter’s responsibility to ensure that the privacy of participant (human subject) is protected in accordance with all applicable laws, regulations and policies of submitter’s institute.
In principle, make sure to remove any direct personal identifiers of human subjects from your submissions.
Before submitting data from human subjects researches, read the “Data submission of human subjects research”.

Create submission files

The metadata file (TRACEINFO file) describes the submitted data as well as points to the location of the chromatograms. All submissions when extracted should have a top directory. All metadata files should be placed under that directory. In case when the submission should contain trace files at least one more directory should be introduced to the top directory and all trace files should be placed under that directory. The trace files (either in SCFor in ABIformat) should not appear in the top level directory, but rather should be in a subdirectory. It is suggested to use the name of the traces or the name of the project for subdirectories. There may be subdirectories within and this is encouraged to group traces. Below are examples of the submission directory hierarchy.

Submission directory hierarchy example

TOP_DIRECTORY/
TOP_DIRECTORY/TRACEINFO
TOP_DIRECTORY/traces
TOP_DIRECTORY/traces/FLJ/
TOP_DIRECTORY/traces/FLJ/FLJA1U0001.scf
TOP_DIRECTORY/traces/FLJ/FLJA1U0002.scf
TOP_DIRECTORY/traces/FLJ/FLJA1U0003.scf

The metadatafile can be either in XML or in tab-delimited format. The metadata requirements are in the Validation Table (spreadsheet format)for specific combinations of STRATEGY and TRACE_TYPE_CODE. Both types of metadata files can contain common fields section at the beginning of it. This section defines common for the submission values if any.

Below are examples of TRACEINFO metadata files.

TRACEINFO xml example

<?xml version="1.0"?>
<trace_volume>
   <common_fields>
      <center_name>CENTER NAME ACRONYM IS HERE</center_name>
      <center_project>FLJ</center_project>
      <source_type>N</source_type>
      <species_code>HOMO SAPIENS</species_code>
      <strategy>EST</strategy>
      <submission_type>NEW</submission_type>
      <trace_format>SCF</trace_format>
      <trace_type_code>EST</trace_type_code>
   </common_fields>
   <trace>
      <trace_name>F-3NB691000020</trace_name>
      <trace_file>./traces/F-3NB691000020.scf</trace_file>
      <clone_id>3NB691000020</clone_id>
      <library_id>3NB691</library_id>
      <template_id>3NB691000020</template_id>
   </trace>
   <trace>
      <trace_name>F-3NB691000033</trace_name>
      <trace_file>./traces/F-3NB691000033.scf</trace_file>
      <clone_id>3NB691000033</clone_id>
      <library_id>3NB691</library_id>
      <template_id>3NB691000033</template_id>
   </trace>
     --- more information ---
</trace_volume>

TRACEINFO tab-delimited text example

center_name = CENTER NAME ACRONYM IS HERE
center_project = FLJ
source_type = N
species_code = HOMO SAPIENS
strategy = EST
submission_type = NEW
trace_format = SCF
trace_type_code = EST
trace_name  clone_id    library_id  template_id trace_file
F-3NB691000020  3NB691000020    3NB691  3NB691000020    ./traces/F-3NB691000020.scf
F-3NB691000033  3NB691000033    3NB691  3NB691000033    ./traces/F-3NB691000033.scf
--- more information ---            

Upload submission files

DTA creates a directory for data submission. Please contact to the DTA team. Transfer files by SCP according to the manual.

Submission directory example

submission/submitter_id/dta/dta_submitter_id-0001

Directory for the DTA submission is separated from those for the DDBJ Sequence Read Archive.

Completion of submission

After submission files become complete, DTA can keep the data private until the submitters instruct us to release the data. After instruction of data release, DTA uploads the files to the NCBI Trace Archive. As soon as the data are loaded to the NCBI Trace Archive, TI numbers are assigned and the data become public.

Please note that TI number assignment and data release are concurrent events.

Update

To update the records, please contact to the DTA team.