Sequence Read Archive

  • Home
  • Handbook
    • Metadata examples
    • XML examples
    • XML schema
  • FAQ
  • Search
  • Downloads
    • FASTQ
    • SRA
    • XML Schema
  • About DRA
  • Home
  • dra
  • Example of metadata

Example of metadata

The DRA metadata submission tool cannot describe technical reads (adapter, primer and barcode sequences). When sequencing data contain technical reads to be submitted, submitters need to create ExperimentXML files and describe technical reads in the <SPOT_DESCRIPTOR>.

Experiment (Spot, Platform)

454 single reads

Read

Read composition

Read Index : 0 1
Read : TCAG ATAGAGTTGATCCTGGCTCAT……………
Base Coordinate : 1
5
80
Read Type : Adapter Forward

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Technical Read Adapter BaseCoord = 1
1 Application Read Forward BaseCoord = 5

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Technical Read</READ_CLASS>
      <READ_TYPE>Adapter</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
    <READ_SPEC>
      <READ_INDEX>1</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>5</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model
LS454 454 GS FLX Titanium

454 paired reads

Read

Read composition

Read Index : 0 1 2 3
Read : TCAG ATAGAGT……………CCTGG TCGTAT……………TATTACG CTCAT……………
Base Coordinate : 1 5    
Read Type : Adapter Forward Linker Forward

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Technical Read Adapter BaseCoord = 1
1 Application Read Forward BaseCoord = 5
2 Technical Read Linker ExpectedBasecallTable
3 Application Read Forward RelativeOrder

Expected Basecall Table

Base Call Min Match Max Mismatch Match Edge
TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG 38 5 full
CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA 38 5 full

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Technical Read</READ_CLASS>
      <READ_TYPE>Adapter</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
    <READ_SPEC>
      <READ_INDEX>1</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>5</BASE_COORD>
    </READ_SPEC>
    <READ_SPEC>
      <READ_INDEX>2</READ_INDEX>
      <READ_CLASS>Technical Read</READ_CLASS>
      <READ_TYPE>Linker</READ_TYPE>
      <EXPECTED_BASECALL_TABLE>
        <BASECALL min_match="38" max_mismatch="5" match_edge="full">TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG</BASECALL>
        <BASECALL min_match="38" max_mismatch="5" match_edge="full">CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA</BASECALL>
      </EXPECTED_BASECALL_TABLE>          
    </READ_SPEC>   
    <READ_SPEC>
      <READ_INDEX>3</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <RELATIVE_ORDER follows_read_index="2"/>
    </READ_SPEC>        
  </SPOT_DECODE_SPEC>      
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model
LS454 454 GS FLX Titanium

Illumina single reads

Read

Read composition

Read Index : 0
Read : ATAGAGTTGATCCTGG……………CCTGGCTCA
Base Coordinate :
1
72
Read Type : Forward

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Application Read Forward BaseCoord = 1

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
    <SPOT_LENGTH>72</SPOT_LENGTH>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model Sequence Length
Illumina Illumina Genome Analyzer IIx 72

Illumina paired reads

Read

Read composition

Read Index : 0 1
Read : ATAGAGTTGATCCTGG…………… CCTGGCTCATCAGTTGAT……………
Base Coordinate : 1
101
200
Read Type : Forward Reverse

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Application Read Forward BaseCoord = 1
1 Application Read Reverse BaseCoord = 101

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
  <SPOT_LENGTH>200</SPOT_LENGTH>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
    <READ_SPEC>
      <READ_INDEX>1</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Reverse</READ_TYPE>
      <BASE_COORD>101</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model Sequence Length
Illumina Illumina Genome Analyzer IIx 200

SOLiD single reads

Read

Read composition

Read Index : 0
Read : TTGATCCTGG……………CGCTCA
Base Coordinate :
1
50
Read Type : Forward

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Application Read Forward BaseCoord = 1

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
  <SPOT_LENGTH>50</SPOT_LENGTH>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model Sequence Length
ABI SOLID AB SOLiD System 3.0 50

SOLiD paired reads

Read

Read composition

Read Index : 0 1
Read : ATCCTGG…………… CATCAGTTGAT……………
Base Coordinate : 1
26
50
Read Type : Forward Forward

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Application Read Forward BaseCoord = 1
0 Application Read Forward BaseCoord = 26

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
  <SPOT_LENGTH>50</SPOT_LENGTH>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
    <READ_SPEC>
      <READ_INDEX>1</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>26</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model Sequence Length
ABI SOLID AB SOLiD System 3.0 50

Ion torrent single reads

Read

Read composition

Read Index : 0
Read : AGCCGTATATAG……………CGTCAGAA
Base Coordinate :
1
(x)
Read Type : Forward

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Application Read Forward BaseCoord = 1

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model Sequence Length
Ion torrent Ion torrent PGM/Proton  

PacBio single reads (Standard sequencing)

Metadata

Spot (Read Spec)

Adapter Spec
ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT


Read Index Read Class Read Type Ordering Method
0 Application Read Forward BaseCoord = 1

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model
Pacbio Smart PacBio RS

Experiment Attribute

Tag Value Units
Sequencing Protocol Standard sequencing  
Insert size 8000  

Experiment XML (EXPERIMENT_ATTRIBUTES)

<EXPERIMENT_ATTRIBUTES>
  <EXPERIMENT_ATTRIBUTE>
    <TAG>Sequencing Protocol</TAG>
    <VALUE>Standard sequencing</VALUE>
  </EXPERIMENT_ATTRIBUTE>
  <EXPERIMENT_ATTRIBUTE>
    <TAG>Insert size</TAG>
    <VALUE>8000</VALUE>
  </EXPERIMENT_ATTRIBUTE>    
</EXPERIMENT_ATTRIBUTES>

PacBio single reads (Circular consensus sequencing)

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Application Read Other BaseCoord = 1

Experiment XML (SPOT_DESCRIPTOR)

<SPOT_DESCRIPTOR>
  <SPOT_DECODE_SPEC>
    <READ_SPEC>
      <READ_INDEX>0</READ_INDEX>
      <READ_CLASS>Application Read</READ_CLASS>
      <READ_TYPE>Forward</READ_TYPE>
      <BASE_COORD>1</BASE_COORD>
    </READ_SPEC>
  </SPOT_DECODE_SPEC>
</SPOT_DESCRIPTOR>

Platform

Platform Instrument Model
Pacbio Smart PacBio RS

Experiment Attribute

Tag Value Units
Sequencing Protocol Circular consensus sequencing  
Insert size 700  

Experiment XML (EXPERIMENT_ATTRIBUTES)

<EXPERIMENT_ATTRIBUTES>
  <EXPERIMENT_ATTRIBUTE>
    <TAG>Sequencing Protocol</TAG>
    <VALUE>Circular consensus sequencing</VALUE>
  </EXPERIMENT_ATTRIBUTE>
  <EXPERIMENT_ATTRIBUTE>
    <TAG>Insert size</TAG>
    <VALUE>700</VALUE>
  </EXPERIMENT_ATTRIBUTE>    
</EXPERIMENT_ATTRIBUTES>

Typical examples causing errors in data validation

In the case of validation error, correct the metadata and re-upload data files after stopping validation process.

  • Reads having no application read
  • Reads with inconsistent base coordinate
  • Reads with relative order which cannot be specified

Reads having no application read

Read

Read composition

Read Index : 0 1
Read : ATCCGG CATCAGTTGAT…………………………………………………
Base Coordinate : 1
7
50
Read Type : Primer Linker (should have at least one application)

Reads with inconsistent base coordinate

Read 1

Read composition

Read Index : 0 1
Read : ATCCGG…………… CATCAG……………
Base Coordinate : 1 1 (should be > 1)
Read Type : Forward Reverse

Read 2

Read composition

Read Index : 0 1 2
Read : TCAG ATAGAGTTG……… TCGTATAACTTCGTATAATGTATGCTATACGAAGTT
Base Coordinate : 1 5 4 (should be > 5)
Read Type : Adapter Forward Reverse

Read 3

Read composition

Read Index : 0 1
Read : ATCCGGGTGTGTCATCAG CATCAG……………
Base Coordinate : 2 (should start at 1) 19
Read Type : Adapter Forward

Reads with relative order which cannot be specified

Read

Read composition

Read Index : 0 1 2 3
Read : TCAG ATAGA…………… ………………… CTCAT…………………………………………………………
Base Coordinate : 1 5    
Read Type : Adapter Forward Linker Forward (This forward cannot be specified)

Metadata

Spot (Read Spec)

Read Index Read Class Read Type Ordering Method
0 Technical Read Adapter BaseCoord = 1
1 Application Read Forward BaseCoord = 5
2 Technical Read Linker RelativeOrder
3 Application Read Forward RelativeOrder

Experiment (Pipeline)

Example 1 of Experiment XML Pipeline

<PROCESSING>
  <PIPELINE>
    <PIPE_SECTION section_name="Base Caller">
      <STEP_INDEX>1</STEP_INDEX>
      <PREV_STEP_INDEX>NIL</PREV_STEP_INDEX>
      <PROGRAM>Casava</PROGRAM>
      <VERSION>V1.8.3_V3.2.1</VERSION>
      <NOTES/>
    </PIPE_SECTION>
    <PIPE_SECTION section_name="Quality Scores">
      <STEP_INDEX>2</STEP_INDEX>
      <PREV_STEP_INDEX>1</PREV_STEP_INDEX>
      <PROGRAM>Casava</PROGRAM>
      <VERSION>V1.8.3_V3.2.1</VERSION>
      <NOTES/>
    </PIPE_SECTION>
  </PIPELINE>
</PROCESSING>

Example 2 of Experiment XML Pipeline

<PROCESSING>
  <PIPELINE>
    <PIPE_SECTION>
      <STEP_INDEX>1</STEP_INDEX>
      <PREV_STEP_INDEX/>
      <PROGRAM>bwa</PROGRAM>
      <VERSION>0.5.9-r16</VERSION>
      <NOTES>BWA-MEM algorithm alignment</NOTES>
    </PIPE_SECTION>
    <PIPE_SECTION>
      <STEP_INDEX>2</STEP_INDEX>
      <PREV_STEP_INDEX>1</PREV_STEP_INDEX>
      <PROGRAM>Picard</PROGRAM>
      <VERSION>1.74(1243)</VERSION>
      <NOTES>Duplicate reads marked</NOTES>
    </PIPE_SECTION>
    <PIPE_SECTION>
      <STEP_INDEX>3</STEP_INDEX>
      <PREV_STEP_INDEX>2</PREV_STEP_INDEX>
      <PROGRAM>GATK</PROGRAM>
      <VERSION>1.4-29</VERSION>
      <NOTES>Indel realignment</NOTES>
    </PIPE_SECTION>
    <PIPE_SECTION>
      <STEP_INDEX>4</STEP_INDEX>
      <PREV_STEP_INDEX>3</PREV_STEP_INDEX>
      <PROGRAM>GATK</PROGRAM>
      <VERSION>1.4-29</VERSION>
      <NOTES>Base quality score recalibration</NOTES>
    </PIPE_SECTION>
  </PIPELINE>
</PROCESSING>