- Checking if an array design is already registered
- Creating an array design format (ADF) file
The aim of the ADF component is to describe a microarray design in a spreadsheet or, for complex cases, a set of spreadsheets. Conceptually, microarray designs are devised to measure presence and/or abundance of genomic sequence entities in biological samples. Genomic sequences of interest are represented by one or more synthetic sequences which are in turn arranged in one or more physical locations in the two-dimensional space of a microarray surface. Therefore, to fully describe a microarray layout, information about genomic sequences, synthetic sequences, physical position on array and relationships (mappings) between those must be captured. The same array design can be used in many different hybridizations across many different experiments.
This page is about submitting microarray designs (array layout and annotation) to DOR.
Use the diagram below to decide if you need to submit an array design to us.
If the array design you used has already been described in ArrayExpress/DOR then you do not need to submit it. Many commercial and academic array designs from organizations such as Affymetrix, Agilent, Illumina, Nimblegen and Sanger are already loaded into ArrayExpress/DOR.
To search for registered array designs, use ArrayExpress platform designs query interface.
If your array design is already in ArrayExpress/DOR then you can use it in your MAGE-TAB experiment submission by entering the accession number of the array design in the 'Array Design REF' column in the SDRF section of your spreadsheet.
If you used a commercial catalogue array design and you cannot find it in ArrayExpress/DOR, please contact to the DOR team and tell us the exact name of the array design you used and the manufacturer.
An array design format (ADF) file is simply a table with standardized column names describing what was printed/synthesized at each position on a microarray. The ADF file can be created in any spreadsheet application but must be saved as a tab delimited text file.
Each ADF file may start with an optional header section providing top-level information about the array design. As described for the IDF, optional "Comment" rows may be used to provide extra information needed. Additional rows providing Term Source information are included in the ADF header to allow the full encoding of array design information in the absence of any investigation-level detail. These Term Source rows are treated in the same way as for the IDF, and are used to indicate the source databases or files used for sequence database accessions and ontology terms. As many Term Sources may be used as needed, listed horizontally in columns as for the IDF. See Table for a list of ADF header row types. All tags are optional, and a tag can have at most one value. The tags (rows) can appear in any order, except that associated attributes must immediately follow the object they are associated with.
|Array Design Name||Text|
|Technology Type||Ontology term|
|Technology Type Term Source REF||Term Source Name|
|Technology Type Term Accession Number||Term Accession Number|
|Surface Type||Ontology term|
|Surface Type Term Source REF||Term Source Name|
|Surface Type Term Accession Number||Term Accession Number|
|Substrate Type||Ontology term|
|Substrate Type Term Source REF||Term Source Name|
|Substrate Type Term Accession Number||Term Accession Number|
|Sequence Polymer Type||Ontology term|
|Sequence Polymer Type Term Source REF||Term Source Name|
|Sequence Polymer Type Term Accession Number||Term Accession Number|
|Term Source Name||Text tag as used in main ADF table|
|Term Source File||URI|
|Term Source Version||Text|
Each spot on the array is called a feature. The position of each feature is described by 4 coordinates: Block Column, Block Row, Column, Row. These 4 columns are mandatory in the ADF and each line in your ADF will correspond to one feature. Features cannot be duplicated on an array as each spot can occur only once, but reporters can be printed at several different locations. All the features that appear in your raw data files must be included in the ADF even if there is nothing spotted there.
|Block Column||Block Row||Column||Row|
Synthetic sequences, used as proxies for genomic entities, can be deposited in one or more spot locations and array designs. These elements correspond to Reporter, and it is a MIAME requirement to publish the actual sequences physically present on the array. Therefore, a Reporter is uniquely defined by its ID and its sequence. Additional information is also required by the model, such as the role (experimental or control), and, where appropriate, the kind of control it represents.
The Reporter Name entered should be the same as the one you use for the reporter in final gene expression matrices and other normalized data files. We use the reporter name values in the array design files and data files to link array annotation to measurement values in data files.
|Reporter Name||Reporter Sequence||Reporter Group[role]||Control Type|
|Reporter Name||Reporter Database Entry[flybase]||Reporter Group[role]||Control Type|
We need database entries or actual sequence to describe the sequences on your array. We need to know which database these accession numbers are from and we ask you to supply a database code inside the [square brackets] in the header row. You can find a complete list of allowed databases here (use the values in the 'Name' column). A short list of common ones is below.
Describe what type of controls were used in the "Control Type" column. If the spot is not a control then do not fill in anything in this column. The allowed values for this column are:
- control_biosequence - for example a spike
- control_buffer - buffer spotted on the array
- control_empty - nothing spotted on the array
- control_genomic_DNA - e.g. salmon sperm DNA
- control_label - landing lights
- control_reporter_size - size standard
- control_spike_calibration - spike at varying concentrations
This section addresses the description of the biological sequence of interest which is interrogated by the synthetic probe (Reporter) sequences. For simple microarray designs, spot location, spot sequence and genomic sequences are directly associated in a one-to-one relationship. Interpretation is straightforward: one location, one probe, one gene or biological entity. For these cases, all layers can be combined in a single spreadsheet, and the ADF can be considered completely and unequivocally represented. In more elaborate microarray designs, hybridization signals observed from series of spot sequences can be combined to provide measure estimates about surveyed genomic sequences. The format proposed here is designed to encode simple cases where there is a one-to-one or many-to-one mapping from Reporters (probe sequences) to Composite Elements (biologically relevant sequences).
Each feature, reporter or composite element can be annotated with "Comment[<category>]" columns which allow users to provide information that is additional to the usual Database Entry annotations.
|Reporter Name||Composite Element Name||Composite Element Database Entry[tair]||Comment[Chromosome]|
Case 1: Absence of technical replicates, direct association between representative sequences and genomic sequences:
Case 2: Technical replicates, and direct association between representative sequences and genomic sequences. Description of Composite Element is not required, and the relevant Composite Element columns may be omitted from the ADF:
Case 3: Absence of technical replicates, and any genomic sequence being represented by more than one representative sequence. This use-case requires extra columns to describe the Composite Elements, and is only supported for cases where many Reporters map to one Composite Element:
There are two ADF sections:
- An optional header section, with top-level information.
- The main ADF table itself; supported column headings are given in Table. This table should be preceded by a "[main]" header (section delimiter) which is case-insensitive.
|(Feature)||Block Column, Block Row, Column, Row, Comment|
|Reporter Name||Reporter Database Entry, Reporter Sequence, Reporter Group, Control Type, Comment|
|Reporter Database Entry|
|Reporter Group||Reporter Group Term Source REF|
|Reporter Group Term Source REF||Reporter Group Term Accession Number|
|Reporter Group Term Accession Number|
|Control Type||Control Type Term Source REF|
|Control Type Term Source REF||Control Type Term Accession Number|
|Control Type Term Accession Number|
|Composite Element Name||Composite Element Database Entry, Comment|
|Composite Element Database Entry||Comment|
The "Reporter Group" ADF heading may be used to describe a variety of different group types; typical examples would be "role" (with values "experimental" and "control") or "species" for multi-species arrays. Enter the group types (e.g., "role" and "species") in the square bracket of the column in free-text.
A tool is provided by the ArrayExpress which will check your ADF for common formatting errors. The tool will report any problems in the ADF - please try to fix as many as you can before submitting the ADF as this will speed up the processing of your array design submission:
ADF format checking tool (ArrayExpress)
- File is in tab-delimited text format (not Excel)
- Feature coordinates are in Block Column, Block Row, Column, Row format
- The following columns are included:
- Reporter Name
- Reporter Sequence and/or Reporter Database Entry
- Reporter Group[role]
- Control Type
- If it is an oligo array the column Reporter Sequence is included
- One or more Reporter Database Entry columns are included (if sequence not included)