The BioSample databasewas developed to serve as a central location in which to store descriptive information about biological samples used to generate experimental data in any of primary data archives.
Following figure depicts how BioSample records are organized and linked with other objects. This example is composed of one umbrella project that encompasses three subprojects, each of which generated data derived from two BioSample records. Users can query either the BioProject or the BioSample database to retrieve the relevant records, and then navigate through links to the corresponding experimental data which continue to be stored in DDBJ's primary data archives, DDBJ, DRA and DOR.
Given the huge diversity of sample types handled by archival databases, and the fact that appropriate sample descriptions are often dependent on the context of the study, the definition of what a BioSample represents is deliberately flexible. Typical examples of a BioSample include a cell line, a primary tissue biopsy, an individual organism or an environmental isolate.
Biological and technical replicates are represented by separate BioSamples with distinct 'replicate' attribute, e.g., 'biological replicate 1' and 'biological replicate 2'. FAQ: How many samples do I need for my DRA submission?Information about the sample will include:
- The material sampled, e.g., organs, tissues, cell type
- Phenotypic information including disease states and clinical information about the individual
The information about human subjects and access to it will be compliant with all relevant ethical requirements. The DDBJ BioSample database does not support controlled access mechanisms and thus cannot host human clinical samples that may have associated privacy concerns.
A particular set of biosamples submitted to BioSample databases directly may be referenced subsequently from many experiments. We will refer to this set of samples as reference biosamples. Example of these may be some commonly used cell lines or mouse strains. The BioSample pre-registered commonly-used samples and make it easy to reference these from other databases at INSDC. Reference biosamples include ATCC and Coriell.
A major component of a BioSample record is the sample attributes section. Attributes define the material under investigation and can include sample characteristics such as cell type, collection site and phenotypic information like disease state.
BioSample attributes are captured as structured name: value pairs, for example, tissue:liver
The database supports and encourages use of dictionaries of attribute names.
The first targeted dictionaries implemented in the DDBJ BioSample database are the MIxS minimum information checklists for standardizing descriptions of genomes, metagenomes and targeted locus sequences as recently developed by the Genomics Standards Consortium.
For the MIxS check lists, please see Nature Biotechnology 29, 415–420 (2011) | doi: 10.1038/nbt.1823 (PMID:21552244 ).
For an organism name of the BioSample organism attribute, see the "Organism name" page. Previously, a strain name or some other lower taxon was required for the organism name of whole genomic sequence, mainly microorganisms. However, currently, the value of organism qualifier should be just a scientific name, in principle, even though for microbial genomes. Please describe a strain name in the strain attribute of BioSample.
Related news: Changes in organism strain information management
Contact information of submitter(s). Questions and notifications about a submission are contacted to the e-mail address(es) listed here. Personal contact information is considered confidential and is collected to be used by DDBJ staff should questions arise; the general information about the research center is used for public display.
- First name*
- Submitter's first name.
- Last name*
- Submitter's last name.
- E-mail address. Enter an address from the organizations domain.
- Organization to which a contact person belongs.
- Submitting organization*
- Full name of organization.
- Submitting organization URL
- The URL of submitter's organization.
Select "Hold" or "Release". You cannot specify hold date. Please see Data Release for detailed release mechanism.
- Submitted BioSample record will be released immediately after the curation process finishes.
- Submitted BioSample record is released when the DDBJ, DRA and DTA record(s) referencing this BioSample ID is released. Private DDBJ record(s) referencing this BioSample ID is not released.
- External Links
- An URL may be provided, with a label for the resource, to reference a resource that is directly relevant to the submitted sample.
- Link description
- Display name of web site that is related to this sample.
- URL of the web site.
- Genome, metagenome or marker sequences (MIxS compliant)
- Use for genomes, metagenomes, and marker sequences. These samples include specific attributes that have been defined by the Genome Standards Consortium (GSC) to formally describe and standardize sample metadata for genomes, metagenomes, and marker sequences. The samples are validated for compliance based on the presence of the required core attributes as described in MIxS. For details, please see the GSC websites.
- Other samples (e.g. transcriptome, epigenetics etc)
- Use for any sample type (e.g. transcriptome, epigenetics etc). These samples are described using common core attributes and submitter-supplied custom attributes.
- (Meta)Genomic Sequences Sample (MIMS)
Environmental/Metagenome Genomic Sequences
- Genomic Sequences Sample (MIGS)
Cultured Bacterial/Archaeal Genomic Sequences Eukaryotic Genomic Sequences Viral Genomic Sequences
Environmental samples do not include endosymbionts that can be reliably recovered from a particular host, organisms from a readily identifiable but uncultured field sample (e.g., many cyanobacteria), or phytoplasmas that can be reliably recovered from diseased plants (even though these cannot be grown in axenic culture). Select "Cultured Bacterial/Archaeal" or "Eukaryotic" or "Viral".
- Marker Sequences Sample (MIMARKS)
Specimen Marker Sequences Survey related Marker Sequences
MIMARKS specimen: for marker gene (e.g., COI) sequences obtained from any material identifiable by means of specimens
MIMARKS-specimen applies to the contextual data for marker gene sequences from cultured or voucher-identifiable specimens.
MIMARKS survey: for uncultured diversity marker gene (e.g., 16S rRNA, 18S rRNA, nif, amoA, rpo) surveys
MIMARKS-survey is applicable to contextual data for marker gene sequences, obtained directly from the environment, without culturing or identification of the organisms.
- Environmental package (MIxS Sample)
No package air host-associated human-associated human-gut human-oral human-skin human-vaginal microbial mat/biofilm miscellaneous or artificial plant-associated sediment soil wastewater/sludge water
- Sample attributes
- Download BioSample worksheet which has been customised to fit models. This is a tab-delimited text file that may be opened with a spreadsheet program or a text editor. The validator checks the uploaded text file and feedbacks warning and error messages. According to the messages, revise the text file and upload the file again. Submitters can not submit the BioSample unless all errors are resolved.
- List of attributes. Besides the mandatory fields, there are several optional attribute fields. To make the BioSample record most useful, you should include all available information in the submission. Commonly used and useful attributes have been defined, with standardized nomenclature. In preparing your submission, please refer to this attributes list and BioSample examples and fill in the relevant fields. If you have information of a type that does not appear in the standard list, you can create it as a user-defined attribute.
- Review your submission and submit the BioSample by clicking the "Submit" button at bottom. The uploaded sample attribute file can be downloaded at "Submission ID.txt".
Submission to BioSample
- Submission of research data from human subjects
- For submitting data from human subjects (human data) to the databases of DDBJ center, it is submitter's responsibility to ensure that the dignity and right of human subject are protected in accordance with all applicable laws, ordinances, guidelines and policies of submitter's institution. In principle, make sure to remove any direct personal identifiers of human subjects from your data to be submitted. Before submitting human data, read the "Submission of research data from human subjects".
Submission to BioSample
Create a new sample submission
Obtain a submission account according to the Account Handbook.
Move to the Biosample submission page from the “Biosample” menu at the top. Create a new sample submission by clicking the [New submission] button.
To submit a BioSample, enter content from left to right tabs.
For BioSample metadata, please see the BioSample metadata.
Select a sample type in the "SAMPLE TYPE". For genome samples, minimum sample attributes are defined by MIxS.
For the Sample type, please see the BioSample Handbook.
Enter sample attributes
List and explanation of BioSample attributes. User-defined attributes can be added to the rightmost column.
Download a template text file according to the selected sample type to enter sample attributes.
A main sample submission step is to describe samples by required, optional and user-defined attributes.
A text file is separated by tab and can be opened and editted in spreadsheet editor (e.g. Excel®). Attribute names are in a header line. Attributes with "*" are required.
From second lines, enter one sample per line. Enter PSUB submission id in bioproject_id for project without PRJD accession numbers.
In one submission, samples can be submitted as 1 sample - 1 line in sample attributes tab-delimited text file.
Missing value reporting
The International Nucleotide Database Collaboration (INSDC)have developed a standardised missing/null value reporting language to be used where a value of an expected format for sample metadata reporting can not be provided. Submitters are strongly encouraged to always provide true values of expected formats. However, if missing/null value reporting is required submitters are asked to use a term with the finest granularity for their reported situation. If appropriate, use a term in the "lower level", if not, use a term in the "top level".
To facilitate an understanding of the supported terms we enclose a table with the missing/null value terms and their definitions.
Please use the following standardised missing value vocabulary only if a true value of an expected format for a mandatory field is missing. If a true value is missing for a recommended or an optional field then these fields should not be used for reporting at all.
INSDC missing value reporting terms
|not applicable||information is inappropriate to report, can indicate that the standard itself fails to model or represent the information appropriately|
|missing||not collected||information of an expected format was not given because it has not been collected|
|not provided||information of an expected format was not given, a value may be given at the later stage|
|restricted access||information exists but can not be released openly because of privacy concerns|
Sample attributes validation
Upload the sample attribute file by selecting the file and click the Continue button. The validator checks the uploaded file accoring to the rules and feedbacks the error and warning messages. Submitters can not submit the BioSample unless all errors are resolved.
For validation rules and messages, please see Validation rules page.
Check content in the last "OVERVIEW" and submit samples. In the "ATTRIBUTES" area, the submitted sample attribute file can be downloaded.
A temporary ID starts with SSUB is automatically assigned to a submitted BioSample. Until an official accession number will be issued, the submitted sample is referenced by this ID. After reviewing process, the DDBJ BioSample issues accession numbers with prefix SAMD to the completed data. You can view status and accession numbers of submitted samples in your submission account.
- Do NOT cite a temporary ID starts with SSUB in references.
- Do not double submit the samples which have been registered to EBI and NCBI.
Release of BioSample
You can select the following options:
- Release immediately following curation
- Release when referenced data is published
Hold date cannot be set for BioSample.
The submitted sample data can be kept private. Sample data are automatically released when the linked DDBJ record(s) is published. The release of the BioSample record does not trigger the release of private DDBJ sequence record(s) referencing this BioSample accession.However, the release of the BioSample record does trigger the release of referencing BioProject.
It is possible to update data after registration. Please contact us from Message form.