BioSample Overview

The BioSample database is descriptive information about the biological source materials, or samples, used to generate experimental data in any of primary data archives. BioSample, together with BioProject, serve a function of organizing related data across databases.

Sample granularity

In general, create a BioSample record for a biological source material. Extracted molecules, nucleic acids and metabolites are represented by DRA, GEA and MetaboBank metadata.

Biological replicates are represented by separate BioSamples with distinct ‘biological_replicate’ attributes. DRA objects organization For example, ‘biological_replicate = 1’ and ‘biological_replicate = 2’.
Technical replicates are represented by DRA Experiments and GEA SDRF. Use a BioSample for technical replicates.
RNA and metabolites samples extracted from a plant leaf, create one BioSample and represent extracts by GEA and MetaboBank metadata.
If a paired-end library from single sample is sequenced, do not create separate sample for forward and reverse reads but register both reads in a DRA Run. DRA objects organization
If a sample is sequenced by different sequencing instruments, link DRA Experiments with distinct Instrument model to a BioSample.
Register a separate BioSample for each unique source, e.g., RNA from the wings is a separate BioSample than RNA from legs if those two sources were sequenced independently.
Genome Assembly Sample requires genome-specific attributes such as locus tag prefix, so it needs to be separated from other RNA and metabolites samples.

Examples:

23,000 unique 16S amplicons from a single seawater collection point - 1 BioSample (1 sample was collected and then analyzed to deduce 16S diversity)
3 “identical” transgenic mice treated with the same drug as part of an experiment - 3 BioSamples (biological replicates are represented by separate BioSamples)
To examine gene expression profiles, CHO cells infected with a virus and sampled at 0, 2, 4, and 8 hours post infection - 4 BioSamples (4 time points)
To analyze differences in gene expression levels, RNA-seq data from a single male anteater taken from the brain, heart, lungs, testes, and liver - 5 BioSamples (5 different tissues isolated)

Sample attributes

A major component of a BioSample record is the sample attributes section. Attributes define the material under investigation and can include sample characteristics such as collection site and phenotypic information. BioSample attributes are captured as structured name: value pairs, for example, tissue:liver (attributes list)
The database supports and encourages use of dictionaries of attribute names by providing packages with pre-defined attributes.

Organism

For an organism name of the BioSample organism attribute, see the “Organism name” page.

Spatio-temporal annotation

INSDC improves utility of sequence data and sample traceability by making sample location and collection date description mandatory. Please also see INSDC spatio-temporal annotation standards.

Location of collection: Specification of where the sequenced sample was collected. Provide a meaningful location to interpret the data. At a minimum, describe the names for countries, oceans, or seas. Relevant attributes are, BioSample and DDBJ geo_loc_name.
Date and time of collection: Date and time when the sequenced sample was collected. Provide a meaningful date and time to interpret the data. Describe at least to the nearest year. Relevant attributes are, BioSample and DDBJ collection_date.

In cases where this information cannot be provided (e.g., pathogen samples for which this information would lead to identifiability of a human) or is not relevant (e.g., study of a model organism lab stock), you can declare an appropriate exemption using the exemption terms defined in the INSDC Missing Value Reporting Standards. FAQs regarding sample collection location and time

Sample package

BioSample promotes richer sample description and standardization of attribute name by providing sample packages designed for each type of sample and sequences. See “Sample attributes” for attributes provided by packages.
The package itself is a mechanism to promote adequate sample description and attributes are more important for sample interpretation. Therefore, if samples are described by appropriate attributes, you do not change the package even though submitted samples use not-recommended package.

Package series.

Standard
Pathogen
MIxS
- MIxS Environmental package

Standard

Standard packages according to sample types and organisms.

SARS-CoV-2: clinical or host-associated
SARS-CoV-2 samples that are relevant to public health.
SARS-CoV-2: wastewater surveillance
SARS-CoV-2 wastewater surveillance samples that are relevant to public health.
Microbe
Bacteria or other unicellular microbes.
Model organism or animal
Animals or common laboratory model organisms, e.g., mouse and Drosophila.
Metagenome or environmental
Metagenomic and environmental samples.
Invertebrate
Invertebrate sample.
Human
Human samples. "WARNING": Only for human samples that have no privacy concerns. Make sure to remove any direct personal identifiers from your submission. If you need to protect samples, please submit samples and data to Japanese Genotype-phenotype Archive (JGA) which has controlled access mechanisms.
Plant
Plant sample or cell line.
Viral
Virus samples not directly associated with disease. For viral pathogens, use the Pathogen: clinical or host-associated.
Beta-lactamase
Beta-lactamase gene transformants that have antibiotic resistance data.
Omics
Gene expression, epigenetics and and metabolomics data samples.

Pathogen

Use for pathogen samples that are relevant to public health.

Pathogen: clinical or host-associated
Clinical or host-associated pathogen samples.
Pathogen: environmental/food/other
Environmental/food/other pathogen samples.

MIxS

Used for samples from which genome and metagenome sequences were obtained.

Cultured Bacterial/Archaeal Genomic Sequences (MIGS.ba)
cultured bacterial or archaeal genomic sequences. Organism must have lineage Bacteria or Archaea.
Eukaryotic Genomic Sequences (MIGS.eu)
Eukaryotic genomic sequences. Organism must have lineage Eukaryota.
Viral Genomic Sequences (MIGS.vi)
Virus genomic sequences. Organism must have lineage Viruses.
Environmental and metagenome sequences
Organism must be a metagenome, where lineage starts withunclassified sequences and scientific name ends with 'metagenome'.
Metagenome-assembled Genome Sequences (MIMAG)
Metagenome-assembled genome sequences. Use the MIUVIG package for virus genomes.
Single Amplified Genome Sequences (MISAG)
Single amplified genome sequences produced by isolating individual cells.
Specimen Marker Sequences (MIMARKS.specimen)
Marker gene sequences, eg, 16S, 18S, 23S, 28S rRNA or COI obtained from specimens.
Survey-related Marker Sequences (MIMARKS.survey)
Marker gene sequences, eg, 16S, 18S, 23S, 28S rRNA or COI obtained directly from the environment, without culturing or identification of the organisms. Organism must be a metagenome.
Uncultivated Viral Genome Sequences (MIUVIG)
Uncultivated virus genome identified in metagenome and metatranscriptome datasets. Organism must have lineage Viruses.

MIxS Environmental package

Select an appropriate environmental package for a MIxS environmental/metagenome sample. Predefined attributes to describe sampling environments are added (for example, “altitude” for the “air” environmental package).
For the MIMS.me and MIMARKS.survey packages, “No package” cannot be selected.

agriculture
air
built
food-animal
food-farm_env
food-human_foods
food-prod_facility
host-associated
human-associated
human-gut
human-oral
human-skin
human-vaginal
hydrocarbon-cores
hydrocarbon-fluids_swabs
microbial
miscellaneous
plant-associated
sediment
soil
symbiont-associated
wastewater
water

How to select a package

Select a package according to organism and data. When appropriate packages are found in both Standard and MIxS series, please see attribute list and select a better one to describe your sample.

Genome assembly sample

A DDBJ/ENA/GenBank genome sequence should be linked to one BioProject and one BioSample.
Select a package according to species of your sample.

Isolated, cultured prokaryotes
Microbe or MIGS.ba
Eukaryotes
Model organism or animal/Invertebrate/Plant or MIGS.eu

Register a locus tag prefix necessary for an annotated genome submission by entering a prefix in the BioSample locus_tag_prefix attribute.

Metagenome samples

Different packages need to be used for metagenome assembly samples at different assembly levels. Please see Metagenome assembly.

Raw reads and primary metagenome.
MIGS.me or Metagenome or environmental
Binned metagenome and MAG
MIMAG. Use MIUVIG for virus metagenomic assemblies.

Derived sample

For a mixed sample which consist of samples, register a derived sample and list accession numbers (separated by comma or hyphen) of the component samples in derived_from. Examples: SAMD00000001,SAMD00000002,SAMD00000008-SAMD00000100. A derived sample is necessary in the following cases.
INSDC restricts “A genome assembly sequence links to a BioProject and a BioSample”. Therefore, when submitting a genome sequence assembled from reads of samples to DDBJ, you need to represent samples by a derived sample. For example, to submit a genome sequence assembled from reads of male and female samples, register a derived sample citing both BioSample accessions.
Another example is to submit a MAG computationally constructed from environmental samples, register a derived sample for the MAG and list accession numbers of the environmental samples in the derived_from attribute.

Human sample

Submission of research data from human subjects

Submit data derived from human subjects (human data) to the databases of Bioinformation and DDBJ Center in compliant with “Submission of Research Data from Human Subjects”.

WARNING: Only use for human samples or cell lines that have no privacy concerns. For all studies involving human subjects, it is the submitter’s responsibility to ensure that the information supplied protects participant privacy in accordance with all applicable laws, regulations and institutional policies. Make sure to remove any direct personal identifiers from your submission. If there are patient privacy concerns regarding making data fully public, please submit samples and data to Japanese Genotype-phenotype Archive (JGA) database. JGA has controlled access mechanisms and is an appropriate resource for hosting sensitive patient data.

Sample attributes

Describe following attributes for Human (Homo sapiens) sample by using Human package. Please see this page for attribute explanation.

Sample derived from human subjects

Fill in anonymized subject id in isolate.

Cell line

Recommended;

cell_type

Primary cell

Indicate primary cell in sample_type. sample_type: primary cell

iPS cell

In most cases, iPS cells are used in differentiated state, so information regarding before and after the differentiation are important. In addition to the above, provide attributes indicated below. It is also applied to ES cells used after differentiation. Complex samples such as differentiated a few times, provide description in free-text.

Samples from human subjects

Describe information regarding differentiation in cell_type. For example, cell_type： iPS cell derived megarocyte cell.

Cell line

Describe information regarding differentiation in cell_type. For example, cell_type： iPS cell (cell_line:253G1) derived megarocyte cell. In addition, describe provider information in biomaterial_provider. For example, biomaterial_provider： ATCC.

XML schema

BioSample XML schema