BioSample
BioSample Overview
The BioSample database is descriptive information about the biological source materials, or samples, used to generate experimental data in any of primary data archives. BioSample, together with BioProject, serve a function of organizing related data across databases.
Sample granularity
In general, create a BioSample record for a biological source material. Extracted molecules, nucleic acids and metabolites are represented by DRA, GEA and MetaboBank metadata.
- Biological replicates are represented by separate BioSamples with distinct ‘biological_replicate’ attributes. DRA objects organization For example, ‘biological_replicate = 1’ and ‘biological_replicate = 2’.
- Technical replicates are represented by DRA Experiments and GEA SDRF. Use a BioSample for technical replicates.
- RNA and metabolites samples extracted from a plant leaf, create one BioSample and represent extracts by GEA and MetaboBank metadata.
- If a paired-end library from single sample is sequenced, do not create separate sample for forward and reverse reads but register both reads in a DRA Run. DRA objects organization
- If a sample is sequenced by different sequencing instruments, link DRA Experiments with distinct Instrument model to a BioSample.
- Register a separate BioSample for each unique source, e.g., RNA from the wings is a separate BioSample than RNA from legs if those two sources were sequenced independently.
- Genome Assembly Sample requires genome-specific attributes such as locus tag prefix, so it needs to be separated from other RNA and metabolites samples.
Examples:
- 23,000 unique 16S amplicons from a single seawater collection point - 1 BioSample (1 sample was collected and then analyzed to deduce 16S diversity)
- 3 “identical” transgenic mice treated with the same drug as part of an experiment - 3 BioSamples (biological replicates are represented by separate BioSamples)
- To examine gene expression profiles, CHO cells infected with a virus and sampled at 0, 2, 4, and 8 hours post infection - 4 BioSamples (4 time points)
- To analyze differences in gene expression levels, RNA-seq data from a single male anteater taken from the brain, heart, lungs, testes, and liver - 5 BioSamples (5 different tissues isolated)
Sample attributes
A major component of a BioSample record is the sample attributes section. Attributes define the material under investigation and can include sample characteristics such as collection site and phenotypic information.
BioSample attributes are captured as structured name: value pairs, for example, tissue:liver (attributes list)
The database supports and encourages use of dictionaries of attribute names by providing packages with pre-defined attributes.
Organism
For an organism name of the BioSample organism attribute, see the “Organism name” page.
Spatio-temporal annotation
INSDC improves utility of sequence data and sample traceability by making sample location and collection date description mandatory. Please also see INSDC spatio-temporal annotation standards.
- Location of collection: Specification of where the sequenced sample was collected. Provide a meaningful location to interpret the data. At a minimum, describe the names for countries, oceans, or seas. Relevant attributes are, BioSample and DDBJ geo_loc_name.
- Date and time of collection: Date and time when the sequenced sample was collected. Provide a meaningful date and time to interpret the data. Describe at least to the nearest year. Relevant attributes are, BioSample and DDBJ collection_date.
In cases where this information cannot be provided (e.g., pathogen samples for which this information would lead to identifiability of a human) or is not relevant (e.g., study of a model organism lab stock), you can declare an appropriate exemption using the exemption terms defined in the INSDC Missing Value Reporting Standards. FAQs regarding sample collection location and time
Sample package
BioSample promotes richer sample description and standardization of attribute name by providing sample packages designed for each type of sample and sequences. See “Sample attributes” for attributes provided by packages.
The package itself is a mechanism to promote adequate sample description and attributes are more important for sample interpretation. Therefore, if samples are described by appropriate attributes, you do not change the package even though submitted samples use not-recommended package.
Package series.
Standard
Standard packages according to sample types and organisms.
- SARS-CoV-2: clinical or host-associated
SARS-CoV-2 samples that are relevant to public health.
- SARS-CoV-2: wastewater surveillance
SARS-CoV-2 wastewater surveillance samples that are relevant to public health.
- Microbe
Bacteria or other unicellular microbes.
- Model organism or animal
Animals or common laboratory model organisms, e.g., mouse and Drosophila.
- Metagenome or environmental
Metagenomic and environmental samples.
- Invertebrate
Invertebrate sample.
- Human
Human samples. "WARNING": Only for human samples that have no privacy concerns. Make sure to remove any direct personal identifiers from your submission. If you need to protect samples, please submit samples and data to Japanese Genotype-phenotype Archive (JGA) which has controlled access mechanisms.
- Plant
Plant sample or cell line.
- Viral
Virus samples not directly associated with disease. For viral pathogens, use the Pathogen: clinical or host-associated.
- Beta-lactamase
Beta-lactamase gene transformants that have antibiotic resistance data.
- Omics
Gene expression, epigenetics and and metabolomics data samples.
Pathogen
Use for pathogen samples that are relevant to public health.
- Pathogen: clinical or host-associated
Clinical or host-associated pathogen samples.
- Pathogen: environmental/food/other
Environmental/food/other pathogen samples.
MIxS
Used for samples from which genome and metagenome sequences were obtained.
- Cultured Bacterial/Archaeal Genomic Sequences (MIGS.ba)
cultured bacterial or archaeal genomic sequences. Organism must have lineage Bacteria or Archaea.
- Eukaryotic Genomic Sequences (MIGS.eu)
Eukaryotic genomic sequences. Organism must have lineage Eukaryota.
- Viral Genomic Sequences (MIGS.vi)
Virus genomic sequences. Organism must have lineage Viruses.
- Environmental and metagenome sequences
Organism must be a metagenome, where lineage starts withunclassified sequences and scientific name ends with 'metagenome'.
- Metagenome-assembled Genome Sequences (MIMAG)
Metagenome-assembled genome sequences. Use the MIUVIG package for virus genomes.
- Single Amplified Genome Sequences (MISAG)
Single amplified genome sequences produced by isolating individual cells.
- Specimen Marker Sequences (MIMARKS.specimen)
Marker gene sequences, eg, 16S, 18S, 23S, 28S rRNA or COI obtained from specimens.
- Survey-related Marker Sequences (MIMARKS.survey)
Marker gene sequences, eg, 16S, 18S, 23S, 28S rRNA or COI obtained directly from the environment, without culturing or identification of the organisms. Organism must be a metagenome.
- Uncultivated Viral Genome Sequences (MIUVIG)
Uncultivated virus genome identified in metagenome and metatranscriptome datasets. Organism must have lineage Viruses.
MIxS Environmental package
Select an appropriate environmental package for a MIxS environmental/metagenome sample. Predefined attributes to describe sampling environments are added (for example, “altitude” for the “air” environmental package).
For the MIMS.me and MIMARKS.survey packages, “No package” cannot be selected.
- agriculture
- air
- built
- food-animal
- food-farm_env
- food-human_foods
- food-prod_facility
- host-associated
- human-associated
- human-gut
- human-oral
- human-skin
- human-vaginal
- hydrocarbon-cores
- hydrocarbon-fluids_swabs
- microbial
- miscellaneous
- plant-associated
- sediment
- soil
- symbiont-associated
- wastewater
- water
How to select a package
Select a package according to organism and data. When appropriate packages are found in both Standard and MIxS series, please see attribute list and select a better one to describe your sample.
Genome assembly sample
A DDBJ/ENA/GenBank genome sequence should be linked to one BioProject and one BioSample.
Select a package according to species of your sample.
- Isolated, cultured prokaryotes
- Eukaryotes
Register a locus tag prefix necessary for an annotated genome submission by entering a prefix in the BioSample locus_tag_prefix attribute.
Metagenome samples
Different packages need to be used for metagenome assembly samples at different assembly levels. Please see Metagenome assembly.
- Raw reads and primary metagenome.
- Binned metagenome and MAG
Derived sample
For a mixed sample which consist of samples, register a derived sample and list accession numbers (separated by comma or hyphen) of the component samples in derived_from. Examples: SAMD00000001,SAMD00000002,SAMD00000008-SAMD00000100. A derived sample is necessary in the following cases.
INSDC restricts “A genome assembly sequence links to a BioProject and a BioSample”. Therefore, when submitting a genome sequence assembled from reads of samples to DDBJ, you need to represent samples by a derived sample. For example, to submit a genome sequence assembled from reads of male and female samples, register a derived sample citing both BioSample accessions.
Another example is to submit a MAG computationally constructed from environmental samples, register a derived sample for the MAG and list accession numbers of the environmental samples in the derived_from attribute.
Human sample
Submission of research data from human subjects
Submit data derived from human subjects (human data) to the databases of Bioinformation and DDBJ Center in compliant with “Submission of Research Data from Human Subjects”.
WARNING: Only use for human samples or cell lines that have no privacy concerns. For all studies involving human subjects, it is the submitter’s responsibility to ensure that the information supplied protects participant privacy in accordance with all applicable laws, regulations and institutional policies. Make sure to remove any direct personal identifiers from your submission. If there are patient privacy concerns regarding making data fully public, please submit samples and data to Japanese Genotype-phenotype Archive (JGA) database. JGA has controlled access mechanisms and is an appropriate resource for hosting sensitive patient data.
Sample attributes
Describe following attributes for Human (Homo sapiens) sample by using Human package. Please see this page for attribute explanation.
Sample derived from human subjects
Fill in anonymized subject id in isolate.
Cell line
Recommended;
- cell_type
Primary cell
Indicate primary cell in sample_type. sample_type: primary cell
iPS cell
In most cases, iPS cells are used in differentiated state, so information regarding before and after the differentiation are important. In addition to the above, provide attributes indicated below. It is also applied to ES cells used after differentiation. Complex samples such as differentiated a few times, provide description in free-text.
Samples from human subjects
Describe information regarding differentiation in cell_type. For example, cell_type: iPS cell derived megarocyte cell.
Cell line
Describe information regarding differentiation in cell_type. For example, cell_type: iPS cell (cell_line:253G1) derived megarocyte cell. In addition, describe provider information in biomaterial_provider. For example, biomaterial_provider: ATCC.
Antibiogram
An antibiogram table can be included in a BioSample record (Example: SAMN07958491). To submit the table, please contact BioProject/BioSample/DRA update request form.
For the antibiogram submission guidelines, please see the NCBI BioSample pages.