DDBJ Annotated/Assembled Sequences
Pseudohaplotype
Historically, whole-genome sequencing generated a single consensus sequence without distinguishing between alleles on homologous chromosomes. Long-read sequencing technologies can identify haploid chromosomes (called “pseudohaplotype” in INSDC). Because two genome sequences are produced from single sample in pseudohaplotype sequencing, INSDC establishes the guideline for pseudohaplotype sequence submission.
pseudohaplotype
This page describes a typical case of the pseudohaplotype sequence submission. To distinguish pseudohaplotype assemblies, name one of the assemblies as “Principal” and another as “Alternate”. There is no absolute criteria, however, please name them based on their sequence length or sequencing accuracy. Because each pseudohaplotype assembly is derived from the same sample, both assemblies share the same BioSample. Because INSDC manages a genome assembly by unique combination of BioProject and BioSample, create separate BioProject for each principal and alternate pseudohaplotype to make the combination unique. Create an umbrella BioProject to group these projects.
If the raw DRA sequencing data contain both pseudohaplotypes, create a BioProject for DRA apart from those for assemblies. If the DRA data are derived from the same sample for the assemblies, use the same BioSample.
BioProject
Create separate BioProject for each principal and alternate pseudohaplotype and an umbrella BioProject to group these projects.
- BioProject 1: Principal
- Add phasing information in the title. For example, Principal pseudohaplotype or Primary haplotype.
- BioProject 2: Alternate
- Add phasing information in the title. For example, Alternate pseudohaplotype or Alternate haplotype.
- Umbrella BioProject
- For grouping BioProject 1, 2 and the other related BioProjects (BioProject 3 for DRA in the figure).
BioSample
Because the sample is shared by pseudohaplotypes, create single BioSample.
- Select MIGS package.
- Create a common BioSample for principal and alternate pseudohaplotype.
- If you add gene annotations to the pseudohaplotype sequences, enter a locus tag prefix you want to use for the principal and the alternate pseudohaplotype in the locus_tag_prefix attribute. The locus tag prefix is shared by the principal and the alternate pseudohaplotypes, loci can be distinguished by tags, for example, A1C_p00001 (principal) and A1C_a00001 (alternate).
DDBJ
Submit the principal and the alternate pseudohaplotype sequences.
- Principal pseudohaplotype
- Reference the BioProject 1 (Principal) in DBLINK.
- Add the pre-defined comment in
ST_COMMENT.
Genome-Assembly-Data ST_COMMENT: Diploid :: Principal Pseudohaplotype
- Alternate pseudohaplotype
- Reference the BioProject 2 (Alternate) in DBLINK.
- Add the pre-defined comment in
ST_COMMENT.
Genome-Assembly-Data ST_COMMENT: Diploid :: Alternate Pseudohaplotype
Real-world examples
Common
- BioProject: PRJDB10054 (Umbrella)
- BioSample: SAMD00229903
Principal pseudohaplotype
- BioProject: PRJDB10055
- DDBJ: BLYA01000001-BLYA01003780
Alternate pseudohaplotype
- BioProject: PRJDB10056
- DDBJ: BLYB01000001-BLYB01003780
DRA
- BioProject: PRJDB9979
- DRA: DRX222432-DRX222163, DRR231909-DRR231923