Pseudohaplotype

Historically, whole-genome sequencing generated a single consensus sequence without distinguishing between alleles on homologous chromosomes. Long-read sequencing technologies can identify haploid chromosomes (called “pseudohaplotype” in INSDC). Because two genome sequences are produced from single sample in pseudohaplotype sequencing, INSDC establishes the guideline for pseudohaplotype sequence submission.

pseudohaplotype

This page describes a typical case of the pseudohaplotype sequence submission. To distinguish pseudohaplotype assemblies, name one of the assemblies as “Principal” and another as “Alternate”. There is no absolute criteria, however, please name them based on their sequence length or sequencing accuracy. Because each pseudohaplotype assembly is derived from the same sample, both assemblies share the same BioSample. Because INSDC manages a genome assembly by unique combination of BioProject and BioSample, create separate BioProject for each principal and alternate pseudohaplotype to make the combination unique. Create an umbrella BioProject to group these projects.

If the raw DRA sequencing data contain both pseudohaplotypes, create a BioProject for DRA apart from those for assemblies. If the DRA data are derived from the same sample for the assemblies, use the same BioSample.

Pseudohaplotype data submission
Pseudohaplotype data submission

BioProject

Create separate BioProject for each principal and alternate pseudohaplotype and an umbrella BioProject to group these projects.

  • BioProject 1: Principal
    • Add phasing information in the title. For example, Principal pseudohaplotype or Primary haplotype.
  • BioProject 2: Alternate
    • Add phasing information in the title. For example, Alternate pseudohaplotype or Alternate haplotype.
  • Umbrella BioProject
    • For grouping BioProject 1, 2 and the other related BioProjects (BioProject 3 for DRA in the figure).

BioSample

Because the sample is shared by pseudohaplotypes, create single BioSample.

  • Select MIGS package.
  • Create a common BioSample for principal and alternate pseudohaplotype.
  • If you add gene annotations to the pseudohaplotype sequences, enter a locus tag prefix you want to use for the principal and the alternate pseudohaplotype in the locus_tag_prefix attribute. The locus tag prefix is shared by the principal and the alternate pseudohaplotypes, loci can be distinguished by tags, for example, A1C_p00001 (principal) and A1C_a00001 (alternate).

DDBJ

Submit the principal and the alternate pseudohaplotype sequences.

  • Principal pseudohaplotype
    • Reference the BioProject 1 (Principal) in DBLINK.
    • Add the pre-defined comment in ST_COMMENT.
      Genome-Assembly-Data ST_COMMENT: Diploid :: Principal Pseudohaplotype
  • Alternate pseudohaplotype
    • Reference the BioProject 2 (Alternate) in DBLINK.
    • Add the pre-defined comment in ST_COMMENT.
      Genome-Assembly-Data ST_COMMENT: Diploid :: Alternate Pseudohaplotype

Real-world examples

Common

Principal pseudohaplotype

Alternate pseudohaplotype

DRA