You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: introduces support for specific reference genomes
Previously, the tool simply had a hardcoded set of PRIMARY_CHROMOSOMES that
were hardcoded to the hg38 primary chromosomes. Now, the tool has a supported
set of reference genomes, namely (to start):
* GRCh38NoAlt (from the NCBI)
* hs37d5 (from the 1000 Genomes Project)
These two genomes were selected simply because (a) GRCh38NoAlt is probably the
most popular GRCh38 genome and (b) hs37d5 is the genome used for phase 2 and
phase 3 of the 1000 Genomes project: a fairly popular publicly available
resource and the subject of many QC papers.
Introducing a reference genome into the code required multiple QC facets to be
updated to use this functionality. For each of these, I chose to simply pass
the reference genome to the initialization function for the facet: it's up to
the facet to take what it needs from the reference genome and store it for
later use (as opposed to adding a lifecycle hook injecting it).
Other notable, related changes:
* I include now a check at the beginning of the `qc` command to ensure that the
sequences in the header of the file match the reference genome the user
specified on the commmand line. In the future, I also plan to add checks that
the actual FASTA file matches the specified reference genome (if provided)
_and_ that the GFF file matches the specified reference genome (if provided).
There were some other changes that are introduced in this changeset that, at
first, don't appear directly related:
* We've now moved away from using `async`/`await` for the `qc` subcommand, as
there is an obscure bug that doesn't allow two generic lifetimes and one
static lifetime with an `async` function. Thus, I decided to just move away
from using `async`/`await` altogether, as I had been considering that
regardless (we already moved away from using the lazy evaluation facilities
in noodles). See issues rust-lang/rust#63033 and
rust-lang/rust#99190 for more details.
* In testing this code, I was running into an error where a record fell outside
of the valid range of a sequence. This was annoying, so I just decided to fix
it as part of this changeset. There is no other deep reason why those changes
are included here.
0 commit comments