CladeTime¶
- class cladetime.CladeTime(sequence_as_of=None, tree_as_of=None)[source]¶
Interface for Nextstrain SARS-CoV-2 genome sequences and clades.
The CladeTime class is instantiated with two optional arguments that specify the point in time at which to access genome sequences/metadata as well as the reference tree used for clade assignment. CladeTime interacts with GenBank-based data provided by the Nextstrain project.
Important
Historical data availability is constrained by Nextstrain’s infrastructure:
sequence_as_of: Must be >= 2025-09-29 (Nextstrain S3 90 day retention)
tree_as_of: Must be >= 2024-10-09 (variant-nowcast-hub archive availability)
These constraints reflect Nextstrain’s October 2025 implementation of a 90 day retention policy for S3 versioned objects. Dates outside these windows will raise CladeTimeDataUnavailableError. See GitHub issue #185 for details and potential workarounds.
Note: These limitations may change as Nextstrain’s infrastructure evolves.
- Parameters:
sequence_as_of (datetime.datetime | str | None) – Sets the versions of Nextstrain SARS-CoV-2 genome sequence and sequence metadata files that will be used by CladeTime properties and methods. Can be a datetime object or a string in YYYY-MM-DD format, both of which will be treated as UTC. The default value is the current UTC time. Dates passed as YYYY-MM-DD strings will be set to 11:59:59 PM UTC. Must be >= 2025-09-29.
tree_as_of (datetime.datetime | str | None) – Sets the version of the Nextstrain reference tree that will be used by CladeTime. Can be a datetime object or a string in YYYY-MM-DD format, both of which will be treated as UTC. The default value is
sequence_as_of. Dates passed as YYYY-MM-DD strings will be set to 11:59:59 PM UTC. Must be >= 2024-10-09.
- url_ncov_metadata¶
S3 URL to metadata from the Nextstrain pipeline run that generated the sequence clade assignments in
url_sequence_metadata- Type:
- url_sequence¶
S3 URL to the Nextstrain Sars-CoV-2 sequence file (zst-compressed .fasta) that was current at the date specified in
sequence_as_of- Type:
- url_sequence_metadata¶
S3 URL to the Nextstrain Sars-CoV-2 sequence metadata file (zst-compressed tsv) that was current at the date specified in
sequence_as_of- Type:
- property sequence_as_of: datetime¶
The date and time (UTC) used to retrieve NextStrain sequences and sequence metadata.
url_sequenceandurl_sequence_metadatalink to Nextstrain files that were current as of this date.- Type:
- property tree_as_of: datetime¶
The date and time (UTC) used to retrieve the NextStrain reference tree.
- Type:
- property ncov_metadata: dict¶
Metadata for the reference tree that was used for SARS-CoV-2 clade assignments as of
tree_as_of. This property will be empty for dates before 2024-08-01, when Nextstrain began publishing ncov pipeline metadata.- Type:
- property sequence_metadata: LazyFrame¶
A Polars LazyFrame that references
url_sequence_metadata- Type:
- assign_clades(sequence_metadata: LazyFrame, output_file: Path | str | None = None) Clade[source]¶
Assign clades to a specified set of sequences.
For each sequence in a sequence file (.fasta), assign a Nextstrain clade using the Nextclade reference tree that corresponds to the tree_as_of date. The earliest available tree_as_of date is 2024-08-01, when Nextstrain began publishing the pipeline metadata that Cladetime uses to retrieve past reference trees.
- Parameters:
sequence_metadata (polars.LazyFrame) – A Polars LazyFrame of the Nexstrain sequence metadata to use for clade assignment.
output_file (str | None) – The full path (including a .tsv filename) to where the clade assignment output file will be saved. The default value is <home_dir>/cladetime/clade_assignments.tsv.
- Returns:
A Clade object that contains detailed and summarized information about clades assigned to the sequences in sequence_metadata.
- Return type:
- Raises:
CladeTimeSequenceWarning – If sequence_metadata is empty, the clade assignment process will be stopped.
Example
>>> import polars as pl >>> >>> from cladetime import CladeTime, sequence >>> ct = CladeTime(sequence_as_of="2024-11-15", tree_as_of="2024-09-01") >>> >>> filtered_metadata = sequence.filter_metadata( >>> ct.sequence_metadata, >>> collection_min_date = "2024-10-01", >>> ) >>> clade_assignments = ct.assign_clades(filtered_metadata) >>> >>> clade_assignment_summary = clade_assignments.summary >>> clade_assignment_summary.select( >>> ["location", "date", "clade_nextstrain", "count"]) >>> .sort("count", descending=True) >>> .collect(stream=True).head() ┌──────────┬────────────┬──────────────────┬───────┐ │ location ┆ date ┆ clade_nextstrain ┆ count │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ date ┆ str ┆ u32 │ ╞══════════╪════════════╪══════════════════╪═══════╡ │ NY ┆ 2024-10-01 ┆ 24C ┆ 15 │ │ NY ┆ 2024-10-15 ┆ 24C ┆ 15 │ │ NY ┆ 2024-10-03 ┆ 24C ┆ 14 │ │ NY ┆ 2024-10-14 ┆ 24C ┆ 14 │ │ NJ ┆ 2024-10-16 ┆ 24C ┆ 12 │ └──────────┴────────────┴──────────────────┴───────┘