CladeTime

class cladetime.CladeTime(sequence_as_of=None, tree_as_of=None)[source]

Interface for Nextstrain SARS-CoV-2 genome sequences and clades.

The CladeTime class is instantiated with two optional arguments that specify the point in time at which to access genome sequences/metadata as well as the reference tree used for clade assignment. CladeTime interacts with GenBank-based data provided by the Nextstrain project.

Important

Historical data availability is constrained by Nextstrain’s infrastructure:

  • sequence_as_of: Must be >= 2025-09-29 (Nextstrain S3 90 day retention)

  • tree_as_of: Must be >= 2024-10-09 (variant-nowcast-hub archive availability)

These constraints reflect Nextstrain’s October 2025 implementation of a 90 day retention policy for S3 versioned objects. Dates outside these windows will raise CladeTimeDataUnavailableError. See GitHub issue #185 for details and potential workarounds.

Note: These limitations may change as Nextstrain’s infrastructure evolves.

Parameters:
  • sequence_as_of (datetime.datetime | str | None) – Sets the versions of Nextstrain SARS-CoV-2 genome sequence and sequence metadata files that will be used by CladeTime properties and methods. Can be a datetime object or a string in YYYY-MM-DD format, both of which will be treated as UTC. The default value is the current UTC time. Dates passed as YYYY-MM-DD strings will be set to 11:59:59 PM UTC. Must be >= 2025-09-29.

  • tree_as_of (datetime.datetime | str | None) – Sets the version of the Nextstrain reference tree that will be used by CladeTime. Can be a datetime object or a string in YYYY-MM-DD format, both of which will be treated as UTC. The default value is sequence_as_of. Dates passed as YYYY-MM-DD strings will be set to 11:59:59 PM UTC. Must be >= 2024-10-09.

url_ncov_metadata

S3 URL to metadata from the Nextstrain pipeline run that generated the sequence clade assignments in url_sequence_metadata

Type:

str

url_sequence

S3 URL to the Nextstrain Sars-CoV-2 sequence file (zst-compressed .fasta) that was current at the date specified in sequence_as_of

Type:

str

url_sequence_metadata

S3 URL to the Nextstrain Sars-CoV-2 sequence metadata file (zst-compressed tsv) that was current at the date specified in sequence_as_of

Type:

str

property sequence_as_of: datetime

The date and time (UTC) used to retrieve NextStrain sequences and sequence metadata. url_sequence and url_sequence_metadata link to Nextstrain files that were current as of this date.

Type:

datetime.datetime

property tree_as_of: datetime

The date and time (UTC) used to retrieve the NextStrain reference tree.

Type:

datetime.datetime

property ncov_metadata: dict

Metadata for the reference tree that was used for SARS-CoV-2 clade assignments as of tree_as_of. This property will be empty for dates before 2024-08-01, when Nextstrain began publishing ncov pipeline metadata.

Type:

dict

property sequence_metadata: LazyFrame

A Polars LazyFrame that references url_sequence_metadata

Type:

polars.LazyFrame

assign_clades(sequence_metadata: LazyFrame, output_file: Path | str | None = None) Clade[source]

Assign clades to a specified set of sequences.

For each sequence in a sequence file (.fasta), assign a Nextstrain clade using the Nextclade reference tree that corresponds to the tree_as_of date. The earliest available tree_as_of date is 2024-08-01, when Nextstrain began publishing the pipeline metadata that Cladetime uses to retrieve past reference trees.

Parameters:
  • sequence_metadata (polars.LazyFrame) – A Polars LazyFrame of the Nexstrain sequence metadata to use for clade assignment.

  • output_file (str | None) – The full path (including a .tsv filename) to where the clade assignment output file will be saved. The default value is <home_dir>/cladetime/clade_assignments.tsv.

Returns:

A Clade object that contains detailed and summarized information about clades assigned to the sequences in sequence_metadata.

Return type:

cladetime.clade.Clade

Raises:

CladeTimeSequenceWarning – If sequence_metadata is empty, the clade assignment process will be stopped.

Example

>>> import polars as pl
>>>
>>> from cladetime import CladeTime, sequence
>>> ct = CladeTime(sequence_as_of="2024-11-15", tree_as_of="2024-09-01")
>>>
>>> filtered_metadata = sequence.filter_metadata(
>>>     ct.sequence_metadata,
>>>     collection_min_date = "2024-10-01",
>>> )
>>> clade_assignments = ct.assign_clades(filtered_metadata)
>>>
>>> clade_assignment_summary = clade_assignments.summary
>>> clade_assignment_summary.select(
>>>     ["location", "date", "clade_nextstrain", "count"])
>>>     .sort("count", descending=True)
>>>     .collect(stream=True).head()
┌──────────┬────────────┬──────────────────┬───────┐
│ location ┆ date       ┆ clade_nextstrain ┆ count │
│ ---      ┆ ---        ┆ ---              ┆ ---   │
│ str      ┆ date       ┆ str              ┆ u32   │
╞══════════╪════════════╪══════════════════╪═══════╡
│ NY       ┆ 2024-10-01 ┆ 24C              ┆ 15    │
│ NY       ┆ 2024-10-15 ┆ 24C              ┆ 15    │
│ NY       ┆ 2024-10-03 ┆ 24C              ┆ 14    │
│ NY       ┆ 2024-10-14 ┆ 24C              ┆ 14    │
│ NJ       ┆ 2024-10-16 ┆ 24C              ┆ 12    │
└──────────┴────────────┴──────────────────┴───────┘