DataFrameComparison.change_counts#
- DataFrameComparison.change_counts(
- column: str,
- /,
- *,
- lazy: Literal[True],
- include_sample_primary_key: bool = False,
- DataFrameComparison.change_counts(
- column: str,
- /,
- *,
- lazy: Literal[False] = False,
- include_sample_primary_key: bool = False,
Get the changes of a column, sorted in descending order of frequency.
- Parameters:
column – The name of the column to compare.
lazy – If
True, return a lazy frame. Otherwise, return an eager frame (default).include_sample_primary_key – Whether to include a sample primary key for each change.
- Returns:
A data frame or lazy frame containing the change counts of the specified column, sorted by count with the most frequent change first.
Examples
>>> import polars as pl >>> from diffly import compare_frames >>> left = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "b", "c"]}) >>> right = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "x", "x"]}) >>> comparison = compare_frames(left, right, primary_key="id") >>> comparison.change_counts("status") shape: (2, 3) ┌──────┬───────┬───────┐ │ left ┆ right ┆ count │ │ --- ┆ --- ┆ --- │ │ str ┆ str ┆ u32 │ ╞══════╪═══════╪═══════╡ │ c ┆ x ┆ 1 │ │ b ┆ x ┆ 1 │ └──────┴───────┴───────┘
Include a sample primary key for each change:
>>> comparison.change_counts("status", include_sample_primary_key=True) shape: (2, 4) ┌──────┬───────┬───────┬───────────┐ │ left ┆ right ┆ count ┆ sample_id │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ u32 ┆ i64 │ ╞══════╪═══════╪═══════╪═══════════╡ │ c ┆ x ┆ 1 ┆ 3 │ │ b ┆ x ┆ 1 ┆ 2 │ └──────┴───────┴───────┴───────────┘