DataFrameComparison.change_counts#

DataFrameComparison.change_counts( column: str, /, *, lazy: Literal[True], include_sample_primary_key: bool = False, ) → LazyFrame[source]#

DataFrameComparison.change_counts( column: str, /, *, lazy: Literal[False] = False, include_sample_primary_key: bool = False, ) → DataFrame

Get the changes of a column, sorted in descending order of frequency.

Parameters:

column – The name of the column to compare.
lazy – If True, return a lazy frame. Otherwise, return an eager frame (default).
include_sample_primary_key – Whether to include a sample primary key for each change.

Returns:

A data frame or lazy frame containing the change counts of the specified column, sorted by count with the most frequent change first.

Examples

>>> import polars as pl
>>> from diffly import compare_frames
>>> left = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "b", "c"]})
>>> right = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "x", "x"]})
>>> comparison = compare_frames(left, right, primary_key="id")
>>> comparison.change_counts("status")
shape: (2, 3)
┌──────┬───────┬───────┐
│ left ┆ right ┆ count │
│ ---  ┆ ---   ┆ ---   │
│ str  ┆ str   ┆ u32   │
╞══════╪═══════╪═══════╡
│ c    ┆ x     ┆ 1     │
│ b    ┆ x     ┆ 1     │
└──────┴───────┴───────┘

Include a sample primary key for each change:

>>> comparison.change_counts("status", include_sample_primary_key=True)
shape: (2, 4)
┌──────┬───────┬───────┬───────────┐
│ left ┆ right ┆ count ┆ sample_id │
│ ---  ┆ ---   ┆ ---   ┆ ---       │
│ str  ┆ str   ┆ u32   ┆ i64       │
╞══════╪═══════╪═══════╪═══════════╡
│ c    ┆ x     ┆ 1     ┆ 3         │
│ b    ┆ x     ┆ 1     ┆ 2         │
└──────┴───────┴───────┴───────────┘