DataFrameComparison.joined_unequal#

DataFrameComparison.joined_unequal( *subset: str, select: Literal['all', 'subset'] | list[str] = 'all', lazy: Literal[True], ) → LazyFrame[source]#

DataFrameComparison.joined_unequal( *subset: str, select: Literal['all', 'subset'] | list[str] = 'all', lazy: Literal[False] = False, ) → DataFrame

The rows of both data frames that can be joined and have at least one mismatching value across any column in subset.

Parameters:

subset – The columns to check for mismatches. If not provided, all common columns are used. Must only contain common columns.
select – Which columns should be selected in the result. “all” (default) selects all columns. “subset” selects only the primary key and the columns from subset in the compared data frames. Providing a list of strings behaves the same as “subset” but additionally selects the columns in the list from the compared data frames. The list must only contain common columns.
lazy – If True, return a lazy frame. Otherwise, return an eager frame (default).

Returns:

A data frame or lazy frame containing the rows that can be joined and have at least one mismatching value across the specified columns.

Raises:

ValueError – If any of the provided columns are not common columns.

Columns which are not used for joining have a suffix _left for the left data frame and a suffix _right for the right data frame.

Examples

>>> import polars as pl
>>> from diffly import compare_frames
>>> left = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "b", "c"], "value": [10.0, 20.0, 30.0]})
>>> right = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "x", "x"], "value": [10.0, 25.0, 30.0]})
>>> comparison = compare_frames(left, right, primary_key="id")
>>> comparison.joined_unequal()
shape: (2, 5)
┌─────┬─────────────┬──────────────┬────────────┬─────────────┐
│ id  ┆ status_left ┆ status_right ┆ value_left ┆ value_right │
│ --- ┆ ---         ┆ ---          ┆ ---        ┆ ---         │
│ i64 ┆ str         ┆ str          ┆ f64        ┆ f64         │
╞═════╪═════════════╪══════════════╪════════════╪═════════════╡
│ 2   ┆ b           ┆ x            ┆ 20.0       ┆ 25.0        │
│ 3   ┆ c           ┆ x            ┆ 30.0       ┆ 30.0        │
└─────┴─────────────┴──────────────┴────────────┴─────────────┘

Use select="subset" to only include the columns being compared:

>>> comparison.joined_unequal("status", select="subset")
shape: (2, 3)
┌─────┬─────────────┬──────────────┐
│ id  ┆ status_left ┆ status_right │
│ --- ┆ ---         ┆ ---          │
│ i64 ┆ str         ┆ str          │
╞═════╪═════════════╪══════════════╡
│ 2   ┆ b           ┆ x            │
│ 3   ┆ c           ┆ x            │
└─────┴─────────────┴──────────────┘