DataFrameComparison.num_rows_joined_unequal#

DataFrameComparison.num_rows_joined_unequal(*subset: str) int[source]#

The number of rows of both data frames that can be joined and have at least one mismatching value across any column in subset.

Parameters:

subset – The columns to check for mismatches. If not provided, all common columns are used.

Returns:

The number of rows that can be joined and have at least one mismatching value across the specified columns.

Raises:

ValueError – If any of the provided columns are not common columns.

Examples

>>> import polars as pl
>>> from diffly import compare_frames
>>> left = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "b", "c"], "value": [10.0, 20.0, 30.0]})
>>> right = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "x", "x"], "value": [10.0, 25.0, 30.0]})
>>> comparison = compare_frames(left, right, primary_key="id")
>>> comparison.num_rows_joined_unequal()
2
>>> comparison.num_rows_joined_unequal("value")
1