DataFrameComparison.joined_equal#
- DataFrameComparison.joined_equal(
- *subset: str,
- lazy: Literal[True],
- DataFrameComparison.joined_equal(
- *subset: str,
- lazy: Literal[False] = False,
The rows of both data frames that can be joined and have matching values in in all columns in subset.
- Parameters:
subset – The columns to check for mismatches. If not provided, all common columns are used.
lazy – If
True, return a lazy frame. Otherwise, return an eager frame (default).
- Returns:
A data frame or lazy frame containing the rows that can be joined and have matching values across the specified columns.
- Raises:
ValueError – If any of the provided columns are not common columns.
Columns which are not used for joining have a suffix
_leftfor the left data frame and a suffix_rightfor the right data frame.Examples
>>> import polars as pl >>> from diffly import compare_frames >>> left = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "b", "c"], "value": [10.0, 20.0, 30.0]}) >>> right = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "x", "x"], "value": [10.0, 25.0, 30.0]}) >>> comparison = compare_frames(left, right, primary_key="id") >>> comparison.joined_equal() shape: (1, 5) ┌─────┬─────────────┬──────────────┬────────────┬─────────────┐ │ id ┆ status_left ┆ status_right ┆ value_left ┆ value_right │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ str ┆ f64 ┆ f64 │ ╞═════╪═════════════╪══════════════╪════════════╪═════════════╡ │ 1 ┆ a ┆ a ┆ 10.0 ┆ 10.0 │ └─────┴─────────────┴──────────────┴────────────┴─────────────┘
Only check a subset of columns for equality:
>>> comparison.joined_equal("value") shape: (2, 5) ┌─────┬─────────────┬──────────────┬────────────┬─────────────┐ │ id ┆ status_left ┆ status_right ┆ value_left ┆ value_right │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ str ┆ f64 ┆ f64 │ ╞═════╪═════════════╪══════════════╪════════════╪═════════════╡ │ 1 ┆ a ┆ a ┆ 10.0 ┆ 10.0 │ │ 3 ┆ c ┆ x ┆ 30.0 ┆ 30.0 │ └─────┴─────────────┴──────────────┴────────────┴─────────────┘