Schema Comparison#

class diffly.comparison.Schemas(
left_schema: dict[str, DataType],
right_schema: dict[str, DataType],
)[source]#

Container object providing information about the schemas of compared data frames.

Schemas.left()

Schema of the left data frame.

Schemas.right()

Schema of the right data frame.

Schemas.equal(*[, check_dtypes])

Whether the schemas of the left and right data frames are equal.

Schemas.in_common()

Columns that are present in both data frames, mapped to their data types in the left and right data frame.

Schemas.left_only()

Columns that are only present in the left data frame, mapped to their data types.

Schemas.right_only()

Columns that are only present in the right data frame, mapped to their data types.

class diffly.comparison.Schemas.Schema

Child container for the schema of a data frame.

column_names() set[str]

The names of the columns.

class diffly.comparison.Schemas.JointSchema

Child container for the joint schema of two data frames.

column_names() set[str]

The names of the columns.

matching_dtypes() Schema

The columns that have matching dtypes, mapped to the common dtype.

Examples

>>> import polars as pl
>>> from diffly import compare_frames
>>> left = pl.DataFrame({"id": [1], "value": [10]})
>>> right = pl.DataFrame({"id": [1], "value": [10.0]})
>>> compare_frames(left, right, primary_key="id").schemas.in_common().matching_dtypes()
{'id': Int64}
mismatching_dtypes() Self

The columns that have mismatching dtypes, mapped to the dtypes in the left and right data frame.

Examples

>>> import polars as pl
>>> from diffly import compare_frames
>>> left = pl.DataFrame({"id": [1], "value": [10]})
>>> right = pl.DataFrame({"id": [1], "value": [10.0]})
>>> compare_frames(left, right, primary_key="id").schemas.in_common().mismatching_dtypes()
{'value': (Int64, Float64)}