ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

levaphenyl · 2024-07-04T14:05:06Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

My test fails because the data cast and expected data have slightly different types, like int32 vs. int64.
I don't want to use assert_frame_equal(df1, df2, check_dtype=False) because it does not check the data type at all, which is bad.

import pandas as pd


a = pd.DataFrame({'Int': [1, 2, 3], 'Float': [0.57, 0.179, 0.213]})  # Automatic type casting
# Force 32-bit
b = a.copy()
b['Int'] = b['Int'].astype('int32')
b['Float'] = b['Float'].astype('float32')
# Force 64-bit
c = a.copy()
c['Int'] = c['Int'].astype('int64')
c['Float'] = c['Float'].astype('float64')
try:
    pd.testing.assert_frame_equal(b, c)
    print('Success')
except AssertionError as err:
    print(err)

gives

Attributes of DataFrame.iloc[:, 0] (column name="Int") are different

Attribute "dtype" are different
[left]:  int32
[right]: int64

Feature Description

Something like assert_frame_equal(df1, df2, check_dtype='equiv') would be handy but it does not work because the function uses the hard check of assert_attr_equal under the hood.

It means changing the logic to either have a soft attribute check in assert_attr_equal, or call a new function if the check_dtype is set to 'equiv'.

Alternative Solutions

I added a workaround function to my unit tests, which casts the data type of one DataFrame to the other when the types are similar (int, float).

def assert_frame_equiv(left: pd.DataFrame, right: pd.DataFrame) -> None:
    """Convert equivalent data types to same before comparing.

    Parameters
    ----------
    left : DataFrame
        First DataFrame to compare.
    right : DataFrame
        Second DataFrame to compare.

    Raises
    ------
    AssertionError
        If the DataFrames are different.
    """
    # First, check that the columns are the same.
    pd.testing.assert_index_equal(left.columns, right.columns, check_order=False)
    # Knowing columns names are the same, cast the same data type if equivalent.
    for col_name in left.columns:
        lcol = left[col_name]
        rcol = right[col_name]
        if (
            (pd.api.types.is_integer_dtype(lcol) and pd.api.types.is_integer_dtype(rcol))
            or (pd.api.types.is_float_dtype(lcol) and pd.api.types.is_float_dtype(rcol))
        ):
            left[col_name] = lcol.astype(rcol.dtype)

    return pd.testing.assert_frame_equal(left, right, check_like=True)

Additional Context

Adapted from my answer on SO.

Thanks for making pandas!

The text was updated successfully, but these errors were encountered:

levaphenyl added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

levaphenyl commented Jul 4, 2024

ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

Comments

levaphenyl commented Jul 4, 2024

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context