Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

Open
1 of 3 tasks
levaphenyl opened this issue Jul 4, 2024 · 0 comments
Open
1 of 3 tasks

ENH: check_dtype='equiv' for assert_frame_equal in unit testing #59182

levaphenyl opened this issue Jul 4, 2024 · 0 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@levaphenyl
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

My test fails because the data cast and expected data have slightly different types, like int32 vs. int64.
I don't want to use assert_frame_equal(df1, df2, check_dtype=False) because it does not check the data type at all, which is bad.

import pandas as pd


a = pd.DataFrame({'Int': [1, 2, 3], 'Float': [0.57, 0.179, 0.213]})  # Automatic type casting
# Force 32-bit
b = a.copy()
b['Int'] = b['Int'].astype('int32')
b['Float'] = b['Float'].astype('float32')
# Force 64-bit
c = a.copy()
c['Int'] = c['Int'].astype('int64')
c['Float'] = c['Float'].astype('float64')
try:
    pd.testing.assert_frame_equal(b, c)
    print('Success')
except AssertionError as err:
    print(err)

gives

Attributes of DataFrame.iloc[:, 0] (column name="Int") are different

Attribute "dtype" are different
[left]:  int32
[right]: int64

Feature Description

Something like assert_frame_equal(df1, df2, check_dtype='equiv') would be handy but it does not work because the function uses the hard check of assert_attr_equal under the hood.

It means changing the logic to either have a soft attribute check in assert_attr_equal, or call a new function if the check_dtype is set to 'equiv'.

Alternative Solutions

I added a workaround function to my unit tests, which casts the data type of one DataFrame to the other when the types are similar (int, float).

def assert_frame_equiv(left: pd.DataFrame, right: pd.DataFrame) -> None:
    """Convert equivalent data types to same before comparing.

    Parameters
    ----------
    left : DataFrame
        First DataFrame to compare.
    right : DataFrame
        Second DataFrame to compare.

    Raises
    ------
    AssertionError
        If the DataFrames are different.
    """
    # First, check that the columns are the same.
    pd.testing.assert_index_equal(left.columns, right.columns, check_order=False)
    # Knowing columns names are the same, cast the same data type if equivalent.
    for col_name in left.columns:
        lcol = left[col_name]
        rcol = right[col_name]
        if (
            (pd.api.types.is_integer_dtype(lcol) and pd.api.types.is_integer_dtype(rcol))
            or (pd.api.types.is_float_dtype(lcol) and pd.api.types.is_float_dtype(rcol))
        ):
            left[col_name] = lcol.astype(rcol.dtype)

    return pd.testing.assert_frame_equal(left, right, check_like=True)

Additional Context

Adapted from my answer on SO.

Thanks for making pandas!

@levaphenyl levaphenyl added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant