You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reusing a pandera.Field instance in a DataFrameModel definition causes unexpected behavior and an unhelpful error message.
If the same Field instance is assigned to two attributes, the first attribute will be dropped from the model. This should raise an exception when the model is built, but will only raise an error when validation fails.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the main branch of pandera.
Code Sample, a copy-pastable example
frompanderaimportDataFrameModel, FieldfrompandasimportDataFrame# This Fails# Attempt to create a field instance to be reused on multiple fieldsGenericField: float=Field(ge=0)
classBadModelDF(DataFrameModel):
field: float=GenericFieldfield_1: float=GenericField# Bug: this is breaks the modelclassConfig:
strict=True# Works# Workaround, create a new Field object for each model fielddefgeneric_field() ->Field:
returnField(ge=0)
# This creates a valid model that works as expectedclassGoodModelDF(DataFrameModel):
field: float=generic_field()
field_1: float=generic_field()
classConfig:
strict=Truedf=DataFrame({'field': [0.0, 0.1], 'field_1': [0.2, 0.3]})
print(GoodModelDF(df))
# Raises SchemaError BadModelDF.validate(df)
Raises: SchemaError: column 'field' not in DataFrameSchema {'field_1': <Schema Column(name=field_1, type=DataType(float64))>}
Expected behavior
Reusing the GenericField instance should result in a valid model/schema that includes both field and field_1 columns.
OR
Assigning the same instance to multiple field attributes should raise an expressive exception when the class is created.
Ex:
classBadModelDF(DataFrameModel):
field: float=GenericFieldfield_1: float=GenericField# Bug: this is breaks the model
Raises: AttributeError: Field instances cannot be used for multiple columns. GenericField<pandera.api.dataframe.model_components.FieldInfo("None") object at 0x7fd460386490> assigned to 'field' and 'field_1' attributes for BadMdelDF
Desktop (please complete the following information):
OS: [Ubuntu]
Version: [e.g. 22.04]
The text was updated successfully, but these errors were encountered:
Reusing a
pandera.Field
instance in aDataFrameModel
definition causes unexpected behavior and an unhelpful error message.If the same
Field
instance is assigned to two attributes, the first attribute will be dropped from the model. This should raise an exception when the model is built, but will only raise an error when validation fails.Code Sample, a copy-pastable example
Raises:
SchemaError: column 'field' not in DataFrameSchema {'field_1': <Schema Column(name=field_1, type=DataType(float64))>}
Expected behavior
Reusing the
GenericField
instance should result in a valid model/schema that includes bothfield
andfield_1
columns.OR
Assigning the same instance to multiple field attributes should raise an expressive exception when the class is created.
Ex:
Raises:
AttributeError: Field instances cannot be used for multiple columns. GenericField<pandera.api.dataframe.model_components.FieldInfo("None") object at 0x7fd460386490> assigned to 'field' and 'field_1' attributes for BadMdelDF
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: