You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected SchemaErrors (rather than SchemaError) which details the missing column and any other validation failures in the DataFrame.
Desktop (please complete the following information):
OS: macOS Sonoma Version 14.5
Browser: Chrome
Version: 0.19.3
Python Version: 3.11
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
It looks like the SchemaErrors class is attempting to concatenate DataFrames representing different failure cases into one DataFrame. In this case, the below are the DataFrames it is trying to concat:
It should be as simple as setting the dtype of column to always be str but I have no knowledge of the pandera codebase, so no idea if it's actually as quick of a fix as I think it is 😂
Full Traceback:
SchemaError Traceback (most recent call last)
Cell In[6], line 1
----> 1 schema.validate(df, lazy=True)
File ~/.local/share/virtualenvs/zeus-XZJKObwC/lib/python3.11/site-packages/pandera/api/polars/container.py:58, in DataFrameSchema.validate(self, check_obj, head, tail, sample, random_state, lazy, inplace)
54 if is_dataframe:
55 # if validating a polars DataFrame, use the global config setting
56 check_obj = check_obj.lazy()
---> 58 output = self.get_backend(check_obj).validate(
59 check_obj=check_obj,
60 schema=self,
61 head=head,
62 tail=tail,
63 sample=sample,
64 random_state=random_state,
65 lazy=lazy,
66 inplace=inplace,
67 )
69 if is_dataframe:
70 output = output.collect()
File ~/.local/share/virtualenvs/zeus-XZJKObwC/lib/python3.11/site-packages/pandera/backends/polars/container.py:122, in DataFrameSchemaBackend.validate(self, check_obj, schema, head, tail, sample, random_state, lazy, inplace)
120 check_obj = self.drop_invalid_rows(check_obj, error_handler)
121 else:
--> 122 raise SchemaErrors(
123 schema=schema,
124 schema_errors=error_handler.schema_errors,
125 data=check_obj,
126 )
128 return check_obj
File ~/.local/share/virtualenvs/zeus-XZJKObwC/lib/python3.11/site-packages/pandera/errors.py:183, in SchemaErrors.__init__(self, schema, schema_errors, data)
178 self.schema_errors = schema_errors
179 self.data = data
181 failure_cases_metadata = schema.get_backend(
182 data
--> 183 ).failure_cases_metadata(schema.name, schema_errors)
184 self.error_counts = failure_cases_metadata.error_counts
185 self.failure_cases = failure_cases_metadata.failure_cases
File ~/.local/share/virtualenvs/zeus-XZJKObwC/lib/python3.11/site-packages/pandera/backends/polars/base.py:204, in PolarsSchemaBackend.failure_cases_metadata(self, schema_name, schema_errors)
198 failure_cases_df = pl.DataFrame(scalar_failure_cases).cast(
199 {"check_number": pl.Int32, "index": pl.Int32}
200 )
202 failure_case_collection.append(failure_cases_df)
--> 204 failure_cases = pl.concat(failure_case_collection)
206 error_handler = ErrorHandler()
207 error_handler.collect_errors(schema_errors)
File ~/.local/share/virtualenvs/zeus-XZJKObwC/lib/python3.11/site-packages/polars/functions/eager.py:187, in concat(items, how, rechunk, parallel)
184 out = wrap_df(plr.concat_df(elems))
185 elif how == "vertical_relaxed":
186 out = wrap_ldf(
--> 187 plr.concat_lf(
188 [df.lazy() for df in elems],
189 rechunk=rechunk,
190 parallel=parallel,
191 to_supertypes=True,
192 )
193 ).collect(no_optimization=True)
195 elif how == "diagonal":
196 out = wrap_df(plr.concat_df_diagonal(elems))
SchemaError: type String is incompatible with expected type Null
The text was updated successfully, but these errors were encountered:
I was about to try to make a bugfix PR but I noticed the bug seems to be fixed on the main branch... although there's nothing in recent commits that would suggest a reason for it being fixed as far as I can see 🤔
@cosmicBboy Would it be possible to cut a bugfix release so I can see if it's definitely fixed? If not no worries
Describe the bug
A clear and concise description of what the bug is.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Expected behavior
Expected SchemaErrors (rather than SchemaError) which details the missing column and any other validation failures in the DataFrame.
Desktop (please complete the following information):
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
It looks like the SchemaErrors class is attempting to concatenate DataFrames representing different failure cases into one DataFrame. In this case, the below are the DataFrames it is trying to concat:
It should be as simple as setting the dtype of
column
to always bestr
but I have no knowledge of the pandera codebase, so no idea if it's actually as quick of a fix as I think it is 😂Full Traceback:
The text was updated successfully, but these errors were encountered: