You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I was helping a user Amith with a question about cudf.pandas and the library pandapower (see RAPIDS GoAI Slack). The pandapower notebook examples trigger a bug in cudf.pandas around the .at accessor. @Jacey0 helped debug this with me during PyCon 2024. Thanks Jacey!
Here is the minimal code snippet needed to reproduce:
%load_extcudf.pandasimportpandasaspdbus=pd.DataFrame({'name': []})
entries= {'name': 'Bus 0', 'geo': 'X and Y'}
forcol, valinentries.items():
print("Setting", col, "to", val, end="\n\n")
print("Before:")
print(bus, end="\n\n")
bus.at[0, col] =val# pandas adds a new row and 'geo' column' fineprint("After:")
print(bus, end="\n\n")
Error output:
SettingnametoBus0Before:
EmptyDataFrameColumns: [name]
Index: []
---------------------------------------------------------------------------KeyErrorTraceback (mostrecentcalllast)
/usr/local/lib/python3.10/dist-packages/cudf/core/series.pyin_loc_to_iloc(self, arg)
381if (n:=len(indices)) ==0:
-->382raiseKeyError("Label scalar is out of bounds")
383elifn==1:
KeyError: 'Label scalar is out of bounds'Duringhandlingoftheaboveexception, anotherexceptionoccurred:
KeyErrorTraceback (mostrecentcalllast)
15framesKeyError: 'Label scalar is out of bounds'Duringhandlingoftheaboveexception, anotherexceptionoccurred:
IndexErrorTraceback (mostrecentcalllast)
IndexError: IndexoutofboundsDuringhandlingoftheaboveexception, anotherexceptionoccurred:
AttributeErrorTraceback (mostrecentcalllast)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexing.pyin__setitem__(self, key, value)
843else:
844key=com.apply_if_callable(key, self.obj)
-->845indexer=self._get_setitem_indexer(key)
846self._has_valid_setitem_indexer(key)
847AttributeError: '_AtIndexer'objecthasnoattribute'_get_setitem_indexer'
Expected behavior
Here is the output from plain pandas:
Setting name to Bus 0
Before:
Empty DataFrame
Columns: [name]
Index: []
After:
name
0 Bus 0
Setting geo to X and Y
Before:
name
0 Bus 0
After:
name geo
0 Bus 0 X and Y
The text was updated successfully, but these errors were encountered:
So cudf.DataFrame.at is just returns cudf.DataFrame.loc, so a pandas.DataFrame.at call will always succeed to return cudf.pandas' intermediate proxy of pandas.DataFrame.loc.
So if pandas.DataFrame.at.__setitem__ fails in cudf.pandas, we rewind to pandas.DataFrame.loc.__setitem__ instead of pandas.DataFrame.at.__setitem__.
I think we need pandas.DataFrame.at to be an IntermediateProxy of an IntermediateProxy as "proper" fix. I'm not sure if there's an straightforward way to do this today.
Alternatively, if we make cudf.DataFrame.at not map to cudf.DataFrame.loc (i.e. return a different object with the same implementation), that would be an easier fix...
Alternatively, if we make cudf.DataFrame.at not map to cudf.DataFrame.loc (i.e. return a different object with the same implementation), that would be an easier fix...
That sounds like the right thing to do. cudf's object model should mirror that of pandas. If at returns loc in cudf but not pandas, that's a problem in the mirrored object model.
Describe the bug
I was helping a user Amith with a question about cudf.pandas and the library pandapower (see RAPIDS GoAI Slack). The pandapower notebook examples trigger a bug in cudf.pandas around the
.at
accessor. @Jacey0 helped debug this with me during PyCon 2024. Thanks Jacey!Steps/Code to reproduce bug
Here is a Colab notebook with a reproducer: https://colab.research.google.com/gist/bdice/f5be320dada30671015d5b53642ec19c/solution-test.ipynb
Here is the minimal code snippet needed to reproduce:
Error output:
Expected behavior
Here is the output from plain pandas:
The text was updated successfully, but these errors were encountered: