You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
There is an inconsistency in the forward fill behavior of cudf when replacing np.inf and -np.inf values using a list. The same operation works correctly with pandas or replace np.inf and -np.inf seperately.
The problem appears already after the replace call:
import cudf
import numpy as np
s = cudf.Series([1, -np.inf, np.inf])
print(s.replace([-np.inf, np.inf], np.nan))
print(s.replace(-np.inf, np.nan).replace(np.inf, np.nan))
The former produces:
0 1.0
1 NaN
2 NaN
dtype: float64
The latter:
0 1.0
1 <NA>
2 <NA>
dtype: float64
groupby.ffill handles the latter case, but not the former, in the way you might expect from pandas (where NaN is consider a missing value).
I agree that replace should produce the same output for the two examples in this comment (I think the latter is "more correct").
To work around this, if you replace your usage of np.nan in your replace call with None, then everything works as anticipated.
Note that this is a consequence of cudf being slightly stricter than pandas in a number of places when it comes to differences between nan and NA, the latter indicates and actually missing value, the former (in cudf) does not.
Describe the bug
There is an inconsistency in the forward fill behavior of cudf when replacing np.inf and -np.inf values using a list. The same operation works correctly with pandas or replace np.inf and -np.inf seperately.
Steps/Code to reproduce bug
Output
Expected behavior
DataFrame after forward fill:
group value
0 A 1.0
1 A 1.0
2 A 3.0
3 B
4 B 5.0
5 B 5.0
Environment overview (please complete the following information)
it works fine if seperate the replace by:
or use pandas instead
The text was updated successfully, but these errors were encountered: