Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling df.loc with multiple arguments results in KeyError #4354

Closed
naren-ponder opened this issue Mar 30, 2022 · 8 comments · May be fixed by #4421
Closed

Calling df.loc with multiple arguments results in KeyError #4354

naren-ponder opened this issue Mar 30, 2022 · 8 comments · May be fixed by #4421
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon pandas concordance 🐼 Functionality that does not match pandas pandas 🤔 Weird Behaviors of Pandas

Comments

@naren-ponder
Copy link
Collaborator

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Version 11.6.4
  • Modin version (modin.__version__): 0.14.0
  • Python version: Python 3.8.11
  • Code we can use to reproduce:
import modin.pandas as pd
import numpy as np

arrays = [
    np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
    np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df.loc['bar', 'one']

Resulting Error:

KeyError                                  Traceback (most recent call last)
<ipython-input-7-5557f8ed36a3> in <module>
----> 1 df.loc['bar', 'one']

~/Desktop/modin/modin/pandas/indexing.py in __getitem__(self, key)
    636             return self._handle_boolean_masking(row_loc, col_loc)
    637 
--> 638         row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
    639         result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
    640         if isinstance(result, Series):

~/Desktop/modin/modin/pandas/indexing.py in _compute_lookup(self, row_loc, col_loc)
    843                         else axis_loc
    844                     )
--> 845                     raise KeyError(missing_labels)
    846 
    847             if isinstance(axis_lookup, pandas.Index) and not is_range_like(axis_lookup):

KeyError: array(['one'], dtype='<U3')

Expected Output (with pandas):

0    0.395674
1   -0.426304
2    0.273483
3   -0.702982
Name: (bar, one), dtype: float64

Describe the problem

Calling df.loc with multiple arguments results in Modin believing there are missing labels and therefore raises a KeyError.

Source code / logs

@naren-ponder naren-ponder self-assigned this Mar 30, 2022
@anmyachev
Copy link
Collaborator

@naren-ponder do you find the behavior strange? It would be more expected if it would be necessary to explicitly pass the tuple to work with the multi-index, like df.loc[(bar, one)].

If this behavior is wrong in pandas itself, maybe we should not repeat it?

@naren-ponder
Copy link
Collaborator Author

@anmyachev The "expected output" section I indicated above is what happens when you run that snippet of code with pandas. So given that we want to mirror the pandas behavior, I think this is a bug that should be fixed. Perhaps I am misunderstanding your question?

@anmyachev
Copy link
Collaborator

@naren-ponder In general you are right. But it seemed to me that there was already a precedent when we issued a warning for users that Modin's behavior in such and such a case does not coincide with the behavior of pandas, because the behavior of pandas is erroneous. @modin-project/modin-core do you remember this case? Or am I confusing something?

@anmyachev
Copy link
Collaborator

The behavior of pandas in this case is not erroneous, I looked at the docs. So we definitely need to fix the case.

However, the previous question is still relevant.

@alvin-chang
Copy link

I got the same error, thus upvoting this issue.

@YarShev
Copy link
Collaborator

YarShev commented Apr 5, 2022

@anmyachev, if Modin behavior does not match the pandas behavior, we issue a warning like this.

operation="melt", message="Order of rows could be different from pandas"

@dchigarev dchigarev added pandas 🤔 Weird Behaviors of Pandas pandas concordance 🐼 Functionality that does not match pandas labels Apr 14, 2022
@naren-ponder
Copy link
Collaborator Author

@alvin-chang An easy workaround for this issue would be to separate out the calls to .loc. For instance in the case listed above you could do df.loc['bar'].loc['one']. This should unblock you while we work towards putting in a fix.

@naren-ponder naren-ponder added the bug 🦗 Something isn't working label Apr 26, 2022
naren-ponder added a commit to naren-ponder/modin that referenced this issue Jun 1, 2022
@pyrito pyrito closed this as completed Aug 11, 2022
@pyrito pyrito reopened this Aug 11, 2022
@mvashishtha mvashishtha added the P1 Important tasks that we should complete soon label Sep 20, 2022
@mvashishtha
Copy link
Collaborator

This works at version 80c7891.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P1 Important tasks that we should complete soon pandas concordance 🐼 Functionality that does not match pandas pandas 🤔 Weird Behaviors of Pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants