-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#7331: Initial Polars API #7332
base: main
Are you sure you want to change the base?
FEAT-#7331: Initial Polars API #7332
Conversation
This commit adds a polars namespace to Modin, and the DataFrame and Series objects and their respective APIs. This doesn't include error handling and is still missing several polars features: * LazyFrame * Expressions * String, Temporal, Struct, and other Series accessors * Several parameters * Operators that we don't have query compiler methods for * e.g. sin, cos, tan, etc. Those will be handled in a future PR. Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the files alphabetically through item
in dataframe.py . I'll make another round of comments later today.
def __eq__(self, other) -> "BasePolarsDataset": | ||
return self.__constructor__( | ||
_query_compiler=self._query_compiler.eq( | ||
other._query_compiler if hasattr(other, "_query_compiler") else other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional comment: Why look for query_compiler
instead of checking whether other
is a BasePolarsDataset
?
Returns: | ||
Arrow representation of the DataFrame. | ||
""" | ||
return polars.from_pandas(self._query_compiler.to_pandas()).to_arrow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional nit: should we add a _to_polars()
to alias polars.from_pandas(self._query_compiler.to_pandas())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And should we replace all the polars.from_pandas(self._query_compiler.to_pandas())
here with self._to_polars()
?
modin/polars/dataframe.py
Outdated
|
||
groupby = group_by | ||
|
||
def drop(self, *columns): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is missing strict
.
Co-authored-by: Mahesh Vashishtha <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
Signed-off-by: Devin Petersohn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made it through len
in groupby.py
Returns: | ||
Arrow representation of the DataFrame. | ||
""" | ||
return polars.from_pandas(self._query_compiler.to_pandas()).to_arrow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And should we replace all the polars.from_pandas(self._query_compiler.to_pandas())
here with self._to_polars()
?
This commit adds a polars namespace to Modin, and the DataFrame and Series objects and their respective APIs.
This PR adds the following to Modin:
modin.polars.DataFrame
modin.polars.Series
modin.polars.GroupBy
This PR does NOT include matching error handling and is still missing several polars features:
These features will be added in a future PR.
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date