Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support rolling operations in Polars engine (window functions) #16176

Open
beckernick opened this issue Jul 2, 2024 · 0 comments
Open
Assignees
Labels
cudf.polars Issues specific to cudf.polars feature request New feature or request

Comments

@beckernick
Copy link
Member

Rolling operations / window functions are common when processing real-world datasets to answer questions like "What's the rolling average transaction volume per entity?". In particular, the combination of groupby and rolling windows is particularly valuable.

cuDF has extensive support for rolling/window functions, but they're not yet available in the Polars engine.

import polars as pl
from functools import partial
from cudf_polars.callback import execute_with_cudf
import numpy as np

use_cudf = partial(execute_with_cudf, raise_on_fail=True)

dates = [
    "2020-01-01 13:45:48",
    "2020-01-01 16:42:13",
    "2020-01-01 16:45:09",
    "2020-01-02 18:12:48",
    "2020-01-03 19:45:32",
    "2020-01-08 23:16:43",
]
df = pl.LazyFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).with_columns(
    pl.col("dt").str.strptime(pl.Datetime).set_sorted()
)

# workaround other gaps temporarily
df = df.collect().lazy()

query = (
    df.rolling(index_column="dt", period="2d")
    .agg(
        pl.sum("a").alias("sum_a"),
        pl.min("a").alias("min_a"),
        pl.max("a").alias("max_a"),
    )
)

print(query.collect())
print(query.collect(post_opt_callback=use_cudf))
shape: (6, 4)
┌─────────────────────┬───────┬───────┬───────┐
│ dtsum_amin_amax_a │
│ ------------   │
│ datetime[μs]        ┆ i64i64i64   │
╞═════════════════════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48333     │
│ 2020-01-01 16:42:131037     │
│ 2020-01-01 16:45:091537     │
│ 2020-01-02 18:12:482439     │
│ 2020-01-03 19:45:321129     │
│ 2020-01-08 23:16:43111     │
└─────────────────────┴───────┴───────┴───────┘
---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[158], line 33
     23 query = (
     24     df.rolling(index_column="dt", period="2d")
     25     .agg(
   (...)
     29     )
     30 )
     32 print(query.collect())
---> 33 print(query.collect(post_opt_callback=use_cudf))

File [/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py:1942](http://10.117.23.184:8882/lab/tree/raid/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py#line=1941), in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
   1939 # Only for testing purposes atm.
   1940 callback = _kwargs.get("post_opt_callback")
-> 1942 return wrap_df(ldf.collect(callback))

ComputeError: 'cuda' conversion failed: NotImplementedError: rolling window/groupby
@beckernick beckernick added the feature request New feature or request label Jul 2, 2024
@mroeschke mroeschke added the cudf.polars Issues specific to cudf.polars label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf.polars Issues specific to cudf.polars feature request New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

3 participants