Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to specify columns with a dot in the name in predicate #2624

Open
emanueledomingo opened this issue Jun 26, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@emanueledomingo
Copy link

emanueledomingo commented Jun 26, 2024

Environment

Delta-rs version:

How do i find the delta-rs version as a python user?

Binding: 0.18.1

Environment:

  • OS: Ubuntu 22.04 LTS

Bug

What happened: I cannot use a predicate containing a column with a dot in the name, like " \"Product.Id\" = '1' " when writing with rust engine. It's being interpreted as "Product"."Id" instead of "Product.Id".

What you expected to happen: correctly parse the column name with the dot

How to reproduce it:

import deltalake
import pyarrow as pa

ta = pa.Table.from_pydict(
    {
        "Product.Id": ['x-0', 'x-1', 'x-2', 'x-3'],
    }
)

fp = "./resources/path/to/table"

deltalake.write_deltalake(
    table_or_uri=fp,
    data=ta,
    partition_by=["Product.Id"],
    engine="rust",
    mode="overwrite",
    predicate="\"Product.Id\" = 'x-1'"
)

More details:

Here the stacktrace:

DeltaError                                Traceback (most recent call last)
Cell In[89], line 12
      4 ta = pa.Table.from_pydict(
      5     {
      6         "Product.Id": ['x-0', 'x-1', 'x-2', 'x-3'],
      7     }
      8 )
     10 fp = "./resources/path/to/table"
---> 12 deltalake.write_deltalake(
     13     table_or_uri=fp,
     14     data=ta,
     15     partition_by=["Product.Id"],
     16     engine="rust",
     17     mode="overwrite",
     18     predicate="\"Product.Id\" = 'x-1'"
     19 )

File ~/mambaforge/envs/delta/lib/python3.12/site-packages/deltalake/writer.py:304, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, large_dtypes, engine, writer_properties, custom_metadata)
    301     return
    303 data = RecordBatchReader.from_batches(schema, (batch for batch in data))
--> 304 write_deltalake_rust(
    305     table_uri=table_uri,
    306     data=data,
    307     partition_by=partition_by,
    308     mode=mode,
    309     table=table._table if table is not None else None,
    310     schema_mode=schema_mode,
    311     predicate=predicate,
    312     name=name,
    313     description=description,
    314     configuration=configuration,
    315     storage_options=storage_options,
    316     writer_properties=(
    317         writer_properties._to_dict() if writer_properties else None
    318     ),
    319     custom_metadata=custom_metadata,
    320 )
    321 if table:
    322     table.update_incremental()

DeltaError: Generic DeltaTable error: Schema error: No field named "Product"."Id". Valid fields are "88e03a2f-8d4f-407c-98de-cb67462708d2"."Product.Id".

It seems that the predicate splits the column by the dot and then the sql backend (datafusion i suppose) interpret the first part as table name

@emanueledomingo emanueledomingo added the bug Something isn't working label Jun 26, 2024
@emanueledomingo
Copy link
Author

I made some further trials and i got:

  • if i use "`Product.Id` = 'x-1'" i get DeltaError: Generic DeltaTable error: Schema error: No field named Product.Id. Valid fields are "88e03a2f-8d4f-407c-98de-cb67462708d2"."Product.Id".
  • if i use "`\"Product.Id\"` = 'x-1'" i get DeltaError: Generic DeltaTable error: Schema error: No field named """Product.Id""" Valid fields are "Product.Id".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant