-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent writers return error despite successfully written to the file #2609
Comments
I think the information you are missing is that in a delta table, the writes happen in two stages: (1) write data files (Parquet), then (2) commit to the transaction log. It detects the table already exists and fails on step 2. The files created in step (1) that are part of the failed transaction can be cleaned up with the VACUUM operation. |
Thank you for the clarification. I have gone through the transaction and vacuum documentation. Following the example when I try to vacuum:
It returns an empty list. Is there something I am missing to indicate to clean up files that have failed transaction? |
You should read the documentation for the vacuum method, particularly the https://delta-io.github.io/delta-rs/api/delta_table/#deltalake.DeltaTable.vacuum |
After going through the documentation I have set the There are totally 3 parquet files generated by the sample program in a path, I observed that the |
I have the same issue here. Even if I try to vacuum the files or even if I try deleting and creating the table again I get the same error where the data gets appended but then the transaction log fails. |
Environment
Delta-rs version: 0.18.1
Binding: rust
Environment:
Bug
What happened: I have writers as python processes, writing to the same file location. Each writer is responsible for creating a pandas dataframe and writing to the exact file location. There are 4 different scenarios regarding this writing:
2.1. Delta transaction failed, version 0 already exists
2.2. Generic error: A Delta Lake table already exists at that location
3.1. Delta table already exists, write mode set to error
When the process throws error 2.1 or 2.2, I expect the write to fail. But when inspecting the file I observe that the writer's data is appended to the file.
For the below reproducible example, let's consider process-1 and process-2 had the error 2.2. Now I only expected
process 3
to be present in the file but then the file containedWhat you expected to happen: When the writer threw an error, I expected the write to fail. When multiple writers are writing to the same location then only one should succeed and the other writes should fail and return error.
How to reproduce it:
More details:
Regarding the Generic error: A Delta Lake table already exists at that location, I believe this is handled in this part of the code in
crates/core/src/logstore/mod.rs
This function is called here
crates/core/src/operations/create.rs
:For the Delta table already exists, write mode set to error, this is handled from python side in
python/deltalake/writer.py
The text was updated successfully, but these errors were encountered: