Replies: 5 comments
-
Looking at the checkpoint code, I see this line: Am I misunderstanding what checkpoint does? Why is it being set to |
Beta Was this translation helpful? Give feedback.
-
@echai58 change seems to be done 3 years ago ^^, but reading the protocol it mentions data_change=false if files were already present in a table, so I guess that's why in this context. You could check what the behavior is of delta-spark and if you see the data_change being preserved as-is? |
Beta Was this translation helpful? Give feedback.
-
Yeah, in my eyes a checkpoint isn't really a transaction because it doesn't generate a commit file, so i feel it should preserve it. But yeah, I'll look into what the delta-spark behavior is, good call. |
Beta Was this translation helpful? Give feedback.
-
@ion-elgreco Seems like checkpointing with delta-spark also sets I brought this up because I have a use case where I'm interested in figuring out which partitions were edited in the past |
Beta Was this translation helpful? Give feedback.
-
Environment
Delta-rs version: 0.16.3
Binding: Python
Bug
What happened:
![image](https://private-user-images.githubusercontent.com/56415623/317480180-53de4a76-7cbb-4cbb-a510-87c7cce9a545.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExNTQ1NjYsIm5iZiI6MTcyMTE1NDI2NiwicGF0aCI6Ii81NjQxNTYyMy8zMTc0ODAxODAtNTNkZTRhNzYtN2NiYi00Y2JiLWE1MTAtODdjN2NjZTlhNTQ1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE2VDE4MjQyNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFlZGVlODEwNTE1ODJlODE5ODMwODFmMDc0ZjM5ODUxNGMxMmJiMDU3ZTI5ODdmNWIyZDFhM2NiNjBmYmRjZjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Ua2hTVq96WzZLnrTx3WWLgyES3Y5UYQcsCgf-OONWus)
![image](https://private-user-images.githubusercontent.com/56415623/317481582-ea125344-a8ac-4077-a398-3e1d3b2b2b6b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExNTQ1NjYsIm5iZiI6MTcyMTE1NDI2NiwicGF0aCI6Ii81NjQxNTYyMy8zMTc0ODE1ODItZWExMjUzNDQtYThhYy00MDc3LWEzOTgtM2UxZDNiMmIyYjZiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE2VDE4MjQyNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM5YzcyMjE0ZGUzNTZlMmFhZTZlYTYyMDhkNWZkNWUwYmNmNGNjYzMyZGM4NDRlYmQ5MTYwODg3OGIwYWZlMDQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.dHzvfVc7OiLmAzycAtGWmhTtPnejqQ0iZplMTWp8zKk)
Before checkpointing,
table.get_add_actions(flatten=True).to_pandas
looks like:After checkpointing, the same call looks like:
which shows everything is unchanged, except
data_change
goes from True to False.Examining the checkpoint parquet file, I see the following entry for
add
:while the original commit file shows:
What you expected to happen:
Unless I'm misunderstanding what
checkpoint
does, I believe it is a bug thatdataChange
is being set toFalse
, instead of preserving the value from the original commit file.How to reproduce it:
Beta Was this translation helpful? Give feedback.
All reactions