You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I compacted a table, and it replaced a set of files with a new identically-sized set of files.
What you expected to happen:
I expect compaction to do no work if it is not going to reduce the file count. Combining this issue with #2576 means that there is nothing I can do to get a table into a state where I am sure that compaction is a NOOP (incidentally the example below is also a reproduction of #2576, showing that it can take more than one compaction to get a table into a minimal state).
How to reproduce it:
importdeltalakeimportpyarrowaspaforzinrange(10):
deltalake.write_deltalake(
'./storageloop-table',
pa.Table.from_pydict(
{
"x": pa.array([x%207forxinrange(1000000)]),
"y": pa.array([x%3008forxinrange(1000000)]),
"z": pa.array([zfor_inrange(1000000)]),
}
),
mode='append',
)
for_inrange(5):
dt=deltalake.DeltaTable('./storageloop-table')
print(f"Table has {len(dt.files())} files pre-compaction")
# use a small target_size for this toy example so we can# reproduce it with smaller datastats=dt.optimize.compact(target_size=2**21)
print(f"Compaction added {stats['numFilesAdded']} files and removed {stats['numFilesRemoved']} files")
Outputs:
Table has 10 files pre-compaction
Compaction added 3 files and removed 9 files
Table has 4 files pre-compaction
Compaction added 2 files and removed 4 files
Table has 2 files pre-compaction
Compaction added 2 files and removed 2 files
Table has 2 files pre-compaction
Compaction added 2 files and removed 2 files
Table has 2 files pre-compaction
Compaction added 2 files and removed 2 files
More details:
Without having looked at the implementation, my guess is that the compaction algorithm decides it can merge the two files, and issues a write of a single file to the table, and some lower-level mechanism splits it back up into two files.
The text was updated successfully, but these errors were encountered:
Environment
Delta-rs version:
0.15.3
Binding:
Environment: python
3.9.16
Bug
What happened:
I compacted a table, and it replaced a set of files with a new identically-sized set of files.
What you expected to happen:
I expect compaction to do no work if it is not going to reduce the file count. Combining this issue with #2576 means that there is nothing I can do to get a table into a state where I am sure that compaction is a NOOP (incidentally the example below is also a reproduction of #2576, showing that it can take more than one compaction to get a table into a minimal state).
How to reproduce it:
Outputs:
More details:
Without having looked at the implementation, my guess is that the compaction algorithm decides it can merge the two files, and issues a write of a single file to the table, and some lower-level mechanism splits it back up into two files.
The text was updated successfully, but these errors were encountered: