[Data] ArrowVariableShapedTensorArray with LargeListArray #46434
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
Description
The current implementation only allows to create
ArrowVariableShapedTensorArray
objects with a maximum number of (2^31)-1 elements because it uses PyArrow'sListArray
inray.air.util.tensor_extention.arrow
L812 which uses 32-bit encoding for indexing. Thus, storing some types of data like long time-series which contain more elements than with 32-bit encoding causes overflow.Providing the possibility to replace
ListArray
with Pyarrow LargeListArray would allow to store arrays with up to (2^63)-1 elements. (Note: this would also require to change theOFFSET_DTYPE
in L722)Use case
The goal is to be able to store long time-series in arrow format (like long audios, or audios with high sample frequencies).
The text was updated successfully, but these errors were encountered: