Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Adjust read_json to allow reading byte ranges from source files >2 GB #16138

Open
GregoryKimball opened this issue Jun 30, 2024 · 0 comments
Assignees
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Jun 30, 2024

Is your feature request related to a problem? Please describe.
When reading a byte range from >2GB large source file, cudf read_json throws:
CUDF failure at: /opt/conda/conda-bld/work/cpp/src/io/json/read_json.cu:311: The size of each source file must be less than INT_MAX bytes

Is it possible to adjust this exception to allow for byte range reading from large source files?

Describe the solution you'd like
Hopefully we can adjust the batching in read_json and allow <2 GB byte range reads from source files >2 GB to succeed.

@GregoryKimball GregoryKimball added feature request New feature or request cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. labels Jun 30, 2024
@GregoryKimball GregoryKimball added this to the Nested JSON reader milestone Jun 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: In Progress
Status: No status
Development

No branches or pull requests

2 participants