You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to save a Dataset using the save_to_disk() function with:
num_proc > 1
dataset_path being a s3 bucket path e.g. "s3://{bucket_name}/{dataset_folder}/"
The hf progress bar shows up but the saving does not seem to start.
When using one processor only (num_proc=1), everything works fine.
When saving the dataset on local disk (as opposed to s3 bucket) with num_proc > 1, everything works fine.
Describe the bug
I'm trying to save a
Dataset
using thesave_to_disk()
function with:num_proc > 1
dataset_path
being a s3 bucket path e.g. "s3://{bucket_name}/{dataset_folder}/"The hf progress bar shows up but the saving does not seem to start.
When using one processor only (
num_proc=1
), everything works fine.When saving the dataset on local disk (as opposed to s3 bucket) with
num_proc > 1
, everything works fine.Thank you for your help! :)
Steps to reproduce the bug
I tried without any storage options:
and with the specific s3fs storage options:
I'm guessing I might use
storage_options
parameter wrongly, but I didn't find anything online that made it work.NB: Behavior is the same when trying to save the whole
DatasetDict
.Expected behavior
Progress bar fills in and saving is carried out.
Environment info
datasets==2.18.0
The text was updated successfully, but these errors were encountered: