You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm deciding to post here because I'm still not sure what the issue is, or if I am using IterableDatasets wrongly.
I'm following the guide on here https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu pretty much to a tee and have verified that it works when I'm fine-tuning on the provided dataset.
However, I'm doing some data preprocessing steps (filtering out entries), when I try to swap out the dataset for mine, it fails to train. However, I eventually fixed this by simply setting stream=False in load_dataset.
Coud this be some sort of network / firewall issue I'm facing?
Describe the bug
I'm deciding to post here because I'm still not sure what the issue is, or if I am using IterableDatasets wrongly.
I'm following the guide on here https://huggingface.co/learn/cookbook/en/fine_tuning_code_llm_on_single_gpu pretty much to a tee and have verified that it works when I'm fine-tuning on the provided dataset.
However, I'm doing some data preprocessing steps (filtering out entries), when I try to swap out the dataset for mine, it fails to train. However, I eventually fixed this by simply setting
stream=False
inload_dataset
.Coud this be some sort of network / firewall issue I'm facing?
Steps to reproduce the bug
I made a post with greater description about how I reproduced this problem before I found my workaround: https://discuss.huggingface.co/t/problem-with-custom-iterator-of-streaming-dataset-not-returning-anything/94551
Here is the problematic dataset snippet, which works when streaming=False (and with buffer keyword removed from shuffle)
The annoying part about this is that it only fails during training and I don't know when it will fail, except that it always fails during evaluation.
Expected behavior
The expected behavior is that I should be able to get something from the iterator when called instead of getting nothing / stuck in a loop somewhere.
Environment info
datasets
version: 2.20.0huggingface_hub
version: 0.23.4fsspec
version: 2024.5.0The text was updated successfully, but these errors were encountered: