Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset with streaming doesn't work with proxy #6992

YHL04 opened this issue Jun 22, 2024 · 1 comment

Dataset with streaming doesn't work with proxy #6992

YHL04 opened this issue Jun 22, 2024 · 1 comment


Copy link

YHL04 commented Jun 22, 2024

Describe the bug

I'm currently trying to stream data using dataset since the dataset is too big but it hangs indefinitely without loading the first batch. I use AIMOS which is a supercomputer that uses proxy to connect to the internet. I assume it has to do with the network configurations. I've already set up both HTTP_PROXY and HTTPS_PROXY. streaming = False works fine.

Steps to reproduce the bug

use load_dataset with streaming = True in AIMOS

Expected behavior

does not hang indefinitely and loads batches to start training run

Environment info

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
_pytorch_select 2.0 cuda_2
abseil-cpp 20220623.0 h9888cd1_6 conda-forge
absl-py 1.0.0 py311h399429b_0
aiofiles 23.2.1 pyhd8ed1ab_0 conda-forge
aiohttp 3.8.6 py311hf118e41_0
aiosignal 1.2.0 pyhd3eb1b0_0
archspec 0.2.3 pyhd8ed1ab_0 conda-forge
arrow-cpp 11.0.0 ha3edaa6_5_cpu conda-forge
async-timeout 4.0.2 py311h6ffa863_0
attrs 23.1.0 py311h6ffa863_0
av 10.0.0 py311he6153ed_2
aws-c-auth 0.6.24 hb81f6d7_5 conda-forge
aws-c-cal 0.5.20 h3c2b4d9_6 conda-forge
aws-c-common 0.8.11 h4194056_0 conda-forge
aws-c-compression 0.2.16 ha19333d_3 conda-forge
aws-c-event-stream 0.2.18 h12a9399_6 conda-forge
aws-c-http 0.7.4 ha2cde00_2 conda-forge
aws-c-io 0.13.17 h9189062_2 conda-forge
aws-c-mqtt 0.8.6 h40d1a04_6 conda-forge
aws-c-s3 0.2.4 hbdbe4f0_3 conda-forge
aws-c-sdkutils 0.1.7 ha19333d_3 conda-forge
aws-checksums 0.1.14 ha19333d_3 conda-forge
aws-crt-cpp 0.19.7 hd018011_7 conda-forge
aws-sdk-cpp 1.10.57 hb9575ba_4 conda-forge
blas 1.0 openblas
blinker 1.8.2 pyhd8ed1ab_0 conda-forge
boltons 23.0.0 py311h6ffa863_0
boost-cpp 1.82.0 h25e6d66_2
bottleneck 1.3.5 py311h34f6284_0
brotli 1.0.9 hf118e41_7
brotli-bin 1.0.9 hf118e41_7
brotli-python 1.0.9 py311h4a02239_7
bzip2 1.0.8 h7b6447c_0
c-ares 1.19.1 hf118e41_0
ca-certificates 2024.6.2 h0f6029e_0 conda-forge
cachetools 5.3.3 pyhd8ed1ab_0 conda-forge
certifi 2024.6.2 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py311hf118e41_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.7 unix_pyh707e725_0 conda-forge
conda 24.5.0 py311h1af927a_0 conda-forge
conda-content-trust 0.2.0 py311h6ffa863_0
conda-libmamba-solver 23.11.1 py311h6ffa863_0
conda-package-handling 2.2.0 py311h6ffa863_0
conda-package-streaming 0.9.0 py311h6ffa863_0
contourpy 1.0.5 py311h25e6d66_0
cryptography 41.0.3 py311hb0e80e7_0
cudatoolkit 11.8.0 hedcfb66_13 conda-forge
cudnn 8.9.2_11.8 h9ceb136_1
cycler 0.11.0 pyhd3eb1b0_0
datasets 2.12.0 py311h6ffa863_0
dill 0.3.6 py311h6ffa863_0
distro 1.9.0 pyhd8ed1ab_0 conda-forge
ffmpeg 4.2.2 opence_0
filelock 3.9.0 py311h6ffa863_0
fmt 9.1.0 h25e6d66_0
fonttools 4.25.0 pyhd3eb1b0_0
freetype 2.12.1 hd23a775_0
frozendict 2.4.4 py311hb02d432_0 conda-forge
frozenlist 1.4.0 py311hf118e41_0
fsspec 2023.9.2 py311h6ffa863_0
gflags 2.2.2 he6710b0_0
giflib 5.2.1 hf118e41_3
glog 0.6.0 hbe088e0_0 conda-forge
gmp 6.3.0 h46f38da_0 conda-forge
gmpy2 2.1.5 py311h2758da7_1 conda-forge
google-auth 2.30.0 pyhff2d567_0 conda-forge
google-auth-oauthlib 0.5.3 pyhd8ed1ab_0 conda-forge
grpc-cpp 1.51.1 h8ba971d_1 conda-forge
grpcio 1.54.3 py311h414e0d3_0
huggingface_hub 0.17.3 py311h6ffa863_0
icu 73.1 h4a02239_0
idna 3.4 py311h6ffa863_0
importlib-metadata 6.0.0 py311h6ffa863_0
jinja2 3.1.4 pyhd8ed1ab_0 conda-forge
jpeg 9e hf118e41_1
jsonpatch 1.32 pyhd3eb1b0_0
jsonpointer 2.1 pyhd3eb1b0_0
kiwisolver 1.4.4 py311h4a02239_0
krb5 1.20.1 hc019ccd_1
lame 3.100 hb283c62_1003 conda-forge
lcms2 2.12 h2045e0b_0
ld_impl_linux-ppc64le 2.38 hec883e6_1
lerc 3.0 h29c3540_0
leveldb 1.23 h24532b4_1 conda-forge
libabseil 20220623.0 cxx17_h9235812_6 conda-forge
libarchive 3.6.2 hd8ab008_2
libarrow 11.0.0 h837770b_5_cpu conda-forge
libboost 1.82.0 haf51a6a_2
libbrotlicommon 1.0.9 hf118e41_7
libbrotlidec 1.0.9 hf118e41_7
libbrotlienc 1.0.9 hf118e41_7
libcrc32c 1.1.2 h3b9df90_0 conda-forge
libcurl 8.4.0 h4d62439_0
libdeflate 1.17 hf118e41_1
libedit 3.1.20221030 hf118e41_0
libev 4.33 h140841e_1
libevent 2.1.10 h19c23f1_4 conda-forge
libexpat 2.6.2 h46f38da_0 conda-forge
libffi 3.4.4 h4a02239_0
libgcc-ng 13.2.0 h31e42bb_10 conda-forge
libgfortran-ng 11.2.0 hb3889a9_1
libgfortran5 11.2.0 h1234567_1
libgomp 13.2.0 h31e42bb_10 conda-forge
libgoogle-cloud 2.7.0 h11140b6_1 conda-forge
libgrpc 1.51.1 h4d29a31_1 conda-forge
libmamba 1.5.3 h7c6fafd_0
libmambapy 1.5.3 py311h828bf7b_0
libnghttp2 1.57.0 h44e5816_0
libnsl 2.0.1 ha17a0cc_0 conda-forge
libopenblas 0.3.23 hc5a31fb_2
libopus 1.3.1 h4e0d66e_1 conda-forge
libpng 1.6.39 hf118e41_0
libprotobuf 3.21.12 h1776448_0
libsolv 0.7.24 h0f529ac_0
libsqlite 3.45.3 hd4bbf49_0 conda-forge
libssh2 1.10.0 h50fa78f_2
libstdcxx-ng 13.2.0 h262982c_10 conda-forge
libthrift 0.18.0 h82f1162_0 conda-forge
libtiff 4.5.1 h4a02239_0
libutf8proc 2.8.0 hb283c62_0 conda-forge
libuuid 2.38.1 h4194056_0 conda-forge
libvpx 1.13.1 h46f38da_0 conda-forge
libwebp 1.3.2 h0f96ee2_0
libwebp-base 1.3.2 hf118e41_0
libxcrypt 4.4.36 ha17a0cc_1 conda-forge
libxml2 2.10.4 h18e3229_1
libzlib 1.2.13 h1f2b957_6 conda-forge
llvm-openmp 14.0.6 hc028133_0
lmdb 0.9.31 ha17a0cc_1 conda-forge
lz4-c 1.9.4 h4a02239_0
markdown 3.4.4 pyhd8ed1ab_0 conda-forge
markupsafe 2.1.5 py311h32d8acf_0 conda-forge
matplotlib 3.8.0 py311h6ffa863_0
matplotlib-base 3.8.0 py311h52e1fcc_0
menuinst 2.1.1 py311h1af927a_0 conda-forge
mpc 1.3.1 heaf1863_0 conda-forge
mpfr 4.2.1 haad2271_1 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
multidict 6.0.2 py311hf118e41_0
multiprocess 0.70.14 py311h6ffa863_0
munkres 1.1.4 py_0
mypy_extensions 1.0.0 pyha770c72_0 conda-forge
nccl 2.18.3 cuda11.8_1
ncurses 6.4 h4a02239_0
nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge
networkx 2.8.8 pyhd8ed1ab_0 conda-forge
nomkl 3.0 0
numactl 2.0.16 hba61f60_1
numexpr 2.8.7 py311hc46fc55_0
numpy 1.24.3 py311h148a09e_0
numpy-base 1.24.3 py311h06b82f6_0
oauthlib 3.2.2 pyhd8ed1ab_0 conda-forge
openjpeg 2.4.0 hfe35807_0
openssl 3.3.1 h1f2b957_0 conda-forge
orc 1.8.2 h341c9a4_2 conda-forge
packaging 23.1 py311h6ffa863_0
pandas 2.1.1 py311h52e1fcc_0
pcre2 10.42 h280155c_0
pillow 10.0.1 py311he33076b_0
pip 23.3 py311h6ffa863_0
platformdirs 4.2.2 pyhd8ed1ab_0 conda-forge
pluggy 1.0.0 py311h6ffa863_1
pooch 1.8.2 pyhd8ed1ab_0 conda-forge
protobuf 4.21.12 py311ha7baec7_1
psutil 5.9.8 py311hd26027c_0 conda-forge
pyarrow 11.0.0 py311h04a18d5_1
pyasn1 0.6.0 pyhd8ed1ab_0 conda-forge
pyasn1-modules 0.4.0 pyhd8ed1ab_0 conda-forge
pybind11-abi 4 hd3eb1b0_1
pycosat 0.6.6 py311hf118e41_0
pycparser 2.21 pyhd3eb1b0_0
pyjwt 2.8.0 pyhd8ed1ab_1 conda-forge
pyopenssl 23.2.0 py311h6ffa863_0
pyparsing 3.0.9 py311h6ffa863_0
pyre-extensions 0.0.30 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 py311h6ffa863_0
python 3.11.8 h3332dee_0_cpython conda-forge
python-dateutil 2.8.2 pyhd3eb1b0_0
python-tzdata 2023.3 pyhd3eb1b0_0
python-xxhash 2.0.2 py311hf118e41_1
python_abi 3.11 4_cp311 conda-forge
pytorch 2.0.1 cuda11.8_py311_1
pytorch-base 2.0.1 cuda11.8_py311_pb4.21.12_4
pytz 2023.3.post1 py311h6ffa863_0
pyu2f 0.1.5 pyhd8ed1ab_0 conda-forge
pyyaml 6.0.1 py311hf118e41_0
re2 2023.02.01 h883269e_0 conda-forge
readline 8.2 hf118e41_0
regex 2023.10.3 py311hf118e41_0
reproc 14.2.4 h29c3540_1
reproc-cpp 14.2.4 h29c3540_1
requests 2.31.0 py311h6ffa863_0
requests-oauthlib 2.0.0 pyhd8ed1ab_0 conda-forge
responses 0.13.3 pyhd3eb1b0_0
rsa 4.9 pyhd8ed1ab_0 conda-forge
ruamel.yaml 0.17.21 py311hf118e41_0
s2n 1.3.37 h5e47323_0 conda-forge
safetensors 0.4.0 py311hda16d9e_0
scipy 1.11.1 py311hd69e9bb_0
sentencepiece 0.1.97 h1e74c73_py311_pb4.21.12_2
setuptools 68.0.0 py311h6ffa863_0
six 1.16.0 pyhd3eb1b0_1
snappy 1.1.9 h29c3540_0
sqlite 3.41.2 hf118e41_0
sympy 1.12.1 pypyh2585a3b_103 conda-forge
tabulate 0.8.10 pyhd8ed1ab_0 conda-forge
tensorboard 2.13.0 pyhab0730d_pb4.21.12_1
tensorboard-data-server 0.7.0 pyh6f84499_1
tensorboard-plugin-wit 1.6.0 pyh9f0ad1d_0 conda-forge
tk 8.6.13 hd4bbf49_0 conda-forge
tokenizers 0.13.3 py311h3d4f45a_0
torchdata 0.6.0 py311_2
torchsnapshot 0.1.0 pyhd8ed1ab_0 conda-forge
torchtext-base 0.15.2 cuda11.8_py311_1
torchtnt 0.2.4 pyhd8ed1ab_0 conda-forge
torchvision-base 0.15.2 cuda11.8_py311_1
tornado 6.3.3 py311hf118e41_0
tqdm 4.65.0 py311h7837921_0
transformers 4.32.1 py311h6ffa863_0
truststore 0.8.0 py311h6ffa863_0
typing-extensions 4.7.1 py311h6ffa863_0
typing_extensions 4.7.1 py311h6ffa863_0
typing_inspect 0.9.0 pyhd8ed1ab_0 conda-forge
tzdata 2023c h04d1e81_0
urllib3 1.26.18 py311h6ffa863_0
utf8proc 2.6.1 h140841e_0
werkzeug 2.3.8 pyhd8ed1ab_0 conda-forge
wheel 0.41.2 py311h6ffa863_0
xxhash 0.8.0 h140841e_3
xz 5.4.2 hf118e41_0
yaml 0.2.5 h7b6447c_0
yaml-cpp 0.8.0 h4a02239_0
yarl 1.8.1 py311hf118e41_0
zipp 3.11.0 py311h6ffa863_0
zlib 1.2.13 h1f2b957_6 conda-forge
zstandard 0.19.0 py311hf118e41_0
zstd 1.5.5 h57e4825_0

Copy link

lhoestq commented Jun 25, 2024

Hi ! can you try updating datasets and huggingface_hub ?

pip install -U datasets huggingface_hub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

2 participants