Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WikiText-2 is not a zip file #2588

Open
CharryLee0426 opened this issue Mar 7, 2024 · 3 comments
Open

WikiText-2 is not a zip file #2588

CharryLee0426 opened this issue Mar 7, 2024 · 3 comments

Comments

@CharryLee0426
Copy link

When I executed the following part:

from d2l import torch as d2l

batch_size, max_len = 512, 64
train_iter, vocab = d2l.load_data_wiki(batch_size, max_len)
from d2l import mxnet as d2l

batch_size, max_len = 512, 64
train_iter, vocab = d2l.load_data_wiki(batch_size, max_len)

I met this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/site-packages/d2l/torch.py", line 2443, in load_data_wiki
    data_dir = d2l.download_extract('wikitext-2', 'wikitext-2')
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/site-packages/d2l/torch.py", line 3247, in download_extract
    fp = zipfile.ZipFile(fname, 'r')
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/home/charry/miniconda3/envs/d2l/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

I think it is because the dataset in the server has been damaged. I reimplemented this error with d2l 1.0.0 - 1.0.3. And it will cause some errors when WikiText-2 dataset is needed.

I have a pull request failed due to this error. I also mentioned that there are some pull requests related fixing typo errors also failed check due to this error.

I hope this error can be fixed as soon as possible.

@CharryLee0426
Copy link
Author

The wikitext-2 dataset URL returns this error:

<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>MM9XHEKPABYT4NPW</RequestId>
<HostId>KOjOK6r2VNkvN6gS28B7s2akq8hULUJohhsiCnyrL9RMzjk3RAIvYnVZiHGd6PPVEIDnQHTijnI=</HostId>
</Error>

@donny-nyc
Copy link

Having the same issue. Is there an updated URL we can use?

@MassEast
Copy link

Same issue here. According the book, the dataset is from

Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. ArXiv:1609.07843.

In that paper, http://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/ is linked and this site can't be reached anymore. Hence, likewise https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip isn't anymore. Anyone has a good mirror for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants