Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad request when continuations query for zh.wikipedia #162

Open
arbalest339 opened this issue Jan 18, 2021 · 0 comments
Open

Bad request when continuations query for zh.wikipedia #162

arbalest339 opened this issue Jan 18, 2021 · 0 comments
Assignees

Comments

@arbalest339
Copy link

Hello, I have a problem when query zh.wikipedia, here is my code and console output.

import wptools
page = wptools.page('西安', lang='zh')
page.get_query(proxy='http://127.0.0.1:1080')   # local proxy
zh.wikipedia.org (query) 西安
zh.wikipedia.org (query) 西安市 (&plcontinue=7536|0|炮里街道)
Traceback (most recent call last):
  File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 199, in _load_response
    data = utils.json_loads(response)
  File "d:\software\Anaconda\lib\site-packages\wptools\utils.py", line 95, in json_loads
    return json.loads(data, encoding='utf-8')
  File "d:\software\Anaconda\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "d:\software\Anaconda\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "d:\software\Anaconda\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\software\Anaconda\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\software\Anaconda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
  File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "d:\software\Anaconda\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "d:\software\Anaconda\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "d:\software\Anaconda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\项目代码\wiki\wikitools_exp.py", line 3, in <module>
    page.get_query(proxy='http://127.0.0.1:1080')   # 本地建立代理
  File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 641, in get_query
  File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 183, in _get
    self._set_data(action)
  File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 200, in _set_data
    self._set_query_data(action)
  File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 295, in _set_query_data
    data = self._load_response(action)
  File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 201, in _load_response
    raise ValueError(_query)
ValueError: https://zh.wikipedia.org/w/api.php?action=query&exintro&formatversion=2&inprop=url|watchers&list=random&pithumbsize=240&pllimit=500&ppprop=disambiguation|wikibase_item&prop=extracts|info|links|pageassessments|pageimages|pageprops|pageterms|redirects&redirects&rdlimit=500&rnlimit=1&rnnamespace=0&titles=%E8%A5%BF%E5%AE%89%E5%B8%82&plcontinue=7536|0|炮里街道

I've noticed that when the query of "西安" is finished, wptools continued to query "炮里街道", that is not what I needed. So I further read the source code and in the file page.py, line 640, it seems that wptools try to make more queries from the "continue" field.

Issue 57 said that this is a new support, but this support should be an option implement in the function "get_querymore" (line 645). However, this continuation support is now implemented in function "get_query" too. I believe that this is a little bug to be fixed.

Although redundant, it still works well for en.wikipedia. But when query zh.wikipedia, there seems something wrong with the URL and pycurl always returns "Bad request" (core.py line 175), which is not an json format and cannot be dumped by json. So I believe that this is another bug.

At present, I simply delete line 640-641 of page.py and it works very well to me. Looking forward to your reply.

@siznax siznax self-assigned this Jan 26, 2021
kovarden added a commit to kovarden/wptools that referenced this issue Mar 12, 2021
Incorrect URL in TCP packet gives "400 Bad Request" to curl.

siznax#162
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants