Description
Description
If one attempts to use scrapy-playwright with scrapy shell, simply doing a fetch
results in a hang.
Steps to Reproduce
- install scrapy-playwright,
- launch scrapy shell with the
AsyncioSelectorReactor
, - run
fetch('https://www.google.com', meta={'playwright': True, 'playwright_include_page': True})
Expected behavior:
The request should be handled by scrapy-playwright and result in a response.
Actual behavior:
The request is blocked soon after the second thread is created, right before the browser page is created. The traceback with Ctrl-C shows that execution is blocked in blockingCallFromThread
. On closing the shell, the error message:
asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-163' coro=<ScrapyPlaywrightDownloadHandler._download_request() running at /home/petros/.local/lib/python3.10/site-packages/scrapy_playwright/handler.py:272> cb=[Deferred.fromFuture..adapt() at /usr/lib/python3.10/site-packages/twisted/internet/defer.py:1063]>
appears.
Reproduces how often:
Always.
Versions
Scrapy : 2.8.0
lxml : 4.9.2.0
libxml2 : 2.9.14
cssselect : 1.2.0
parsel : 1.7.0
w3lib : 1.22.0
Twisted : 22.10.0
Python : 3.10.9 (main, Dec 08 2022, 14:49:06) [GCC]
pyOpenSSL : 23.0.0 (OpenSSL 3.0.7 1 Nov 2022)
cryptography : 39.0.1
Platform : Linux-6.1.8-1-default-x86_64-with-glibc2.36
Additional context
There is a long thread about the root cause in https://discord.com/channels/851364676688543744/1073689007927590932. The summary is that set_asyncio_event_loop
seems to have the following issues:
- it uses
get_asyncio_event_loop_policy().get_event_loop()
, which results in the edge case described in asyncio Policies documentation needs clarification python/cpython#96377 (comment), namelyset_asyncio_event_loop
creates an event loop, even though another already exists. - it uses
asyncio.get_event_loop_policy()
, which will probably be deprecated, according to asyncio Policies documentation needs clarification python/cpython#96377 (comment).