Skip to content

Asyncio event loop handling results in hang in scrapy shell with scrapy-playwright #5831

Closed
@auxsvr

Description

@auxsvr

Description

If one attempts to use scrapy-playwright with scrapy shell, simply doing a fetch results in a hang.

Steps to Reproduce

  1. install scrapy-playwright,
  2. launch scrapy shell with the AsyncioSelectorReactor,
  3. run fetch('https://www.google.com', meta={'playwright': True, 'playwright_include_page': True})

Expected behavior:
The request should be handled by scrapy-playwright and result in a response.

Actual behavior:
The request is blocked soon after the second thread is created, right before the browser page is created. The traceback with Ctrl-C shows that execution is blocked in blockingCallFromThread. On closing the shell, the error message:

asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-163' coro=<ScrapyPlaywrightDownloadHandler._download_request() running at /home/petros/.local/lib/python3.10/site-packages/scrapy_playwright/handler.py:272> cb=[Deferred.fromFuture..adapt() at /usr/lib/python3.10/site-packages/twisted/internet/defer.py:1063]>

appears.

Reproduces how often:
Always.

Versions

Scrapy : 2.8.0
lxml : 4.9.2.0
libxml2 : 2.9.14
cssselect : 1.2.0
parsel : 1.7.0
w3lib : 1.22.0
Twisted : 22.10.0
Python : 3.10.9 (main, Dec 08 2022, 14:49:06) [GCC]
pyOpenSSL : 23.0.0 (OpenSSL 3.0.7 1 Nov 2022)
cryptography : 39.0.1
Platform : Linux-6.1.8-1-default-x86_64-with-glibc2.36

Additional context

There is a long thread about the root cause in https://discord.com/channels/851364676688543744/1073689007927590932. The summary is that set_asyncio_event_loop seems to have the following issues:

  1. it uses get_asyncio_event_loop_policy().get_event_loop(), which results in the edge case described in asyncio Policies documentation needs clarification python/cpython#96377 (comment), namely set_asyncio_event_loop creates an event loop, even though another already exists.
  2. it uses asyncio.get_event_loop_policy(), which will probably be deprecated, according to asyncio Policies documentation needs clarification python/cpython#96377 (comment).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions