Skip to content

Add Pyodide support and CI jobs for Zarr #1903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 74 commits into
base: main
Choose a base branch
from

Conversation

agriyakhetarpal
Copy link

@agriyakhetarpal agriyakhetarpal commented May 23, 2024

Description

This description is a stub and will be updated in due course.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Additional context

Here are a few xrefs to some other PRs that have been implemented for packages in the Scientific Python ecosystem:

The greater, long-term goal towards this change is to implement Sphinx-based interactive documentation via JupyterLite and WASM-powered, in-browser kernels, as referenced in Quansight-Labs/czi-scientific-python-mgmt#19, see also: Quansight-Labs/czi-scientific-python-mgmt#18. A pilot that can be readily tried out is available for the "API reference" pages under the PyWavelets documentation. This will be preceded by configuring a place to publish these WebAssembly/Emscripten wheels nightly or on a scheduled cadence for Zarr (which could be a third-party, PyPI-like index such as Anaconda.org) and then integrating Sphinx extensions such as jupyterlite-sphinx for hosted documentation.

@agriyakhetarpal
Copy link
Author

Zarr v3 has crc32c and zstandard as dependencies, which were not present for v2. These packages contain compiled code and are not available in Pyodide yet (based on the logs from the workflow run). I shall try to add recipes for them Pyodide, but I am afraid I shall not have an answer for how long that can take.

@agriyakhetarpal
Copy link
Author

Update: we added both crc32c and zstandard (please see the issues/PRs linked above). Pyodide 0.26.0 should be out very soon containing both packages plus with a bump to the Python version (to 3.12.1) and an ABI-breaking change (via Emscripten 3.1.58). I shall revisit this PR in a few days.

Next, when Zarr v3 (stable) comes out on PyPI in the next month or so, I shall bump the version/recipe for the in-tree Pyodide build as well. It would be great to have the out-of-tree build, i.e., this PR, ready for review/merging by then, and I will be committed to putting my best efforts in doing so – thanks!

@agriyakhetarpal
Copy link
Author

The Pyodide 0.26.0 release is out, and we have been lucky to have got both crc32c and zstandard in time (but in any case, they would have got into a patch release anyway). Let me update this PR in a short moment.

Copy link
Author

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I apologise for the radio silence on this PR. I've recently been exploring Pyodide/WASM support for Zarr again, as we aim to reintroduce Zarr to https://github.com/pyodide/pyodide-recipes. We will utilise that repository in the next release of Pyodide, namely version 0.28, scheduled for release in a few weeks – please see the tracking issue at pyodide/pyodide-recipes#99 for a subset of our remaining work. We currently use Python 3.13, and the first version of Zarr that supports it is v3, which meant that we needed to get this working – we had disabled Zarr due to the use of threads. I've tried to revisit support for Pyodide through the recent commits I've pushed, and I've been able to get it to work.

If I understand correctly, Zarr's inherent threading model in v3 can be bypassed in WebAssembly through the browser's/Node.js's event loop, as demonstrated in my changes to src/zarr/core/sync.py. The only route it disables is zarr.sync, and tests in test_sync.py are skipped as a result. If I understand correctly, zarr.sync is going away anyway with a later version of Zarr.

This PR depends on its sister PR that I created at zarr-developers/numcodecs#529 where we're hoping to add Pyodide/WASM support for numcodecs. All tests have been passing there and it has been ready for further review. Here, I'm temporarily building numcodecs from my PR's branch there, as the version of numcodecs in the Pyodide distribution itself is old does not satisfy the constraint here (>=0.16). I still haven't been able to figure out why the VCS system is not picking up the right version in CI – it looks like a bug where actions/checkout@v4 does not fetch tags and does not perform an unshallow clone when checking out to a directory other than {{ github.workspace }}. Either way, I've temporarily skipped the version check for now so that we can let the tests run till the end.

Hence, while this is obviously not in a mergeable state – I've been able to get several tests passing and Zarr seems to be working well. I believe this PR is ready for review and further inspection, and it should enable us to integrate a portion of this PR as a patch for Pyodide, allowing Zarr to be re-enabled once the idea receives peer review here. A full review can proceed in the meantime as I work through this PR.

In other respects, what I'm finding a bit concerning is the slowness of the test suite in CI. The CI job took 5 hours to run, which seems like an anomaly – in a Pyodide virtual environment on my macOS machine locally, I've been able to consistently run the tests until completion in ~11 minutes on multiple runs, rather than hours. test_set_orthogonal_selection_3d takes 6034 seconds here, which is more than 150x of what it takes outside of CI. Would it make sense to add an additional slow_in_wasm marker so that pytest can ignore the tests shown below?

============================= slowest 10 durations =============================
6034.84s call     tests/test_indexing.py::test_set_orthogonal_selection_3d
4101.22s call     tests/test_properties.py::test_array_creates_implicit_groups
3878.97s call     tests/test_properties.py::test_array_roundtrip
924.30s call     tests/test_properties.py::test_roundtrip_array_metadata_from_store
714.00s call     tests/test_indexing.py::test_set_orthogonal_selection_2d
397.94s call     tests/test_indexing.py::test_set_coordinate_selection_2d
242.29s call     tests/test_indexing.py::test_set_orthogonal_selection_1d
150.35s call     tests/test_indexing.py::test_set_block_selection_2d
121.05s call     tests/test_indexing.py::test_set_block_selection_1d
95.83s call     tests/test_indexing.py::test_get_orthogonal_selection_3d

I've also xfailed some flaky tests arising from Hypothesis, which I haven't been able to figure out the cause for, and marked a Blosc test where I get mismatched arrays (which will need further probing into how Blosc was compiled to WASM) as a known failure case – I hope that is alright!

Out of the two failing tests in the CI job:

  • test_array_roundtrip xfails on my local setup on several reruns due to Hypothesis having degraded performance when generating test cases, and it looks like it strict-xpassed here. It is likely a flake, too.
  • I had introduced a lenience factor for tests/test_group.py::test_group_members_performance[memory] to accommodate the fact that it takes a little longer to run, likely due to overhead accumulated by running in WASM and the latency coming from the Node.js event loop. The lenience factor/tolerance of 1.1 is apparently good enough for my machine, but we know CI machines are quite unreliable, and the required lenience factor to make it pass here would be 4. I'm open to receiving suggestions about this test. My suggestion would be to rework the test method to test versus multiple group counts—parametrised—and verify that the time taken doesn't scale linearly with the group count (i.e., times should be roughly similar if $O(1)$, later times should be much larger if $O(n)$; and the time ratio should ideally be much less than the linear scaling ratio).

Tip

I've introduced a slow_wasm marker and skipped the relevant slow tests completely, and the test_group_members_performance test should pass now.

Thank you for your time!

Comment on lines +80 to +83
# For some reason fetch-depth: 0 and fetch-tags: true aren't working...
- name: Manually fetch tags for numcodecs
working-directory: numcodecs-wasm
run: git fetch --tags
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unexplained reasons, even this doesn't help.

Comment on lines +45 to +50
# _numcodecs_version = Version(numcodecs.__version__)
# if _numcodecs_version < Version("0.13.0"):
# raise RuntimeError(
# "numcodecs version >= 0.13.0 is required to use the zstd codec. "
# f"Version {_numcodecs_version} is currently installed."
# )
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary change; these lines will be uncommented when we either have a solution for the numcodecs version being fetched in the CI job or we create yet another alpha with an updated version of numcodecs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about skipping tests that require newer version of numcodecs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, this is only to disable the version check. numcodecs is being compiled from zarr-developers/numcodecs#529 with the latest version, just that the version is incorrect.

@agriyakhetarpal agriyakhetarpal changed the title [WIP]: Add Pyodide support and CI jobs for Zarr Add Pyodide support and CI jobs for Zarr May 28, 2025
@agriyakhetarpal
Copy link
Author

Marking this as ready for review based on the above notes.

@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review May 28, 2025 11:16
@agriyakhetarpal
Copy link
Author

I can't request a review from you as you haven't been previously involved in this PR, @ryanking13, but it would be nice if you could take a look at it as well – thank you!

Comment on lines +45 to +50
# _numcodecs_version = Version(numcodecs.__version__)
# if _numcodecs_version < Version("0.13.0"):
# raise RuntimeError(
# "numcodecs version >= 0.13.0 is required to use the zstd codec. "
# f"Version {_numcodecs_version} is currently installed."
# )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about skipping tests that require newer version of numcodecs?

# https://developer.mozilla.org/en-US/docs/Web/API/setTimeout
if IS_WASM:
current_loop = asyncio.get_running_loop()
return current_loop.run_until_complete(coro)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_until_complete is no-op in Pyodide unless JSPI is enabled. So I wonder whether if it is better to use run_until_complete here or just raise an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; do we know of a way to check if JSPI is enabled? We don't have --experimental-wasm-jspi in https://github.com/pyodide/pyodide/blob/adfa092b26745a16127ce2ad680f81407f07738d/src/templates/python#L24-L26 so I thought it isn't enabled, but it seems to be enabled – I added a few debug print statements, and the currently running loop is indeed the WebLoop, and run_until_complete doesn't seem to be a no-op. I'm able to get an AsyncArray after sync() completes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. It looks like we are enabling jspi in pyodide venv. We pass enableRunUntilComplete parameter in loadPyodide.

https://github.com/pyodide/pyodide/blob/adfa092b26745a16127ce2ad680f81407f07738d/src/templates/python_cli_entry.mjs#L41

But this parameter might not be enabled for users who are using in zarr in the browser. So I think it is safe to not assume that run_until_complete would wait until the async job is finished.

Copy link
Author

@agriyakhetarpal agriyakhetarpal May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me – I can grab a wheel and test it in the browser as well; it's probably not going to work. It looks like there would be a reasonable amount of time between Phase 4 and Phase 5 for JSPI.

Could you give me a code suggestion here? i.e., I'm thinking that it isn't possible to ship support without JSPI in that case (unless you know of a way for this) and that we can't re-enable Zarr's recipe because of this PR alone – and so this PR will benefit only CI testing for now?

Comment on lines 44 to 46
Patch Pyodide's WebLoop to fix interoperability with pytest-asyncio.

WebLoop.shutdown_asyncgens() raises NotImplementedError, which causes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should implement it in Pyodide?

If this is only problematic in testing environment, how about moving this into conftest.py?

Copy link
Author

@agriyakhetarpal agriyakhetarpal May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should implement it in Pyodide?

I haven't thought about it, but yes, I think this should be helpful to put somewhere in pytest-pyodide, if I understand you correctly. I haven't come across this case/problem in any other downstream package, so I'm not sure (but perhaps Zarr is among the first).

If this is only problematic in testing environment, how about moving this into conftest.py?

Yes, makes sense to me! I'll move it to conftest.py and make it run only when PYTEST_CURRENT_TEST is active.

Edit: I've moved the patch to conftest.py in bad9920

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 29, 2025
@agriyakhetarpal
Copy link
Author

agriyakhetarpal commented May 29, 2025

The tests are now passing in https://github.com/zarr-developers/zarr-python/actions/runs/15322119791/job/43108023246?pr=1903 (except the extremely slow ones – also, they don't exhibit the same level of slowness at all if you run them locally); and I've added a release note highlighting Pyodide/WebAssembly support. Codecov is going to complain because it isn't aware of this WebAssembly environment, so I'm happy to either mock it to absurdly boost the coverage value or skip those lines with a # pragma: no cover comment. Thank you!

@hoodmane
Copy link
Contributor

It would be nice to make coverage work with the webassembly tests too. Though we still haven't done it for Pyodide itself.

@agriyakhetarpal
Copy link
Author

It would be nice to make coverage work with the webassembly tests too. Though we still haven't done it for Pyodide itself.

Yes, I'd assume merely collecting or operating on coverage data should already work, as coverage and pytest-cov themselves are able to generate reports, and we can combine them outside of Codecov to fix the missing values. It doesn't work for Codecov, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants