-
-
Notifications
You must be signed in to change notification settings - Fork 330
(fix): use typesize
on Blosc
codec
#2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(fix): use typesize
on Blosc
codec
#2962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good @ilan-gold, we just a release note
Apologies did not mean for you guys to do an immediate review, will keep that in mind next time, this was mostly to remind myself to finish up :) |
no worries, I was trigger-happy here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a test for this that asserts a certain compressed size, so we can catch regressions in the future? The doctest is a nice way to catch this, but I worry that it might get removed or changed whereas a test is more likely to stay around.
…hon into ig/typesize_for_blosc
…hon into ig/typesize_for_blosc
…hon into ig/typesize_for_blosc
Does anyone have access to a windows machine? Or should we just xfail this and move on? I am not sure if the issue is numpy or the python version interacting with numcodecs here causing the sizes to be off. We can open an issue if someone can come up with access to a windows machine can create a repro |
tests/test_codecs/test_blosc.py
Outdated
|
||
|
||
async def test_typesize() -> None: | ||
a = np.arange(1000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a = np.arange(1000000) | |
a = np.arange(2**16, dtype=np.uint16) |
As a thought, worth explicitly specifying the data type (and making the data smaller)? Don't know if it will fix the windows issue, but I think worth doing anyway os there's a concrete bytesize, and perhaps using integer data type will help with linux/windows because perhaps they have different floating point implementations (although that's wild speculation on my part...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arange
is default uint64
so I'll push something with that added.
The first four dumped characters on the failing windows case are:
while they should be
This would indicate to me the typesize is being incorrectly encoded but I (a) don't know why and (b) don't know what the |
Weird... I'm guessing that would be an upstream numcodecs issue/fix, so we could probably cut our losses here and just xfail the test on windows for now. |
Ok @dstansby great call - it looks like it was just being explicit, there must be different behavior on windows for that version. There is a warning in the documentation https://numpy.org/doc/stable/reference/generated/numpy.arange.html but I figured we hadn't actually hit any of those conditions. |
Thank you @ilan-gold and @dstansby for working on this bug! I really appreciate your efforts. 🙏 |
Fixes #2766 and fixes #2171
TODO:
docs/user-guide/*.rst
changes/