-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
interpolate_na with limit argument changes size of chunks #2514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The problem seems to occur here Lines 368 to 376 in 5940100
because of the usage of Hence, |
Thanks, @cchwala, for reporting the issue. It looks that the actual chunks size is ((10, 735), ) not all 10. In [16]: ds_test.interpolate_na(dim='time', limit=20)['foo'].chunks
Out[16]: ((10, 735),) (why does our The problem would be in xarray/xarray/core/dask_array_ops.py Lines 74 to 85 in 5940100
This method is desinged to be used for multiplly chunked array, so I didn't care to add a small chunk on the head. |
Thanks @fujiisoup for the quick response and the pointers. I will have a look and report back if a PR is within my capabilities or not. |
EDIT: The issue of this post is now separated #2531 I think I have a fix, but wanted to write some failing tests before committing the changes. Doing this I discovered that also import pandas as pd
import numpy as np
import xarray as xr
t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H')
bar = np.sin(np.arange(len(t)))
baz = np.cos(np.arange(len(t)))
da_test = xr.DataArray(data=np.stack([bar, baz]),
coords={'time': t,
'sensor': ['one', 'two']},
dims=('sensor', 'time'))
print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks)
print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks)
My fix solves my initial problem, but maybe if done correctly it should also solve this bug, too. Any idea why this depends on whether I have already pushed some WIP changes. Should I already open a PR if though most new test still fail? |
@cchwala Discussion is a lot easier on a PR so go ahead and do that. You can add WIP in the title. |
Nice catch! For some historical reasons,
xarray/xarray/core/dask_array_ops.py Line 23 in b622c5e
It looks there is another but for this function. |
@dcherian Okay. A WIP PR will follow, but might take some days. |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
Code Sample, a copy-pastable example if possible
Output of the above code. Note the different chunk sizes, depending on the value of
limit
:Problem description
When using
xarray.DataArray.interpolate_na()
with thelimit
kwarg this changes the chunksize of the resultingdask.arrays
.Expected Output
The chunksize should not change. Very small chunks which results from typical small values of
limit
are not optimal for the performance ofdask
. Also, things like.rolling()
will fail if the chunksize is smaller than the window length of the rolling window.Output of
xr.show_versions()
xarray: 0.10.9
pandas: 0.23.3
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.4.1
h5netcdf: 0.5.0
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.19.4
distributed: 1.23.3
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 38.5.2
pip: 9.0.1
conda: 4.5.11
pytest: 3.4.2
IPython: 5.5.0
sphinx: None
The text was updated successfully, but these errors were encountered: