DataArray.rolling() does not preserve chunksizes in some cases #2531

cchwala · 2018-10-31T20:50:33Z

This issue was found and discussed in the related issue #2514

I open a separate issue for clarity.

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
import xarray as xr

t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H')
bar = np.sin(np.arange(len(t)))
baz = np.cos(np.arange(len(t)))

da_test = xr.DataArray(data=np.stack([bar, baz]),
                       coords={'time': t,
                               'sensor': ['one', 'two']},
                       dims=('sensor', 'time'))

print(da_test.chunk({'time': 100}).rolling(time=60).mean().chunks)

print(da_test.chunk({'time': 100}).rolling(time=60).count().chunks)

Output for `mean`: ((2,), (745,))
Output for `count`: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))
Desired Output: ((2,), (100, 100, 100, 100, 100, 100, 100, 45))

Problem description

DataArray.rolling() does not preserve the chunksizes, apparently depending on the applied method.

Output of `xr.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: None.None

xarray: 0.10.9
pandas: 0.23.3
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.4.1
h5netcdf: 0.5.0
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.19.4
distributed: 1.23.3
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 38.5.2
pip: 9.0.1
conda: 4.5.11
pytest: 3.4.2
IPython: 5.5.0
sphinx: None

The text was updated successfully, but these errors were encountered:

cchwala · 2018-10-31T20:52:49Z

The cause has been explained by @fujiisoup here #2514 (comment)

Nice catch!

For some historical reasons, mean and some reduction method uses bottleneck as default, while count does not.

mean goes through this function

xarray/xarray/core/dask_array_ops.py

Line 23 in b622c5e
def dask_rolling_wrapper(moving_func, a, window, min_count=None, axis=-1):

It looks there is another but for this function.

mangecoeur · 2019-02-04T09:06:14Z

Perhaps related - I was running into MemoryErrors with a large array and also noticed that chunksizes were not respected (basically xarray tried to process the array in one go) - but it turned out that i'd forgotten to install both bottleneck and numexpr and after installing both (just installing bottleneck was not enough), everything worked as expected.

This was referenced Oct 31, 2018

[WIP] Fix problem with wrong chunksizes when using rolling_window on dask.array #2532

Closed

interpolate_na with limit argument changes size of chunks #2514

Closed

dcherian added API design topic-rolling and removed API design labels Feb 16, 2021

dcherian mentioned this issue Mar 1, 2021

Use numpy & dask sliding_window_view for rolling #4977

Merged

4 tasks

dcherian closed this as completed in #4977 Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DataArray.rolling() does not preserve chunksizes in some cases #2531

DataArray.rolling() does not preserve chunksizes in some cases #2531

cchwala commented Oct 31, 2018

cchwala commented Oct 31, 2018

Uh oh!

mangecoeur commented Feb 4, 2019 •

edited

Loading

Uh oh!

Uh oh!

DataArray.rolling() does not preserve chunksizes in some cases #2531

DataArray.rolling() does not preserve chunksizes in some cases #2531

Comments

cchwala commented Oct 31, 2018

Code Sample, a copy-pastable example if possible

Problem description

Output of xr.show_versions()

cchwala commented Oct 31, 2018

Uh oh!

mangecoeur commented Feb 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Output of `xr.show_versions()`

mangecoeur commented Feb 4, 2019 •

edited

Loading