Skip to content

Problems plotting resampled time series when using loffset #18467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
skjaeve opened this issue Nov 24, 2017 · 5 comments
Open

Problems plotting resampled time series when using loffset #18467

skjaeve opened this issue Nov 24, 2017 · 5 comments
Labels

Comments

@skjaeve
Copy link

skjaeve commented Nov 24, 2017

Code Sample, a copy-pastable example if possible

#!/usr/bin/env python3

import pandas as pd
import matplotlib as mpl
from matplotlib import pyplot as plt
import numpy as np
try:
    import seaborn
    seaborn.set()
except:
    pass

plt.ion()
plt.close('all')

print("pandas:", pd.__version__)
print("matplotlib:", mpl.__version__)

# sample data
t = pd.date_range('2017-06-06 00:00', '2017-06-06 09:59', freq='1min')
d = pd.Series(np.sin(np.linspace(0, 7, len(t))), index=t)

# no shift
d_r = d.resample('1h').mean()

# Shift using offset
d_ro = d.resample('1h', loffset='30min').mean()

# Shift by recreating object
new_index = d_r.index + (d_r.index.to_series().diff()/2).mean()
d_rs = pd.Series(data=d_r.as_matrix(), index=new_index)

# This works
plt.figure()
plt.title('1: Full data and resampled unshifted')
d.plot(color='blue', label='full')
d_r.plot(color='green', label='unshifted')
plt.legend()
plt.savefig('1.png')

# This does not work
plt.figure()
plt.title('2: Resampled unshifted and loffset')
d_r.plot(color='green', label='unshifted')
d_ro.plot(color='brown', label='loffset')
plt.legend()
plt.savefig('2.png')

# This works
plt.figure()
plt.title('3: Resampled loffset and shifted')
d_ro.plot(color='brown', label='loffset')
d_rs.plot(color='red', label='shifted', marker='p', zorder=0)
plt.legend()
plt.savefig('3.png')

# This does not work
plt.figure()
plt.title('4: Full data and resampled loffset')
d.plot(color='blue', label='full')
d_ro.plot(color='brown', label='loffset')
plt.legend()
plt.savefig('4.png')

# This works
plt.figure()
plt.title('5: This is what I expect')
plt.plot(d.index.to_pydatetime(), d, color='blue', label='full')
d_r.plot(color='green', label='unshifted')
d_ro.plot(color='brown', label='loffset', marker='P', markersize=14)
d_rs.plot(color='red', label='shifted', marker='p')
plt.legend()
plt.savefig('5.png')

# Set label font small to prevent overlap
mpl.rcParams['xtick.labelsize'] = 6

f, ax = plt.subplots(nrows=4, sharex=False, figsize=(6, 10))
plt.suptitle('6: separate subplots sharex=False')
d.plot(color='blue', ax=ax[0])
d_r.plot(color='green', ax=ax[1])
d_ro.plot(color='brown', ax=ax[2])
d_rs.plot(color='red', ax=ax[3], marker='p')
plt.subplots_adjust(hspace=0.6, top=0.95, bottom=0.1, left=0.1, right=0.95)
# Show dates on same format
ax[1].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m-%d\n%H:%M'))
ax[3].xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m-%d\n%H:%M'))
plt.savefig('6.png')

Problem description

When plotting resampled time series with and without shift (loffset parameter to resample()) something goes wrong with the time axis.
Example data:

  1. d time series, resolution 1 minute
  2. d_r resampled to 1 hour
  3. d_ro resampled to 1 hour, index shifted 1/2 hour using loffset parameter to resample()
  4. d_rs is d_r with index manually shifted 1/2 hour

Plotting these graphs on top of each other doesn't work as expected.
Figure 5 shows the result I expected, figures 1-4 show various combinations. Figure 6 shows each series plotted individually, with something odd going on in the time axis.

When I inspect the various objects in IPython, there's nothing in the datetime index that looks off.

In Figure 5, i use Matplotlib to plot the un-resampled data, manually converting the date axis. Once I've done that, all the resampled data plots nicely, even though "unshifted" and "loffset" did not work together in Figure 2. I formatted the xaxis manually, and clearly something is odd with the axes of the first two subplots. If I use automatic formatting, the top two subplots have the same xaxis and formatting, and the lower two likewise.

Expected Output

I expected a result like Figure 5, but without needing to do anything to set the xaxis.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 4.13.13-300.fc27.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: nn_NO.utf8 LOCALE: nn_NO.UTF-8

pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.7.1
xarray: 0.9.6
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@skjaeve
Copy link
Author

skjaeve commented Nov 24, 2017

1
2
3
4
5
6

@jorisvandenbossche
Copy link
Member

I think this is a duplicate of #9053, and if so, very much a coincidence, as I was just coding on a fix :-)

@jorisvandenbossche
Copy link
Member

See explanation in #9053 (comment), it's a not as easy a fix as I thought due to how matplotlib updates the units of the x-axis in a second plot.

However, given that here in this issue both plotting calls are using a pandas Series/DataFrame.plot(..) method, we should be able to solve this case in pandas (as opposed to the case in #9053)

@jreback jreback added the Visualization plotting label Nov 25, 2017
@mrlnc
Copy link

mrlnc commented Dec 11, 2017

I'm a pandas beginner and found this in search for a workaround. I've noticed that things seem to work if you plot a non-resampled timeseries first. In example above, it seems to work if you just swap the calls to the plot function.

Plot 2

original

# This does not work
plt.figure()
plt.title('2: Resampled unshifted and loffset')
d_r.plot(color='green', label='unshifted')
d_ro.plot(color='brown', label='loffset')
plt.legend()
plt.savefig('2.png')

2

order changed

Now plot calls are just swapped.

# This does not work
plt.figure()
plt.title('2: Resampled unshifted and loffset')
d_ro.plot(color='brown', label='loffset')
d_r.plot(color='green', label='unshifted')
plt.legend()
plt.savefig('2.png')

2

Plot 4

original

# This does not work
plt.figure()
plt.title('4: Full data and resampled loffset')
d.plot(color='blue', label='full')
d_ro.plot(color='brown', label='loffset')
plt.legend()
plt.savefig('4.png')

4

order changed

# This does not work
plt.figure()
plt.title('4: Full data and resampled loffset')
d_ro.plot(color='brown', label='loffset')
d.plot(color='blue', label='full')
plt.legend()
plt.savefig('4.png')

4

@ewquon
Copy link

ewquon commented Jun 29, 2018

FWIW...

After dumping out the problematic data series to a csv file, I've found that data with an inconsistent datetime delta causes problems (sensitivity to plot order, causes data to be shifted to the year ~1970, and/or causes the notebook to crash):

2016-11-21 18:00:02,0.322354120406
2016-11-21 19:00:02,
2016-11-21 20:00:03,0.0447800715636
2016-11-21 21:00:04,0.126263457436
2016-11-21 22:00:05,0.0244525723174
...

But manually fixing the datetime interval resolves the issues:

2016-11-21 18:00:00,0.322354120406
2016-11-21 19:00:00,
2016-11-21 20:00:00,0.0447800715636
2016-11-21 21:00:00,0.126263457436
2016-11-21 22:00:00,0.0244525723174
...

Not sure if that sheds any light on this bug or not. I'm running Pandas 0.21.

@mroeschke mroeschke added the Bug label Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants