Skip to content

Problem with plotting frequencies "L" and "S" #7772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mmajewsk opened this issue Jul 17, 2014 · 2 comments · Fixed by #7803
Closed

Problem with plotting frequencies "L" and "S" #7772

mmajewsk opened this issue Jul 17, 2014 · 2 comments · Fixed by #7803
Labels
Bug Period Period data type Visualization plotting
Milestone

Comments

@mmajewsk
Copy link

IO am opening new issue as i solved what is wrong with #7760 .
So, i have two sets of data one with frequency 1 second:
http://pastebin.com/HenYJxdV
And other with frequency 0.1 second (100 miliseconds):
http://pastebin.com/HenYJxdV

I wanted to plot them on same axe so i did:

seriesA=dfA["more"]
seriesB=dfB["less"]

plt.figure()
seriesA.plot()
seriesB.plot()

which yielded error seen in linked issue.
But having simmilar data, from the same source, of the same type it often was plotted corectly.
What caused this, and what was the difference since data was from the same source ?
Two things:
1.First:
line 1515 in tools\plotting.py

return Period(x[0], freq).to_timestamp(tz=x.tz) == x[0]

This line caused different behaviour of plotting function for different set of data, but only for datas with milisecond frequency (seriesB.plot())
How different ?
When the x[0] is

 Timestamp: 2001-01-01 09:12:13.200000

then Period(x[0], freq).to_timestamp(tz=x.tz) is:

Timestamp: 2001-01-01 09:12:13.200000002

which causes the comparison to return False value, but having x[0]:

 Timestamp: 2001-01-01 09:03:11.500000

we have expression Period(x[0], freq).to_timestamp(tz=x.tz) being set to:

 Timestamp: 2001-01-01 09:03:11.500000

Which returns True.
Suprisingly it plotted correctly in the first case, when that line returns false (ex. Timestamp: 2001-01-01 09:12:13.200000002)
Because call stack goes way back to line 1539 same file:

if self._is_ts_plot():

and caused self._is_ts_plot() to be False, which cause pandas not to try detect frequency and try to convert it but simply plot it.

2.Second:
line 92 of tseries\plotting.py

def _maybe_resample(series, ax, freq, plotf, kwargs):
    ax_freq = _get_ax_freq(ax)
    if ax_freq is not None and freq != ax_freq:
        if frequencies.is_superperiod(freq, ax_freq):  # upsample input
            series = series.copy()
            series.index = series.index.asfreq(ax_freq, how='s')
            freq = ax_freq
        elif _is_sup(freq, ax_freq):  # one is weekly
            how = kwargs.pop('how', 'last')
            series = series.resample('D', how=how).dropna()
            series = series.resample(ax_freq, how=how).dropna()
            freq = ax_freq
        elif frequencies.is_subperiod(freq, ax_freq) or _is_sub(freq, ax_freq):
            _upsample_others(ax, freq, plotf, kwargs)
            ax_freq = freq
        else:  # pragma: no cover
            raise ValueError('Incompatible frequency conversion')
    return freq, ax_freq, series

this function for the set of data which does not plot (the one with Timestamp: 2001-01-01 09:03:11.500000) goes through all the conditions and goes to the last "else"
with

ax_freq = "S"
freq = "L"

None of the functions handles the frequency "L".
I presume that it should be handled in

frequencies.is_subperiod(freq, ax_freq)

but as you can see here https://github.com/pydata/pandas/blob/790d6464130fe9448739e48678f466d5452992ca/pandas/tseries/frequencies.py#L904
it is not done.

Anyway, as i presume fixing problem with sampling may be tricky, i'd like to suggest an option (if it does not exist) which could allow pandas not to look for periods, or treat it as timeseries, but simply plot it.
Something like

seriesB.plot(please_oh_please_make_this_thing_plot=True)

which would cause to not treat it as timeseries and forced
self._is_ts_plot(): to be false.

@sinhrks
Copy link
Member

sinhrks commented Jul 19, 2014

You can use x_compat=True, hope to be shorter than expectation :) As you've pointed out, is_superperiod and is_subperiod doesn't handle higher frequencies than S. Trying to fix this also.

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

s1 = pd.Series(np.random.rand(50), index=pd.date_range('2014-07-01 09:00', freq='S', periods=50))
s2 = pd.Series(np.random.rand(500), index=pd.date_range('2014-07-01 09:00', freq='100L', periods=500))

s1.plot(x_compat=True)
s2.plot(x_compat=True)

figure_1

@mmajewsk
Copy link
Author

@sinhrks
I want to make note here:
i could not find find "x_compat" in the API: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.plot.html#pandas.Series.plot

I strongly suggest putting it there, because it's the first place anyone would be looking for any information about plotting, and i wouldn't even thought that something that is in the tutorial is not included in API.

Honestly i thought this argument is more related to x-axis ticks than general plotting of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type Visualization plotting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants