Skip to content

Add cdfplot #5700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Add cdfplot #5700

wants to merge 2 commits into from

Conversation

TomAugspurger
Copy link
Contributor

WIP for now. Closes #2669

I'm using statsmodels' KDE implementation right now; it has a cdf method, but scipy's gaussian_kde doesn't. I need to check the math, but I think something like doing a cumsum on the density and normalizing by the sum should be the same.

I'm also going to add kwargs for things like the inverse cdf.

cdf

klass = KdePlot
elif kind == 'cdf': # should be unified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO (soonish): we should make this into a dict instead, would be simpler:

PLOT_KINDS = {"line": LinePlot,
 "bar": BarPlot,
 "barh": BarPlot,
 "kde": KdePlot,
 "cdf": CDFPlot}
# ... other stuff
if kind not in PLOT_KINDS:
   raise ValueError("Invalid chart type given: %s" % kind)
klass = PLOT_KINDS[kind]

@jtratner
Copy link
Contributor

@TomAugspurger not saying the dict thing is required to get this merged, but if you wanted to add a second commit that changed this all to two dicts (one for Series, one for DataFrame), rather than if/elif statements, that'd be a complementary addition to this PR.

@jtratner jtratner mentioned this pull request Dec 14, 2013
@jseabold
Copy link
Contributor

jseabold commented Jan 4, 2014

FWIW, in statsmodels (and places in scipy.stats), we compute the cdf from the pdf using scipy.integrate.quad.

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/nonparametric/kde.py#L161

@TomAugspurger
Copy link
Contributor Author

Okay this one can sit until 0.14 has started and I can add a release notes entry. I got on a bit of DRYness run, so I ended up combining the KdePlot and CDFPlot into a DistributionPlot class.

I've only used statsmodels for the actual fitting of the cdf. Scipy doesn't provide a cdf on their kde object, so we'd have to duplicate the code that @jseabold linked to above. May as well just use what they've done.

And of course if this has fallen out of scope for pandas then close without merging!

@ghost
Copy link

ghost commented Jan 29, 2014

We can't have a dependency on statsmodels, except for testing.
I remember kde plots were a bit suspect in terms of scope too,
expanding further is therefore suspect too.

Let's call this out of scope.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Cumulative Distribution Function Plotting
3 participants