Skip to content

daily to dekadal functions #527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 13, 2025
Merged

daily to dekadal functions #527

merged 16 commits into from
May 13, 2025

Conversation

remicousin
Copy link
Contributor

Functions to compute dekadal totals from daily data and their tests. Initially needed to generate synthetic dekadal data from synthetic daily data to test Maprooms.

In the process, I (eventually!) discovered how to make use of the Intervals objects that groupby functions return. It turns out they are pandas.Interval, but you need surgical striking to hit the pandas object with its attributes: too soon, it's still a DataArray and too late it's an ndarray... tells so much about my understanding of how xarray, numpy, pandas... intricate...

So there are 3 functions:

  • groupedby_dekads derived from xr.groupby_bins returns a grouped object to which can be applied group methods (sum, mean, etc)
  • swap_interval assigns a new coord at the left, mid or right point of the interval dim that a grouped object returns after being applied a method
  • daily2dekad_sum applies all the above to return dekadal totals with an additional dim at the beginning of the dekad (what we used in zarrification scripts

For the very short-term purpose of synthetic data into Maprooms, I will need a mean version for temperature. It's trivial.
I didn't deal with units since we might use pint some day.

For the medium term, swap_interval can be applied to any sort of dim (not just time). There could be simplification/generalization to make after checking what xarray ojects return Interval dims and whether they name it always the same or what-have-you. Right now I assumed some behavior.

For longer term, that could open the door for clean library to deal with time: the Intervals contain all the information we need and can be retrieved. I imagine pair-wise operations could be applied on the Interval dims directly, what would force the Intervals to be consistent throughout the pair (Ingrid only relies on centers). Time-wise operations could probably be automatized to start from Interval, move to point-wise for the sake of the operation, and returned to appropriate Interval accordingly. But we have other discussion on time elsewhere and maybe this is not new, or needs be completed by what was said there.

@remicousin
Copy link
Contributor Author

I mean this PR is really just to have acceptable dekadal data to feed the synthetic Maprooms. We can move the broader time dimension where it belongs.

Copy link
Collaborator

@aaron-kaplan aaron-kaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Interval class is interesting. But if you have to convert it to points in order to do anything useful with it, then it's of limited utility.

@remicousin
Copy link
Contributor Author

The Interval class is interesting. But if you have to convert it to points in order to do anything useful with it, then it's of limited utility.

All the commits but last address the comments of this PR. The last one adds a suite of functions that can replace all others, including the older seasonal functions for which cases confirming tests have been made too.

You'll have to make a concession somewhere. With Intervals, you'll have to convert them to something to make maths. Without, you need at least 2 coords to describe one time dimension...

Copy link
Collaborator

@aaron-kaplan aaron-kaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed all but the last commit ("another suite of functions..."). Will have to continue with that next week.

Copy link
Collaborator

@aaron-kaplan aaron-kaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble figuring out where you're going with this last commit. I'm a little sleepy, so maybe I'll have better luck another time; but also some high-level guidance might help.

Is this meant to be a piece of the general-purpose time library you mentioned in the last paragraph of the PR description? If so, then before reviewing a detailed implementation I would like to see a high-level summary of the proposed design. What class of problem are you trying to solve, how do resampling and regrouping contribute to solving those problems, and then what is the resampling and regrouping API you propose. Perhaps do some example calculations, just on paper, using the API you wish you had.

@remicousin
Copy link
Contributor Author

The above address all but "another suite..." commit

@remicousin
Copy link
Contributor Author

I'm having trouble figuring out where you're going with this last commit. I'm a little sleepy, so maybe I'll have better luck another time; but also some high-level guidance might help.

Is this meant to be a piece of the general-purpose time library you mentioned in the last paragraph of the PR description? If so, then before reviewing a detailed implementation I would like to see a high-level summary of the proposed design. What class of problem are you trying to solve, how do resampling and regrouping contribute to solving those problems, and then what is the resampling and regrouping API you propose. Perhaps do some example calculations, just on paper, using the API you wish you had.

Yes, it's an experiment for a general-purpose time library. I still want to finish clean up the existing Python code we have (splitting data reading / apps etc.) before I engage in that (or actually I could work on both in parallels if I don't dive into servers logs analyses). But I got carried away with the idea and stuck that in here. So maybe see it as an experiment. I can still write high level considerations in here.

@remicousin
Copy link
Contributor Author

So at minimum for this PR, the question to answer is whether it's a good/useful generalization of the groupby_dekads function written in the first part of the PR.

@aaron-kaplan
Copy link
Collaborator

I would prefer to have one PR with a minimal implementation of just what we need now to make synthetic data for testing, and a different branch where we experiment with generalizations. The short-term minimal stuff might belong in a testutils.py rather than calc.py so we don't build too much on top of it before we decide it's the right approach.

@remicousin
Copy link
Contributor Author

Move the experimental "regroup" with pd.Interval to another branch.

@remicousin remicousin merged commit 279f07b into master May 13, 2025
1 check passed
@remicousin remicousin deleted the daily2dekadal branch May 13, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants