Skip to content

[enhancement] ICC_rep_anova could be significantly faster than it is when running on images #3406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bbfrederick opened this issue Nov 9, 2021 · 4 comments

Comments

@bbfrederick
Copy link
Contributor

Summary

ICC_rep_anova calculates ICC(3,1) on a table of subjects and repeated measures. Setting up the design matrix is far more computationally expensive than actually calculating ICC from the input data (by around a factor of 100 for 2000 subjects and 2 repeated measures). For repeated calculations (such as calculating ICC on every voxel of a set of images), this goes much more slowly than it should.

I made a modified version of ICC_rep_anova for a project I'm working on where I needed the speedup for the calculation to be practical; It's simple enough to cache the design matrix calculations and put in some if statements to decide if they need to be calculated/recalculated. I don't know if this is something anybody else actually ever wants to do, but if so, it's an easy fix.

Actual behavior

As implemented, the design matrix setup is performed on each call.

Expected behavior

The setup only depends on the shape of the input table. When running the calculation on an image, this shape will be the same for each voxel, so does not need to be redone. If the routine is uninitialized, the design information should be calculated and cached. If the routine is called again, and the design matrix is unchanged, this information should be retrieved and used. If the design matrix is changed, the cached information should be discarded and recalculated.

Script/Workflow details

A proposed fix is here in lines 89 and 96-123

This seems to pass all of the tests.

@effigies
Copy link
Member

effigies commented Nov 9, 2021

Hi Blaise. Yes, we should definitely refactor this for efficiency. Would you care to open a PR? That will make it easier to comment with specific suggestions.

@bbfrederick
Copy link
Contributor Author

Sure thing. Just following the proper order from the guidelines (issue first, PR next!)

@effigies
Copy link
Member

effigies commented Nov 9, 2021

Ah, possibly we should revise those. Personally I solve my own problems and then open a PR to discuss whether they can be merged upstream. The issue -> PR pipeline makes more sense to me when I haven't already written code and wouldn't bother if I knew it would never get merged.

@bbfrederick
Copy link
Contributor Author

I realized that there is one off topic addition to this PR - I added a nan_to_num to the final ICC calculation because I was very occasionally getting back NaNs for some particularly weird data. I'm happy to lose that if that's not considered best practice (i.e. maybe you should get the NaN back so that you know your input data is somehow unsound).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants