[enhancement] ICC_rep_anova could be significantly faster than it is when running on images #3406

bbfrederick · 2021-11-09T16:02:21Z

Summary

ICC_rep_anova calculates ICC(3,1) on a table of subjects and repeated measures. Setting up the design matrix is far more computationally expensive than actually calculating ICC from the input data (by around a factor of 100 for 2000 subjects and 2 repeated measures). For repeated calculations (such as calculating ICC on every voxel of a set of images), this goes much more slowly than it should.

I made a modified version of ICC_rep_anova for a project I'm working on where I needed the speedup for the calculation to be practical; It's simple enough to cache the design matrix calculations and put in some if statements to decide if they need to be calculated/recalculated. I don't know if this is something anybody else actually ever wants to do, but if so, it's an easy fix.

Actual behavior

As implemented, the design matrix setup is performed on each call.

Expected behavior

The setup only depends on the shape of the input table. When running the calculation on an image, this shape will be the same for each voxel, so does not need to be redone. If the routine is uninitialized, the design information should be calculated and cached. If the routine is called again, and the design matrix is unchanged, this information should be retrieved and used. If the design matrix is changed, the cached information should be discarded and recalculated.

Script/Workflow details

A proposed fix is here in lines 89 and 96-123

This seems to pass all of the tests.

effigies · 2021-11-09T16:49:14Z

Hi Blaise. Yes, we should definitely refactor this for efficiency. Would you care to open a PR? That will make it easier to comment with specific suggestions.

bbfrederick · 2021-11-09T17:12:25Z

Sure thing. Just following the proper order from the guidelines (issue first, PR next!)

effigies · 2021-11-09T17:26:08Z

Ah, possibly we should revise those. Personally I solve my own problems and then open a PR to discuss whether they can be merged upstream. The issue -> PR pipeline makes more sense to me when I haven't already written code and wouldn't bother if I knew it would never get merged.

bbfrederick · 2021-11-09T17:44:09Z

I realized that there is one off topic addition to this PR - I added a nan_to_num to the final ICC calculation because I was very occasionally getting back NaNs for some particularly weird data. I'm happy to lose that if that's not considered best practice (i.e. maybe you should get the NaN back so that you know your input data is somehow unsound).

bbfrederick mentioned this issue Nov 9, 2021

RF: Optimize ICC_rep_anova using a global cache #3407

Closed

1 task

effigies closed this as completed Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enhancement] ICC_rep_anova could be significantly faster than it is when running on images #3406

[enhancement] ICC_rep_anova could be significantly faster than it is when running on images #3406

bbfrederick commented Nov 9, 2021

effigies commented Nov 9, 2021

bbfrederick commented Nov 9, 2021

effigies commented Nov 9, 2021

bbfrederick commented Nov 9, 2021

[enhancement] ICC_rep_anova could be significantly faster than it is when running on images #3406

[enhancement] ICC_rep_anova could be significantly faster than it is when running on images #3406

Comments

bbfrederick commented Nov 9, 2021

Summary

Actual behavior

Expected behavior

Script/Workflow details

effigies commented Nov 9, 2021

bbfrederick commented Nov 9, 2021

effigies commented Nov 9, 2021

bbfrederick commented Nov 9, 2021