You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ICC_rep_anova calculates ICC(3,1) on a table of subjects and repeated measures. Setting up the design matrix is far more computationally expensive than actually calculating ICC from the input data (by around a factor of 100 for 2000 subjects and 2 repeated measures). For repeated calculations (such as calculating ICC on every voxel of a set of images), this goes much more slowly than it should.
I made a modified version of ICC_rep_anova for a project I'm working on where I needed the speedup for the calculation to be practical; It's simple enough to cache the design matrix calculations and put in some if statements to decide if they need to be calculated/recalculated. I don't know if this is something anybody else actually ever wants to do, but if so, it's an easy fix.
Actual behavior
As implemented, the design matrix setup is performed on each call.
Expected behavior
The setup only depends on the shape of the input table. When running the calculation on an image, this shape will be the same for each voxel, so does not need to be redone. If the routine is uninitialized, the design information should be calculated and cached. If the routine is called again, and the design matrix is unchanged, this information should be retrieved and used. If the design matrix is changed, the cached information should be discarded and recalculated.
Hi Blaise. Yes, we should definitely refactor this for efficiency. Would you care to open a PR? That will make it easier to comment with specific suggestions.
Ah, possibly we should revise those. Personally I solve my own problems and then open a PR to discuss whether they can be merged upstream. The issue -> PR pipeline makes more sense to me when I haven't already written code and wouldn't bother if I knew it would never get merged.
I realized that there is one off topic addition to this PR - I added a nan_to_num to the final ICC calculation because I was very occasionally getting back NaNs for some particularly weird data. I'm happy to lose that if that's not considered best practice (i.e. maybe you should get the NaN back so that you know your input data is somehow unsound).
Summary
ICC_rep_anova calculates ICC(3,1) on a table of subjects and repeated measures. Setting up the design matrix is far more computationally expensive than actually calculating ICC from the input data (by around a factor of 100 for 2000 subjects and 2 repeated measures). For repeated calculations (such as calculating ICC on every voxel of a set of images), this goes much more slowly than it should.
I made a modified version of ICC_rep_anova for a project I'm working on where I needed the speedup for the calculation to be practical; It's simple enough to cache the design matrix calculations and put in some if statements to decide if they need to be calculated/recalculated. I don't know if this is something anybody else actually ever wants to do, but if so, it's an easy fix.
Actual behavior
As implemented, the design matrix setup is performed on each call.
Expected behavior
The setup only depends on the shape of the input table. When running the calculation on an image, this shape will be the same for each voxel, so does not need to be redone. If the routine is uninitialized, the design information should be calculated and cached. If the routine is called again, and the design matrix is unchanged, this information should be retrieved and used. If the design matrix is changed, the cached information should be discarded and recalculated.
Script/Workflow details
A proposed fix is here in lines 89 and 96-123
This seems to pass all of the tests.
The text was updated successfully, but these errors were encountered: