Description
🚀 The feature
For each classification datasets with balanced distribution on the classes (MNIST, CIFAR-N, etc...), it would be very useful to provide a standard dataset for the imbalanced version of the dataset. For a dataset with
I am not sure if torch vision should provide with the datasets or provide a data loader that imbalance the dataset.
Motivation, pitch
Many papers are published on the problem of training on an imbalanced dataset and testing on a balanced dataset, for instance see this. As far as I know, there is no systematic way of generating such data sets for people using Pytorch. Here are few very similar implementations that are not fully satisfying :
- https://github.com/zhangyongshun/BagofTricks-LT/blob/main/lib/dataset/cao_cifar.py
- https://github.com/KaihuaTang/Long-Tailed-Recognition.pytorch/blob/master/classification/data/ImbalanceCIFAR.py
Such datasets seems to exist on TensorFlow, for instance section 3 of the readme of this repo provides with links to download tfrecord
datasets.
I feels like it could be a very nice feature of torchvision to either contain such datasets or be able to craft them easily.
Alternatives
No response
Additional context
No response
cc @pmeier