DESIGN: NA values in floating point arrays #46
Description
Do we want to continue to use NaN? There are possible computational benefits to doing so, but we have the opportunity with pandas 2.0 to change to using bitmaps everywhere, which brings a lot of consistency to how missing data is handled (for example: isnull
becomes free under this scenario). We may want to do some benchmarking to understand the overhead of using bitmaps in arithmetic and aggregations (rather than letting the CPU propagate NaN where relevant).
One benefit of the bitmap route is that we can use hardware popcount to skip null checking on groups of 64 or 128 bits at a time (or more, if there are AVX popcount instructions available, not actually sure about this), so the performance in aggregations may actually get a lot better on mostly non-null data.