blosc2.jit support for pandas UDFs #383

datapythonista · 2025-04-13T19:28:14Z

We discussed this informally in the past, sharing more clearly how blosc2.jit and pandas can interact.

I'm about to open a PR in pandas to support this:

import pandas
import blosc2

def my_func(x):
    return np.sin(x * 2)

s = pandas.Series([1, 2, 3], index=list('abc'), name='sample')

# normal call executed by pandas
print(s.map(my_func))

# we let blosc2 handle this
print(s.map(my_func, engine=blosc2.jit))

To be able to do this, we would need blosc2 to implement a new interface. The implementation shouldn't be too complex, something like (the example ignores skip_na and another method apply for column-wise operations (function being called with the whole array, not each scalar):

import numpy as np
import blosc2

# Reference base class: https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py#L77
class Blosc2ExecutionEngine:
    @staticmethod
    def map(data, func, args, kwargs, decorator, skip_na):
        if not isinstance(data, np.ndarray):
            # we probably received a Series
            if hasattr(data, "values"):
                data = data.values
            else:
                # there is a chance that we call this with a pyarrow object in the future
                raise ValueError("blosc2.jit does not support {data.__name__}")
                
        func = decorator(func)
        result = func(data, *args, **kwargs)
        return result


blosc2.jit.__pandas_udf__ = Blosc2ExecutionEngine

The advantage of this approach over just decorating the function is that the whole execution loop can be jitted, not only the individual calls.

What do you think? Is this something you'd like to implement? Any feedback? It's designed in a way that you don't need to add a dependency on pandas. We aim to have Numba and Bodo supporting this same interface, and possibly others.

The text was updated successfully, but these errors were encountered:

FrancescAlted · 2025-04-14T11:44:04Z

Sure. That code seems quite unobtrusive, and we would be happy to serve the pandas community. Would you mind to send a PR?

datapythonista linked a pull request May 24, 2025 that will close this issue

Add support for new pandas UDF engine #418

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

blosc2.jit support for pandas UDFs #383

blosc2.jit support for pandas UDFs #383

datapythonista commented Apr 13, 2025 •

edited

Loading

FrancescAlted commented Apr 14, 2025

Uh oh!

Uh oh!

blosc2.jit support for pandas UDFs #383

blosc2.jit support for pandas UDFs #383

Comments

datapythonista commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FrancescAlted commented Apr 14, 2025

Uh oh!

datapythonista commented Apr 13, 2025 •

edited

Loading