You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We discussed this informally in the past, sharing more clearly how blosc2.jit and pandas can interact.
I'm about to open a PR in pandas to support this:
importpandasimportblosc2defmy_func(x):
returnnp.sin(x*2)
s=pandas.Series([1, 2, 3], index=list('abc'), name='sample')
# normal call executed by pandasprint(s.map(my_func))
# we let blosc2 handle thisprint(s.map(my_func, engine=blosc2.jit))
To be able to do this, we would need blosc2 to implement a new interface. The implementation shouldn't be too complex, something like (the example ignores skip_na and another method apply for column-wise operations (function being called with the whole array, not each scalar):
importnumpyasnpimportblosc2# Reference base class: https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py#L77classBlosc2ExecutionEngine:
@staticmethoddefmap(data, func, args, kwargs, decorator, skip_na):
ifnotisinstance(data, np.ndarray):
# we probably received a Seriesifhasattr(data, "values"):
data=data.valueselse:
# there is a chance that we call this with a pyarrow object in the futureraiseValueError("blosc2.jit does not support {data.__name__}")
func=decorator(func)
result=func(data, *args, **kwargs)
returnresultblosc2.jit.__pandas_udf__=Blosc2ExecutionEngine
The advantage of this approach over just decorating the function is that the whole execution loop can be jitted, not only the individual calls.
What do you think? Is this something you'd like to implement? Any feedback? It's designed in a way that you don't need to add a dependency on pandas. We aim to have Numba and Bodo supporting this same interface, and possibly others.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
xref pandas-dev/pandas#61125
We discussed this informally in the past, sharing more clearly how blosc2.jit and pandas can interact.
I'm about to open a PR in pandas to support this:
To be able to do this, we would need blosc2 to implement a new interface. The implementation shouldn't be too complex, something like (the example ignores
skip_na
and another methodapply
for column-wise operations (function being called with the whole array, not each scalar):The advantage of this approach over just decorating the function is that the whole execution loop can be jitted, not only the individual calls.
What do you think? Is this something you'd like to implement? Any feedback? It's designed in a way that you don't need to add a dependency on pandas. We aim to have Numba and Bodo supporting this same interface, and possibly others.
The text was updated successfully, but these errors were encountered: