Use BaseExecutionEngine for Python and Numba engines #61458

datapythonista · 2025-05-19T13:04:52Z

In #61032 we have created a new base class BaseExecutionEngine that engines can subclass to handle apply and map operations. The base class has been initially created to allow third-party engines to be passed to DataFrame.apply(..., engine=third_party_engine). But our core engines Python and Numba can also be implemented as instances of this base class. This will make the code cleaner, more maintainable, and it may allow to move the Numba engine outside of the pandas code base easily.

The whole migration to the new interface is quite a big change, so it's recommended to make the transition step by step, in small pull requests.

The text was updated successfully, but these errors were encountered:

arthurlw · 2025-05-20T11:35:19Z

Thanks for assigning me this @datapythonista ! This looks interesting to work on and I'll start looking into it.

datapythonista · 2025-05-20T12:27:56Z

Thanks @arthurlw. A possible approach could be starting by numba only. The numba engine is only implemented for DataFrame.apply for now, and only for certain types of the parameters. For example, it doesn't work with ufuncs.

I think all the numba engine has been introduced in two PRs, #54666 and #55104, and hasn't change much. So it should be easy to see all the changes implemented for the engine.

The main logic is implemented here: https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py#L1096

I think having all the numba engine as a sublass of the base executor would be already quite valuable, and much easier than refactoring all the Python engine code.

For reference, you have an implementation of a third-party executor engine in this PR: https://github.com/bodo-ai/Bodo/pull/410/files

arthurlw · 2025-05-22T12:55:25Z

Hey @datapythonista I’ve been thinking about how to best organize the engine subclasses and avoid circular imports. One option is to move the base class and all engine implementations into a new pandas/core/engines/ sub-package:

pandas/core/
├─ apply.py
└─ engines/
   ├─ base.py              # BaseExecutionEngine
   ├─ python_engine.py     # PythonExecutionEngine
   └─ numba_engine.py      # NumbaExecutionEngine

This keeps each engine in its own file and provides a clear plugin point for third-party engines. What do you think?

datapythonista · 2025-05-22T14:18:24Z

This looks reasonable. I'd probably start creating the NumbaExecutionEngine class in apply.py for now, as I think it'll be somehow small. And being in the same file you'll also avoid circular imports. But as we properly split the Python and the Numba engines, I think it makes sense to split this way. Maybe it'd be more clear to name the directory/module apply, since engine can mean different things in pandas.

datapythonista assigned arthurlw May 19, 2025

datapythonista added the Apply Apply, Aggregate, Transform, Map label May 19, 2025

datapythonista mentioned this issue May 19, 2025

DOC: User Guide Page on user-defined functions #61195

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use BaseExecutionEngine for Python and Numba engines #61458

Use BaseExecutionEngine for Python and Numba engines #61458

datapythonista commented May 19, 2025

arthurlw commented May 20, 2025

datapythonista commented May 20, 2025

arthurlw commented May 22, 2025

datapythonista commented May 22, 2025

Use BaseExecutionEngine for Python and Numba engines #61458

Use BaseExecutionEngine for Python and Numba engines #61458

Comments

datapythonista commented May 19, 2025

arthurlw commented May 20, 2025

datapythonista commented May 20, 2025

arthurlw commented May 22, 2025

datapythonista commented May 22, 2025