Skip to content

Use BaseExecutionEngine for Python and Numba engines #61458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
datapythonista opened this issue May 19, 2025 · 4 comments
Open

Use BaseExecutionEngine for Python and Numba engines #61458

datapythonista opened this issue May 19, 2025 · 4 comments
Assignees
Labels
Apply Apply, Aggregate, Transform, Map

Comments

@datapythonista
Copy link
Member

In #61032 we have created a new base class BaseExecutionEngine that engines can subclass to handle apply and map operations. The base class has been initially created to allow third-party engines to be passed to DataFrame.apply(..., engine=third_party_engine). But our core engines Python and Numba can also be implemented as instances of this base class. This will make the code cleaner, more maintainable, and it may allow to move the Numba engine outside of the pandas code base easily.

The whole migration to the new interface is quite a big change, so it's recommended to make the transition step by step, in small pull requests.

@arthurlw
Copy link
Member

Thanks for assigning me this @datapythonista ! This looks interesting to work on and I'll start looking into it.

@datapythonista
Copy link
Member Author

Thanks @arthurlw. A possible approach could be starting by numba only. The numba engine is only implemented for DataFrame.apply for now, and only for certain types of the parameters. For example, it doesn't work with ufuncs.

I think all the numba engine has been introduced in two PRs, #54666 and #55104, and hasn't change much. So it should be easy to see all the changes implemented for the engine.

The main logic is implemented here: https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py#L1096

I think having all the numba engine as a sublass of the base executor would be already quite valuable, and much easier than refactoring all the Python engine code.

For reference, you have an implementation of a third-party executor engine in this PR: https://github.com/bodo-ai/Bodo/pull/410/files

@arthurlw
Copy link
Member

Hey @datapythonista I’ve been thinking about how to best organize the engine subclasses and avoid circular imports. One option is to move the base class and all engine implementations into a new pandas/core/engines/ sub-package:

pandas/core/
├─ apply.py
└─ engines/
   ├─ base.py              # BaseExecutionEngine
   ├─ python_engine.py     # PythonExecutionEngine
   └─ numba_engine.py      # NumbaExecutionEngine

This keeps each engine in its own file and provides a clear plugin point for third-party engines. What do you think?

@datapythonista
Copy link
Member Author

This looks reasonable. I'd probably start creating the NumbaExecutionEngine class in apply.py for now, as I think it'll be somehow small. And being in the same file you'll also avoid circular imports. But as we properly split the Python and the Numba engines, I think it makes sense to split this way. Maybe it'd be more clear to name the directory/module apply, since engine can mean different things in pandas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map
Projects
None yet
Development

No branches or pull requests

2 participants