-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Use BaseExecutionEngine for Python and Numba engines #61458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for assigning me this @datapythonista ! This looks interesting to work on and I'll start looking into it. |
Thanks @arthurlw. A possible approach could be starting by numba only. The numba engine is only implemented for I think all the numba engine has been introduced in two PRs, #54666 and #55104, and hasn't change much. So it should be easy to see all the changes implemented for the engine. The main logic is implemented here: https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py#L1096 I think having all the numba engine as a sublass of the base executor would be already quite valuable, and much easier than refactoring all the Python engine code. For reference, you have an implementation of a third-party executor engine in this PR: https://github.com/bodo-ai/Bodo/pull/410/files |
Hey @datapythonista I’ve been thinking about how to best organize the engine subclasses and avoid circular imports. One option is to move the base class and all engine implementations into a new
This keeps each engine in its own file and provides a clear plugin point for third-party engines. What do you think? |
This looks reasonable. I'd probably start creating the |
In #61032 we have created a new base class
BaseExecutionEngine
that engines can subclass to handleapply
andmap
operations. The base class has been initially created to allow third-party engines to be passed toDataFrame.apply(..., engine=third_party_engine)
. But our core engines Python and Numba can also be implemented as instances of this base class. This will make the code cleaner, more maintainable, and it may allow to move the Numba engine outside of the pandas code base easily.The whole migration to the new interface is quite a big change, so it's recommended to make the transition step by step, in small pull requests.
The text was updated successfully, but these errors were encountered: