-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Use ZendMM in ext-gmp #16609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ZendMM in ext-gmp #16609
Conversation
While libgmp uses true globals to track the allocator, and is therefore not safe to use in a threaded environment, we can make use of a thread-local variable that indicates whether the current thread is currently performing a PHP request. We set this TLS global to true if it is in RINIT, and set it to false in post-RSHUTDOWN. This ensures that any (de)allocation during the request will go through ZendMM while keeping it possible for this thread to be reused or for other threads using libgmp. Co-authored-by: "Christoph M. Becker" <[email protected]>
A similar strategy may be possible for libxml, except there we also have to take into account lazy initialization. |
Hm or not? |
@@ -551,10 +588,19 @@ ZEND_MINIT_FUNCTION(gmp) | |||
|
|||
register_gmp_symbols(module_number); | |||
|
|||
mp_get_memory_functions(&gmp_old_alloc, &gmp_old_realloc, &gmp_old_free); | |||
mp_set_memory_functions(gmp_alloc, gmp_realloc, gmp_free); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should be nice and restore the old functions on MSHUTDOWN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be better yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks sensible to me, but my knowledge of TSRM is basically non-existent :)
@@ -551,10 +588,19 @@ ZEND_MINIT_FUNCTION(gmp) | |||
|
|||
register_gmp_symbols(module_number); | |||
|
|||
mp_get_memory_functions(&gmp_old_alloc, &gmp_old_realloc, &gmp_old_free); | |||
mp_set_memory_functions(gmp_alloc, gmp_realloc, gmp_free); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be better yes.
It doesn't actually use the TSRM, it uses only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting approach! But I'm afraid that cannot work in the general case, since libgmp uses process globals. Even if we only install our allocators during request processing, another thread may still use ours. It might be safe enough for Apache mpm_winnt, for instance, but I'm not even sure about that.
It seems to me that this libgmp API is fundamentally flawed. Besides that it would need thread-local storage and locking during installation and fetching the allocators for multi-threaded processes, it also should not support arbitrary re-installation of allocators (or at least be very explicit in the documentation that anybody who installs new allocators needs to re-install the old allocators when done, so they basically work as a stack),
That's not true. The other threads will have
So the way I implemented it here is basically like a stack indeed. Module init and shutdown should only happen at webserver process start and shutdown, so I think the risk of racing for the installation of the handler is basically zero. |
Oh, right!
Maybe I'm overthinking this, but I see potential issues even for single-threaded processes. What if someone else (that would need to be a PHP module for single-threaded processes, though) installs their own handlers, and after being finished restores the original handlers (not those that have been installed previously)? What happens if libgmp allows to install some callbacks (in the broadest sense; could be a signal handler), where some resources would be freed, now possibly using a different allocator? This might not be a problem in practise for classic SAPI, but I'm also thinking about embed, FrankenPHP, Swoole and such. Given that memory allocation is only one issue with libgmp (the other that some overflows may |
That is already a problem even without this patch.
That's not fundamentally different from what could happen with e.g. the Apache SAPI which is first-party to PHP.
We should only bundle libraries if we have no other choice. I think that the overflow aborts are not fundamentally so problematic that we should go the bundled library route. Bundling creates a burden (just look at the state of libgd, where some fixes are only in PHP and some are only in upstream). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is already a problem even without this patch.
Yeah, right. So maybe we should go with this approach. But see the question below.
We should only bundle libraries if we have no other choice. I think that the overflow aborts are not fundamentally so problematic that we should go the bundled library route. Bundling creates a burden (just look at the state of libgd, where some fixes are only in PHP and some are only in upstream).
True. Plus that is another discussion.
/* Allocator state: `gmp_request_allocator` is a true TLS global | ||
* to indicate whether this thread is currently working on a PHP request. | ||
* Just checking EG() is not enough because other modules may be loaded in the webserver. */ | ||
static TSRM_TLS bool gmp_request_allocator = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be a gmp module global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
Simplest counter-example:
Imagine you're in an embedded SAPI with multiple threads. If you only run a single PHP interpreter at a time you can safely use NTS in this scenario. When NTS is used, the variable will be a true global, which would break other threads trying to use libgmp.
That aside, AFAIK, a true TLS variable will be faster than having to go through the TSRM too. As other DSO's don't need access to this variable directly there's no reason to put it in the module globals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thank you!
Sorry to be late on this one For memory, the ZendMM in ext/gmp was dropped 10 years ago in 7740eda So I still think it is a bad idea to try to change allocators. |
Thank @remicollet! So there are apparently still some cases where this might clash. :( |
IMHO a safe solution is to allocate additional storage to store pointer to the free function to use for each allocated space. But this have to be done in libgmp, which is not the case. |
It might be worth talking to the GMP developers if we haven't; maybe there's some improvements to be made over there. Of course, any improvements that result from it will take a while to propagate to downstreams... |
The bug report that Remi linked was talking about the order of module initialization, where one module used libgmp before ext-gmp loaded and switched allocators. However that is not a problem because the allocator is only active during request with this patch. Also: no need to store metadata as Remi suggested, there's a is_zend_ptr() function. |
This work in PHP side to check a buffer was allocated from PHP, but not in the other side (there is no warranty a buffer allocated by PHP will not be freed by another library with standard free function) In all case I'm against this change, if really you plan to implement it in PHP next, please make it an optional behavior, which can be enabled/disabled by a build option, so downstream can skip it. Having a library with customizable allocators doesn't mean we have to use them. Most library don't have and it is not a real problem (libssl, libicu, libonig, libxml2..) |
It is a real problem, do some research by searching bugsnet or this issue tracker. |
I agree. If a library potentially allocates a lot of memory,
I'm not 100% sure that I understand all details (and their implications) of that bug report. @smalyshev said:
If that is true, even with this patch (which seems to me as good as can be), there would be a problem. Now, I don't know enough about libgmp (and mpir might behave somewhat differently anyway), but if they implement something like shared numbers (similar to BCMath), then I see a real problem. Maybe we should discuss this on the list, in the hope to get further/more detailed insights about what may happen, and how usable custom allocators need to be designed, wrt libgmp, but also wrt other libraries (e.g. there is a PR against libgd open for a long time). And it also might make sense to contact the libgmp maintainers, as suggested by @NattyNarwhal. |
cc @cmb69 I think I found a solution for using ZendMM anyway even when the library uses true globals. I took your patch and extended it. There's probably still some room for improvement. There's also a test failure because now one test hits the memory limit.
While libgmp uses true globals to track the allocator, and is therefore not safe to use in a threaded environment, we can make use of a thread-local variable that indicates whether the current thread is currently performing a PHP request. We set this TLS global to true if it is in RINIT, and set it to false in post-RSHUTDOWN. This ensures that any (de)allocation during the request will go through ZendMM while keeping it possible for this thread to be reused or for other threads using libgmp.