Use ZendMM in ext-gmp #16609

nielsdos · 2024-10-26T09:28:08Z

cc @cmb69 I think I found a solution for using ZendMM anyway even when the library uses true globals. I took your patch and extended it. There's probably still some room for improvement. There's also a test failure because now one test hits the memory limit.

While libgmp uses true globals to track the allocator, and is therefore not safe to use in a threaded environment, we can make use of a thread-local variable that indicates whether the current thread is currently performing a PHP request. We set this TLS global to true if it is in RINIT, and set it to false in post-RSHUTDOWN. This ensures that any (de)allocation during the request will go through ZendMM while keeping it possible for this thread to be reused or for other threads using libgmp.

While libgmp uses true globals to track the allocator, and is therefore not safe to use in a threaded environment, we can make use of a thread-local variable that indicates whether the current thread is currently performing a PHP request. We set this TLS global to true if it is in RINIT, and set it to false in post-RSHUTDOWN. This ensures that any (de)allocation during the request will go through ZendMM while keeping it possible for this thread to be reused or for other threads using libgmp. Co-authored-by: "Christoph M. Becker" <[email protected]>

nielsdos · 2024-10-26T09:29:06Z

A similar strategy may be possible for libxml, except there we also have to take into account lazy initialization.

nielsdos · 2024-10-26T10:13:05Z

There's also a test failure because now one test hits the memory limit.

Hm or not? ext/gmp/tests/gmp_setbit_long.phpt exhausts all ZendMM memory for me, maybe it's because I'm on a debug build.

nielsdos · 2024-10-26T10:19:33Z

ext/gmp/gmp.c

@@ -551,10 +588,19 @@ ZEND_MINIT_FUNCTION(gmp)

 	register_gmp_symbols(module_number);

+	mp_get_memory_functions(&gmp_old_alloc, &gmp_old_realloc, &gmp_old_free);
+	mp_set_memory_functions(gmp_alloc, gmp_realloc, gmp_free);


Maybe we should be nice and restore the old functions on MSHUTDOWN?

I think that would be better yes.

Girgias

This looks sensible to me, but my knowledge of TSRM is basically non-existent :)

Girgias · 2024-10-27T10:29:26Z

ext/gmp/gmp.c

@@ -551,10 +588,19 @@ ZEND_MINIT_FUNCTION(gmp)

 	register_gmp_symbols(module_number);

+	mp_get_memory_functions(&gmp_old_alloc, &gmp_old_realloc, &gmp_old_free);
+	mp_set_memory_functions(gmp_alloc, gmp_realloc, gmp_free);


I think that would be better yes.

nielsdos · 2024-10-27T10:45:44Z

It doesn't actually use the TSRM, it uses only TSRM_TLS which creates a thread local variable.

cmb69

An interesting approach! But I'm afraid that cannot work in the general case, since libgmp uses process globals. Even if we only install our allocators during request processing, another thread may still use ours. It might be safe enough for Apache mpm_winnt, for instance, but I'm not even sure about that.

It seems to me that this libgmp API is fundamentally flawed. Besides that it would need thread-local storage and locking during installation and fetching the allocators for multi-threaded processes, it also should not support arbitrary re-installation of allocators (or at least be very explicit in the documentation that anybody who installs new allocators needs to re-install the old allocators when done, so they basically work as a stack),

nielsdos · 2024-10-27T15:37:22Z

But I'm afraid that cannot work in the general case, since libgmp uses process globals. Even if we only install our allocators during request processing, another thread may still use ours.

That's not true. The other threads will have gmp_request_allocator set to false, causing them to use the system allocator.

Besides that it would need thread-local storage and locking during installation and fetching the allocators for multi-threaded processes, it also should not support arbitrary re-installation of allocators (or at least be very explicit in the documentation that anybody who installs new allocators needs to re-install the old allocators when done, so they basically work as a stack),

So the way I implemented it here is basically like a stack indeed. Module init and shutdown should only happen at webserver process start and shutdown, so I think the risk of racing for the installation of the handler is basically zero.

cmb69 · 2024-10-27T16:01:38Z

That's not true. The other threads will have gmp_request_allocator set to false, causing them to use the system allocator.

Oh, right!

So the way I implemented it here is basically like a stack indeed. Module init and shutdown should only happen at webserver process start and shutdown, so I think the risk of racing for the installation of the handler is basically zero.

Maybe I'm overthinking this, but I see potential issues even for single-threaded processes. What if someone else (that would need to be a PHP module for single-threaded processes, though) installs their own handlers, and after being finished restores the original handlers (not those that have been installed previously)? What happens if libgmp allows to install some callbacks (in the broadest sense; could be a signal handler), where some resources would be freed, now possibly using a different allocator?

This might not be a problem in practise for classic SAPI, but I'm also thinking about embed, FrankenPHP, Swoole and such.

Given that memory allocation is only one issue with libgmp (the other that some overflows may abort()), I wonder for some time whether we should actually bundle mini-gmp (probably way slower, possibly out-dated), or maybe another library (perhaps libtommath), and adapt as needed. That could possibly also be used as low-level support for BCMath. And maybe at some point in time, PHP would get native bigint support.

nielsdos · 2024-10-27T16:08:23Z

Maybe I'm overthinking this, but I see potential issues even for single-threaded processes. What if someone else (that would need to be a PHP module for single-threaded processes, though) installs their own handlers, and after being finished restores the original handlers (not those that have been installed previously)? What happens if libgmp allows to install some callbacks (in the broadest sense; could be a signal handler), where some resources would be freed, now possibly using a different allocator?

That is already a problem even without this patch.

This might not be a problem in practise for classic SAPI, but I'm also thinking about embed, FrankenPHP, Swoole and such.

That's not fundamentally different from what could happen with e.g. the Apache SAPI which is first-party to PHP.

Given that memory allocation is only one issue with libgmp (the other that some overflows may abort()), I wonder for some time whether we should actually bundle mini-gmp (probably way slower, possibly out-dated), or maybe another library (perhaps libtommath), and adapt as needed.

We should only bundle libraries if we have no other choice. I think that the overflow aborts are not fundamentally so problematic that we should go the bundled library route. Bundling creates a burden (just look at the state of libgd, where some fixes are only in PHP and some are only in upstream).

cmb69

That is already a problem even without this patch.

Yeah, right. So maybe we should go with this approach. But see the question below.

We should only bundle libraries if we have no other choice. I think that the overflow aborts are not fundamentally so problematic that we should go the bundled library route. Bundling creates a burden (just look at the state of libgd, where some fixes are only in PHP and some are only in upstream).

True. Plus that is another discussion.

cmb69 · 2024-10-27T18:54:22Z

ext/gmp/gmp.c

+/* Allocator state: `gmp_request_allocator` is a true TLS global
+ * to indicate whether this thread is currently working on a PHP request.
+ * Just checking EG() is not enough because other modules may be loaded in the webserver. */
+static TSRM_TLS bool gmp_request_allocator = false;


Couldn't this be a gmp module global?

No.

Simplest counter-example:

Imagine you're in an embedded SAPI with multiple threads. If you only run a single PHP interpreter at a time you can safely use NTS in this scenario. When NTS is used, the variable will be a true global, which would break other threads trying to use libgmp.

That aside, AFAIK, a true TLS variable will be faster than having to go through the TSRM too. As other DSO's don't need access to this variable directly there's no reason to put it in the module globals.

Ah, thank you!

remicollet · 2024-10-28T14:29:20Z

Sorry to be late on this one

For memory, the ZendMM in ext/gmp was dropped 10 years ago in 7740eda
because of https://bugs.php.net/bug.php?id=63595 (nothing related to ZTS here)

So I still think it is a bad idea to try to change allocators.

cmb69 · 2024-10-28T15:01:01Z

Thank @remicollet! So there are apparently still some cases where this might clash. :(

Sadly, this might not work as hoped.

remicollet · 2024-10-28T15:20:25Z

IMHO a safe solution is to allocate additional storage to store pointer to the free function to use for each allocated space.

But this have to be done in libgmp, which is not the case.

NattyNarwhal · 2024-10-28T16:05:03Z

It might be worth talking to the GMP developers if we haven't; maybe there's some improvements to be made over there. Of course, any improvements that result from it will take a while to propagate to downstreams...

nielsdos · 2024-10-28T16:30:47Z

The bug report that Remi linked was talking about the order of module initialization, where one module used libgmp before ext-gmp loaded and switched allocators. However that is not a problem because the allocator is only active during request with this patch.

Also: no need to store metadata as Remi suggested, there's a is_zend_ptr() function.

remicollet · 2024-10-29T07:52:02Z

Also: no need to store metadata as Remi suggested, there's a is_zend_ptr() function.

This work in PHP side to check a buffer was allocated from PHP, but not in the other side (there is no warranty a buffer allocated by PHP will not be freed by another library with standard free function)

In all case I'm against this change, if really you plan to implement it in PHP next, please make it an optional behavior, which can be enabled/disabled by a build option, so downstream can skip it.

Having a library with customizable allocators doesn't mean we have to use them. Most library don't have and it is not a real problem (libssl, libicu, libonig, libxml2..)

nielsdos · 2024-10-29T08:11:12Z

It is a real problem, do some research by searching bugsnet or this issue tracker.
Anyway, I'm not gonna spend any more energy on this.

cmb69 · 2024-10-29T10:07:55Z

It is a real problem, […]

I agree. If a library potentially allocates a lot of memory, memory_limit is moot, and that is not good[TM]. And if a library specific allocator abort()s, that is also rather bad. And there may be further issues I'm overlooking.

The bug report that Remi linked was talking about the order of module initialization, where one module used libgmp before ext-gmp loaded and switched allocators. However that is not a problem because the allocator is only active during request with this patch.

I'm not 100% sure that I understand all details (and their implications) of that bug report. @smalyshev said:

The big problem is that nobody guarantees us gnutls GMP object lifetimes would match PHP request lifetime. They could allocate something outside the request and free inside, or vice versa.

If that is true, even with this patch (which seems to me as good as can be), there would be a problem. Now, I don't know enough about libgmp (and mpir might behave somewhat differently anyway), but if they implement something like shared numbers (similar to BCMath), then I see a real problem.

Maybe we should discuss this on the list, in the hope to get further/more detailed insights about what may happen, and how usable custom allocators need to be designed, wrt libgmp, but also wrt other libraries (e.g. there is a PR against libgd open for a long time).

And it also might make sense to contact the libgmp maintainers, as suggested by @NattyNarwhal.

nielsdos requested a review from cmb69 October 26, 2024 09:28

nielsdos requested a review from Girgias as a code owner October 26, 2024 09:28

github-actions bot added the Extension: gmp label Oct 26, 2024

no return

d816fc3

nielsdos commented Oct 26, 2024

View reviewed changes

Girgias reviewed Oct 27, 2024

View reviewed changes

cmb69 reviewed Oct 27, 2024

View reviewed changes

cmb69 mentioned this pull request Oct 27, 2024

Application-supplied memory allocators libgd/libgd#692

Open

cmb69 previously approved these changes Oct 27, 2024

View reviewed changes

nielsdos closed this Oct 29, 2024

cmb69 mentioned this pull request Nov 21, 2024

gmp_pow(64, 11) throws overflow exception #16870

Closed

Use ZendMM in ext-gmp #16609

Use ZendMM in ext-gmp #16609

Uh oh!

Conversation

nielsdos commented Oct 26, 2024

Uh oh!

nielsdos commented Oct 26, 2024

Uh oh!

nielsdos commented Oct 26, 2024

Uh oh!

nielsdos Oct 26, 2024

Choose a reason for hiding this comment

Uh oh!

Girgias Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

Girgias left a comment

Choose a reason for hiding this comment

Uh oh!

Girgias Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

nielsdos commented Oct 27, 2024

Uh oh!

cmb69 left a comment

Choose a reason for hiding this comment

Uh oh!

nielsdos commented Oct 27, 2024

Uh oh!

cmb69 commented Oct 27, 2024

Uh oh!

nielsdos commented Oct 27, 2024

Uh oh!

cmb69 left a comment

Choose a reason for hiding this comment

Uh oh!

cmb69 Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

nielsdos Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

cmb69 Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

remicollet commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmb69 commented Oct 28, 2024

Uh oh!

remicollet commented Oct 28, 2024

Uh oh!

NattyNarwhal commented Oct 28, 2024

Uh oh!

nielsdos commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remicollet commented Oct 29, 2024

Uh oh!

nielsdos commented Oct 29, 2024

Uh oh!

cmb69 commented Oct 29, 2024

Uh oh!

Uh oh!

remicollet commented Oct 28, 2024 •

edited

Loading

nielsdos commented Oct 28, 2024 •

edited

Loading