Skip to content

bpo-43179: Generalise alignment for optimised string routines #24624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 31, 2021

Conversation

jrtc27
Copy link
Contributor

@jrtc27 jrtc27 commented Feb 22, 2021

The recent "Move support of legacy platforms/architectures outside Python" thread pointed out that there is currently a gross hack for m68k. Such a hack is not needed, just correct code that doesn't rely on so much implementation-defined behaviour. Thus, introduce and correctly use ALIGNOF_FOO in place of SIZEOF_FOO for alignment-related code in optimised string routines.

The first commit is required for correctness on architectures where alignof(x) is less than sizeof(x) due to the hardware having weaker alignment requirements for certain types (early m68k processors had a 16-bit data bus, and so 32-bit loads/stores were permitted to be only 16-bit aligned, though more recent ones likely have a full 32-bit data bus and have to deal with unaligned accesses much like modern x86 and Arm cores) but makes no changes that are otherwise not strictly required.

The second much larger commit fixes the many variants of optimised string routines to condition themselves on alignof(x) not sizeof(x). This is not required for correctness, but is good practice (and ensures that asserts with sizeof(x) rather than alignof(x) don't start to creep back in); all that happens when the conditions are too strict is they fall back on less-optimised versions.

https://bugs.python.org/issue43179

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@jrtc27

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@jrtc27
Copy link
Contributor Author

jrtc27 commented Feb 22, 2021

@glaubitz Would be good if you could verify there are no regressions on m68k, please :)

@jrtc27 jrtc27 force-pushed the sizeof-is-not-alignof branch from 499dadf to 62b132b Compare February 22, 2021 20:46
@glaubitz
Copy link
Contributor

Thanks! Working on it. We should probably pull in @andreas-schwab as well.

On m68k, alignments of primitives is more relaxed, with 4-byte and
8-byte types only requiring 2-byte alignment, thus using sizeof(size_t)
does not work. Instead, use the portable alternative.

Note that this is a minimal fix that only relaxes the assertion and the
condition for when to use the optimised version remains overly strict.
Such issues will be fixed tree-wide in the next commit.

NB: In C11 we could use _Alignof(size_t) instead, but for compatibility
we use autoconf.
@jrtc27 jrtc27 force-pushed the sizeof-is-not-alignof branch from 62b132b to e046b24 Compare February 22, 2021 20:57
@jrtc27 jrtc27 requested a review from a team as a code owner February 22, 2021 20:57
@glaubitz
Copy link
Contributor

@jrtc27 This triggers an assertion:

python: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

@jrtc27 jrtc27 force-pushed the sizeof-is-not-alignof branch from e046b24 to 31947b7 Compare February 22, 2021 21:48
@jrtc27
Copy link
Contributor Author

jrtc27 commented Feb 22, 2021

@jrtc27 This triggers an assertion:

python: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

Don't know if I found the cause of that, but I had been a bit too aggressive with the changes and misunderstood one part of the code (which was particularly annoying to fix, so glad it's not needed). Try again now? Have you also verified that it was a regression compared with master?

@glaubitz
Copy link
Contributor

I applied the patches against the python3.10 package from experimental which builds fine:

https://buildd.debian.org/status/fetch.php?pkg=python3.10&arch=m68k&ver=3.10.0%7Ea4-2&stamp=1610391393&raw=0

With the patches, the assertion fails. I'm trying git master with your patches now.

@glaubitz
Copy link
Contributor

Tried with git master - works fine, breaks with your changes, unfortunately:

python: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

C only requires that sizeof(x) is a multiple of alignof(x), not that the
two are equal. Thus anywhere where we optimise based on alignment we
should be using alignof(x) not sizeof(x).

This is more annoying than it would be in C11 where we could just use
_Alignof(x) (and alignof(x) in C++11), but since we still require only
C99 we must plumb the information all the way from autoconf through the
various typedefs and defines.
@jrtc27 jrtc27 force-pushed the sizeof-is-not-alignof branch from e30c139 to e81d2ea Compare February 22, 2021 23:11
@glaubitz
Copy link
Contributor

Changes are fine now and the assertion no longer fails on m68k.

Great work and thank you!

@glaubitz
Copy link
Contributor

Also verified to work on 32-bit PowerPC and 64-bit SPARC.

/* Help allocation */
const char *_p = p;
while (_p < aligned_end) {
while (_p + SIZEOF_SIZE_T <= end) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. It looks is good change.

But I have questions about one particular change. Are you sure that there is no overflow here? Are you sure that all compilers optimize this code so that the loop does not contain superfluous addition?

Copy link
Contributor Author

@jrtc27 jrtc27 Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rewrite those all as end - _p >= SIZEOF_SIZE_T if you'd rather, as yes technically this can generate a pointer slightly out of bounds (and in theory overflow if you have a pointer to the last page of the address space, though in practice I hope that page is never given out as most software is unlikely to work if it's used...), though this is a common enough pattern for software that I think compilers end up having to support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, it's 2021, not the late 90s, any compiler worth your time is already rewriting this entire thing behind your back (e.g. the "help allocation" is completely meaningless for any compiler you'll have encountered in the past 10+ years, that's such a basic optimisation for it to do).

@glaubitz
Copy link
Contributor

glaubitz commented Mar 9, 2021

Ping? Any news on this? Would be great if this change could be integrated for 3.10 so that there is less arch-specific code in CPython.

@glaubitz
Copy link
Contributor

So, any chance we can get this further?

@ambv ambv added skip news type-feature A feature request or enhancement and removed awaiting review labels Mar 31, 2021
@ambv ambv changed the title Generalise alignment for optimised string routines bpo-43179: Generalise alignment for optimised string routines Mar 31, 2021
@ambv ambv merged commit dec0757 into python:master Mar 31, 2021
ambv added a commit that referenced this pull request Mar 31, 2021
@ambv
Copy link
Contributor

ambv commented Mar 31, 2021

Thanks for taking the time to implement this, Jessica. Thanks @glaubitz for the ping.

@glaubitz
Copy link
Contributor

Thanks a lot for merging this \o/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants