Skip to content

[WIP/to be DISCUSSED] Add pep517.metadata module for highlevel metadata access #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented Mar 2, 2019

This adds a pep517.metadata module my past self desperately wished would have been available, and which would have saved me weeks of getting way more familiar with packaging than I had ever planned to. The main functionality is to list package dependencies.

This pull request is work in progress but I'll put it here early to discuss:

  • should this be part of PEP517, or something else? (I highly believe it should be part of something official though, I hope you feel similarly)

  • does the general approach I've taken here make sense? (e.g. is it reasonable to expect tempfile/a filesystem to be available?)

  • would you like the function signatures to look different?

What it provides (copy & pasted from what I added to readme):

   # Getting package name from pip reference:

   from pep517.metadata import get_package_name
   print(get_package_name("pillow"))
   # Outputs: "Pillow" (note the spelling!)


   # Getting package dependencies:

   from pep517.metadata import get_package_dependencies
   print(get_package_dependencies("pep517"))
   # Outputs: "['pytoml']"


   # Get package name from arbitrary package source:

   from pep517.metadata import get_package_name
   print(get_package_name("/some/local/project/folder/"))
   # Outputs package name

(also yes, I am aware it lacks unit tests. I will add them I promise, don't merge until they're present)

@pfmoore
Copy link
Member

pfmoore commented Mar 2, 2019

I don't really understand what this does, but from what I see it looks too high level for this project, so I'd prefer it to be published somewhere else, probably as an independent project in the first place.

Also, the code references pythonforandroid. Was this a typo, or is this code not yet intended to be an actual working version?

@ghost
Copy link
Author

ghost commented Mar 2, 2019

be published somewhere else

Well just tell me where! Because last time I asked for such a highlevel lib, I was only pointed here to pep517. I think it's a really huge issue of python packaging that no such highlevel code is available for easy use

Also, the code references pythonforandroid

Oops, nice catch. This is a typo, that's what I originally wrote this for before realizing this should better go upstream (it is kind of silly I would need to write this myself, don't you think?)

Edit:

I don't really understand what this does

I copy&pasted the usage examples I added to the readme in the pull request post above

@ghost
Copy link
Author

ghost commented Mar 2, 2019

For what it's worth, I just fixed the flake8/style errors and the left-over pythonforandroid reference in the code

@pfmoore
Copy link
Member

pfmoore commented Mar 2, 2019

Well just tell me where!

My initial thought was that you should publish and maintain it yourself. If it's useful to others, then at a later stage it could be merged into an existing project, if an appropriate one exists.

last time I asked for such a "highlevel lib", I was only pointed here, to pep517

For me, the key thing is that the pep517 package contains functions that work on local directories containing source trees, and supplies wrappers around the raw PEP 517 backend hooks that are defined in the standard. It's high level in the sense that it provides a friendly wrapper around the low-level PEP 517 hook mechanisms, taking care of the necessary admin. But from a user's point of view, it's still pretty low level, IMO, and I'd expect it to be mostly used in building tools (specifically build frontends, in PEP 517 terms).

You seem to be writing functions that download packages by name, and extract metadata from them. That doesn't seem like a good fit here.

But I'm not the owner of this project, so it's not really my call.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

My initial thought was that you should publish and maintain it yourself

I think anything that isn't official is kind of not solving anything / missing the point.

@jaraco already does that, there are 99 other weird pip wrappers that do that, and it's impossible to tell which of the 99% of them aren't useless (since most of them are pre-517 and pretty outdated/don't deal with non-setuptools/...).

Unless this is somehow officially curated people it'll stay as it is, which is that people are going to continue to make up their own alternatives because it's not clear what works, and python meta package handling will remain annoying to get into. This doesn't mean I wouldn't be willing to maintain this code, but I am not interested in maintaining it as yet another unofficial packaging library. In that case, I'd rather add it to pythonforandroid itself

For me, the key thing is that the pep517 package contains functions that work on local directories containing source trees [...] You seem to be writing functions that download packages by name

It works perfectly fine on local source trees! See usage example. It only also works on remote sources, since it's meant to be highlevel / universal (Edit: it is also impossible to recursively get the dependencies without this, which get_package_dependencies optionally supports)

@ghost
Copy link
Author

ghost commented Mar 2, 2019

I now added a meta ticket for discussing where such code could possibly live on the packaging-problems tracker: pypa/packaging-problems#247

@gaborbernat
Copy link

I also struggle to understand what this does and why supports pytoml.yml and METADATA (not sure what the first is and the second is wheel only, not?)

@ghost
Copy link
Author

ghost commented Mar 2, 2019

I also struggle to understand what this does

The basic mechanism really is quite simple, it extracts METADATA, and from that the name & dependencies (see example code above for the actual information spit out in the end)

METADATA [...] is wheel only, not?

METADATA is no longer wheel-only with prepare_metadata_for_build_wheel: with that, pep517 can spit it out without building the full wheel, at least for setuptools packages it can, which avoids going through the whole process. (but still requires installing setup_requires/build_requires, so it's only semi efficient)

Basically all the pull request does is download the packages via pip if not a local source tree, extract it & use pep517.prepare_metadata_for_build_wheel to get METADATA on the downloaded result. From there, it spits out the queried info (name, ...) via simple text parsing of METADATA

Oh and it's all done in a venv so technically you're not permanently installing anything, which was the entire point. (Of course in the venv the setup requires and such is, but how would that be avoided?)

pytoml.yml [...] not sure what the first is

No clue either! I added it in because pep517 seems to kinda rely on it, and it contains the build_requires info for packages that use it - so I pick it up to extract that if it's present. (since that counts as "dependencies" for building the wheel) I never personally authored any project that uses that file.

@gaborbernat
Copy link

I see now, I suppose one can use prepare_metadata_for_build_wheel to generate all the required information. Then just read that file to provision the answers you want. After this, in essence, all this does is parse that file and return back the answers by dictionary lookup. Given you need the PEP-517 chain to call prepare_metadata_for_build_wheel I suppose it's okay for this to be here. Downloading the package, on the other hand, does feel like something that should be left up to the frontend that calls this module and not part of this package. My 2 cents on it.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

After this, in essence, all this does is parse that file and return back the answers by dictionary lookup

Exactly. Now all I am hoping is to get it integrated somewhere, because if it's not upstream somewhere I'd need to put this into python-for-android itself, where it seems grossly misplaced - and it is after all not a completely insignificant amount of code which I don't think can be shortened a lot

I suppose it's okay for this to be here

Well that I don't know. I also wouldn't mind adapting it for another library if I get told where that would be

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Downloading the package, on the other hand, does feel like something that should be left up to the frontend

Wait that greatly confuses me. What frontend would that be? You seem to imagining some involved tool, but all python-for-android needs is to know the dependencies, including indirect ones. There is no frontend of any sort I could think of, what would that be? If you have a good idea which library that should be, then maybe all the code should go there. I don't think this should be considered "backend".

Edit: to clarify, the entire point really was to make something universal that "just works" on anything pip would install. The distinction between whether a package is available locally or remote just isn't meaningful in some basic situations (e.g. when it's referenced as something to install and you want to know dependencies in advance!) and the downloading is not trivial, see the length of the code I needed to make it work. I think it is part of the main usefulness of this code that the downloading is included

@gaborbernat
Copy link

You must have a frontend that drives the backend provision so that the backend can then spit out the correct metadata (this pep517 library is just a part of what frontends do and pretty much non-working on its own). Such frontends at the moment are pip, tox, poetry AFAIK. Neither of these tools wants to offer what you want so you're probably looking at creating your own custom frontend that will use pip + pep517 here to generate indirect (transitive) dependencies.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Right, I am essentially looking for a standalone frontend library in that case. What I wrote is really meant to be exactly that (think of a pip library - which wouldn't be needed if pip could be used as library), just the very basics of course. Isn't that a common request? If there really isn't one yet, that would surprise me, and I really think there should be one (an official one)

@gaborbernat
Copy link

Yes, there really isn't. As said reading, the metadata part is ok to be here. Generating transitive dependencies is something that a frontend should do. You can try to propose it for pip, but I think your best chance is to write your own, publish it and maybe with time and maturity will be merged in. Remember pip has no API access (just CLI), hence why might not be a good fit what you're requesting.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Ok, I could look into separating the metadata extraction part if it is consensus that this one might be worth including. I'll be happy with any subpart being included somewhere, because that means less code packaging oddly misplaced in our downstream (that being python-for-android)

best chance is to write your own, publish it

That's really unlikely to happen, I just don't have the time 😢 what I can't contribute to existing projects is likely just going to end up inside our android-specific handling, which is why I'm looking so badly for places where to add this outside of our codebase. It just seems way too universal to bury it in our project

@ghost
Copy link
Author

ghost commented Mar 2, 2019

(by which I'm not saying I'll dump the code somewhere and not respond to problems, but a separate project involves website, much more overall doc, release cycle, separate testing infrastructure, ... it's just a lot more work than helping out with a small part of an existing project)

@takluyver
Copy link
Member

Thanks, but I agree with @pfmoore - this isn't a good fit for the pep517 package. I understand your frustration - there certainly should be a standard way to look up metadata for a named package from PyPI. But that's not what this package is: it's a mechanism for one tool (a 'frontend') to call another (a 'backend') using the interface defined in PEP 517.

I suspect that getting metadata from a named package is more complicated than it seems, which may go some way to explain why it's not already easy. E.g. if there are different packages available for one release, which one do you get the metadata from? You might argue that this is a corner case, and you may well not care about it, but it will matter to someone else.

Finally, reinforcing a point that Paul and Gabor have already made: blessing brand new code as the official way to do something isn't really how PyPA works. It's much healthier for someone release something, get it battle-tested for a while, start establishing it as a de-facto standard, and then have PyPA adopt maintenance of it. It's fine that you don't want to be that someone. But right now, neither do we. ;-)

@takluyver takluyver closed this Mar 2, 2019
@ghost
Copy link
Author

ghost commented Mar 2, 2019

Makes sense it doesn't go here. I still don't understand the battle-tested notion though, you're acting like I want to make some grand new project. I'm really just looking for making a minor contribution to the right place. Happy for any new suggestions to where it might fit here: pypa/packaging-problems#247

@pfmoore
Copy link
Member

pfmoore commented Mar 2, 2019

I'm really just looking for making a minor contribution

I think you're underestimating the complexity of this task drastically. What we're trying to tell you is that supporting a piece of code like this is likely to be a big job. You seem to get that - after all, you've said you don't want to maintain this as a standalone project - but then you seem to be assuming that another project will be happy to take on that maintenance burden. I don't really know how to respond to that assumption other than by saying "it's more than I'd be willing to take on in a project I'm maintaining" :-(

@ghost
Copy link
Author

ghost commented Mar 3, 2019

What? You're making it sound like I wouldn't want to maintain the code, even though I said multiple times I would be interested in taking care of it, just not as a standalone project. Again, there are so many libs already, does there really need to be another one? And yes maintaining a project is more overhead than just maintaining a component, that should be obvious.

Nevertheless, if none of you find a good match for this component then this is all moot. I don't want you to take it if it's not a good fit. But please don't say I wouldn't help out look after it

(Small edit: and again if anyone of you ever happens to work on a lib where it would fit, just poke me. I might still be interested in working on it as a module of that)

@takluyver
Copy link
Member

I think we all tend to feel responsible for projects we create, and thus we can be protective of them. It's great that you want to maintain this code, but if you contribute it to someone else's project, whoever feels responsible for that project implicitly assumes responsibility for your code. And even if you're keen to maintain it now, maybe in a year or two you'll be busy with something else, and someone else is stuck maintaining it. I'm not saying this to call you flaky - it's how life goes. People move jobs, people's interests change, or people's circumstances change and they suddenly don't have time. It's certainly happened to me.

I think this is really why we're so keen to tell people to release code themselves. If I've contributed code to someone's package, it's much harder for them to say "that part is nothing to do with me, I'll ignore issues relating to it".

@ghost
Copy link
Author

ghost commented Mar 3, 2019

Makes sense, thanks for the clarification! It might however make it more difficult to get contributions sometimes, but I guess everything in life is a trade-off... if anyone of you ever changes their mind and thinks this particular code would make a great addition, let me know

@ghost
Copy link
Author

ghost commented Mar 3, 2019

@takluyver just out of curiosity for our limited internal use of this code (and I understand if you can't be bothered to respond for that purpose), but

E.g. if there are different packages available for one release, which one do you get the metadata from?

Can you point me to something / give an example for that? If there's a corner case where it breaks I'd still be interested in knowing, after all we're going to use this code internally... so far I've been under the impression a "package reference argument" (whatever the proper name is) passed to pip can only map to one package, and that's also what the code would pick up. But if that's not necessarily the case I'd love to learn more about that

@takluyver
Copy link
Member

Yup, I can explain a bit. I like the sound of my own digital voice. 😉

First off a package can have an sdist and a wheel (as I think you know, from scanning your code). There's no guarantee that their metadata is the same. AFAIK, the sdist doesn't even have to have a PKG-INFO metadata file, although most do. Packages can also have multiple wheels, which could all have different metadata.

The underlying technical issue is that the metadata is generated at build time. Taking this to a ridiculous extreme, a package could add a dependency if it's built on a Tuesday. The only way to get the right metadata is to get or create a wheel for the relevant platform, and then ensure that you use the exact same wheel when constructing your application.

I don't have a good example handy at present, but this isn't just a facetious concern. When we were hashing out PEP 517, one suggestion was that packages which build against numpy could add a dependency on the exact version of numpy they were built against, ensuring they're not installed with another version. This sort of thing could grow in importance now, because PEP 517 separates the build environment from the installation environment.

In the past (before wheels), this kind of mechanism was also used a lot to make conditional dependencies based on platform or Python version - your setup script would include something like:

# Don't copy this example of what not to do!
if sys.version_info[0] < 3:
    install_requires.append('configparser')

Nowadays this use case is mostly handled by 'environment markers', specifying a requirement which only applies in certain conditions. But some packages might still be doing things the old way, or need conditions which environment markers can't specify.

@ghost
Copy link
Author

ghost commented Mar 4, 2019

@takluyver thanks, that was very insightful! ❤️ I've seen the environment markers already, very interesting to know other packages choose a more dynamic route

@gaborbernat
Copy link

I don't have a good example handy at present, but this isn't just a facetious concern. When we were hashing out PEP 517, one suggestion was that packages which build against numpy could add a dependency on the exact version of numpy they were built against, ensuring they're not installed with another version. This sort of thing could grow in importance now, because PEP 517 separates the build environment from the installation environment.

@takluyver wouldn't this just be an install requires of numpy == 1.23.4 ?

@takluyver
Copy link
Member

wouldn't this just be an install requires of numpy == 1.23.4 ?

It would, but the wheel would have a more specific requirement than the sdist. The sdist might require numpy >= 1.20 and < 2, so you can build against any of those versions, but once you have built it, the resulting wheel depends on numpy == 1.23.4.

@gaborbernat
Copy link

Hnmm, can the user express this in a setuptools build?

@takluyver
Copy link
Member

It's bound to be possible with enough hacking, but I don't think there's currently a supported way to specify it.

@dholth
Copy link
Member

dholth commented Mar 6, 2019

Mystery solved: it supports pyproject.toml, the comment mentioning pytoml.yml is wrong.

@jaraco
Copy link
Member

jaraco commented Mar 7, 2019

I haven't had time to absorb this full thread or the related threads, but I do believe there's a lot of overlap in what's being proposed here with the work on importlib_metadata, which does seek to present a reliable interface for getting metadata for a package. Unfortunately, it does have some issues, particularly with unbuilt packages. Still, my expectation was that a tool could (a) build the metadata for a package using PEP 517 and then (b) point importlib_metadata at that metadata to extract that metadata (name, requirements, etc).

@ghost
Copy link
Author

ghost commented Mar 7, 2019

@jaraco that looks very interesting! for what it's worth though, my current code needs to (for my use case) and also does obtain the metadata for whatever pip would fetch, which can be an egg but also could be a wheel. So something that only works with eggs (as the README suggests) wouldn't be that well-suited. But very interesting indeed!

@takluyver
Copy link
Member

I think you misread the README there - it says eggs are unsupported. Pip also won't install eggs.

@ghost
Copy link
Author

ghost commented Mar 7, 2019

@takluyver ah oops, I did misread indeed 😂 😂 sorry about that. Then it might be exactly what I need, looks like it has the potential to replace half of my code or something. Very nice find @jaraco I'll look into it!

Edit: oops, I see you actually even suggested it to me before: pypa/setuptools#1606 (comment) but there I was way back before pep517, so had no installed package.

Well, this is still unsatisfactory then. Because install will often process a wheel, which can take significantly take longer than setup_requires. So not a good match I'm afraid, I'd be essentially trading code for much less performance (due to much higher install workload)

@ghost
Copy link
Author

ghost commented Mar 7, 2019

This is the same problem with almost all introspection libs btw. They assume installed, and that just means a huge time waste. (pep517's hooks.prepare_metadata_for_build_wheel will only process setup_requires/build_requires but not all of install_requires and Cythonization of the package itself and such, which would be required for a wheel or install. This can make a huge time difference for some packages with big binary components.)

Since introspection's main issue in general is that it takes forever if the package is complex and not installed yet, this is not really a reasonable trade-off I feel. It can make a difference of many minutes per package in the most extreme cases

@takluyver
Copy link
Member

takluyver commented Mar 7, 2019 via email

@ghost
Copy link
Author

ghost commented Mar 7, 2019

This is what I meant: you can't definitively know the metadata until you have either an installed package or a built wheel ready to install.

well you can, sometimes, which is the point (and as you said yourself in the last paragraph)

Pep 517 does specify a hook to prepare metadata without building, but it's optional.

Yes but it can make a huge difference when it's there. I'm speaking of actual tests here. (and it makes sense if you think about it, you can skip the entire Cython compilation/any other native code of an entire huge project.) And the truth is even if importlib_metadata or any of the other dozens of libs that only work with installed packages are super neat. but if they still require me to install it, I don't think it's worth it 🤷‍♀️ it can really make a huge difference in practice, and not just in a theoretical scenario.

Edit: just to illustrate my point, think about when you run this recursively. (to get an entire tree of what pip would install) if there's multiple Cython projects in install_requires, but never or almost never in setup_requires/build_requires, this can add up quick to a really significant amount of time, like many minutes - I've been there.

sdist [...] the same as the wheel [...] you can't be sure of that

Yeah you've mentioned that. Could be a problem if pip installs an sdist only and not as a wheel, but would it do that for any package with binary components? (not a rhetoric question I really don't know) Anyway, seems to me like an unrelated, difficult & certainly important problem, but not for this particular question of whether to use importlib_metadata or not.

Edit: in general I know I might have gotten some corner cases wrong and if there's a lib that doesn't force install I'd be happy to look at it for sure. As I think I made clear I really don't want to need to take care of this code myself more than I need to 😬 but a replacement still kinda needs to make sense

@takluyver
Copy link
Member

takluyver commented Mar 7, 2019 via email

@ghost
Copy link
Author

ghost commented Mar 7, 2019

Yeah it makes sense. Sorry, i didn't want to be a downer - just wanted to explain why importlib_metadata sadly doesn't seem to be a very good match. It would kinda need to be a lib which doesn't need a full package install by using this pep517 hook accordingly, until then I'm probably better off sticking to my meh code

@jaraco
Copy link
Member

jaraco commented Apr 8, 2019

PEP 517 has a section for building just the metadata... and importlib_metadata should be able to consume that metadata. Currently, there's no public api for that, but you could probably get by with the private APIs for now. Something like:

import pep517.wrappers
import importlib_metadata


def get_metadata_for_unbuilt_package(dir='.'):
	prepared = pep517.wrappers.prepare_metadata_for_build_wheel(dir)
	dist = importlib_metadata._hooks.PathDistribution(prepared)
	version = dist.version
	name = dist.metadata['Name']

I haven't tried this technique, but this is what I had in mind. I agree, you shouldn't have to install a package to get its metadata (though you do have to invoke the 'build metadata' step).

@ghost
Copy link
Author

ghost commented Apr 8, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants