Skip to content

Support for custom options and proto2 extensions #591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

nswarm
Copy link

@nswarm nswarm commented Feb 5, 2022

Hi, this PR adds support for extensions (proto2 only) and custom options (proto2 and 3).

Custom options are built on top of extensions, so I'll refer to them collectively as "extensions".

Please see the included documentation and tests for examples of how prost users would use these features.

Overview

This code is roughly patterned after how extensions are handled in other languages generated by protoc itself.

Here's a board which explains the flow of custom options. Extensions are a little simpler since they're just accessible in the final generated code instead of at parse time, but the overall picture similar.

protobuf-extensions-discovery

The core functionality lives in src/extension.rs:

  • Extendable
    • Applied to proto messages marked with extensions n to m; via code generation.
    • Provides functions for working with the underlying ExtensionSet.
  • ExtensionSet
    • Contains a set of decoded values in a type-opaque manner through ExtensionValue.
    • Its methods take ExtensionImpl<T> which can be used to manage the data of an ExtensionValue by downcasting it to an ExtensionValueImpl.
  • Extension/ExtensionImpl
    • const definitions are generated in rust code from the proto extend MessageName {...} defintions.
    • These definitions are used with the Extendable/ExtensionSet to provide type information.
  • ExtensionRegistry
    • This is a temporary structure that you load the generated const Extensions into before parsing.
    • It provides the information to the parser necessary to decode values into the type-erased ExtensionSets

The other major file src/generic.rs provides traits Merge, MergeRepeated, Encode, and EncodeRepeated, which exist exclusively to map rust types to the various functions in prost::encoding. This allows merging and encoding to work for the ExtensionSet fields that won't have prost derive attributes for each of field it needs to decode.

The remaining changes are mostly implementing the above traits, tests, documentation.

Notes

  • This PR makes additive changes to prost_types::protobuf.rs which adds to all *Options types:
    • derive ::prost::Extendable.
    • a pub extension_set field. ⚠️ This will break for anyone creating these types directly through e.g. FileOptions{...}.
    • an impl that defines a const EXTENDABLE_TYPE_ID.
  • This PR has almost zero runtime impact on anyone not using extensions.
    • The only effect is the various *Options take an additional Option<Box<_>> bytes.
  • I ran into a build issue on which seems to have been encountered by others (Add missing perl-unicode feature to regex crate #556). That is the change to prost-build/Cargo.toml.
  • I am happy to provide maintenance for this area of code in the future.

A nice side effect of this change is prost could define its own custom options for use in compile_protos e.g. the ability to tag specific fields to generate using small_vec.

Related issues

Resolves #256
Resolves #100

Copy link
Contributor

@neoeinstein neoeinstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor documentation changes that looked to be mismatched.

Would be interested to see how this could be used to, for instance, add an attribute (such as a proc-macro) to a message or a field to modify the code generation behavior.

@nswarm
Copy link
Author

nswarm commented Feb 20, 2022

Thanks @neoeinstein for some great usability changes!

@nswarm
Copy link
Author

nswarm commented Feb 21, 2022

Would be interested to see how this could be used to, for instance, add an attribute (such as a proc-macro) to a message or a field to modify the code generation behavior.

Operating inside the prost lib, we could define prost-level custom options that could be explicitly loaded and referenced during code generation and add the relevant rust attribute. That could even be a user defined macro, for instance we could directly inject the name into the attr list:

option (prost_proc_attr) = "::mycrate::MyAttr";

This would be akin to the protoc plugin insertion points in google's generators, although wouldn't scale very well for code blocks.

Is that the kind of thing you mean?

@neoeinstein
Copy link
Contributor

That's part of it. Inside prost, I'm thinking about things like indicating a message or field should be a foreign type, providing specified names, etc. Below is a series of potential things off the top of my head. Not all of these would need to be implemented; some will be more useful than others.

// This one would be more complex due to impacts to type resolution
// in generated code. It's also a bit less useful due to the way we use `include!`.
option (prost.package).path = "::my_crate::placement";

message Example {
    option (prost.type).type_name = "ExampleRust";
    option (prost.type).attrs = "#[::aliri_braid::braid]";

    uint64 id = 1 [(prost.field).field_name = "identifier"];
}

message External {
    option (prost.type).external = "::my_external::Type";

    sint32 value = 1;
}

message MessageWithUuid {
    string uuid = 1 [(prost.field).type = "::uuid::Uuid"];
}

message OwnedAndBorrowed {
    option (prost.type).gen_borrowed = true;

    bytes data = 1 [
        (prost.field).type = "::bytes::Bytes",
        (prost.field).borrowed_type = "[u8]"
    ];
    string my_typed_string = 2 [
        (prost.field).type = "::my_crate::MyStronglyTypedString",
        (prost.field).borrowed_type = "::my_crate::MyStronglyTypedStringRef"
    ];
}

A borrowed mode would make it possible to reduce unnecessary copies, as you could deserialize an owned buffer into values referring to that data or construct a message for serialization without needing to take ownership of all of the parts.

@neoeinstein
Copy link
Contributor

I posted up a really quick example for the renaming of a field option I mentioned above. Should probably move this discussion to a separate issue or such, but it all builds on top of this PR right now. Renaming of types and packages likely requires a two-pass process so that we can collect all of the aliases first and then use the correct aliases and internally-defined externs when doing code generation pass.

neoeinstein/prost@extensions-usability...neoeinstein:prost-ext

@@ -403,7 +403,7 @@ impl<'a> CodeGenerator<'a> {
self.push_indent();
self.buf.push_str("pub const ");
self.buf.push_str(&extension.name().to_ascii_uppercase());
self.buf.push_str(": &::prost::ExtensionImpl<");
self.buf.push_str(": &'static ::prost::ExtensionImpl<");
Copy link
Contributor

@neoeinstein neoeinstein Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did this initially, and then removed it based on a clippy recomendation because all const references have to be 'static already. Since this produces code in user's crates, I removed it so that they wouldn't get the generated clippy warning. It could probably be removed using an #[allow] attribute instead, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did this initially, and then removed it based on a clippy recomendation.

Really? It wasn't even compiling for me with errors like:

error[E0106]: missing lifetime specifier
   --> <snip>\prost\target\debug\build\protobuf-56feaaf2b10845f1\out/protobuf_unittest.rs:893:40
    |
893 | ...   pub const TEST_EXT_ORDERINGS3: &::prost::ExtensionImpl<TestExtensionOrderings3> = &::prost::ExtensionImpl::<TestExtensionOrderings3...
    |                                      ^ expected named lifetime parameter
    ```

Copy link
Contributor

@neoeinstein neoeinstein Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a compiler version issue. What version of the rust compiler are you using locally?

This is the specific clippy lint regarding redundant static lifetimes. According to this comment, it expects a Rust compiler to be version 1.17 or later, which is almost 5 years old now.

The Prost minimum supported version of the Rust compiler right now is, I believe, 1.51.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've built/tested on the following compilers using cargo test --workspace:

1.51.0-x86_64-unknown-linux-gnu
stable-x86_64-unknown-linux-gnu (1.58.1)

1.51.0-x86_64-pc-windows-msvc
stable-x86_64-pc-windows-msvc (1.58.1)

And I get the errors on all of them. Although I did find another unsupported 1.51 issue...

I do notice if I only use cargo test I won't get the errors. By chance are you not using --workspace?

Copy link
Contributor

@neoeinstein neoeinstein Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooh. After using cargo test --all, I see where it is. These are the nested extensions that end up inside impl blocks. For the module-level consts, the static modifier is redundant, but inside of an impl block they aren't considered redundant. I think that makes it possible for us to split the difference, either adding 'static always and throwing in the allow, or only adding in 'static when we're dealing with a nested extension.

This also is a good place to note that you'll see the clippy redundant_static_lifetimes warnings I mentioned above running cargo clippy --all (along with several others that should be fixed up in different PR).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threw you one more PR. It adds the above mentioned clippy lint as a kind of least-common-denominator. Beyond that, I found some errors when building on no-std, so I added the appropriate use references to the alloc crate to make sure that path compiles as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for digging into that!

@vriesk
Copy link
Contributor

vriesk commented May 30, 2022

Hi! Is this change abandoned now? It is somewhat similar to my feature request here: #658

I'd be particularly eager to see a way to inject support for PGV, preferably directly into validator tags.

@neoeinstein
Copy link
Contributor

Hi! Is this change abandoned now? It is somewhat similar to my feature request here: #658

I'd be particularly eager to see a way to inject support for PGV, preferably directly into validator tags.

This change is not abandoned, and indeed is planned on the way to Prost 1.0. However, we may not land this in the exact form provided in the PR for general public use. There is an underlying need to support unrecognized tags that this should be built upon, and that would eliminate the need to do special work up-front with a registry in order to get to these extensions.

Nonetheless, there are several, including myself, who are using this PR branch as a git dependency and a bridge for doing supplemental codegen. I have some plans to create a generator similar to PGV in my line of protoc-gen-prost crates.

The key thing is that, we want to have an API where you can just use decode rather than decode_with_registry.

It may be that we could land this for an intermediate release and then have a further breaking change that removes the need for a registry, but if we start publishing a release that includes support for it, we'd need to make sure the functionality isn't broken even across breaking API changes.

@vriesk
Copy link
Contributor

vriesk commented May 30, 2022

Thanks for the update. Yeah, I was even thinking about forking prost-build privately for now, however the problem I see is that it is non-trivial to integrate it with tonic for the full service generation (but I have not tried hard enough TBF). Or is there?

@nswarm
Copy link
Author

nswarm commented May 30, 2022

There is an underlying need to support unrecognized tags that this should be built upon, and that would eliminate the need to do special work up-front with a registry in order to get to these extensions.
(snip)
The key thing is that, we want to have an API where you can just use decode rather than decode_with_registry.

I'm not sure that is possible without a registry or some form of additional information up front.

Extensions are fundamentally data that cannot be parsed by the default parser, so the user has to provide additional information i.e. the registry which tells it how to parse that data. Similarly, unregonized tags / uninterpreted options are just serialized data the parser was not provided a schema for handling.

In either case you still need a way to parse it that's not available by default. If you don't provide a registry or schema up front, you get a list of "sorry couldn't parse this" data that you have to manually parse yourself.

In the case of extensions, the registry step could be eliminated by having a static, global registry that automatically contains all extensions in generated code at compile time, that the default parser would know to use. This is how google's c++ does it. That could be additive on top of this PR and even feature gated for anyone who wants to specify registry manually. Other implementations like java and c# use the extension registry pattern as well.

Unrecognized tags are a different beast because it means you have a different or out of date schema from what the data was serialized with. I think it's a subtle but important difference from extensions.

For that reason I think support for unrecognized tags is a separate code path from extensions, but fwiw I have not thought deeply on that.

@LucioFranco
Copy link
Member

@nswarm I would be interested in landing some of your work but because I am not the original author of this crate we are gonna need go through a proper proposal process w/ feedback etc in a github issue. Then I think we can start to land smaller PRs that are easier to review etc. Sorry for the major delay on reviewing I've had a huge backlog to get through. Thanks for your patience.

@nswarm
Copy link
Author

nswarm commented Jun 27, 2022

@nswarm I would be interested in landing some of your work but because I am not the original author of this crate we are gonna need go through a proper proposal process w/ feedback etc in a github issue. Then I think we can start to land smaller PRs that are easier to review etc. Sorry for the major delay on reviewing I've had a huge backlog to get through. Thanks for your patience.

That's fine. Let me know how you'd like me to proceed.

I'll need to spend some time figuring out how to break it up as well.

@LucioFranco
Copy link
Member

@nswarm lets start in an issue and you can tag me in it so I get an email and we can go from there. My gh notifications is much better now so I should have some time to talk through design etc. Thanks!

repeated bool ext_bool_repeated = 128;
repeated string ext_string_repeated = 129;
repeated bytes ext_bytes_repeated = 130;
repeated CustomMessageType ext_message_repeated = 131;
Copy link
Contributor

@andrewhickman andrewhickman Jul 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bit of an edge case, but it would be nice to test a group field here as well

Actually, I think the Merge/Encode traits don't handle groups at currently. Maybe ProtoIntType needs a Group variant?

/// Extension data must be retrieved and modified through use of static Extensions in generated code.
pub trait Extendable: 'static {
/// A static type id associated with this Extendable.
fn extendable_type_id() -> ExtendableTypeId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn extendable_type_id() -> ExtendableTypeId;
fn extendable_type_id(&self) -> &'a str;

Could this return a borrowed string to allow implementing it for something like DynamicMessage? Most consumers can just call T::default().extendable_type_id()

/// A generic container for an Extension that can delegate to the implementation for type-specific work.
pub trait Extension {
/// Fully-qualified type name of the message this extension is for.
fn extendable_type_id(&self) -> ExtendableTypeId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also could return a borrowed string to allow implementing for something like ExtensionDescriptor

fn extension_set(&self) -> &ExtensionSet<Self>;

/// The data structure that stores extension data.
fn extension_set_mut(&mut self) -> &mut ExtensionSet<Self>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods basically require the implementation to have an ExtensionSet field, which feels unnecessarily restrictive. For example, another valid implementation would be to store unknown fields at decode time (without looking at ExtensionRegistry and then decode on-the-fly in methods like extension_data

@nswarm
Copy link
Author

nswarm commented Aug 1, 2022

Thanks for the comments @andrewhickman! Note that I probably won't do much work on this for a bit, per status in: #674

Copy link

@slinkydeveloper slinkydeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all,
I need this feature for a prototype that I'm building that manipulates FileDescriptorSet with options, I'll be more than happy to provide a thorough review and feedback in the next days.

If it can be any helpful, here is my branch rebased with master: https://github.com/slinkydeveloper/prost/tree/extensions


### Reading Custom Options from Descriptors

```rust

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```rust
```rust,ignore

To disable testing it


When working with extensions on a decoded object, you use a reference to the generated `Extension` as a sort of handle.

```rust

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```rust
```rust,ignore

To disable testing it

@cgorski
Copy link

cgorski commented Sep 28, 2022

@nswarm just wanted to let you know I use this PR in a private project with a few code additions. We're replacing a legacy app using old proto2 definitions and it would not be possible to do a drop-in replacement in any reasonable way without this PR.

@Houndie
Copy link

Houndie commented Mar 2, 2023

I know this PR is kind of in limbo right now, and while I'm not volunteering to push it through to the end, I did go and make a fork of it and merge master into it. From reading the comments, it seems like a number of people are using this PR branch, so having an updated one might be helpful

https://github.com/Houndie/prost/tree/extensions

(I also fixed a small bug where Result wasn't properly scoped in macro generation)

@ozars
Copy link

ozars commented Sep 3, 2023

In the same vein, I rebased @Houndie's branch on current master at https://github.com/ozars/prost/tree/extensions, fixing a small issue due to failed doctests in readme file.

@jhugard
Copy link

jhugard commented Oct 2, 2023

I added a few commits to @ozars' fork above & merged in master to pick up 0.12.1

https://github.com/jhugard/prost/tree/extensions

  • Added extension_registry to ServiceGenerator Config & builder, so that that extensions can be used in the service_generator.
  • Commented-out generation of the file-level register_extensions helper, because it creates multiple conflicting instances when extensions are defined in multiple proto files using the same namespace. The message-level extensions must be registered individually instead.

@jhugard
Copy link

jhugard commented Oct 2, 2023

As an aside, I would be more in favor of a reflection-based approach applied at runtime. Feels very clunky to have to bootstrap extensions by copying a generated file into my project, so that the extension definitions are available for use with prost_build in build.rs.

@Kampfkarren
Copy link

Kampfkarren commented Jun 9, 2024

Should be noted that none of the forks here I've tried have seemed to work with ServiceGenerator, I still have no idea how to extract custom options from methods in those :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom options? Are proto2 extensions supported?