Skip to content

[feat] Support AUTO_PUBLISH schema. #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

shibd
Copy link
Member

@shibd shibd commented Dec 12, 2022

Master Issue: apache/pulsar/pull/2685

Motivation

Implementing AUTO_PUBLISH schema solves the following problems:

  1. If a schema exists in a topic, but the producer does not know it. He still wanted to send data to the topic. (The scenario in which the DLQ sends a message: Please refer [feat] Support Dead Letter Topic. #139 [Verifying this change][5])

  2. For clients based on C++ implementations(Node and Python), these client supports help users serialize and deserialize.
    They can use this AUTO_PUBLISH schema feature to implement the same semantics as Java clients.

Modifications

  • When creating a producer with AUTO_PUBLISH schema, try to get the schema of that topic and use it.
  • Support getSchema on LookupService(HTTP and Binary).

Verifying this change

  • Add LookupServiceTest unit test to cover getSchema logic.
  • Add SchemaTest.testAutoPublicSchema unit test to cover creating producer success when the topic has a schema.

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@shibd shibd mentioned this pull request Dec 12, 2022
4 tasks
@shibd shibd changed the title Support AUTO_PUBLISH schema. [feat] Support AUTO_PUBLISH schema. Dec 12, 2022
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should review the motivation of the AUTO_PRODUCE schema before introducing this feature into other clients.

It was introduced from apache/pulsar#2685 without any PIP. The motivation doesn't make sense to me now. It's more likely to solve the scenario that a Pulsar Function cannot send messages to a topic with schema. But the cost is high because each time a message is sent, the validation will happen.

Regarding the DLQ scenario, is there any reason to register chema for the DLQ topic? Could this DLQ topic be produced by other clients?

@BewareMyPower
Copy link
Contributor

It looks like this PR mainly solves the issue that the producer cannot be created when isSchemaValidationEnforced is true. But the validation still requires the new dependencies like Avro. Otherwise, the AUTO_PUBLISH schema with no validation could be weird.

We should not introduce the validation logic into the C++ client. Maybe we can think of a way to support passing a customized message builder so that we can perform the validation on a generic type without exposing the internal code.

If a schema exists in a topic, but the producer does not know it. He still wanted to send data to the topic. (The scenario in which the DLQ sends a message

Could you describe in which case could the DLQ topic have schema? I thought he topic could only be created by the client itself, right?

@shibd
Copy link
Member Author

shibd commented Dec 16, 2022

It looks like this PR mainly solves the issue that the producer cannot be created when isSchemaValidationEnforced is true. But the validation still requires the new dependencies like Avro. Otherwise, the AUTO_PUBLISH schema with no validation could be weird.

Yes, and we also need to support setting the schema when sending messages. Only then is AUTO_PUBLISH fully usable.

We should not introduce the validation logic into the C++ client. Maybe we can think of a way to support passing a customized message builder so that we can perform the validation on a generic type without exposing the internal code.

I think we can open an issue or mail thread to discuss it.

Could you describe in which case could the DLQ topic have schema? I thought he topic could only be created by the client itself, right?

Yes, When the user creates a dead-letter topic by himself, and the schema of the dead-letter topic is not compatible from the schema of the source topic, the DLQ producer will create failed.

BTW: This scenario is relatively rare, and I don't think it affects the release of DLQ features.

I will close this PR first.

@shibd shibd closed this Dec 16, 2022
@shibd
Copy link
Member Author

shibd commented Dec 16, 2022

Yes, When the user creates a dead-letter topic by himself, and the schema of the dead-letter topic is not compatible from the schema of the source topic, the DLQ producer will create failed.

Rectify: If a topic has a schema, and the consumer uses AUTO_CONSUME schema, and DLQTopic also has the same schema. In this case, the DLQ producer fails to create(IncompatibleSchema). Because DLQ producers created in consumers do not have a schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants