-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Schema validation not working: producing messages that are non-compliant with the schema is allowed #1296
Comments
Pulsar does not validate the structure of JSON messages against the topic's schema. It only checks if the producer's schema definition matches the broker's topic schema. In contrast, the Avro schema validates the message structure during both encoding and decoding. This is the same for the Java client. If you need to verify message structure compatibility when sending the messages, you could use the Avro Schema. Otherwise, you need to ensure the producer's schema matches the message structure. Here's a similar example in Java: PulsarClient client = PulsarClient.builder()
.serviceUrl("pulsar://localhost:6650")
.build();
String schemaDef = "{\n" +
" \"type\": \"record\",\n" +
" \"name\": \"SchemaTest\",\n" +
" \"fields\": [\n" +
" { \"name\": \"userName\", \"type\": \"string\" },\n" +
" { \"name\": \"userAge\", \"type\": \"int\" }\n" +
" ]\n" +
"}";
SchemaInfo schemaInfo = SchemaInfo.builder()
.name("SchemaTest")
.type(SchemaType.JSON)
.schema(schemaDef.getBytes())
.build();
GenericSchema<GenericRecord> schema = Schema.generic(schemaInfo);
Producer<GenericRecord> producer = client.newProducer(schema)
.topic("test-schema")
.create();
GenericRecord record = schema.newRecordBuilder()
.set("notUserName", "Alice")
.set("notUserAge", 30)
.build();
producer.send(record);
System.out.println("Message sent!");
producer.close();
client.close(); The message will be sent successfully without validating the schema. |
I expected JsonSchema to function similarly to AvroSchema, since its very close to it - schema definition is Avro for both, only the type in SchemaInfo is what differs, and how message validation works apparently. Might be a naive view since I haven't looked at how they differ in the internals of Pulsar. Shouldn't the client library ensure that messages sent with a Schema Producer adhere to its schema? This .NET library returns a typed producer which can't send other type of messages through it, for example. My impression of the schema validation feature of Pulsar is that it should bring in some guardrails to get some contract between producers and consumers. |
When the client encodes the Avro message, the Avro encoder automatically validates the schema. However, JSON schema encoding works differently. We do not pass the schema definition to the encoder. Instead, we encode from any object type to JSON bytes. The JSON encoder does not perform any validation. This's same for both the GO and Java client.
For client SDKs such as .Net and Java, they can use generics class feature to restrict the type of messages sent or received by the producer or consumer. However, the Go producer and consumer do not use generics. In fact, they are more similar to the Java client's Producer<byte[]>. |
Thanks for pointing it out, it was something that I saw and had me wondering why there's no validation. If I understand correctly, there's nothing to do for now besides accepting and working around it e.g. if validation is needed, to use
I think it would be interesting to offer this. For example, I noticed that although Thanks for taking the time to respond to all of this. I'm trying to understand better the existing functionality, whether there's something we can contribute and what needs to be taken as is. |
Would it make sense for me to contribute some extra validation to not allow using Payload if a schema is used, ensuring that you can't override schema validation? I think this could be added here. |
This would break the existing behavior. I'm afraid many users already use it that way. |
Expected behavior
Setup:
Expected:
Actual behavior
The producer created with a schema can publish non-compliant messages. The payload is not validated against the schema.
Steps to reproduce
The first 3 steps from setup can be done through the admin API:
The code:
The schema from the registry (http response dump):
System configuration
Pulsar version: 3.0.6.8
The text was updated successfully, but these errors were encountered: