-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 578: Find schema version by fingerprint of schema text. #579
Issue 578: Find schema version by fingerprint of schema text. #579
Conversation
Adds a new REST endpoint to find a schema version by the fingerprint of the schema text.
@joaopedroantonio Thanks for showing interest in the project. May I know the circumstances that you are trying to address with this PR? |
@raju-saravanan Sure, I can give you some context. At DefinedCrowd we're improving our event sourcing infrastructure and as part of that initiative we want to add Schema Registry to store our schema definitions. For critical services, we want to register new schema versions at deploy time and publish messages without having to contact the Schema Registry in runtime. To accomplish this, instead of using the schema version id in the message, we can use the schema text fingerprint. This has a couple downsides, the increase in the message size and the (remote) possibility of having a hash collision but it's a trade off we're willing to make for increased resiliency and easier scaling in critical parts of our architecture. |
@joaopedroantonio So will you be exposing new ser des protocol in SerDesProtocolHandlerRegistry to use fingerprint to serialize and deserialze message ? We may have a scenario where we have two schema version with same fingerprint esp when two schemas are canonically same but have different default values for certain fields in the schema, in that case with the current implementation - which picks up a first schema from the result set - deserailized message might not be the one a use expects. We can resolve this by sorting the results set and pick up with schema version with latest timestamp. Will you be ok with this ? @satishd and @michaelandrepearce can you guys share your opinions on this ? |
@raju-saravanan I wasn't thinking about implementing a new protocol, as we'll be using the Schema Registry in our scope solely to store and manage the schemas. My suggestion is that we keep this PR simply for exposing the REST endpoint and I'll create a new one to implement the new Protocol. What you think? Regarding the scenario of having two schema version with the same fingerprint, I agree with returning the latest schema version rather than the first or we could even make this endpoint return all of them. I'm find with both options. What you think? |
@joaopedroantonio:
I have couple of more questions,
|
Exactly, we'll handle serialization/deserizalization on our side (at least for now).
Wasn't planning to but I'll take a look into it and might end up implementing it for the sake of fun. I'll keep it on a separate PR though.
Yes, I'll do it. I'll try to update this PR until the end of the week with the two changes we've discussed:
|
…rsion found. - Add unit tests to validate find by fingerprint feature.
@raju-saravanan implemented the changes discussed previously, let me know your thoughts. :) |
@@ -82,6 +82,15 @@ default SchemaVersionInfo getSchemaVersionInfo(String schemaName, String schemaT | |||
*/ | |||
SchemaVersionInfo getSchemaVersionInfo(String schemaName, String schemaText, boolean disableCanonicalCheck) throws SchemaNotFoundException, InvalidSchemaException, SchemaBranchNotFoundException; | |||
|
|||
/** | |||
* If there is a version of the schema with the given fingerprint then it returns the respective {@link SchemaVersionInfo}, else it returns null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "else it returns null" can we replace it with "else it throws SchemaNotFoundException"?
LOG.warn(String.format("Multiple schemas found for the same fingerprint: %s", fingerprint)); | ||
} | ||
|
||
return schemas.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of sorting the schemas in-memory, can we get sorted schemas from the database and pick the first one? You can have a look at this code which does this.
@joaopedroantonio : Left few more comments to address. |
- Sort schemas by timestamp in database, instead of in memory. - Improve javadoc.
Thank you for the comments @raju-saravanan, I think they're addressed. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM
Adds a new REST endpoint to find a schema version by the fingerprint of
the schema text.