Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema registry design proposal #511

Merged
merged 14 commits into from
Jul 6, 2023

Conversation

slinkydeveloper
Copy link
Contributor

Part of #43.

This PR contains a design proposal for the schema registry. The idea of the next steps is the following:

  • This package hopefully will contain little to nothing "public types", except Schemas and the interfaces. All the other types should live in restate_types, as described in Refactor common #420
  • Schemas will get an implementation of each of the defined interfaces, all of them sharing the same tree of Endpoint/Service/DescriptorPool metadata under a single ArcSwap. We can also reason about potential optimizations using thread locals, as described here: Investigate whether Arc<ArcSwap> for accessing shuffle senders in network is more efficient than Arc<Mutex> #136 (comment)
  • We get rid of restate_service_metadata and restate_key_extractor modules
  • We implement the write side of the schema in Schemas#register_new_endpoint. This will contain down the road the logic for:
    • Register a new service
    • Update a service
    • Schema inconsistency checks
    • Registration in "Development mode"
  • This Schemas object in the distributed mode will be updated by the "node service" somehow, perhaps not through register_new_endpoint but a rather simple method that will apply some "diff" calculated by the meta service. Perhaps moving forward register_new_endpoint will be split between doing checks, calculating the diff and then applying the diff itself.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the schema package make sense to me. Having a single source Schemas which contains the schema information and then exposes the different views to it via traits sounds like a good idea to me. This will hopefully help us keeping schema information consistent.

Once we go distributed then there will probably be a small difference between the schema registry running on the meta and the ones being local to the worker which is that the latter might want to ask the former for an update if a resolution operation failed because some information was missing.

@slinkydeveloper
Copy link
Contributor Author

@tillrohrmann this PR is ready to be reviewed again I think. I updated it with the latest changes and rebased on #551.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @slinkydeveloper. The changes make sense to me. I had a few minor comments. +1 for merging this PR once they are resolved and the preceding PR gets merged.

src/schema/src/endpoint.rs Outdated Show resolved Hide resolved
Comment on lines 6 to 7
// We should try to not leak prost_reflect types as they can be problematic to implement
// the consistent schema view in future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation why this is the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't apply with last commits.

src/types/src/identifiers.rs Outdated Show resolved Hide resolved
Comment on lines +19 to +26
// I need to run this twice to make sure I split the file descriptor set between prod and test
prost_build::Config::new()
.file_descriptor_set_path(
PathBuf::from(env::var("OUT_DIR").expect("OUT_DIR environment variable not set"))
.join("file_descriptor_set_test.bin"),
)
.bytes(["."])
.compile_protos(&["tests/proto/test.proto"], &["tests/proto"])?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can't be within a if cfg!(test) or so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, build.rs is once per project: rust-lang/cargo#1581 rust-lang/cargo#1430

src/schema_impl/src/lib.rs Outdated Show resolved Hide resolved
src/schema_impl/src/proto_symbol.rs Outdated Show resolved Hide resolved
src/schema_impl/src/proto_symbol.rs Outdated Show resolved Hide resolved
src/schema_impl/src/lib.rs Outdated Show resolved Hide resolved
src/schema_impl/src/lib.rs Outdated Show resolved Hide resolved
@@ -89,7 +89,7 @@ impl ProtoSymbols {
for file in &files {
// We rename files prepending them with the endpoint id
// to avoid collision between file names of unrelated endpoints
// TODO this should probably go in https://github.com/restatedev/restate/issues/43
// TODO with schema checks in place, should we move this or remove this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove prepending the endpoint_id you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, changed the comment to specify that.

@slinkydeveloper slinkydeveloper merged commit 9a546e0 into restatedev:main Jul 6, 2023
@slinkydeveloper slinkydeveloper deleted the issues/43 branch July 6, 2023 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants