Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Server architecture (gRPC?) #10

Closed
abrassel opened this issue Jun 20, 2024 · 9 comments
Closed

[RFC]: Server architecture (gRPC?) #10

abrassel opened this issue Jun 20, 2024 · 9 comments
Labels

Comments

@abrassel
Copy link
Collaborator

TL;DR

As we begin work on the server, we will need to settle on a high level architecture. What will the server do? What framework(s) will we use? What is the entity source of truth? ... and more.

This RFC will dedicate one section to each of these questions, and we can expand the scope as necessary.

External Requirements

MUST

  • Implement HTTP serving
  • Handle application/x-ndjson
  • Conform to Unity Catalog OpenAPI spec

SHOULD

  • Allow users to plug their own database
  • Support concurrent requests at scale
  • Have support for monitoring and other obs nice to haves

gRPC

In addition to these external requirements, I am proposing for our implementation to expose a mirror set of gRPC endpoints as well. This, combined with a protobuf spec, will have a number of performances

  • The usual performance, etc suspects. Notably, 7-10x faster speed
  • Easy for other clients to piggy-back on our protobuf definitions
  • Can use protobuf to generate our object model

Object Model

Currently, we have code-genned a set of Rust objects to represent the Unity Catalog object model. Instead, if we use protobuf, I am proposing we also generate our Rust object model from the protobuf generation. This comes with a number of benefits:

  • Easily discoverable for non-Rust users
  • Centralized object model
  • Declarative object model
  • Consistent with other Databricks OSS projects such as spark-connect
  • Doesn't produce an implicit transitive dependency on Rust objects for non-Rust clients.

Server architecture

If we pursue a dual gRPC and HTTP server, this guide seems like a decent model to follow.

TL;DR it proposes using axum, hyper, tonic, tower.

We can furthermore use prost to generate our Rust types and utoipa for our openAPI and swagger spec. It is worth noting that we will need to do additional work and possibly an upstream contribution to properly support the application/x-ndjson specification. See this issue for some context.

@abhiaagarwal
Copy link
Collaborator

abhiaagarwal commented Jun 20, 2024

This sounds like a great plan @abrassel! A couple questions I have:

  • For gRPC, do we want to work with the Java reference implementation or can we differentiate here?
  • What's the need for application/x-ndjson?

In addition, I've used that guide before for personal use and it's a bit out of date for modern axum + tonic versions, but I think there's been a good amount of progress on that end. tokio-rs/axum#2736

@ognis1205
Copy link

ognis1205 commented Jun 20, 2024

@abrassel @abhiaagarwal

Sorry for jumping into the conversation. Regarding application/x-ndjson, if I understand correctly, Unity Catalog will eventually support the Delta Sharing protocol as well. At that point, application/x-ndjson will be relevant when implementing the following protocol specification:

@abhiaagarwal
Copy link
Collaborator

abhiaagarwal commented Jun 20, 2024

Hey @ognis1205,

First of all, don't apologize! This is a "Request for Comments", every and all comments are appreciated :D

Second of all, I just looked at the Delta Sharing protocol and it looks relatively trivial to implement (in fact, I see on your profile that you have a delta-sharing-rs server implementation, we can likely just directly leverage that and nest it under the main router) — my only question is, where did you get the information that Unity will support Delta Sharing? I was under the impression that DBX provides its own properitary server implementation that interfaces with Unity, but it's not necessarily a built-in feature of the catalog itself. I'm not aware of the internals.

@ognis1205
Copy link

@abhiaagarwal

Thank you for the reply and your understanding. Regarding the main router, yes, I thought the same way as you did. The reason I believe Unity will support Delta Sharing is due to the following comment and the resource:

As you mentioned, just from the roadmap and her statement, it might still be unclear how they plan to support the Delta Sharing protocol.

@abhiaagarwal
Copy link
Collaborator

abhiaagarwal commented Jun 20, 2024

@ognis1205 ty so much for the links! You're indeed right. At the end of the day, the "unity catalog protocol" is basically an access-control server for assets scoped like a database (all it does is hand out leases to assets living in cloud storage), delta sharing is basically the same thing without the multimodality and some parquet-specific optimizations (like data skipping). I guess we can say that the unity catalog is meant to represent an evolution of delta-sharing (while delta-sharing is a bit more stateful, unity catalog is theoretically agnostic to the underlying data asset).

That is to say, if the unity catalog is a more generalized form of delta-sharing (which I currently believe it to be), then nesting a router under the main unity catalog router is probably trivial depending on the backend.

I don't know in all honesty, but anyways, I just discovered axum-extras supports ndjson, so it's kind of a moot point anyways :)

@amogh-jahagirdar
Copy link

One requirement I'd like to advocate for is support for the Iceberg REST catalog, like what's being worked in https://github.com/unitycatalog/unitycatalog ! I'd be happy to help with any efforts in that area.

@abrassel
Copy link
Collaborator Author

Thanks @amogh-jahagirdar ! That's a great suggestion. I agree that we should definitely prioritize that super useful feature.

It may be slightly outside the scope of this RFC, since here we're focused on the broad capabilities and architecture - i.e. are we exposing gRPC endpoints, rather than which API endpoints.

That being said, It would be great if you could submit an RFC explicitly asking for Iceberg support! I don't think it'll be controversial :)

@abhiaagarwal
Copy link
Collaborator

Final thoughts from anyone in this thread?

Personally, I am inclined towards gRPC and Protobuf definitions, but at least for the time being, I want to focus on the REST implementation first and retrofit it later. I've spent some time trying to get the new axum and tonic working and it's quite challenging, I worry that gRPC will block us.

@abrassel
Copy link
Collaborator Author

sounds great to me! I think lets consider this RFC closed. I'll be approaching this from the client side and we can use swagger to generate a rust client from the openapi spec.

@unitycatalog unitycatalog locked and limited conversation to collaborators Jun 29, 2024
@rtyler rtyler converted this issue into discussion #14 Jun 29, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

4 participants