-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid Catalog basics #13165
Druid Catalog basics #13165
Conversation
This pull request introduces 8 alerts when merging c958a98 into ebfe1c0 - view on LGTM.com new alerts:
|
Catalog object model for tables, columns Druid metadata DB storage (as an exension) REST API to update the catalog (as an extension) Integration tests Model only: no planner integration yet
c958a98
to
bd7c8be
Compare
This pull request introduces 6 alerts when merging bd7c8be into 7fa53ff - view on LGTM.com new alerts:
|
Build is clean except for one unrelated flaky failure in a Kafka IT. Will rerun later as I'm sure someone will have comments that require code changes and a new build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the design issue and changes @paul-rogers! 👍
I've currently taken a first pass at the core changes as of now - posting the current review as a checkpoint as I review the remaining part.
Thanks for the in-code explanations and patience as well!
processing/src/main/java/org/apache/druid/segment/column/RowSignature.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalSpec.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/Parameterized.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/ColumnFacade.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/InputTableFacade.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/MeasureTypes.java
Outdated
Show resolved
Hide resolved
Addressed direct review comments Removed rollup support per offline discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohangarg, thanks for your review. Addressed your comments. Also revised the model to remove the "rollup" concept, leaving just the one "datasource" table type.
server/src/main/java/org/apache/druid/catalog/model/table/MeasureTypes.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/InputTableFacade.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/ColumnFacade.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/Parameterized.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/Parameterized.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/data/input/InputFormat.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/data/input/InputSource.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/java/util/common/jackson/JacksonUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is huge and I just took a short time to look through and left some minor suggestions.
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Show resolved
Hide resolved
extensions-core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/HideColumns.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
* If SQL based, return the SQL version of the metastore | ||
* connector. Throws an exception if not SQL-based. | ||
*/ | ||
SQLMetadataConnector sqlConnector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About this interface and above isSql
, I think we don't need such a specialized method defined at this interface level. A more appropriate way is to change it as
MetadataStorageConnector connector();
and remove the isSql
interface.
The SQLCatalogManager
checks if the returned connector is instanceof SQLMetadataConnector
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right. This is a work-in-progress: I have not currently set up a way to test with one of the non-SQL storage extensions. The DB integration is rather a hack: extensions can't really depend on one another, so the extension-DB code to create tables (etc.) can't know about the catalog (yet), and so that code lives here. Since it lives here, we have to know that we are, in fact, dealing with a SQL DB so the code can blow up if the storage is something else (and so this extension doesn't know how to create the table-like-thing that the target non-SQL DB needs.)
Druid is in dire need of a more general DB storage API (as well as a mechanism for DB schema evolution.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same sentiment, I think the getDBI
method on the SQLMetadataConnector
interface feels a bit like a leak. But due to the lack of choices, I'm ok with the general current implementation.
Small styling suggestion could be to use MetadataStorage
instead of Metastore
to adhere to the current naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general issue is how we should handle metadata storage for extensions. The catalog is an extension for now. Once it is merged into the core Druid, the DB access stuff can be distributed across the SQL extensions and other extensions.
For an extension, however, the storage has to live in the extension itself. It is hard to see how, say, the catalog extension would work with, say, a Mongo DB storage extension: either that extension has to know about the catalog extension, or visa-versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed class and combined the interface and implementation since it turned out there was only one implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rohangarg and @FrankChen021 for your reviews. I've either updated the code or provided a description for each of our comments.
core/src/main/java/org/apache/druid/data/input/InputFormat.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/java/util/common/jackson/JacksonUtils.java
Outdated
Show resolved
Hide resolved
* If SQL based, return the SQL version of the metastore | ||
* connector. Throws an exception if not SQL-based. | ||
*/ | ||
SQLMetadataConnector sqlConnector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right. This is a work-in-progress: I have not currently set up a way to test with one of the non-SQL storage extensions. The DB integration is rather a hack: extensions can't really depend on one another, so the extension-DB code to create tables (etc.) can't know about the catalog (yet), and so that code lives here. Since it lives here, we have to know that we are, in fact, dealing with a SQL DB so the code can blow up if the storage is something else (and so this extension doesn't know how to create the table-like-thing that the target non-SQL DB needs.)
Druid is in dire need of a more general DB storage API (as well as a mechanism for DB schema evolution.)
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
register(new SchemaDefnImpl( | ||
TableId.LOOKUP_SCHEMA, | ||
ResourceType.CONFIG, | ||
null // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that this early draft does not yet include lookup tables and so there are no valid table types for the lookup schema. Ideally, that will change as the project progresses so that lookup tables can have entries in the catalog.
register(new SchemaDefnImpl( | ||
TableId.VIEW_SCHEMA, | ||
ResourceType.VIEW, | ||
null // TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Druid currently offers at multiple view systems: one via an extension, another two available from Imply. There may be more. Eventually, it will make sense to define views in the catalog, just as is done in most DB systems. The TODO here is a reminder that that work is pending once we get the basics working.
public class ClusterKeySpec | ||
{ | ||
private final String expr; | ||
private final boolean desc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that clustering is sorting and sorting is either ascending or descending. (The Druid implementation seems to always use nulls-first sorting.) See SortColumn
in MSQ. In fact, maybe this should just reuse SortColumn
, now that I know that class exists...
Can you give me a hint as to what other options we might offer in the future? That way, we can figure out how to encode them.
I believe @clintropolis is working on some cool new indexing ideas. If that results in more than one index choice per column (i.e. front-encoding or not), then we can add properties to columns for per-column choices, or to the table for cross-column choices. (The reason that clustering is a table property is that it includes more than one column.)
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Show resolved
Hide resolved
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Outdated
Show resolved
Hide resolved
} | ||
TableSpec spec = tableSpec.spec(); | ||
if (CatalogUpdateNotifier.TOMBSTONE_TABLE_TYPE.equals(spec.type())) { | ||
listener.deleted(tableSpec.id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have 2 different endpoints; one for update, one for delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or three: one for add, another for update, and another for delete. Then, replicate that for other catalog objects: connections, etc. Pretty soon one has a big API to maintain. Since the goal here is to simply shift delta messages, we use the "tombstone encoding" to merge the delete and the add/update cases.
The code right now is specific to tables because that's all this version of the catalog includes. Still, due to the rolling-upgrade forward/backward compatibility issue, perhaps this needs to change. Perhaps another approach is to wrap the table in an event message:
{
"objectType": "table",
"action": "delete",
"table": {
"schema": "druid",
"name": "doomed"
}
}
A receiver would ignore object types it doesn't understand (it means the coordinator is newer than the broker.) I'll play with that a bit to see if that will work.
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Outdated
Show resolved
Hide resolved
extensions-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogResource.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
private Response authorizeTable(TableId tableId, TableSpec spec, final HttpServletRequest req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use resourceFilter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A resource filter is handy when the resource pattern is known and multiple endpoints follow the same pattern. At present, the endpoints here follow different patterns. This code is more like the Broker (have to root around in the SQL to find the resources) than, say, tasks, where it is obvious that the task ID is to be authorized, and there are multiple task endpoints.
The filter is a crude tool. Another form of reuse is the subroutine, which is the technique used here.
That said, if the redesign of the REST API results in multiple messages, then we'll consider moving the auth code to a filter if that makes the code simpler.
server/src/main/java/org/apache/druid/catalog/model/TableDefnRegistry.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/TableMetadata.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/TableSpec.java
Outdated
Show resolved
Hide resolved
This pull request introduces 1 alert when merging 5de07ce into c875f4b - view on LGTM.com new alerts:
|
server/src/main/java/org/apache/druid/catalog/model/table/AbstractDatasourceDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/AbstractDatasourceDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ClusterKeySpec.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/HttpTableDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/HttpTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/InputFormats.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/LocalTableDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/HttpTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/TableBuilder.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/storage/derby/DerbyMetadataStorage.java
Show resolved
Hide resolved
...ns-core/druid-catalog/src/main/java/org/apache/druid/catalog/sync/CachedMetadataCatalog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making all the changes!! Once we get the tests fix, think this is good to go.
The Druid catalog provides a collection of metadata "hints" about tables (datasources, input sources, views, etc.) within Druid. This PR provides the foundation: the DB and REST layer, but not yet the integration with the Calcite SQL layer. This is a much-refined version of the earlier catalog PR.
The DB layer extends what is done for other Druid metadata tables. The semantic ("business logic") layer provides the usual CRUD operations on tables. The entire design is pretty standard and follows Druid patterns. The key difference is the rather extreme lengths taken by the implementation to ensure each bit is easily testable without mocks. That means many interfaces which can be implemented in multiple ways.
Parts available in this PR include:
The catalog mechanism is split into two parts.
druid-catalog
extension which stores data in the Druid metadata database.This split exists for two reasons:
druid-catalog
extension now acknowledge that they are using it only for testing, not production, and at their own risk.Functionality not in this PR, but which will appear in the next one, includes:
INSERT
andREPLACE
).SELECT
queries.This is a great opportunity for reviewers to provide guidance on the basic catalog mechanism before we start building SQL integration on top. Please see the Druid catalog issue for additional details about the goals and design of the catalog.
Update 10-17
After much debate, we decided to drop the rollup feature from the catalog. It turns out that "rollup" is a feature used for many purposes and more thought is needed to work out how users might want to use the rollup features. Compared to the Catalog proposal, the initial feature won't help with MSQ ingestion of rollup tables: users are empowered to specify on each ingestion how they want those segments to be rolled up.
A new commit pulls out the portions of the code in support of the proposed, but now dropped,
rollup
datasource type.This PR has: