-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Time Lookup #1259
Query Time Lookup #1259
Conversation
31769bc
to
37ce08d
Compare
"columns":["key","value] | ||
} | ||
}, | ||
"updateMs":0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the update be a function outside of the namespace? Shouldn't the namespace define it's own update mechanisms that make sense for however it is integrated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can, especially since updateMs doesn't make much sense for some namespace types like Kafka.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that you are agreeing and will remove updateMs
as an external property, instead, making it a part of the particular namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, instead of having a "namespace" and "updateMS" the namespace spec will have updateMs in it for those which understand such a thing, and the "namespace" will be the top level object.
6f79ce8
to
ff42f6c
Compare
final String namespace | ||
) | ||
{ | ||
Preconditions.checkNotNull(kafkaTopic, "kafkatTopic required"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with @NotNull in place, are we putting additional checks with the expectation that these objects will be hand constructed without json deserialization as well?
and a minor typo, s/kafkatTopic/kafkaTopic/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only checks (Preconditions.checkNotNull) in the constructor. The only way someone should be able to set the values as null is via reflection, in which case all bets are off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with Preconditions check but thought they were redundant because jackson will do the null checks automatically given that you have @NotNull specified on those.
c98ae14
to
a4431bb
Compare
Assert.assertTrue(fnCache.containsKey(ns)); | ||
prior = runs.get(); | ||
Thread.sleep(50); | ||
Assert.assertTrue(runs.get() > prior); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is really no guaranty that this will happen (it is likely though). since this check does not seem to be essential (i understand it is checking that updater runner continued to run as scheduled), does it make sense to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but the purpose is exactly as you put it, to make sure the scheduler is behaving properly before continuing with the rest of the test.
There is a similar question asked in just a few lines (which is where I on very rare occasions see problems) which is if the scheduler has now stopped properly.
If you have an idea for a better way to test or ensure this then I am very open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if a namespace is scheduled to update regularly, I don't want it to continue updating after it has been told to cancel its tasks, or else the namespace will re-populate during the delete process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified this a bit to hopefully change the way the tests are done and squash the problem, which I think was a test problem rather than an impl problem.
* date version of data given a base descriptor and a matching pattern. "Version" is completely dependent on the | ||
* implementation but is commonly equal to the "last modified" timestamp. | ||
* | ||
* EXPERIMENTAL: This is implemented explicitly for URIExtractionNamespaceFunctionFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer not to have messages like this. When we release QTL, it should be tested in production somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing experimental notice.
There was a question as to why certain namespace types exist. Here's a brief rundown:
The question was if a kafka extension is really needed. It is the only way to instantly propagate a change through the cluster in the current collection of ways to propagate data. If that is deemed not required then it is pretty easy to pull out the extension. |
62cccac
to
a592d31
Compare
* Adds kafka, URI, and JDBC namespace defintions * Add ability to explicitly rename using a "namespace" which is a particular data collection that is loaded on all realtime, historic nodes, and brokers. If any of these nodes has the namespace extension, ALL nodes have the namespace extension. * Add namespace caching and populating (can be on heap or off heap) * Add NamespaceExtractionCacheManager for handling caches * Added ExtractionNamespace for handling metadata on the extraction namespaces * Added ExtractionNamespaceUpdate for handling metadata related to updates * Add extension which caches renames from a kafka stream (requires kafka8) * Added README.md for the namespace kafka extension * Added docs * Added namespace/size, namespace/count, namespace/deltaTasksStarted metrics Add static config for namespaces via `druid.query.extraction.namespace` * This is a rebase of https://github.com/b-slim/druid/tree/static_config_only
a592d31
to
86ede70
Compare
This PR is at the code review stage. Comments on either the high-level overview or the low-level implementations are welcome. This master comment will be updated with pertinent discussion points if they become major topics in the track below.
Add query time lookups for renames via query "namespace" (may end up renaming this to something with a more natural description)
namespace/size
,namespace/count
,namespace/deltaTasksStarted
metricsNote, the below is for the second round of PRs, which will go in after the static config.
here is how namespace populating will eventually work ( https://github.com/metamx/druid/tree/queryTimeLookup_announce ):
Since everything is loaded everywhere this approach is an "at least once announcement" kind of approach and things eventually settle onto a state as the leading coordinator keeps polling the metadata and spitting out the "correct" state to each node.
Prior PR:
#1093