Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HadoopIndexer job with input as the datasource and configured segments table doesn't work #7482

Closed
samarthjain opened this issue Apr 15, 2019 · 1 comment

Comments

@samarthjain
Copy link
Contributor

Affected Version

0.14, 0.13, 0.12
The Druid version where the problem was encountered.
0.12

Description

I was trying out the hadoop based reingestion job http://druid.io/docs/latest/ingestion/update-existing-data.html which uses the datasource itself as the input.

When I ran the job, it failed because it was trying to read segment metadata from druid_segments table and not from the table, customprefix_segments, I specified in the metadataUpdateSpec.

"metadataUpdateSpec": {
"connectURI": "jdbc:mysql...",
"password": "XXXXXXX",
"segmentTable": "customprefix_segments",
"type": "mysql",
"user": "XXXXXXXX"
},

Looking at the code, I see that the segmentTable specified in the spec is actually passed in as pending_segments table (3rd param is for pending_segments and 4th param is for segments table)
https://github.com/apache/incubator-druid/blob/master/indexing-hadoop/src/main/java/org/apache/druid/indexer/updater/MetadataStorageUpdaterJobSpec.java#L92

This code has been around forever though, so would have to be careful before simply switching the order of param values.

@samarthjain
Copy link
Contributor Author

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant