-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow writing InputRowParser extensions that use hadoop/any libraries #1695
Allow writing InputRowParser extensions that use hadoop/any libraries #1695
Conversation
most of the changes are updates due to constructor arg changes in DataSchema. changes to note are in DataSchema.java, HadoopDruidIndexerConfig.java, IndexingHadoopModule.java and HadoopyStringInputRowParser.java |
null, | ||
ImmutableList.of("timestamp", "host", "host2", "visited_num") | ||
) | ||
MAPPER.convertValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work recursively? Or would you get a Map where the values are regular objects rather than other Maps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does convert recursively, everything to Map. also, it wouldn't matter for this test as long as
mapper.convertValue(mapper.convertValue(InputRowParser parser, Map.class), InputRowParser.class)
produces the right thing. Specific changes for DataSchema serde are tested in DataSchemaTest .
@himanshug had a couple minor comments but overall looks good to me. I don't see a better way of having hadoopy parsers and still respecting the classpath situation… |
Discussed this on the dev sync, @cheddar thought that in the future it would make sense for the hadoop input stuff to be loadable from extensions, similar to how the firehoses can be loadable from extensions. 👍 on this pr for now though, since that's a longer term approach |
70d4478
to
7b305c2
Compare
@gianm addressed review comments and rebased against latest master. |
private static final ObjectMapper jsonMapper; | ||
static { | ||
jsonMapper = new DefaultObjectMapper(); | ||
jsonMapper.setInjectableValues(new InjectableValues.Std().addValue(ObjectMapper.class, jsonMapper)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Injected ObjectMappers should be tagged with @Json
or @Smile
. Is there somewhere that is not doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
@gianm / @himanshug the problem of lazy converting of |
public HadoopIngestionSpecUpdateDatasourcePathSpecSegmentsTest() | ||
{ | ||
jsonMapper = new DefaultObjectMapper(); | ||
jsonMapper.setInjectableValues( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same @Json
comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
@himanshug does this change introduce any performance overheads? |
@xvrl none observable . That said, in theory, the InputRowParser is created each time getParser() is called (see #1695 (comment) ) , however that never happens in a tight loop so it does not matter. |
@drcrallen this PR is pending response on pls see whenever you have time. thanks. |
Thanks @himanshug, responded. |
so that user hadoop related InputRowParsers are created only when needed this allows overlord to accept a HadoopIndexTask with a hadoopy InputRowParser and not fail because hadoopy InputRowParser might need hadoop libraries
7b305c2
to
e8b9ee8
Compare
@drcrallen all review comments have been addressed. |
👍 |
…parser Allow writing InputRowParser extensions that use hadoop/any libraries
This PR is a result of discussion #1682 (diff)
in #1682 . This patch