-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto merging segments created by the Kafka indexing service #4498
Comments
Hi Eran, regarding Unfortunately, the currently possible solution based on druid's native index task is to manually set up a compaction workflow using some workflow tools like oozie. For example, you can schedule your compaction task to be run every midnight for merging segments ingested for a day. FYI, there are two proposals for automatic background compaction (#4479) and improved segment generation in kafka indexing tasks (#4178). |
@jihoonson, thank you for the super fast reply :)
Thanks again! Eran |
here are my answers.
|
@jihoonson, thank you very much for your support! Hadoop it is then... :) |
@erankor cool! Feel free to further ask if you have more questions. |
Hey, we use hadoop index tasks only, are these compatible with the Or is it the other way around and only Index Tasks produce segments with NoneShard Spec ? |
Hi @l15k4, Hadoop index task does not use NoneShard spec unless it is forced which is not recommended. The more recommended way is using IndexTask (http://druid.io/docs/latest/ingestion/tasks.html#index-task) and IngestSegmentFirehose (http://druid.io/docs/latest/ingestion/firehose.html#ingestsegmentfirehose). This should work for any type of shardSpec. I'm currently working on a new |
@jihoonson I checked the We try to get s3-independent and at the same time leverage |
as same as kafka index service. |
@l15k4 @licl2014, just saw these comments.. Sorry for the late response. NoneShardSpec is not appropriate for every kind of appending. Once a segment with NoneShardSpec is generated, no more segment can't be added to the same interval. Tranquility may be fine because it rejects late data, but you can't add more data to the same interval with any task types including native or hadoop batch tasks. |
Hi all,
We're using Druid with the Kafka indexing service and would like to set up a process for merging
the small segments generated for each kafka partition. As far as I can see, the recommended way of doing it (based on the documentation) is to use Hadoop, but that sounds like added complexity and I'd rather avoid it if possible.
From what I understand (and please correct me if I'm wrong) this pull #3611 added support for merging sharded segments to the basic IndexTask. So, it should theoretically be possible to merge the Kafka indexing segments without Hadoop.
However, enabling druid.coordinator.merge.on doesn't work, since it looks only for segments that use NoneShard (https://github.com/druid-io/druid/blob/b77fab8a30eaaaba4e9f5f87a21a8031e3a20f66/server/src/main/java/io/druid/server/coordinator/helper/DruidCoordinatorSegmentMerger.java#L245)
Can this condition be removed following the merge of the pull I referenced above?
Is there a better way to auto-merge segments created by Kafka Indexing service?
Thank you!
Eran
The text was updated successfully, but these errors were encountered: