Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. #11983

Merged
merged 15 commits into from
Dec 3, 2021

Conversation

techdocsmith
Copy link
Contributor

Adds documentation for multi-dimension partitioning. cc: @kfaraz
Refactors the native batch partitioning topic as follows:

  • Native batch ingestion covers parallel-index
  • Native batch simple task indexing covers index
  • Native batch input sources covers ioSource
  • Native batch ingestion with firehose covers deprecated firehose

This PR has:

  • [ x] been self-reviewed.

Copy link

@sthetland sthetland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really happy to see this page getting split up.. Some suggestions and comments, but all in all, nice one!

docs/ingestion/native-batch-simple-task.md Outdated Show resolved Hide resolved
@@ -0,0 +1,341 @@
---
id: native-batch-firehose

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the sidebar.json update is missing. (These new pages aren't in the left nav.)

docs/ingestion/native-batch-input-source.md Outdated Show resolved Hide resolved
docs/ingestion/native-batch-input-source.md Outdated Show resolved Hide resolved

For information general information on native batch indexing and parallel task indexing, see [Native batch ingestion](./native-batch.md).

> Firehose input has been deprecated. For information, see [Firehose](./native-batch-firehose.md).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to say this here? ("Firehose" doesn't otherwise appear in this page. )

docs/ingestion/native-batch-input-source.md Outdated Show resolved Hide resolved
docs/ingestion/native-batch-firehose.md Outdated Show resolved Hide resolved
docs/ingestion/native-batch-simple-task.md Outdated Show resolved Hide resolved

When you use multi-dimension partitioning for your data, Druid is able to distribute segment sizes more evenly than with single dimension partitioning.

For segment pruning to be effective and translate into better query performance, you must the first of your `partitionDimensions` at query time. For example, given the following `partitionDimensions`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a word(s) is missing here, but not sure what it should be: "...you must __ the first..."

docs/ingestion/native-batch.md Outdated Show resolved Hide resolved
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for organizing the docs better, @techdocsmith !
Just one minor comment, otherwise LGTM.

"type": "json"
}
},
"tuningConfig" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should contain a partitionsSpec as some of these fields like maxRowsPerSegment are deprecated, as described in the tuningConfig table below.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@abhishekagarwal87 abhishekagarwal87 merged commit 7ed4680 into apache:master Dec 3, 2021
@loquisgon
Copy link

LGTM

@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants