Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
317brian committed Jan 9, 2025
1 parent 9906544 commit 91ec717
Show file tree
Hide file tree
Showing 16 changed files with 73 additions and 39 deletions.
3 changes: 2 additions & 1 deletion docs/api-reference/sql-ingestion-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,7 +379,8 @@ print(response.text)

The response shows an example report for a query.

<details><summary>View the response</summary>
<details>
<summary>View the response</summary>

```json
{
Expand Down
2 changes: 1 addition & 1 deletion docs/comparisons/druid-vs-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ One typical setup seen in production is to process data in Spark, and load the p

For more information about using Druid and Spark together, including benchmarks of the two systems, please see:

<https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani>
https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani
2 changes: 1 addition & 1 deletion docs/configuration/extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ All of these community extensions can be downloaded using [pull-deps](../operati
|druid-momentsketch|Support for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library|[link](../development/extensions-contrib/momentsketch-quantiles.md)|
|druid-tdigestsketch|Support for approximate sketch aggregators based on [T-Digest](https://github.com/tdunning/t-digest)|[link](../development/extensions-contrib/tdigestsketch-quantiles.md)|
|gce-extensions|GCE Extensions|[link](../development/extensions-contrib/gce-extensions.md)|
|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (<https://prometheus.io/>)|[link](../development/extensions-contrib/prometheus.md)|
|prometheus-emitter|Exposes [Druid metrics](../operations/metrics.md) for Prometheus server collection (https://prometheus.io/)|[link](../development/extensions-contrib/prometheus.md)|
|druid-kubernetes-overlord-extensions|Support for launching tasks in k8s without Middle Managers|[link](../development/extensions-contrib/k8s-jobs.md)|
|druid-spectator-histogram|Support for efficient approximate percentile queries|[link](../development/extensions-contrib/spectator-histogram.md)|
|druid-rabbit-indexing-service|Support for creating and managing [RabbitMQ](https://www.rabbitmq.com/) indexing tasks|[link](../development/extensions-contrib/rabbit-stream-ingestion.md)|
Expand Down
5 changes: 3 additions & 2 deletions docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ Metric monitoring is an essential part of Druid operations. The following monito
|`org.apache.druid.server.metrics.SegmentStatsMonitor` | **EXPERIMENTAL** Reports statistics about segments on Historical services. Available only on Historical services. Not to be used when lazy loading is configured.|
|`org.apache.druid.server.metrics.QueryCountStatsMonitor`|Reports how many queries have been successful/failed/interrupted.|
|`org.apache.druid.server.metrics.SubqueryCountStatsMonitor`|Reports how many subqueries have been materialized as rows or bytes and various other statistics related to the subquery execution|
|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal metrics of `http` or `parametrized` emitter (see below). Must not be used with another emitter type. See the description of the metrics here: <https://github.com/apache/druid/pull/4973>.|
|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal metrics of `http` or `parametrized` emitter (see below). Must not be used with another emitter type. See the description of the metrics here: https://github.com/apache/druid/pull/4973.|
|`org.apache.druid.server.metrics.TaskCountStatsMonitor`|Reports how many ingestion tasks are currently running/pending/waiting and also the number of successful/failed tasks per emission period.|
|`org.apache.druid.server.metrics.TaskSlotCountStatsMonitor`|Reports metrics about task slot usage per emission period.|
|`org.apache.druid.server.metrics.WorkerTaskCountStatsMonitor`|Reports how many ingestion tasks are currently running/pending/waiting, the number of successful/failed tasks, and metrics about task slot usage for the reporting worker, per emission period. Only supported by Middle Manager node types.|
Expand Down Expand Up @@ -1195,7 +1195,8 @@ The following table shows the dynamic configuration properties for the Overlord.

The following is an example of an Overlord dynamic config:

<details><summary>Click to view the example</summary>
<details>
<summary>Click to view the example</summary>

```json
{
Expand Down
2 changes: 1 addition & 1 deletion docs/development/docs-contribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Druid docs contributors:
Druid docs contributors can open an issue about documentation, or contribute a change with a pull request (PR).

The open source Druid docs are located here:
<https://druid.apache.org/docs/latest/design/index.html>
https://druid.apache.org/docs/latest/design/index.html

If you need to update a Druid doc, locate and update the doc in the Druid repo following the instructions below.

Expand Down
3 changes: 2 additions & 1 deletion docs/ingestion/concurrent-append-replace.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,8 @@ druid.indexer.task.default.context={"useConcurrentLocks":true}

We recommend that you use the `useConcurrentLocks` context parameter so that Druid automatically determines the task lock types for you. If, for some reason, you need to manually set the task lock types explicitly, you can read more about them in this section.

<details><summary>Click here to read more about the lock types.</summary>
<details>
<summary>Click here to read more about the lock types.</summary>

Druid uses task locks to make sure that multiple conflicting operations don't happen at once.
There are two task lock types: `APPEND` and `REPLACE`. The type of lock you use is determined by what you're trying to accomplish.
Expand Down
3 changes: 2 additions & 1 deletion docs/ingestion/kinesis-ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ This section outlines the configuration properties that are specific to the Amaz

The following example shows a supervisor spec for a stream with the name `KinesisStream`:

<details><summary>Click to view the example</summary>
<details>
<summary>Click to view the example</summary>

```json
{
Expand Down
24 changes: 16 additions & 8 deletions docs/multi-stage-query/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ When you insert or replace data with SQL-based ingestion, set the context parame

This example inserts data into a table named `w000` without performing any data rollup:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
INSERT INTO w000
Expand Down Expand Up @@ -85,7 +86,8 @@ CLUSTERED BY channel

This example inserts data into a table named `kttm_rollup` and performs data rollup. This example implements the recommendations described in [Rollup](./concepts.md#rollup).

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
INSERT INTO "kttm_rollup"
Expand Down Expand Up @@ -126,7 +128,8 @@ CLUSTERED BY browser, session

This example aggregates data from a table named `w000` and inserts the result into `w002`.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
INSERT INTO w002
Expand All @@ -153,7 +156,8 @@ CLUSTERED BY page

This example inserts data into a table named `w003` and joins data from two sources:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
INSERT INTO w003
Expand Down Expand Up @@ -209,7 +213,8 @@ PARTITIONED BY HOUR

This example replaces the entire datasource used in the table `w007` with the new query data while dropping the old data:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
REPLACE INTO w007
Expand Down Expand Up @@ -256,7 +261,8 @@ CLUSTERED BY channel

This example replaces certain segments in a datasource with the new query data while dropping old segments:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
REPLACE INTO w007
Expand All @@ -279,7 +285,8 @@ CLUSTERED BY page

## REPLACE for reindexing an existing datasource into itself

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
REPLACE INTO w000
Expand All @@ -305,7 +312,8 @@ CLUSTERED BY page

## SELECT with EXTERN and JOIN

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
WITH flights AS (
Expand Down
2 changes: 1 addition & 1 deletion docs/querying/nested-columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ FROM (
PARTITIONED BY ALL
```

## Ingest a JSON string as COMPLEX<json\>
## Ingest a JSON string as COMPLEX\<json\>

If your source data contains serialized JSON strings, you can ingest the data as `COMPLEX<JSON>` as follows:
- During native batch ingestion, call the `parse_json` function in a `transform` object in the `transformSpec`.
Expand Down
18 changes: 12 additions & 6 deletions docs/querying/sql-translation.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ EXPLAIN PLAN statements return:

Example 1: EXPLAIN PLAN for a `SELECT` query on the `wikipedia` datasource:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
EXPLAIN PLAN FOR
Expand All @@ -93,7 +94,8 @@ GROUP BY channel

The above EXPLAIN PLAN query returns the following result:

<details><summary>Show the result</summary>
<details>
<summary>Show the result</summary>

```json
[
Expand Down Expand Up @@ -235,7 +237,8 @@ The above EXPLAIN PLAN query returns the following result:

Example 2: EXPLAIN PLAN for an `INSERT` query that inserts data into the `wikipedia` datasource:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
EXPLAIN PLAN FOR
Expand Down Expand Up @@ -263,7 +266,8 @@ PARTITIONED BY ALL

The above EXPLAIN PLAN returns the following result:

<details><summary>Show the result</summary>
<details>
<summary>Show the result</summary>

```json
[
Expand Down Expand Up @@ -452,7 +456,8 @@ The above EXPLAIN PLAN returns the following result:
Example 3: EXPLAIN PLAN for a `REPLACE` query that replaces all the data in the `wikipedia` datasource with a `DAY`
time partitioning, and `cityName` and `countryName` as the clustering columns:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
EXPLAIN PLAN FOR
Expand Down Expand Up @@ -482,7 +487,8 @@ CLUSTERED BY cityName, countryName

The above EXPLAIN PLAN query returns the following result:

<details><summary>Show the result</summary>
<details>
<summary>Show the result</summary>

```json
[
Expand Down
3 changes: 2 additions & 1 deletion docs/release-info/upgrade-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,8 @@ This property affects both storage and querying, and must be set on all Druid se

The following table illustrates some example scenarios and the impact of the changes.

<details><summary>Show the table</summary>
<details>
<summary>Show the table</summary>

| Query| Druid 27.0.0 and earlier| Druid 28.0.0 and later|
|------|------------------------|----------------------|
Expand Down
3 changes: 2 additions & 1 deletion docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@ Follow these steps to load the sample Wikipedia dataset:
5. Click **Done**. You're returned to the **Query** view that displays the newly generated query.
The query inserts the sample data into the table named `wikiticker-2015-09-12-sampled`.
<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>
```sql
REPLACE INTO "wikiticker-2015-09-12-sampled" OVERWRITE ALL
Expand Down
6 changes: 4 additions & 2 deletions docs/tutorials/tutorial-msq-convert-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ To convert the ingestion spec to a query task, do the following:
![Convert ingestion spec to SQL](../assets/multi-stage-query/tutorial-msq-convert.png "Convert ingestion spec to SQL")
3. In the **Ingestion spec to covert** window, insert your ingestion spec. You can use your own spec or the sample ingestion spec provided in the tutorial. The sample spec uses data hosted at `https://druid.apache.org/data/wikipedia.json.gz` and loads it into a table named `wikipedia`:

<details><summary>Show the spec</summary>
<details>
<summary>Show the spec</summary>

```json
{
Expand Down Expand Up @@ -127,7 +128,8 @@ To convert the ingestion spec to a query task, do the following:

4. Click **Submit** to submit the spec. The web console uses the JSON-based ingestion spec to generate a SQL query that you can use instead. This is what the query looks like for the sample ingestion spec:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
-- This SQL query was auto generated from an ingestion spec
Expand Down
6 changes: 4 additions & 2 deletions docs/tutorials/tutorial-msq-extern.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ To generate a query from external data, do the following:
- Customize how Druid handles the data by selecting the **Input format** and its related options, such as adding **JSON parser features** for JSON files.
5. When you're ready, click **Done**. You're returned to the **Query** view where you can see the starter query that will insert the data from the external source into a table named `wikipedia`.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
REPLACE INTO "wikipedia" OVERWRITE ALL
Expand Down Expand Up @@ -122,7 +123,8 @@ ORDER BY COUNT(*) DESC

With the EXTERN function, you could run the same query on the external data directly without ingesting it first:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
SELECT
Expand Down
12 changes: 8 additions & 4 deletions docs/tutorials/tutorial-query-deep-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ Use the **Load data** wizard or the following SQL query to ingest the `wikipedia

Partitioning by hour provides more segment granularity, so you can selectively load segments onto Historicals or keep them in deep storage.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```sql
REPLACE INTO "wikipedia" OVERWRITE ALL
Expand Down Expand Up @@ -152,7 +153,8 @@ This query looks for records with timestamps that precede `00:10:00`. Based on t

When you submit the query from deep storage through the API, you get the following response:

<details><summary>Show the response</summary>
<details>
<summary>Show the response</summary>

```json
{
Expand Down Expand Up @@ -209,7 +211,8 @@ A successful query also returns a `pages` object that includes the page numbers

Note that `sampleRecords` has been truncated for brevity.

<details><summary>Show the response</summary>
<details>
<summary>Show the response</summary>

```json
{
Expand Down Expand Up @@ -265,7 +268,8 @@ curl --location 'http://ROUTER:PORT/druid/v2/sql/statements/:queryId'

Note that the response has been truncated for brevity.

<details><summary>Show the response</summary>
<details>
<summary>Show the response</summary>

```json
[
Expand Down
18 changes: 12 additions & 6 deletions docs/tutorials/tutorial-unnest-arrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,8 @@ You can use a single unnest datasource to unnest multiple columns. Be careful wh

The following native Scan query returns the rows of the datasource and unnests the values in the `dim3` column by using the `unnest` datasource type:

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```json
{
Expand Down Expand Up @@ -334,7 +335,8 @@ You can implement filters. For example, you can add the following to the Scan qu

The following query returns an unnested version of the column `dim3` as the column `unnest-dim3` sorted in descending order.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```json
{
Expand Down Expand Up @@ -375,7 +377,8 @@ The following query returns an unnested version of the column `dim3` as the colu

The example topN query unnests `dim3` into the column `unnest-dim3`. The query uses the unnested column as the dimension for the topN query. The results are outputted to a column named `topN-unnest-d3` and are sorted numerically in ascending order based on the column `a0`, an aggregate value representing the minimum of `m1`.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```json
{
Expand Down Expand Up @@ -434,7 +437,8 @@ The example topN query unnests `dim3` into the column `unnest-dim3`. The query u

This query joins the `nested_data` table with itself and outputs the unnested data into a new column called `unnest-dim3`.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```json
{
Expand Down Expand Up @@ -539,7 +543,8 @@ The `unnest` datasource supports unnesting virtual columns, which is a queryable

The following query returns the columns `dim45` and `m1`. The `dim45` column is the unnested version of a virtual column that contains an array of the `dim4` and `dim5` columns.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```json
{
Expand Down Expand Up @@ -585,7 +590,8 @@ The following query returns the columns `dim45` and `m1`. The `dim45` column is

The following Scan query unnests the column `dim3` into `d3` and a virtual column composed of `dim4` and `dim5` into the column `d45`. It then returns those source columns and their unnested variants.

<details><summary>Show the query</summary>
<details>
<summary>Show the query</summary>

```json
{
Expand Down

0 comments on commit 91ec717

Please sign in to comment.