Skip to content

Commit

Permalink
[apache#4767] fix(docs): Update the document of playground (apache#4768)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Update the document of playground

### Why are the changes needed?

Fix: apache#4767
Just update the document according to the latest playground document.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Just documents.
  • Loading branch information
jerqi authored Aug 29, 2024
1 parent 3493c00 commit 19a8ccc
Showing 1 changed file with 64 additions and 16 deletions.
80 changes: 64 additions & 16 deletions docs/how-to-use-the-playground.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,14 @@ Install Git and Docker Compose.

The playground runs several services. The TCP ports used may clash with existing services you run, such as MySQL or Postgres.

| Docker container | Ports used |
|-----------------------|----------------------|
| playground-gravitino | 8090 9001 |
| playground-hive | 3307 9003 9084 50071 |
| playground-mysql | 13306 |
| playground-postgresql | 15342 |
| playground-trino | 18080 |
| playground-jupyter | 18888 |
| Docker container | Ports used |
|-----------------------|------------------------|
| playground-gravitino | 8090 9001 |
| playground-hive | 3307 19000 19083 60070 |
| playground-mysql | 13306 |
| playground-postgresql | 15342 |
| playground-trino | 18080 |
| playground-jupyter | 18888 |

## Start playground

Expand All @@ -38,8 +38,8 @@ cd gravitino-playground
./launch-playground.sh
```

### Launching a component of the playground

### Launch special component or components of playground
```shell
git clone [email protected]:apache/gravitino-playground.git
cd gravitino-playground
Expand All @@ -52,7 +52,7 @@ Note. Components have dependencies, so not launching all components may prevent

### Using Trino CLI in Docker Container

1. Log in to the Gravitino playground Trino Docker container using the following command:
1. Login to the Gravitino playground Trino Docker container using the following command:

```shell
docker exec -it playground-trino bash
Expand All @@ -64,17 +64,31 @@ docker exec -it playground-trino bash
trino@container_id:/$ trino
```

### Using Jupyter Notebook
## Using Jupyter Notebook

1. Open the Jupyter Notebook in the browser at [http://localhost:18888](http://localhost:18888).

2. Open the `gravitino-trino-example.ipynb` notebook.

3. Start the notebook and run the cells.

## Using Spark client

1. Login to the Gravitino playground Spark Docker container using the following command:

```shell
docker exec -it playground-spark bash
````
2. Open the Spark SQL client in the container.
```shell
spark@container_id:/$ cd /opt/spark && /bin/bash bin/spark-sql
```

## Example

### Simple queries
### Simple Trino queries

You can use simple queries to test in the Trino CLI.

Expand Down Expand Up @@ -145,6 +159,38 @@ WHERE e.employee_id = p.employee_id AND p.employee_id = s.employee_id
GROUP BY e.employee_id, given_name, family_name;
```

### Using Spark and Trino

You might consider generating data with SparkSQL and then querying this data using Trino. Give it a try with Gravitino:

1. Login Spark container and execute the SQLs:

```sql
// using Hive catalog to create Hive table
USE catalog_hive;
CREATE DATABASE product;
USE product;

CREATE TABLE IF NOT EXISTS employees (
id INT,
name STRING,
age INT
)
PARTITIONED BY (department STRING)
STORED AS PARQUET;
DESC TABLE EXTENDED employees;

INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES (1, 'John Doe', 30), (2, 'Jane Smith', 28);
INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3, 'Mike Brown', 32);
```

2. Login Trino container and execute SQLs:

```sql
SELECT * FROM catalog_hive.product.employees WHERE department = 'Engineering';
```


### Using Apache Iceberg REST service

Suppose you want to migrate your business from Hive to Iceberg. Some tables will use Hive, and the other tables will use Iceberg.
Expand All @@ -155,12 +201,14 @@ Then, you can use Trino to read the data from the Hive table and join it with th

```text
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.catalog_iceberg org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.catalog_iceberg.type rest
spark.sql.catalog.catalog_iceberg.uri http://gravitino:9001/iceberg/
spark.sql.catalog.catalog_rest org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.catalog_rest.type rest
spark.sql.catalog.catalog_rest.uri http://gravitino:9001/iceberg/
spark.locality.wait.node 0
```

Please note that `catalog_rest` in SparkSQL and `catalog_iceberg` in Gravitino and Trino share the same Iceberg JDBC backend, which implies that they can access the same dataset.

1. Login Spark container and execute the steps.

```shell
Expand All @@ -172,7 +220,7 @@ spark@container_id:/$ cd /opt/spark && /bin/bash bin/spark-sql
```

```SQL
use catalog_iceberg;
use catalog_rest;
create database sales;
use sales;
create table customers (customer_id int, customer_name varchar(100), customer_email varchar(100));
Expand Down

0 comments on commit 19a8ccc

Please sign in to comment.