forked from apache/gravitino
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[apache#4767] fix(docs): Update the document of playground (apache#4768)
### What changes were proposed in this pull request? Update the document of playground ### Why are the changes needed? Fix: apache#4767 Just update the document according to the latest playground document. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Just documents.
- Loading branch information
Showing
1 changed file
with
64 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,14 +19,14 @@ Install Git and Docker Compose. | |
|
||
The playground runs several services. The TCP ports used may clash with existing services you run, such as MySQL or Postgres. | ||
|
||
| Docker container | Ports used | | ||
|-----------------------|----------------------| | ||
| playground-gravitino | 8090 9001 | | ||
| playground-hive | 3307 9003 9084 50071 | | ||
| playground-mysql | 13306 | | ||
| playground-postgresql | 15342 | | ||
| playground-trino | 18080 | | ||
| playground-jupyter | 18888 | | ||
| Docker container | Ports used | | ||
|-----------------------|------------------------| | ||
| playground-gravitino | 8090 9001 | | ||
| playground-hive | 3307 19000 19083 60070 | | ||
| playground-mysql | 13306 | | ||
| playground-postgresql | 15342 | | ||
| playground-trino | 18080 | | ||
| playground-jupyter | 18888 | | ||
|
||
## Start playground | ||
|
||
|
@@ -38,8 +38,8 @@ cd gravitino-playground | |
./launch-playground.sh | ||
``` | ||
|
||
### Launching a component of the playground | ||
|
||
### Launch special component or components of playground | ||
```shell | ||
git clone [email protected]:apache/gravitino-playground.git | ||
cd gravitino-playground | ||
|
@@ -52,7 +52,7 @@ Note. Components have dependencies, so not launching all components may prevent | |
|
||
### Using Trino CLI in Docker Container | ||
|
||
1. Log in to the Gravitino playground Trino Docker container using the following command: | ||
1. Login to the Gravitino playground Trino Docker container using the following command: | ||
|
||
```shell | ||
docker exec -it playground-trino bash | ||
|
@@ -64,17 +64,31 @@ docker exec -it playground-trino bash | |
trino@container_id:/$ trino | ||
``` | ||
|
||
### Using Jupyter Notebook | ||
## Using Jupyter Notebook | ||
|
||
1. Open the Jupyter Notebook in the browser at [http://localhost:18888](http://localhost:18888). | ||
|
||
2. Open the `gravitino-trino-example.ipynb` notebook. | ||
|
||
3. Start the notebook and run the cells. | ||
|
||
## Using Spark client | ||
|
||
1. Login to the Gravitino playground Spark Docker container using the following command: | ||
|
||
```shell | ||
docker exec -it playground-spark bash | ||
```` | ||
2. Open the Spark SQL client in the container. | ||
```shell | ||
spark@container_id:/$ cd /opt/spark && /bin/bash bin/spark-sql | ||
``` | ||
|
||
## Example | ||
|
||
### Simple queries | ||
### Simple Trino queries | ||
|
||
You can use simple queries to test in the Trino CLI. | ||
|
||
|
@@ -145,6 +159,38 @@ WHERE e.employee_id = p.employee_id AND p.employee_id = s.employee_id | |
GROUP BY e.employee_id, given_name, family_name; | ||
``` | ||
|
||
### Using Spark and Trino | ||
|
||
You might consider generating data with SparkSQL and then querying this data using Trino. Give it a try with Gravitino: | ||
|
||
1. Login Spark container and execute the SQLs: | ||
|
||
```sql | ||
// using Hive catalog to create Hive table | ||
USE catalog_hive; | ||
CREATE DATABASE product; | ||
USE product; | ||
|
||
CREATE TABLE IF NOT EXISTS employees ( | ||
id INT, | ||
name STRING, | ||
age INT | ||
) | ||
PARTITIONED BY (department STRING) | ||
STORED AS PARQUET; | ||
DESC TABLE EXTENDED employees; | ||
|
||
INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES (1, 'John Doe', 30), (2, 'Jane Smith', 28); | ||
INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3, 'Mike Brown', 32); | ||
``` | ||
|
||
2. Login Trino container and execute SQLs: | ||
|
||
```sql | ||
SELECT * FROM catalog_hive.product.employees WHERE department = 'Engineering'; | ||
``` | ||
|
||
|
||
### Using Apache Iceberg REST service | ||
|
||
Suppose you want to migrate your business from Hive to Iceberg. Some tables will use Hive, and the other tables will use Iceberg. | ||
|
@@ -155,12 +201,14 @@ Then, you can use Trino to read the data from the Hive table and join it with th | |
|
||
```text | ||
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions | ||
spark.sql.catalog.catalog_iceberg org.apache.iceberg.spark.SparkCatalog | ||
spark.sql.catalog.catalog_iceberg.type rest | ||
spark.sql.catalog.catalog_iceberg.uri http://gravitino:9001/iceberg/ | ||
spark.sql.catalog.catalog_rest org.apache.iceberg.spark.SparkCatalog | ||
spark.sql.catalog.catalog_rest.type rest | ||
spark.sql.catalog.catalog_rest.uri http://gravitino:9001/iceberg/ | ||
spark.locality.wait.node 0 | ||
``` | ||
|
||
Please note that `catalog_rest` in SparkSQL and `catalog_iceberg` in Gravitino and Trino share the same Iceberg JDBC backend, which implies that they can access the same dataset. | ||
|
||
1. Login Spark container and execute the steps. | ||
|
||
```shell | ||
|
@@ -172,7 +220,7 @@ spark@container_id:/$ cd /opt/spark && /bin/bash bin/spark-sql | |
``` | ||
|
||
```SQL | ||
use catalog_iceberg; | ||
use catalog_rest; | ||
create database sales; | ||
use sales; | ||
create table customers (customer_id int, customer_name varchar(100), customer_email varchar(100)); | ||
|