Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#75]: add CI check for playground #111

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

unknowntpo
Copy link
Contributor

Resolves #75

@unknowntpo unknowntpo force-pushed the feat-ci branch 2 times, most recently from e839f0d to f2210c8 Compare December 24, 2024 00:43
@unknowntpo unknowntpo changed the title feat(ci): add CI check for playground [#75]: add CI check for playground Dec 24, 2024
@unknowntpo unknowntpo force-pushed the feat-ci branch 3 times, most recently from e67031e to c4b4d9f Compare December 24, 2024 02:13
@unknowntpo unknowntpo force-pushed the feat-ci branch 4 times, most recently from 85ab134 to d1b177f Compare January 19, 2025 12:01
@unknowntpo unknowntpo marked this pull request as ready for review February 27, 2025 13:39
@unknowntpo
Copy link
Contributor Author

@danhuawang Would you like to review this PR ?
I enabled ci test in my repo, and it works.

image

@@ -35,9 +38,12 @@ if echo "$response" | grep -q "\"code\":0"; then
true
else
# Create Hive catalog for experience Gravitino service
response=$(curl -X POST -H "Content-Type: application/json" -d '{"name":"catalog_hive","type":"RELATIONAL", "provider":"hive", "comment":"comment","properties":{"metastore.uris":"thrift://'${HIVE_HOST_IP}':9083" }}' http://gravitino:8090/api/metalakes/metalake_demo/catalogs)
response=$(curl -X POST -H "Content-Type: application/json" -d '{"name":"catalog_hive","type":"RELATIONAL", "provider":"hive", "comment":"comment","properties":{"metastore.uris":"thrift://hive:9083" }}' http://gravitino:8090/api/metalakes/metalake_demo/catalogs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replace the environment variable ${HIVE_HOST_IP}? The Trino in docker-compose.yaml defined the environment variables.

  trino:
    image: apache/gravitino-playground:trino-435-gravitino-0.8.0-incubating
    ports:
      - "18080:8080"
    container_name: playground-trino
    environment:
      - HADOOP_USER_NAME=root
      - GRAVITINO_HOST_IP=gravitino
      - GRAVITINO_HOST_PORT=8090
      - GRAVITINO_METALAKE_NAME=metalake_demo
      - HIVE_HOST_IP=hive
      - MYSQL_HOST_IP=mysql
      - POSTGRES_HOST_IP=postgresql
    entrypoint: /bin/bash /tmp/trino/init.sh
    volumes:
      - ./init/trino:/tmp/trino
      - ./init/common:/tmp/common
      - ./healthcheck:/tmp/healthcheck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I hard-coded this because in this script, we also hard-coded host name of gravitino, so i want to make the style consistent.

http://gravitino:8090/api/metalakes/metalake_demo/catalogs

if echo "$response" | grep -q "\"code\":0"; then
true # Placeholder, do nothing
elif echo "$response" | grep -q "\"type\":\"CatalogAlreadyExistsException\""; then
echo "Catalog catalog_hive already exists"
true # Placeholder, do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's old catalog data in gravitino, will it cause the test cases run failed ? The test case output data may not match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we use init_metalake_catalog.sh in both trino and spark,

For example, during spark container initialization, even if we send GET request to make sure catalog_hive not exist, between first and second request, trino container may also find catalog_hive not exist and create catalog_hive, this cause problem.

response=$(curl http://gravitino:8090/api/metalakes/metalake_demo/catalogs/catalog_hive)
if echo "$response" | grep -q "\"code\":0"; then
  true
else
  # Create Hive catalog for experience Gravitino service
  response=$(curl -X POST -H "Content-Type: application/json" -d '{"name":"catalog_hive","type":"RELATIONAL", "provider":"hive", "comment":"comment","properties":{"metastore.uris":"thrift://hive:9083" }}' http://gravitino:8090/api/metalakes/metalake_demo/catalogs)
  if echo "$response" | grep -q "\"code\":0"; then
    true # Placeholder, do nothing
  else
    echo "catalog_hive create failed"
    exit 1
  fi
fi

@@ -52,6 +58,9 @@ else
response=$(curl -X POST -H "Accept: application/vnd.gravitino.v1+json" -H "Content-Type: application/json" -d '{ "name":"catalog_postgres", "type":"RELATIONAL", "provider":"jdbc-postgresql", "comment":"comment", "properties":{ "jdbc-url":"jdbc:postgresql://postgresql/db", "jdbc-user":"postgres", "jdbc-password":"postgres", "jdbc-database":"db", "jdbc-driver": "org.postgresql.Driver" } }' http://gravitino:8090/api/metalakes/metalake_demo/catalogs)
if echo "$response" | grep -q "\"code\":0"; then
true # Placeholder, do nothing
elif echo "$response" | grep -q "\"type\":\"CatalogAlreadyExistsException\""; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's old catalog data in gravitino, will it cause the test cases run failed ?

if echo "$response" | grep -q "catalog_mysql"; then
true # Placeholder, do nothing
elif echo "$response" | grep -q "\"type\":\"CatalogAlreadyExistsException\""; then
echo "Catalog catalog_mysql already exists"
true # Placeholder, do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's old catalog data in gravitino, will it cause the test cases run failed ?

if echo "$response" | grep -q "\"code\":0"; then
true # Placeholder, do nothing
elif echo "$response" | grep -q "\"type\":\"CatalogAlreadyExistsException\""; then
echo "Catalog catalog_iceberg already exists"
true # Placeholder, do nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's old catalog data in gravitino, will it cause the test cases run failed ?

@@ -78,9 +90,12 @@ if echo "$response" | grep -q "\"code\":0"; then
true
else
# Create Iceberg catalog for experience Gravitino service
response=$(curl -X POST -H "Accept: application/vnd.gravitino.v1+json" -H "Content-Type: application/json" -d '{ "name":"catalog_iceberg", "type":"RELATIONAL", "provider":"lakehouse-iceberg", "comment":"comment", "properties":{ "uri":"jdbc:mysql://'${MYSQL_HOST_IP}':3306/db", "catalog-backend":"jdbc", "warehouse":"hdfs://'${HIVE_HOST_IP}':9000/user/iceberg/warehouse/", "jdbc-user":"mysql", "jdbc-password":"mysql", "jdbc-driver":"com.mysql.cj.jdbc.Driver"} }' http://gravitino:8090/api/metalakes/metalake_demo/catalogs)
response=$(curl -X POST -H "Accept: application/vnd.gravitino.v1+json" -H "Content-Type: application/json" -d '{ "name":"catalog_iceberg", "type":"RELATIONAL", "provider":"lakehouse-iceberg", "comment":"comment", "properties":{ "uri":"jdbc:mysql://mysql:3306/db", "catalog-backend":"jdbc", "warehouse":"hdfs://hive:9000/user/iceberg/warehouse/", "jdbc-user":"mysql", "jdbc-password":"mysql", "jdbc-driver":"com.mysql.cj.jdbc.Driver"} }' http://gravitino:8090/api/metalakes/metalake_demo/catalogs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replace the environment variable ${HIVE_HOST_IP},MYSQL_HOST_IP? The Trino in docker-compose.yaml defined the environment variables.

  trino:
    image: apache/gravitino-playground:trino-435-gravitino-0.8.0-incubating
    ports:
      - "18080:8080"
    container_name: playground-trino
    environment:
      - HADOOP_USER_NAME=root
      - GRAVITINO_HOST_IP=gravitino
      - GRAVITINO_HOST_PORT=8090
      - GRAVITINO_METALAKE_NAME=metalake_demo
      - HIVE_HOST_IP=hive
      - MYSQL_HOST_IP=mysql
      - POSTGRES_HOST_IP=postgresql
    entrypoint: /bin/bash /tmp/trino/init.sh
    volumes:
      - ./init/trino:/tmp/trino
      - ./init/common:/tmp/common
      - ./healthcheck:/tmp/healthcheck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I hard-coded this because in this script, we also hard-coded host name of gravitino, so i want to make the style consistent.

http://gravitino:8090/api/metalakes/metalake_demo/catalogs

cp -r /tmp/gravitino/*.ipynb /home/jovyan
else
cp -r /tmp/gravitino/authorization/*.ipynb /home/jovyan
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the condition for $RANGER_ENABLE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I found something strange. Why can't we use jupyter notebook like gravitino-trino-example.ipynb while enabling ranger ?

@danhuawang
Copy link
Contributor

@danhuawang Would you like to review this PR ? I enabled ci test in my repo, and it works.

image

@unknowntpo I have some comments.

@unknowntpo unknowntpo force-pushed the feat-ci branch 2 times, most recently from 28cda30 to 8af0b9e Compare March 1, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Add CI check for PR
2 participants