Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#434] feat(CI): Graviton Trino connector E2E testing #616

Merged
merged 6 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,7 @@ distribution
server/src/main/resources/project.properties

dev/docker/hive/packages
docs/build
docs/build

dev/docker/tools/docker-connector
dev/docker/tools/docker-connector.conf
4 changes: 2 additions & 2 deletions conf/gravitino.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ gravitino.server.shutdown.timeout = 3000

# THE CONFIGURATION FOR Gravitino WEB SERVER
# The host name of the built-in web server
gravitino.server.webserver.host = 127.0.0.1
gravitino.server.webserver.host = 0.0.0.0
# The http port number of the built-in web server
gravitino.server.webserver.httpPort = 8090
# The min thread size of the built-in web server
Expand Down Expand Up @@ -44,7 +44,7 @@ gravitino.auxService.names = iceberg-rest
# Iceberg REST service classpath
gravitino.auxService.iceberg-rest.classpath = catalogs/lakehouse-iceberg/libs, catalogs/lakehouse-iceberg/conf
# Iceberg REST service host
gravitino.auxService.iceberg-rest.host = 127.0.0.1
gravitino.auxService.iceberg-rest.host = 0.0.0.0
# Iceberg REST service http port
gravitino.auxService.iceberg-rest.httpPort = 9001

2 changes: 1 addition & 1 deletion dev/docker/hive/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#

FROM ubuntu:16.04
LABEL maintainer="dev@datastrato.com"
LABEL maintainer="support@datastrato.com"

ARG HADOOP_PACKAGE_NAME
ARG HIVE_PACKAGE_NAME
Expand Down
3 changes: 3 additions & 0 deletions dev/docker/hive/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,6 @@ ssh -p 8022 datastrato@localhost (password: ds123, this is a sudo user)
- Config HDFS DataNode data transfer address to `0.0.0.0:50010` explicitly
- Map container hostname to `127.0.0.1` before starting Hadoop
- Expose `50010` port for the HDFS DataNode

### 0.1.5
- Rollback `Map container hostname to 127.0.0.1 before starting Hadoop` of `datastrato/gravitino-ci-hive:0.1.4`
6 changes: 0 additions & 6 deletions dev/docker/hive/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,6 @@ service ssh start
ssh-keyscan localhost > /root/.ssh/known_hosts
ssh-keyscan 0.0.0.0 >> /root/.ssh/known_hosts

# Map the hostname to 127.0.0.1 for external access datanode
hostname=$(cat /etc/hostname)
new_content=$(cat /etc/hosts | sed "/$hostname/s/^/# /")
new_content="${new_content}\n127.0.0.1 ${hostname}"
echo -e "$new_content" > /etc/hosts

# start hadoop
${HADOOP_HOME}/sbin/start-all.sh

Expand Down
11 changes: 11 additions & 0 deletions dev/docker/tools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<!--
Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2.
-->

# Mac Docker Connector
Because Docker Desktop for Mac does not provide access to container IP from host(macOS).
The [mac-docker-connector](https://github.com/wenjunxiao/mac-docker-connector) provides the ability for the macOS host to directly access the docker container IP.
This can result in host(macOS) and containers not being able to access each other's internal services directly over IPs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The words here are confusing to me. I think it might be:

Because Docker Desktop for Mac does not provide access to container IP from host(macOS). This can result in host(macOS) and containers not being able to access each other's internal services directly over IPs.

The [mac-docker-connector](https://github.com/wenjunxiao/mac-docker-connector) provides the ability for the macOS host to directly access the docker container IP.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I fixed it.

Before running the integration tests, make sure to execute the `dev/docker/tools/mac-docker-connector.sh` script.
> Developing Gravitino in a linux environment does not have this limitation and does not require executing the `mac-docker-connector.sh` script ahead of time.
35 changes: 35 additions & 0 deletions dev/docker/tools/mac-docker-connector.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
#set -ex

bin="$(dirname "${BASH_SOURCE-$0}")"
bin="$(cd "${bin}">/dev/null; pwd)"

OS=$(uname -s)
if [ "${OS}" != "Darwin" ]; then
echo "Only macOS needs to run mac-docker-connector."
exit 1
fi

if pgrep -xq "docker-connector"; then
echo "docker-connector is running."
exit 1
fi

DOCKER_CONNECTOR_PACKAGE_NAME="docker-connector-darwin.tar.gz"
DOCKER_CONNECTOR_DOWNLOAD_URL="https://github.com/wenjunxiao/mac-docker-connector/releases/download/v3.2/${DOCKER_CONNECTOR_PACKAGE_NAME}"

if [ ! -f "${bin}/docker-connector" ]; then
wget -q -P "${bin}" ${DOCKER_CONNECTOR_DOWNLOAD_URL}
tar -xzf "${bin}/${DOCKER_CONNECTOR_PACKAGE_NAME}" -C "${bin}"
rm -rf "${bin}/${DOCKER_CONNECTOR_PACKAGE_NAME}"
fi

# Create a docker-connector.conf file with the routes to the docker networks
docker network ls --filter driver=bridge --format "{{.ID}}" | xargs docker network inspect --format "route {{range .IPAM.Config}}{{.Subnet}}{{end}}" > ${bin}/docker-connector.conf

echo "Start docker-connector requires root privileges, Please enter the root password."
sudo ${bin}/docker-connector -config ${bin}/docker-connector.conf
7 changes: 7 additions & 0 deletions dev/docker/trino/conf/catalog/gravitino.properties.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = gravitino
gravitino.uri = http://GRAVITINO_HOST_IP:GRAVITINO_HOST_PORT
gravitino.metalake = GRAVITINO_METALAKE_NAME
7 changes: 7 additions & 0 deletions dev/docker/trino/conf/catalog/hive.properties.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = hive
hive.metastore.uri = thrift://HIVE_HOST_IP:9083
hive.allow-drop-table = true
5 changes: 5 additions & 0 deletions dev/docker/trino/conf/catalog/jmx.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = jmx
6 changes: 6 additions & 0 deletions dev/docker/trino/conf/catalog/tpcds.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = tpcds
tpcds.splits-per-node = 4
6 changes: 6 additions & 0 deletions dev/docker/trino/conf/catalog/tpch.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = tpch
tpch.splits-per-node = 4
16 changes: 16 additions & 0 deletions dev/docker/trino/conf/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
#single node install config
#coordinator = true
#node-scheduler.include-coordinator = true
#http-server.http.port = 8080
#discovery-server.enabled = true
#discovery.uri = http://localhost:8080
#protocol.v1.alternate-header-name = Presto
#hive.hdfs.impersonation.enabled = true
coordinator = true
node-scheduler.include-coordinator = true
http-server.http.port = 8080
discovery.uri = http://0.0.0.0:8080
18 changes: 18 additions & 0 deletions dev/docker/trino/conf/jvm.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
-server
-Xmx1G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseGCOverheadLimit
-XX:+ExitOnOutOfMemoryError
-XX:ReservedCodeCacheSize=256M
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
-DHADOOP_USER_NAME=hive
-Dlog4j.configurationFile=/etc/trino/log4j2.properties
6 changes: 6 additions & 0 deletions dev/docker/trino/conf/log.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
# Enable verbose logging from Trino
io.trino = INFO
30 changes: 30 additions & 0 deletions dev/docker/trino/conf/log4j2.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#

# Set to debug or trace if log4j initialization is failing
status = info

# Name of the configuration
name = ConsoleLogConfig

# Console appender configuration
appender.console.type = Console
appender.console.name = consoleLogger
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# File appender configuration
appender.file.type = File
appender.file.name = fileLogger
appender.file.fileName = gravitino-trino-connector.log
appender.file.layout.type = PatternLayout
appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# Root logger level
rootLogger.level = info

# Root logger referring to console and file appenders
rootLogger.appenderRef.stdout.ref = consoleLogger
rootLogger.appenderRef.file.ref = fileLogger
7 changes: 7 additions & 0 deletions dev/docker/trino/conf/node.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
node.environment = docker
node.data-dir = /data/trino
plugin.dir = /usr/lib/trino/plugin
18 changes: 18 additions & 0 deletions docs/integration-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,30 @@ Run only test cases where tag is set `gravitino-docker-it`. [embbeded|deplo
---------------------------------------------------------------
```

If Docker is not installed or the `mac docker connector` is not running, the `./gradlew test -PtestMode=[embedded|deploy]`
command will skip the test cases that depend on the `mac docker connector`.

```text
------------------- Check Docker environment ------------------
Docker server status .......................................... [running]
Gravitino IT Docker container is already running ............... [no]
Run only test cases where tag is set `gravitino-trino-it`. [embbeded|deploy test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the gradle output doesn't match the description you mentioned here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested in these case:

  1. No Gravitino-hive-ci docker container and mac-docker-connector running.
------------------ Check Docker environment -----------------
Docker server status ........................................ [running]
Gravitino IT Docker container is already running ............. [no]
Run test cases without `gravitino-tirno-it` tag .............. [embedded test]
Run test cases without `gravitino-docker-it` tag ............. [embedded test]
-------------------------------------------------------------
  1. Only running Gravitino-hive-ci docker container
------------------ Check Docker environment -----------------
Docker server status ........................................ [running]
Gravitino IT Docker container is already running ............. [yes]
Run test cases without `gravitino-tirno-it` tag .............. [embedded test]
Use Gravitino IT Docker container to run all integration test. [embedded test]
-------------------------------------------------------------
  1. Running Gravitino-hive-ci docker container and mac-docker-connector
------------------ Check Docker environment -----------------
Docker server status ........................................ [running]
Gravitino IT Docker container is already running ............. [yes]
Use Gravitino IT Docker container to run all integration test. [embedded test]
-------------------------------------------------------------

Other, I will reconstruction HiveCatalogIT code and document in the #639

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, but there's a typo "tirno".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I fixed it.

---------------------------------------------------------------
```

> Gravitino will run all integration test cases in the GitHub Actions environment.

### Running Gravitino CI Docker Environment

Before running the tests, make sure Docker is installed.

#### Mac Docker connector
Because Docker Desktop for Mac does not provide access to container IP from host(macOS).
The [mac-docker-connector](https://github.com/wenjunxiao/mac-docker-connector) provides the ability for the macOS host to directly access the docker container IP.
This can result in host(macOS) and containers not being able to access each other's internal services directly over IPs.
Before running the integration tests, make sure to execute the `dev/docker/tools/mac-docker-connector.sh` script.
> Developing Gravitino in a linux environment does not have this limitation and does not require executing the `mac-docker-connector.sh` script ahead of time.

#### Running Gravitino Hive CI Docker Environment

1. Run a hive docker test environment container in the local using the `docker run --rm -d -p 8022:22 -p 8088:8088 -p 9000:9000 -p 9083:9083 -p 10000:10000 -p 10002:10002 -p 50010:50010 -p 50070:50070 -p 50075:50075 datastrato/gravitino-ci-hive` command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the doc about how to run trino integration test locally.

Copy link
Member Author

@xunliu xunliu Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I updated you about how to run the Trino integration test in the docs/integration-test.md, Actual TrinoIT will auto-check the test environment. If you didn't run mac-docker-connector in your local, then TrinoIT will not be running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you have to change the paragraph title. Here it only mentions the "Hive CI Docker". Another thing is that do we need to start a docker before running the integration test?

I think you should reorganize the doc for others without background easy to run and hide details as much as you can.

Expand Down
7 changes: 5 additions & 2 deletions gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ spark = "3.4.1"
scala-collection-compat = "2.7.0"
sqlite-jdbc = "3.42.0.0"
testng = "7.7.1"

testcontainers = "1.19.0"
trino-jdbc = "426"

protobuf-plugin = "0.9.2"
spotless-plugin = '6.11.0'
Expand Down Expand Up @@ -99,7 +100,9 @@ scala-collection-compat = { group = "org.scala-lang.modules", name = "scala-col
sqlite-jdbc = { group = "org.xerial", name = "sqlite-jdbc", version.ref = "sqlite-jdbc" }
testng = { group = "org.testng", name = "testng", version.ref = "testng" }
spark-hive = { group = "org.apache.spark", name = "spark-hive_2.13", version.ref = "spark" }

testcontainers = { group = "org.testcontainers", name = "testcontainers", version.ref = "testcontainers" }
testcontainers-junit-jupiter = { group = "org.testcontainers", name = "junit-jupiter", version.ref = "testcontainers" }
trino-jdbc = { group = "io.trino", name = "trino-jdbc", version.ref = "trino-jdbc" }

[bundles]
log4j = ["slf4j-api", "log4j-slf4j2-impl", "log4j-api", "log4j-core", "log4j-12-api"]
Expand Down
Loading