Skip to content

Commit

Permalink
[#434] feat(CI): Graviton Trino connector E2E testing (#616)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

1. Add HiveContainer and TrinoContainer
3. Use Trino JDBC to connect TrinoContainer execute create database,
table, insert, select and join two tables test.

### Why are the changes needed?

+ Through Trino operations hive
+ Supports Trino E2E testing in the local and GitHub Action.

Fix: #434 

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
Add TrinoConnectorIT in the Integration-test
  • Loading branch information
xunliu authored and web-flow committed Nov 3, 2023
1 parent 74426c8 commit c568cc3
Show file tree
Hide file tree
Showing 31 changed files with 1,376 additions and 18 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,7 @@ distribution
server/src/main/resources/project.properties

dev/docker/hive/packages
docs/build
docs/build

dev/docker/tools/docker-connector
dev/docker/tools/docker-connector.conf
4 changes: 2 additions & 2 deletions conf/gravitino.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ gravitino.server.shutdown.timeout = 3000

# THE CONFIGURATION FOR Gravitino WEB SERVER
# The host name of the built-in web server
gravitino.server.webserver.host = 127.0.0.1
gravitino.server.webserver.host = 0.0.0.0
# The http port number of the built-in web server
gravitino.server.webserver.httpPort = 8090
# The min thread size of the built-in web server
Expand Down Expand Up @@ -44,7 +44,7 @@ gravitino.auxService.names = iceberg-rest
# Iceberg REST service classpath
gravitino.auxService.iceberg-rest.classpath = catalogs/lakehouse-iceberg/libs, catalogs/lakehouse-iceberg/conf
# Iceberg REST service host
gravitino.auxService.iceberg-rest.host = 127.0.0.1
gravitino.auxService.iceberg-rest.host = 0.0.0.0
# Iceberg REST service http port
gravitino.auxService.iceberg-rest.httpPort = 9001

2 changes: 1 addition & 1 deletion dev/docker/hive/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#

FROM ubuntu:16.04
LABEL maintainer="dev@datastrato.com"
LABEL maintainer="support@datastrato.com"

ARG HADOOP_PACKAGE_NAME
ARG HIVE_PACKAGE_NAME
Expand Down
3 changes: 3 additions & 0 deletions dev/docker/hive/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,6 @@ ssh -p 8022 datastrato@localhost (password: ds123, this is a sudo user)
- Config HDFS DataNode data transfer address to `0.0.0.0:50010` explicitly
- Map container hostname to `127.0.0.1` before starting Hadoop
- Expose `50010` port for the HDFS DataNode

### 0.1.5
- Rollback `Map container hostname to 127.0.0.1 before starting Hadoop` of `datastrato/gravitino-ci-hive:0.1.4`
6 changes: 0 additions & 6 deletions dev/docker/hive/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,6 @@ service ssh start
ssh-keyscan localhost > /root/.ssh/known_hosts
ssh-keyscan 0.0.0.0 >> /root/.ssh/known_hosts

# Map the hostname to 127.0.0.1 for external access datanode
hostname=$(cat /etc/hostname)
new_content=$(cat /etc/hosts | sed "/$hostname/s/^/# /")
new_content="${new_content}\n127.0.0.1 ${hostname}"
echo -e "$new_content" > /etc/hosts

# start hadoop
${HADOOP_HOME}/sbin/start-all.sh

Expand Down
11 changes: 11 additions & 0 deletions dev/docker/tools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<!--
Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2.
-->

# Mac Docker Connector
Because Docker Desktop for Mac does not provide access to container IP from host(macOS).
This can result in host(macOS) and containers not being able to access each other's internal services directly over IPs.
The [mac-docker-connector](https://github.com/wenjunxiao/mac-docker-connector) provides the ability for the macOS host to directly access the docker container IP.
Before running the integration tests, make sure to execute the `dev/docker/tools/mac-docker-connector.sh` script.
> Developing Gravitino in a linux environment does not have this limitation and does not require executing the `mac-docker-connector.sh` script ahead of time.
37 changes: 37 additions & 0 deletions dev/docker/tools/mac-docker-connector.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
#set -ex

bin="$(dirname "${BASH_SOURCE-$0}")"
bin="$(cd "${bin}">/dev/null; pwd)"

OS=$(uname -s)
if [ "${OS}" != "Darwin" ]; then
echo "Only macOS needs to run mac-docker-connector."
exit 1
fi

if pgrep -xq "docker-connector"; then
echo "docker-connector is running."
exit 1
fi

# Download docker-connector
DOCKER_CONNECTOR_PACKAGE_NAME="docker-connector-darwin.tar.gz"
DOCKER_CONNECTOR_DOWNLOAD_URL="https://github.com/wenjunxiao/mac-docker-connector/releases/download/v3.2/${DOCKER_CONNECTOR_PACKAGE_NAME}"
if [ ! -f "${bin}/docker-connector" ]; then
wget -q -P "${bin}" ${DOCKER_CONNECTOR_DOWNLOAD_URL}
tar -xzf "${bin}/${DOCKER_CONNECTOR_PACKAGE_NAME}" -C "${bin}"
rm -rf "${bin}/${DOCKER_CONNECTOR_PACKAGE_NAME}"
fi

# Create a docker-connector.conf file with the routes to the docker networks
if [ ! -f "${bin}/docker-connector.conf" ]; then
docker network ls --filter driver=bridge --format "{{.ID}}" | xargs docker network inspect --format "route {{range .IPAM.Config}}{{.Subnet}}{{end}}" > ${bin}/docker-connector.conf
fi

echo "Start docker-connector requires root privileges, Please enter the root password."
sudo ${bin}/docker-connector -config ${bin}/docker-connector.conf
7 changes: 7 additions & 0 deletions dev/docker/trino/conf/catalog/gravitino.properties.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = gravitino
gravitino.uri = http://GRAVITINO_HOST_IP:GRAVITINO_HOST_PORT
gravitino.metalake = GRAVITINO_METALAKE_NAME
7 changes: 7 additions & 0 deletions dev/docker/trino/conf/catalog/hive.properties.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = hive
hive.metastore.uri = thrift://HIVE_HOST_IP:9083
hive.allow-drop-table = true
5 changes: 5 additions & 0 deletions dev/docker/trino/conf/catalog/jmx.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = jmx
6 changes: 6 additions & 0 deletions dev/docker/trino/conf/catalog/tpcds.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = tpcds
tpcds.splits-per-node = 4
6 changes: 6 additions & 0 deletions dev/docker/trino/conf/catalog/tpch.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
connector.name = tpch
tpch.splits-per-node = 4
16 changes: 16 additions & 0 deletions dev/docker/trino/conf/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
#single node install config
#coordinator = true
#node-scheduler.include-coordinator = true
#http-server.http.port = 8080
#discovery-server.enabled = true
#discovery.uri = http://localhost:8080
#protocol.v1.alternate-header-name = Presto
#hive.hdfs.impersonation.enabled = true
coordinator = true
node-scheduler.include-coordinator = true
http-server.http.port = 8080
discovery.uri = http://0.0.0.0:8080
18 changes: 18 additions & 0 deletions dev/docker/trino/conf/jvm.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
-server
-Xmx1G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseGCOverheadLimit
-XX:+ExitOnOutOfMemoryError
-XX:ReservedCodeCacheSize=256M
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
-DHADOOP_USER_NAME=hive
-Dlog4j.configurationFile=/etc/trino/log4j2.properties
6 changes: 6 additions & 0 deletions dev/docker/trino/conf/log.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
# Enable verbose logging from Trino
io.trino = INFO
30 changes: 30 additions & 0 deletions dev/docker/trino/conf/log4j2.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#

# Set to debug or trace if log4j initialization is failing
status = info

# Name of the configuration
name = ConsoleLogConfig

# Console appender configuration
appender.console.type = Console
appender.console.name = consoleLogger
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# File appender configuration
appender.file.type = File
appender.file.name = fileLogger
appender.file.fileName = gravitino-trino-connector.log
appender.file.layout.type = PatternLayout
appender.file.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# Root logger level
rootLogger.level = info

# Root logger referring to console and file appenders
rootLogger.appenderRef.stdout.ref = consoleLogger
rootLogger.appenderRef.file.ref = fileLogger
7 changes: 7 additions & 0 deletions dev/docker/trino/conf/node.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
node.environment = docker
node.data-dir = /data/trino
plugin.dir = /usr/lib/trino/plugin
18 changes: 18 additions & 0 deletions docs/integration-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,30 @@ Run only test cases where tag is set `gravitino-docker-it`. [embbeded|deplo
---------------------------------------------------------------
```

If Docker is not installed or the `mac docker connector` is not running, the `./gradlew test -PtestMode=[embedded|deploy]`
command will skip the test cases that depend on the `mac docker connector`.

```text
------------------- Check Docker environment ------------------
Docker server status .......................................... [running]
Gravitino IT Docker container is already running ............... [no]
Run only test cases where tag is set `gravitino-trino-it`. [embbeded|deploy test]
---------------------------------------------------------------
```

> Gravitino will run all integration test cases in the GitHub Actions environment.
### Running Gravitino CI Docker Environment

Before running the tests, make sure Docker is installed.

#### Mac Docker connector
Because Docker Desktop for Mac does not provide access to container IP from host(macOS).
The [mac-docker-connector](https://github.com/wenjunxiao/mac-docker-connector) provides the ability for the macOS host to directly access the docker container IP.
This can result in host(macOS) and containers not being able to access each other's internal services directly over IPs.
Before running the integration tests, make sure to execute the `dev/docker/tools/mac-docker-connector.sh` script.
> Developing Gravitino in a linux environment does not have this limitation and does not require executing the `mac-docker-connector.sh` script ahead of time.
#### Running Gravitino Hive CI Docker Environment

1. Run a hive docker test environment container in the local using the `docker run --rm -d -p 8022:22 -p 8088:8088 -p 9000:9000 -p 9083:9083 -p 10000:10000 -p 10002:10002 -p 50010:50010 -p 50070:50070 -p 50075:50075 datastrato/gravitino-ci-hive` command.
Expand Down
7 changes: 5 additions & 2 deletions gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ spark = "3.4.1"
scala-collection-compat = "2.7.0"
sqlite-jdbc = "3.42.0.0"
testng = "7.7.1"

testcontainers = "1.19.0"
trino-jdbc = "426"

protobuf-plugin = "0.9.2"
spotless-plugin = '6.11.0'
Expand Down Expand Up @@ -99,7 +100,9 @@ scala-collection-compat = { group = "org.scala-lang.modules", name = "scala-col
sqlite-jdbc = { group = "org.xerial", name = "sqlite-jdbc", version.ref = "sqlite-jdbc" }
testng = { group = "org.testng", name = "testng", version.ref = "testng" }
spark-hive = { group = "org.apache.spark", name = "spark-hive_2.13", version.ref = "spark" }

testcontainers = { group = "org.testcontainers", name = "testcontainers", version.ref = "testcontainers" }
testcontainers-junit-jupiter = { group = "org.testcontainers", name = "junit-jupiter", version.ref = "testcontainers" }
trino-jdbc = { group = "io.trino", name = "trino-jdbc", version.ref = "trino-jdbc" }

[bundles]
log4j = ["slf4j-api", "log4j-slf4j2-impl", "log4j-api", "log4j-core", "log4j-12-api"]
Expand Down
Loading

0 comments on commit c568cc3

Please sign in to comment.