Skip to content

Commit

Permalink
shell script changes
Browse files Browse the repository at this point in the history
vikasgupta78 committed Jun 1, 2023
1 parent 880b4f6 commit a3ddd46
Showing 2 changed files with 25 additions and 3 deletions.
8 changes: 7 additions & 1 deletion docs/running/databricks.md
Original file line number Diff line number Diff line change
@@ -9,13 +9,19 @@ nav_order: 6
1. Configure databricks connect 11.3 and create correspoding workspace/cluster
https://docs.databricks.com/dev-tools/databricks-connect-legacy.html

Ensure to run databricks-connect configure

2. Set env variable ZINGG_HOME to the path where latest zingg release jar is e.g. location of zingg-0.3.5-SNAPSHOT.jar

4. Set env variable DATA_BRICKS_CONNECT to Y

5. pip install zingg

6. Now run zingg using the shell script with --run option, SPARK session would be made remotely to data bricks and job would run on your databricks environment
6. Now run zingg using the shell script with -run-databricks option, SPARK session would be made remotely to data bricks and job would run on your databricks environment
e.g. ./scripts/zingg.sh --run-databricks test/InMemPipeDataBricks.py

More details on how command line works:

https://docs.zingg.ai/zingg/stepbystep/zingg-command-line

# Running on Databricks
20 changes: 18 additions & 2 deletions scripts/zingg.sh
Original file line number Diff line number Diff line change
@@ -21,6 +21,13 @@ while [[ $# -gt 0 ]]; do
shift # past argument "run"
shift
;;
--run-databricks)
# this option is to run a user script (python)
RUN_PYTHON_DB_CONNECT_PHASE=1
PYTHON_SCRIPT_DB_CONNECT="$2"
shift # past argument "run-databricks"
shift
;;
--log)
LOG_FILE=$2
LOGGING="--files $LOG_FILE"
@@ -43,9 +50,18 @@ set -- "${POSITIONAL_ARGS[@]}" # restore positional parameters
# if it is a python phase
if [[ $RUN_PYTHON_PHASE -eq 1 ]]; then
EXECUTABLE="$PYTHON_SCRIPT"
elif [[ $RUN_PYTHON_DB_CONNECT_PHASE -eq 1 ]]; then
EXECUTABLE="$PYTHON_SCRIPT_DB_CONNECT"
else
EXECUTABLE="--class zingg.spark.client.SparkClient $ZINGG_JARS"
fi

# All the additional options must be added here
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $PROPERTIES --files "./log4j.properties" --conf spark.executor.extraJavaOptions="$log4j_setting -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.driver.extraJavaOptions="$log4j_setting" $LOGGING --driver-class-path $ZINGG_JARS $EXECUTABLE $@ --email $EMAIL --license $LICENSE
if [[ $RUN_PYTHON_DB_CONNECT_PHASE -eq 1 ]]; then
unset SPARK_MASTER
unset SPARK_HOME
export DATA_BRICKS_CONNECT=Y
python $EXECUTABLE
else
# All the additional options must be added here
$SPARK_HOME/bin/spark-submit --master $SPARK_MASTER $PROPERTIES --files "./log4j.properties" --conf spark.executor.extraJavaOptions="$log4j_setting -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:/tmp/memLog.txt -XX:+UseCompressedOops" --conf spark.driver.extraJavaOptions="$log4j_setting" $LOGGING --driver-class-path $ZINGG_JARS $EXECUTABLE $@ --email $EMAIL --license $LICENSE
fi

0 comments on commit a3ddd46

Please sign in to comment.