-
Notifications
You must be signed in to change notification settings - Fork 14
Development document
git clone https://github.com/Kyligence/kylin-on-parquet-v2.git
# Compile
mvn clean install -DskipTests
- kylin-spark-project
All parquet related code- kylin-spark-engine
cube build engine - kylin-spark-common
commont utils - kylin-spark-metadata
parquet metadata - kylin-spark-query query engine
- kylin-spark-test
integration test cases
- kylin-spark-engine
- parquet-assemly
package the job jar
-
Download spark(Not support community version for now)
# spark version is spark-2.4.1-os-kylin-r3 wget https://download-resource.s3.cn-north-1.amazonaws.com.cn/osspark/spark-2.4.1-os-kylin-r3.tgz
-
If you submit Spark job through VPN service,you may need to change the following property which in ${SPARK_HOME}/spark-env.sh
#or add as system envirionment property SPARK_LOCAL_IP=${VPN_LOCAL_IP}
cd ${KYLIN_SOURCE_CODE}
# For HDP2.x
./build/script/package.sh
# For CDH5.7
./build/script/package.sh -P cdh5.7
# After finished, the package will be avaliable in the directory ${KYLIN_SOURCE_CODE}/dist/
# If running on HDP, you need to uncomment the following properties in kylin.properties
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
There are two ways to debug locally without connecting with sandbox
-
UT All test cases support debug with local metadata. Such as cube build UT SparkCubingJobTest whose path is ${KYLIN_SOURCE_ROOT}/kylin-spark-project/kylin-spark-engine/src/test/java/org/apache/kylin/engine/spark/job/SparkCubingJobTest.java
-
testBuildJob()
build cube and check parquet file -
testBuildTwoSegmentsAndMerge()
merge two segments and check parquet file
-
-
Debug with tomcat without hadoop environment
- Clone and compile Kylin source code, and suppose the path for Kylin source code is
KYLIN_SOURCE_DIR
git clone https://github.com/Kyligence/kylin-on-parquet-v2.git # Compile mvn clean install -DskipTests
- Copy WEB-INF under
server/src/main/webapp/WEB-INF
towebapp/app/WEB-INF
cd $KYLIN_SOURCE_DIR cp -r server/src/main/webapp/WEB-INF webapp/app/WEB-INF
- Install the dependencies of web app, please comfirm
npm
installed on your machine, if not, please refert to https://www.npmjs.com/get-npm
cd $KYLIN_SOURCE_DIR/webapp npm install -g bower bower --allow-root install
-
Open Kylin project with your IDE (InetlliJ IDEA)
-
Open the config file for local debug with path "$KYLIN_SOURCE_DIR/examples/test_case_data/sandbox/kylin.properties", config items below:
- Set
kylin.metadata.url
to a path of your local metadata or kylin local test metadata which is in "${KYLIN_SOURCE}/example/test_case_data/parquet_test" - Set
kylin.env.zookeeper-is-local=true
- Set
kylin.storage.url
to a path of your local machine, likekylin.storage.url=/tmp/kylin
, your should create the folder first - set
kylin.env.hdfs-working-dir
to a path of your local machine with prefix "file://", likekylin.env.hdfs-working-dir=file:///tmp/kylin_data
- Set
kylin.engine.spark-conf.spark.master
to local mode,kylin.engine.spark-conf.spark.master=local
- Set
kylin.engine.spark-conf.spark.eventLog.dir
to a path of your local machine for the spark log, likekylin.engine.spark-conf.spark.eventLog.dir=/tmp/spark-history
, your should create the folder first
- Open config menu in "Run->Debug Configurations", set the main class with reference
org.apache.kylin.rest.DebugTomcat
, set "VM options" with-Dspark.local=true
, set "Working directory" with$MODULE_WORKING_DIR$
, toggle option "Include dependencies with 'Provided' scope". Press button of Debug.
- If all goes well, kylin instance is started in your local machine, login with user name "ADMIN", and its default password "KYLIN"
-
Create a project
-
Load csv data source by pressing button "Data Source->Load CSV File as Table" on "Model" page, and set the schema for your table. Then press "submit" to save.
-
Design your model and cube on "Model" page, please refer to http://kylin.apache.org/docs/tutorial/create_cube.html
-
Build the cube with some time range
-
Monitor the cubing job
- After cube be built, it will be stored as parquet files
- Query the cube data on "Insight" page
- Clone and compile Kylin source code, and suppose the path for Kylin source code is
kylin on parquetv2