-
Notifications
You must be signed in to change notification settings - Fork 14
Development document
git clone https://github.com/Kyligence/kylin-on-parquet-v2.git
# Compile
mvn clean install -DskipTests
- kylin-spark-project
All parquet related code- kylin-spark-engine
cube build engine - kylin-spark-common
commont utils - kylin-spark-metadata
parquet metadata - kylin-spark-query query engine
- kylin-spark-test
integration test cases
- kylin-spark-engine
- parquet-assemly
package the job jar
-
Download spark(Not support community version for now)
# spark version is spark-2.4.1-os-kylin-r3 wget https://download-resource.s3.cn-north-1.amazonaws.com.cn/osspark/spark-2.4.1-os-kylin-r3.tgz
-
If you submit Spark job through VPN service,you may need to change the following property which in ${SPARK_HOME}/spark-env.sh
#or add as system envirionment property SPARK_LOCAL_IP=${VPN_LOCAL_IP}
cd ${KYLIN_SOURCE_CODE}
# For HDP2.x
./build/script/package.sh
# For CDH5.7
./build/script/package.sh -P cdh5.7
# After finished, the package will be avaliable in the directory ${KYLIN_SOURCE_CODE}/dist/
# If running on HDP, you need to uncomment the following properties in kylin.properties
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
# Most properties are still supported
# As the query engine uses Spark engine, we provided spark configuration for query engine, you can configure spark by add the property like follow:
kylin.query.spark-conf.spark.executor.cores=5
# Cube build engine only supports spark engine, spark configuration as follows:
kylin.engine.spark-conf.spark.executor.cores=5
There are two ways to debug locally without connecting with sandbox
-
UT All test cases support debug with local metadata. Such as cube build UT SparkCubingJobTest whose path is ${KYLIN_SOURCE_ROOT}/kylin-spark-project/kylin-spark-engine/src/test/java/org/apache/kylin/engine/spark/job/SparkCubingJobTest.java
-
testBuildJob()
build cube and check parquet file -
testBuildTwoSegmentsAndMerge()
merge two segments and check parquet file
-
-
Debug with tomcat without hadoop environment (It is worth to say that if you want to debug with tomcat without hadoop environment, you can only use local csv data source, cannot use hive tables)
- Clone and compile Kylin source code, and suppose the path for Kylin source code is
KYLIN_SOURCE_DIR
git clone https://github.com/Kyligence/kylin-on-parquet-v2.git # Compile mvn clean install -DskipTests
- Copy WEB-INF under
server/src/main/webapp/WEB-INF
towebapp/app/WEB-INF
cd $KYLIN_SOURCE_DIR cp -r server/src/main/webapp/WEB-INF webapp/app/WEB-INF
- Install the dependencies of web app, please comfirm
npm
installed on your machine, if not, please refert to https://www.npmjs.com/get-npm
cd $KYLIN_SOURCE_DIR/webapp npm install -g bower bower --allow-root install
-
Open Kylin project with your IDE (InetlliJ IDEA)
-
Open the config file for local debug with path "$KYLIN_SOURCE_DIR/examples/test_case_data/sandbox/kylin.properties", config items below:
- Set
kylin.metadata.url
to a path of your local metadata or kylin local test metadata which is in "${KYLIN_SOURCE}/example/test_case_data/parquet_test" - Set
kylin.env.zookeeper-is-local=true
- Set
kylin.storage.url
to a path of your local machine, likekylin.storage.url=/tmp/kylin
, your should create the folder first - set
kylin.env.hdfs-working-dir
to a path of your local machine with prefix "file://", likekylin.env.hdfs-working-dir=file:///tmp/kylin_data
- Set
kylin.engine.spark-conf.spark.master
to local mode,kylin.engine.spark-conf.spark.master=local
- Set
kylin.engine.spark-conf.spark.eventLog.dir
to a path of your local machine for the spark log, likekylin.engine.spark-conf.spark.eventLog.dir=/tmp/spark-history
, your should create the folder first
- Open config menu in "Run->Debug Configurations", set the main class with reference
org.apache.kylin.rest.DebugTomcat
, set "VM options" with-Dspark.local=true
, set "Working directory" with$MODULE_WORKING_DIR$
, toggle option "Include dependencies with 'Provided' scope". Press button of Debug.
- If all goes well, kylin instance is started in your local machine, login with user name "ADMIN", and its default password "KYLIN"
-
Create a project
-
Load csv data source by pressing button "Data Source->Load CSV File as Table" on "Model" page, and set the schema for your table. Then press "submit" to save.
-
Design your model and cube on "Model" page, please refer to http://kylin.apache.org/docs/tutorial/create_cube.html
-
Build the cube with some time range
-
Monitor the cubing job
- After cube be built, it will be stored as parquet files
- Query the cube data on "Insight" page
- Clone and compile Kylin source code, and suppose the path for Kylin source code is
kylin on parquetv2