Development document

Kylin on Parquet Development Document

Source code

git clone https://github.com/Kyligence/kylin-on-parquet-v2.git
# Compile 
mvn clean install -DskipTests

Modules

kylin-spark-project
All parquet related code
- kylin-spark-engine
  cube build engine
- kylin-spark-common
  commont utils
- kylin-spark-metadata
  parquet metadata
- kylin-spark-query query engine
- kylin-spark-test
  integration test cases
parquet-assemly
package the job jar

Environment

Download spark(Not support community version for now)

# spark version is spark-2.4.1-os-kylin-r3
wget https://download-resource.s3.cn-north-1.amazonaws.com.cn/osspark/spark-2.4.1-os-kylin-r3.tgz

If you submit Spark job through VPN service，you may need to change the following property which in ${SPARK_HOME}/spark-env.sh
```
#or add as system envirionment property
SPARK_LOCAL_IP=${VPN_LOCAL_IP}
```

How to package && depoly

cd ${KYLIN_SOURCE_CODE}
# For HDP2.x
./build/script/package.sh

# For CDH5.7
./build/script/package.sh -P cdh5.7
# After finished, the package will be avaliable in the directory ${KYLIN_SOURCE_CODE}/dist/

# If running on HDP, you need to uncomment the following properties in kylin.properties
kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current

Configuration

# Most properties are still supported
# As the query engine uses Spark engine, we provided spark configuration for query engine, you can configure spark by add the property like follow:
kylin.query.spark-conf.spark.executor.cores=5
# Cube build engine only supports spark engine, spark configuration as follows:
kylin.engine.spark-conf.spark.executor.cores=5

Debug

There are two ways to debug locally without connecting with sandbox

UT All test cases support debug with local metadata. Such as cube build UT SparkCubingJobTest whose path is ${KYLIN_SOURCE_ROOT}/kylin-spark-project/kylin-spark-engine/src/test/java/org/apache/kylin/engine/spark/job/SparkCubingJobTest.java
- testBuildJob()
  build cube and check parquet file
- testBuildTwoSegmentsAndMerge()
  merge two segments and check parquet file
Debug with tomcat without hadoop environment (It is worth to say that if you want to debug with tomcat without hadoop environment, you can only use local csv data source, cannot use hive tables)
1. Clone and compile Kylin source code, and suppose the path for Kylin source code is KYLIN_SOURCE_DIR
```
git clone https://github.com/Kyligence/kylin-on-parquet-v2.git
# Compile 
mvn clean install -DskipTests
```
1. Copy WEB-INF under server/src/main/webapp/WEB-INF to webapp/app/WEB-INF
```
    cd $KYLIN_SOURCE_DIR
cp -r server/src/main/webapp/WEB-INF webapp/app/WEB-INF
```
1. Install the dependencies of web app, please comfirm npm installed on your machine, if not, please refert to https://www.npmjs.com/get-npm
```
cd $KYLIN_SOURCE_DIR/webapp

npm install -g bower

bower --allow-root install
```
1. Open Kylin project with your IDE (InetlliJ IDEA)
2. Open the config file for local debug with path "$KYLIN_SOURCE_DIR/examples/test_case_data/sandbox/kylin.properties", config items below:
- Set kylin.metadata.url to a path of your local metadata or kylin local test metadata which is in "${KYLIN_SOURCE}/example/test_case_data/parquet_test"
- Set kylin.env.zookeeper-is-local=true
- Set kylin.storage.url to a path of your local machine, like kylin.storage.url=/tmp/kylin, your should create the folder first
- set kylin.env.hdfs-working-dir to a path of your local machine with prefix "file://", like kylin.env.hdfs-working-dir=file:///tmp/kylin_data
- Set kylin.engine.spark-conf.spark.master to local mode, kylin.engine.spark-conf.spark.master=local
- Set kylin.engine.spark-conf.spark.eventLog.dir to a path of your local machine for the spark log, like kylin.engine.spark-conf.spark.eventLog.dir=/tmp/spark-history, your should create the folder first
1. Open config menu in "Run->Debug Configurations", set the main class with reference org.apache.kylin.rest.DebugTomcat, set "VM options" with -Dspark.local=true, set "Working directory" with $MODULE_WORKING_DIR$ , toggle option "Include dependencies with 'Provided' scope". Press button of Debug.
1. If all goes well, kylin instance is started in your local machine, login with user name "ADMIN", and its default password "KYLIN"
1. Create a project
2. Load csv data source by pressing button "Data Source->Load CSV File as Table" on "Model" page, and set the schema for your table. Then press "submit" to save.
1. Design your model and cube on "Model" page, please refer to http://kylin.apache.org/docs/tutorial/create_cube.html
2. Build the cube with some time range
3. Monitor the cubing job
1. After cube be built, it will be stored as parquet files
1. Query the cube data on "Insight" page

kylin on parquetv2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development document

Kylin on Parquet Development Document

Source code

Modules

Environment

How to package && depoly

Configuration

Debug

Clone this wiki locally