-
Notifications
You must be signed in to change notification settings - Fork 173
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Kazuyuki Tanimura <[email protected]> Co-authored-by: Steve Vaughan Jr <[email protected]> Co-authored-by: Huaxin Gao <[email protected]> Co-authored-by: Parth Chandra <[email protected]> Co-authored-by: Oleksandr Voievodin <[email protected]>
- Loading branch information
1 parent
20edb17
commit 3feecfe
Showing
233 changed files
with
54,827 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
target | ||
.idea | ||
*.iml | ||
derby.log | ||
metastore_db/ | ||
spark-warehouse/ | ||
dependency-reduced-pom.xml | ||
core/src/execution/generated | ||
prebuild | ||
.flattened-pom.xml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
rules = [ | ||
ExplicitResultTypes, | ||
NoAutoTupling, | ||
RemoveUnused, | ||
|
||
DisableSyntax, | ||
LeakingImplicitClassVal, | ||
NoValInForComprehension, | ||
ProcedureSyntax, | ||
RedundantSyntax | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Comet Debugging Guide | ||
|
||
This HOWTO describes how to debug JVM code and Native code concurrently. The guide assumes you have: | ||
1. Intellij as the Java IDE | ||
2. CLion as the Native IDE. For Rust code, the CLion Rust language plugin is required. Note that the | ||
Intellij Rust plugin is not sufficient. | ||
3. CLion/LLDB as the native debugger. CLion ships with a bundled LLDB and the Rust community has | ||
its own packaging of LLDB (`lldb-rust`). Both provide a better display of Rust symbols than plain | ||
LLDB or the LLDB that is bundled with XCode. We will use the LLDB packaged with CLion for this guide. | ||
4. We will use a Comet _unit_ test as the canonical use case. | ||
|
||
_Caveat: The steps here have only been tested with JDK 11_ on Mac (M1) | ||
|
||
## Debugging for Advanced Developers | ||
|
||
Add a `.lldbinit` to comet/core. This is not strictly necessary but will be useful if you want to | ||
use advanced `lldb` debugging. A sample `.lldbinit` is provided in the comet/core directory | ||
|
||
### In Intellij | ||
|
||
1. Set a breakpoint in `NativeBase.load()`, at a point _after_ the Comet library has been loaded. | ||
|
||
1. Add a Debug Configuration for the unit test | ||
|
||
1. In the Debug Configuration for that unit test add `-Xint` as a JVM parameter. This option is | ||
undocumented *magic*. Without this, the LLDB debugger hits a EXC_BAD_ACCESS (or EXC_BAD_INSTRUCTION) from | ||
which one cannot recover. | ||
|
||
1. Add a println to the unit test to print the PID of the JVM process. (jps can also be used but this is less error prone if you have multiple jvm processes running) | ||
``` JDK8 | ||
println("Waiting for Debugger: PID - ", ManagementFactory.getRuntimeMXBean().getName()) | ||
``` | ||
This will print something like : `PID@your_machine_name`. | ||
For JDK9 and newer | ||
```JDK9 | ||
println("Waiting for Debugger: PID - ", ProcessHandle.current.pid) | ||
``` | ||
==> Note the PID | ||
1. Debug-run the test in Intellij and wait for the breakpoint to be hit | ||
### In CLion | ||
1. After the breakpoint is hit in Intellij, in Clion (or LLDB from terminal or editor) - | ||
1. Attach to the jvm process (make sure the PID matches). In CLion, this is `Run -> Atttach to process` | ||
1. Put your breakpoint in the native code | ||
1. Go back to intellij and resume the process. | ||
1. Most debugging in CLion is similar to Intellij. For advanced LLDB based debugging the LLDB command line can be accessed from the LLDB tab in the Debugger view. Refer to the [LLDB manual](https://lldb.llvm.org/use/tutorial.html) for LLDB commands. | ||
### After your debugging is done, | ||
1. In CLion, detach from the process if not already detached | ||
2. In Intellij, the debugger might have lost track of the process. If so, the debugger tab | ||
will show the process as running (even if the test/job is shown as completed). | ||
3. Close the debugger tab, and if the IDS asks whether it should terminate the process, | ||
click Yes. | ||
4. In terminal, use jps to identify the process with the process id you were debugging. If | ||
it shows up as running, kill -9 [pid]. If that doesn't remove the process, don't bother, | ||
the process will be left behind as a zombie and will consume no (significant) resources. | ||
Eventually it will be cleaned up when you reboot possibly after a software update. | ||
### Additional Info | ||
OpenJDK mailing list on debugging the JDK on MacOS | ||
https://mail.openjdk.org/pipermail/hotspot-dev/2019-September/039429.html | ||
Detecting the debugger | ||
https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to%20let%20your%20app%20continue.&text=I%20know%20that%20those%20are,meant%20with%20my%20first%20phrase). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Comet Development Guide | ||
|
||
## Project Layout | ||
|
||
``` | ||
├── common <- common Java/Scala code | ||
├── conf <- configuration files | ||
├── core <- core native code, in Rust | ||
├── spark <- Spark integration | ||
``` | ||
|
||
## Development Setup | ||
|
||
1. Make sure `JAVA_HOME` is set and point to JDK 11 installation. | ||
2. Install Rust toolchain. The easiest way is to use | ||
[rustup](https://rustup.rs). | ||
|
||
## Build & Test | ||
|
||
A few common commands are specified in project's `Makefile`: | ||
|
||
- `make`: compile the entire project, but don't run tests | ||
- `make test`: compile the project and run tests in both Rust and Java | ||
side. | ||
- `make release`: compile the project and creates a release build. This | ||
is useful when you want to test Comet local installation in another project | ||
such as Spark. | ||
- `make clean`: clean up the workspace | ||
- `bin/comet-spark-shell -d . -o spark/target/` run Comet spark shell for V1 datasources | ||
- `bin/comet-spark-shell -d . -o spark/target/ --conf spark.sql.sources.useV1SourceList=""` run Comet spark shell for V2 datasources | ||
|
||
## Benchmark | ||
|
||
There's a `make` command to run micro benchmarks in the repo. For | ||
instance: | ||
|
||
``` | ||
make benchmark-org.apache.spark.sql.benchmark.CometReadBenchmark | ||
``` | ||
|
||
To run TPC-H or TPC-DS micro benchmarks, please follow the instructions | ||
in the respective source code, e.g., `CometTPCHQueryBenchmark`. | ||
|
||
## Debugging | ||
Comet is a multi-language project with native code written in Rust and JVM code written in Java and Scala. | ||
It is possible to debug both native and JVM code concurrently as described in the [DEBUGGING guide](DEBUGGING.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Expressions Supported by Comet | ||
|
||
The following Spark expressions are currently available: | ||
|
||
+ Literals | ||
+ Arithmetic Operators | ||
+ UnaryMinus | ||
+ Add/Minus/Multiply/Divide/Remainder | ||
+ Conditional functions | ||
+ Case When | ||
+ If | ||
+ Cast | ||
+ Coalesce | ||
+ Boolean functions | ||
+ And | ||
+ Or | ||
+ Not | ||
+ EqualTo | ||
+ EqualNullSafe | ||
+ GreaterThan | ||
+ GreaterThanOrEqual | ||
+ LessThan | ||
+ LessThanOrEqual | ||
+ IsNull | ||
+ IsNotNull | ||
+ In | ||
+ String functions | ||
+ Substring | ||
+ Coalesce | ||
+ StringSpace | ||
+ Like | ||
+ Contains | ||
+ Startswith | ||
+ Endswith | ||
+ Ascii | ||
+ Bit_length | ||
+ Octet_length | ||
+ Upper | ||
+ Lower | ||
+ Chr | ||
+ Initcap | ||
+ Trim/Btrim/Ltrim/Rtrim | ||
+ Concat_ws | ||
+ Repeat | ||
+ Length | ||
+ Reverse | ||
+ Instr | ||
+ Replace | ||
+ Translate | ||
+ Bitwise functions | ||
+ Shiftright/Shiftleft | ||
+ Date/Time functions | ||
+ Year/Hour/Minute/Second | ||
+ Math functions | ||
+ Abs | ||
+ Acos | ||
+ Asin | ||
+ Atan | ||
+ Atan2 | ||
+ Cos | ||
+ Exp | ||
+ Ln | ||
+ Log10 | ||
+ Log2 | ||
+ Pow | ||
+ Round | ||
+ Signum | ||
+ Sin | ||
+ Sqrt | ||
+ Tan | ||
+ Ceil | ||
+ Floor | ||
+ Aggregate functions | ||
+ Count | ||
+ Sum | ||
+ Max | ||
+ Min |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
.PHONY: all core jvm test clean release-linux release bench | ||
|
||
all: core jvm | ||
|
||
core: | ||
cd core && cargo build | ||
jvm: | ||
mvn clean package -DskipTests $(PROFILES) | ||
test: | ||
mvn clean | ||
# We need to compile CometException so that the cargo test can pass | ||
mvn compile -pl common -DskipTests $(PROFILES) | ||
cd core && cargo build && \ | ||
LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli && \ | ||
DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH:+${DYLD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli \ | ||
RUST_BACKTRACE=1 cargo test | ||
SPARK_HOME=`pwd` COMET_CONF_DIR=$(shell pwd)/conf RUST_BACKTRACE=1 mvn verify $(PROFILES) | ||
clean: | ||
cd core && cargo clean | ||
mvn clean | ||
rm -rf .dist | ||
bench: | ||
cd core && LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli && \ | ||
DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH:+${DYLD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli \ | ||
RUSTFLAGS="-Ctarget-cpu=native" cargo bench $(filter-out $@,$(MAKECMDGOALS)) | ||
format: | ||
mvn compile test-compile scalafix:scalafix -Psemanticdb $(PROFILES) | ||
mvn spotless:apply $(PROFILES) | ||
|
||
core-amd64: | ||
rustup target add x86_64-apple-darwin | ||
cd core && RUSTFLAGS="-Ctarget-cpu=skylake -Ctarget-feature=-prefer-256-bit" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --features nightly --release | ||
mkdir -p common/target/classes/org/apache/comet/darwin/x86_64 | ||
cp core/target/x86_64-apple-darwin/release/libcomet.dylib common/target/classes/org/apache/comet/darwin/x86_64 | ||
cd core && RUSTFLAGS="-Ctarget-cpu=haswell -Ctarget-feature=-prefer-256-bit" cargo build --features nightly --release | ||
mkdir -p common/target/classes/org/apache/comet/linux/amd64 | ||
cp core/target/release/libcomet.so common/target/classes/org/apache/comet/linux/amd64 | ||
jar -cf common/target/comet-native-x86_64.jar \ | ||
-C common/target/classes/org/apache/comet darwin \ | ||
-C common/target/classes/org/apache/comet linux | ||
./dev/deploy-file common/target/comet-native-x86_64.jar comet-native-x86_64${COMET_CLASSIFIER} jar | ||
|
||
core-arm64: | ||
rustup target add aarch64-apple-darwin | ||
cd core && RUSTFLAGS="-Ctarget-cpu=apple-m1" CC=arm64-apple-darwin21.4-clang CXX=arm64-apple-darwin21.4-clang++ CARGO_FEATURE_NEON=1 cargo build --target aarch64-apple-darwin --features nightly --release | ||
mkdir -p common/target/classes/org/apache/comet/darwin/aarch64 | ||
cp core/target/aarch64-apple-darwin/release/libcomet.dylib common/target/classes/org/apache/comet/darwin/aarch64 | ||
cd core && RUSTFLAGS="-Ctarget-cpu=native" cargo build --features nightly --release | ||
mkdir -p common/target/classes/org/apache/comet/linux/aarch64 | ||
cp core/target/release/libcomet.so common/target/classes/org/apache/comet/linux/aarch64 | ||
jar -cf common/target/comet-native-aarch64.jar \ | ||
-C common/target/classes/org/apache/comet darwin \ | ||
-C common/target/classes/org/apache/comet linux | ||
./dev/deploy-file common/target/comet-native-aarch64.jar comet-native-aarch64${COMET_CLASSIFIER} jar | ||
|
||
release-linux: clean | ||
rustup target add aarch64-apple-darwin x86_64-apple-darwin | ||
cd core && RUSTFLAGS="-Ctarget-cpu=apple-m1" CC=arm64-apple-darwin21.4-clang CXX=arm64-apple-darwin21.4-clang++ CARGO_FEATURE_NEON=1 cargo build --target aarch64-apple-darwin --features nightly --release | ||
cd core && RUSTFLAGS="-Ctarget-cpu=skylake -Ctarget-feature=-prefer-256-bit" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --features nightly --release | ||
cd core && RUSTFLAGS="-Ctarget-cpu=native -Ctarget-feature=-prefer-256-bit" cargo build --features nightly --release | ||
mvn install -Prelease -DskipTests $(PROFILES) | ||
release: | ||
cd core && RUSTFLAGS="-Ctarget-cpu=native" cargo build --features nightly --release | ||
mvn install -Prelease -DskipTests $(PROFILES) | ||
benchmark-%: clean release | ||
cd spark && COMET_CONF_DIR=$(shell pwd)/conf MAVEN_OPTS='-Xmx20g' .mvn exec:java -Dexec.mainClass="$*" -Dexec.classpathScope="test" -Dexec.cleanupDaemonThreads="false" -Dexec.args="$(filter-out $@,$(MAKECMDGOALS))" $(PROFILES) | ||
.DEFAULT: | ||
@: # ignore arguments provided to benchmarks e.g. "make benchmark-foo -- --bar", we do not want to treat "--bar" as target |
Oops, something went wrong.