Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial PR #1

Merged
merged 4 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
target
.idea
*.iml
derby.log
metastore_db/
spark-warehouse/
dependency-reduced-pom.xml
core/src/execution/generated
prebuild
.flattened-pom.xml
27 changes: 27 additions & 0 deletions .scalafix.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
rules = [
ExplicitResultTypes,
NoAutoTupling,
RemoveUnused,

DisableSyntax,
LeakingImplicitClassVal,
NoValInForComprehension,
ProcedureSyntax,
RedundantSyntax
]
96 changes: 96 additions & 0 deletions DEBUGGING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Comet Debugging Guide

This HOWTO describes how to debug JVM code and Native code concurrently. The guide assumes you have:
1. Intellij as the Java IDE
2. CLion as the Native IDE. For Rust code, the CLion Rust language plugin is required. Note that the
Intellij Rust plugin is not sufficient.
3. CLion/LLDB as the native debugger. CLion ships with a bundled LLDB and the Rust community has
its own packaging of LLDB (`lldb-rust`). Both provide a better display of Rust symbols than plain
LLDB or the LLDB that is bundled with XCode. We will use the LLDB packaged with CLion for this guide.
4. We will use a Comet _unit_ test as the canonical use case.

_Caveat: The steps here have only been tested with JDK 11_ on Mac (M1)

## Debugging for Advanced Developers

Add a `.lldbinit` to comet/core. This is not strictly necessary but will be useful if you want to
use advanced `lldb` debugging.

### In Intellij

1. Set a breakpoint in `NativeBase.load()`, at a point _after_ the Comet library has been loaded.

1. Add a Debug Configuration for the unit test

1. In the Debug Configuration for that unit test add `-Xint` as a JVM parameter. This option is
undocumented *magic*. Without this, the LLDB debugger hits a EXC_BAD_ACCESS (or EXC_BAD_INSTRUCTION) from
which one cannot recover.

1. Add a println to the unit test to print the PID of the JVM process. (jps can also be used but this is less error prone if you have multiple jvm processes running)
``` JDK8
println("Waiting for Debugger: PID - ", ManagementFactory.getRuntimeMXBean().getName())
```
This will print something like : `PID@your_machine_name`.

For JDK9 and newer
```JDK9
println("Waiting for Debugger: PID - ", ProcessHandle.current.pid)
```

==> Note the PID

1. Debug-run the test in Intellij and wait for the breakpoint to be hit

### In CLion

1. After the breakpoint is hit in Intellij, in Clion (or LLDB from terminal or editor) -

1. Attach to the jvm process (make sure the PID matches). In CLion, this is `Run -> Atttach to process`

1. Put your breakpoint in the native code

1. Go back to intellij and resume the process.

1. Most debugging in CLion is similar to Intellij. For advanced LLDB based debugging the LLDB command line can be accessed from the LLDB tab in the Debugger view. Refer to the [LLDB manual](https://lldb.llvm.org/use/tutorial.html) for LLDB commands.

### After your debugging is done,

1. In CLion, detach from the process if not already detached

2. In Intellij, the debugger might have lost track of the process. If so, the debugger tab
will show the process as running (even if the test/job is shown as completed).

3. Close the debugger tab, and if the IDS asks whether it should terminate the process,
click Yes.

4. In terminal, use jps to identify the process with the process id you were debugging. If
it shows up as running, kill -9 [pid]. If that doesn't remove the process, don't bother,
the process will be left behind as a zombie and will consume no (significant) resources.
Eventually it will be cleaned up when you reboot possibly after a software update.

### Additional Info

OpenJDK mailing list on debugging the JDK on MacOS
https://mail.openjdk.org/pipermail/hotspot-dev/2019-September/039429.html

Detecting the debugger
https://stackoverflow.com/questions/5393403/can-a-java-application-detect-that-a-debugger-is-attached#:~:text=No.,to%20let%20your%20app%20continue.&text=I%20know%20that%20those%20are,meant%20with%20my%20first%20phrase).
65 changes: 65 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Comet Development Guide

## Project Layout

```
├── common <- common Java/Scala code
├── conf <- configuration files
├── core <- core native code, in Rust
├── spark <- Spark integration
```

## Development Setup

1. Make sure `JAVA_HOME` is set and point to JDK 11 installation.
2. Install Rust toolchain. The easiest way is to use
[rustup](https://rustup.rs).

## Build & Test

A few common commands are specified in project's `Makefile`:

- `make`: compile the entire project, but don't run tests
- `make test`: compile the project and run tests in both Rust and Java
side.
- `make release`: compile the project and creates a release build. This
is useful when you want to test Comet local installation in another project
such as Spark.
- `make clean`: clean up the workspace
- `bin/comet-spark-shell -d . -o spark/target/` run Comet spark shell for V1 datasources
- `bin/comet-spark-shell -d . -o spark/target/ --conf spark.sql.sources.useV1SourceList=""` run Comet spark shell for V2 datasources

## Benchmark

There's a `make` command to run micro benchmarks in the repo. For
instance:

```
make benchmark-org.apache.spark.sql.benchmark.CometReadBenchmark
```

To run TPC-H or TPC-DS micro benchmarks, please follow the instructions
in the respective source code, e.g., `CometTPCHQueryBenchmark`.

## Debugging
Comet is a multi-language project with native code written in Rust and JVM code written in Java and Scala.
It is possible to debug both native and JVM code concurrently as described in the [DEBUGGING guide](DEBUGGING.md)
96 changes: 96 additions & 0 deletions EXPRESSIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Expressions Supported by Comet

The following Spark expressions are currently available:

+ Literals
+ Arithmetic Operators
+ UnaryMinus
+ Add/Minus/Multiply/Divide/Remainder
+ Conditional functions
+ Case When
+ If
+ Cast
+ Coalesce
+ Boolean functions
+ And
+ Or
+ Not
+ EqualTo
+ EqualNullSafe
+ GreaterThan
+ GreaterThanOrEqual
+ LessThan
+ LessThanOrEqual
+ IsNull
+ IsNotNull
+ In
+ String functions
+ Substring
+ Coalesce
+ StringSpace
+ Like
+ Contains
+ Startswith
+ Endswith
+ Ascii
+ Bit_length
+ Octet_length
+ Upper
+ Lower
+ Chr
+ Initcap
+ Trim/Btrim/Ltrim/Rtrim
+ Concat_ws
+ Repeat
+ Length
+ Reverse
+ Instr
+ Replace
+ Translate
+ Bitwise functions
+ Shiftright/Shiftleft
+ Date/Time functions
+ Year/Hour/Minute/Second
+ Math functions
+ Abs
+ Acos
+ Asin
+ Atan
+ Atan2
+ Cos
+ Exp
+ Ln
+ Log10
+ Log2
+ Pow
+ Round
+ Signum
+ Sin
+ Sqrt
+ Tan
+ Ceil
+ Floor
+ Aggregate functions
+ Count
+ Sum
+ Max
+ Min
85 changes: 85 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

.PHONY: all core jvm test clean release-linux release bench

all: core jvm

core:
cd core && cargo build
jvm:
mvn clean package -DskipTests $(PROFILES)
test:
mvn clean
# We need to compile CometException so that the cargo test can pass
mvn compile -pl common -DskipTests $(PROFILES)
cd core && cargo build && \
LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli && \
DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH:+${DYLD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli \
RUST_BACKTRACE=1 cargo test
SPARK_HOME=`pwd` COMET_CONF_DIR=$(shell pwd)/conf RUST_BACKTRACE=1 mvn verify $(PROFILES)
clean:
cd core && cargo clean
mvn clean
rm -rf .dist
bench:
cd core && LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli && \
DYLD_LIBRARY_PATH=${DYLD_LIBRARY_PATH:+${DYLD_LIBRARY_PATH}:}${JAVA_HOME}/lib:${JAVA_HOME}/lib/server:${JAVA_HOME}/lib/jli \
RUSTFLAGS="-Ctarget-cpu=native" cargo bench $(filter-out $@,$(MAKECMDGOALS))
format:
mvn compile test-compile scalafix:scalafix -Psemanticdb $(PROFILES)
mvn spotless:apply $(PROFILES)

core-amd64:
rustup target add x86_64-apple-darwin
cd core && RUSTFLAGS="-Ctarget-cpu=skylake -Ctarget-feature=-prefer-256-bit" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --features nightly --release
mkdir -p common/target/classes/org/apache/comet/darwin/x86_64
cp core/target/x86_64-apple-darwin/release/libcomet.dylib common/target/classes/org/apache/comet/darwin/x86_64
cd core && RUSTFLAGS="-Ctarget-cpu=haswell -Ctarget-feature=-prefer-256-bit" cargo build --features nightly --release
mkdir -p common/target/classes/org/apache/comet/linux/amd64
cp core/target/release/libcomet.so common/target/classes/org/apache/comet/linux/amd64
jar -cf common/target/comet-native-x86_64.jar \
-C common/target/classes/org/apache/comet darwin \
-C common/target/classes/org/apache/comet linux
./dev/deploy-file common/target/comet-native-x86_64.jar comet-native-x86_64${COMET_CLASSIFIER} jar

core-arm64:
rustup target add aarch64-apple-darwin
cd core && RUSTFLAGS="-Ctarget-cpu=apple-m1" CC=arm64-apple-darwin21.4-clang CXX=arm64-apple-darwin21.4-clang++ CARGO_FEATURE_NEON=1 cargo build --target aarch64-apple-darwin --features nightly --release
mkdir -p common/target/classes/org/apache/comet/darwin/aarch64
cp core/target/aarch64-apple-darwin/release/libcomet.dylib common/target/classes/org/apache/comet/darwin/aarch64
cd core && RUSTFLAGS="-Ctarget-cpu=native" cargo build --features nightly --release
mkdir -p common/target/classes/org/apache/comet/linux/aarch64
cp core/target/release/libcomet.so common/target/classes/org/apache/comet/linux/aarch64
jar -cf common/target/comet-native-aarch64.jar \
-C common/target/classes/org/apache/comet darwin \
-C common/target/classes/org/apache/comet linux
./dev/deploy-file common/target/comet-native-aarch64.jar comet-native-aarch64${COMET_CLASSIFIER} jar

release-linux: clean
rustup target add aarch64-apple-darwin x86_64-apple-darwin
cd core && RUSTFLAGS="-Ctarget-cpu=apple-m1" CC=arm64-apple-darwin21.4-clang CXX=arm64-apple-darwin21.4-clang++ CARGO_FEATURE_NEON=1 cargo build --target aarch64-apple-darwin --features nightly --release
cd core && RUSTFLAGS="-Ctarget-cpu=skylake -Ctarget-feature=-prefer-256-bit" CC=o64-clang CXX=o64-clang++ cargo build --target x86_64-apple-darwin --features nightly --release
cd core && RUSTFLAGS="-Ctarget-cpu=native -Ctarget-feature=-prefer-256-bit" cargo build --features nightly --release
mvn install -Prelease -DskipTests $(PROFILES)
release:
cd core && RUSTFLAGS="-Ctarget-cpu=native" cargo build --features nightly --release
mvn install -Prelease -DskipTests $(PROFILES)
benchmark-%: clean release
cd spark && COMET_CONF_DIR=$(shell pwd)/conf MAVEN_OPTS='-Xmx20g' .mvn exec:java -Dexec.mainClass="$*" -Dexec.classpathScope="test" -Dexec.cleanupDaemonThreads="false" -Dexec.args="$(filter-out $@,$(MAKECMDGOALS))" $(PROFILES)
.DEFAULT:
@: # ignore arguments provided to benchmarks e.g. "make benchmark-foo -- --bar", we do not want to treat "--bar" as target
Loading