-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GEOS library as an optional dependency #12243
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this.
CMakeLists.txt
Outdated
@@ -138,6 +138,7 @@ option(VELOX_ENABLE_ABFS "Build Abfs Connector" OFF) | |||
option(VELOX_ENABLE_HDFS "Build Hdfs Connector" OFF) | |||
option(VELOX_ENABLE_PARQUET "Enable Parquet support" ON) | |||
option(VELOX_ENABLE_ARROW "Enable Arrow support" OFF) | |||
option(VELOX_ENABLE_GEO "Enable Geospatial support" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should turn this OFF by default for now. Otherwise, we just bundle it needlessly.
We should add this to the setup scripts to make it a system dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, updated the code
scripts/setup-centos9.sh
Outdated
@@ -78,7 +78,7 @@ function install_velox_deps_from_dnf { | |||
dnf_install libevent-devel \ | |||
openssl-devel re2-devel libzstd-devel lz4-devel double-conversion-devel \ | |||
libdwarf-devel elfutils-libelf-devel curl-devel libicu-devel bison flex \ | |||
libsodium-devel zlib-devel | |||
libsodium-devel zlib-devel geos-devel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the geos versions be pinned ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current version is 3.10.1-1.el9 (an old version). I can replace it with geos-devel-3.10.1-1.el9
. Does that work for you? Also, do you want to pin the version for the Ubuntu setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinning to some top level like 3.10.1 is good enough :)
@@ -138,6 +138,7 @@ option(VELOX_ENABLE_ABFS "Build Abfs Connector" OFF) | |||
option(VELOX_ENABLE_HDFS "Build Hdfs Connector" OFF) | |||
option(VELOX_ENABLE_PARQUET "Enable Parquet support" ON) | |||
option(VELOX_ENABLE_ARROW "Enable Arrow support" OFF) | |||
option(VELOX_ENABLE_GEO "Enable Geospatial support" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add VELOX_ENABLE_GEO to the adapters build : https://github.com/facebookincubator/velox/blob/main/.github/workflows/linux-build-base.yml#L87 ?
If it doesnt get exercised by default, we need a signal these changes pass CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ;
Would be great if you can pin the versions. Thanks !
efff04f
to
1660caa
Compare
scripts/setup-centos9.sh
Outdated
@@ -78,7 +78,7 @@ function install_velox_deps_from_dnf { | |||
dnf_install libevent-devel \ | |||
openssl-devel re2-devel libzstd-devel lz4-devel double-conversion-devel \ | |||
libdwarf-devel elfutils-libelf-devel curl-devel libicu-devel bison flex \ | |||
libsodium-devel zlib-devel | |||
libsodium-devel zlib-devel geos-devel-3.10.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking (this may be my ignorance here): We are using Geos 3.13.0 above and 3.10.1 (devel) here. Is that fine or will that break things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It looks like only brew
keeps the version up to date (3.13.0), while the latest versions available are 3.10.1 in epel and 3.12.1 in apt. Based on the release logs of GEOS, there don't seem to be any missing features relevant to our requirements, but I'm unsure about the performance differences.
One option is to remove it as a system dependency, similar to what we did for simdjson
or xsimd
. For example, we don't install libsimdjson-dev
or simdjson-devel
in the Ubuntu and CentOS setup scripts (even though they're available) but only include them in the macOS setup script. In CMakeLists.txt
, we just specify the minimum version in resolve_dependency
in case the user uses a non-bundled version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had problems in the past with dependencies coming from brew being the latest version (fmt was one of those).
In my opinion, any velox dependency should come from a predefined version and not let the OS pick which one to use. For xsimd and simdjson we should install it into the system as well. Not sure why we rely on the bundling. Likely because these are header only dependencies so it doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @czentgr. That makes sense. Having a pre-defined version ensures the same behaviour for different platforms. But if we install from different package managers, it might still end up with different versions.
Do you think it's a good idea to make it bundled by default and remove it from system dependency, which ensures all environments use the same version, reducing potential compatibility issues.
Or do you prefer a hybrid approach - installing the default version from the package manager and keeping a bundled version? Users can switch to the bundled version if there's an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been using the hybrid approach in that most dependencies are system dependencies but also offer the bundling of the same versions. Where the OS doesn't give the desired version we build them from scratch via the install methods. So we can pin a version and install it from the scripts and through bundling and be consistent everywhere. Basically not relying on the OS especially because the OS package versions can lag behind quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code
35e8ee0
to
eaf4b64
Compare
scripts/setup-centos9.sh
Outdated
@@ -235,6 +237,14 @@ function install_cuda { | |||
dnf install -y cuda-nvcc-$dashed cuda-cudart-devel-$dashed cuda-nvrtc-devel-$dashed cuda-driver-devel-$dashed | |||
} | |||
|
|||
function install_geos { | |||
if $BUILD_GEOS ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to improve to be sure we compare correctly
if [ "$BUILD_GEOS" = true ]; then
See https://stackoverflow.com/questions/2953646/how-can-i-declare-and-use-boolean-variables-in-a-shell-script which has a nice explanation why this isn't sufficient and is actually bad (notably the shell command execution).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I just follow the way to install duckdb (including this buggy test command and echo
). Probably we need to change it as well in another PR.
scripts/setup-centos9.sh
Outdated
@@ -235,6 +237,14 @@ function install_cuda { | |||
dnf install -y cuda-nvcc-$dashed cuda-cudart-devel-$dashed cuda-nvrtc-devel-$dashed cuda-driver-devel-$dashed | |||
} | |||
|
|||
function install_geos { | |||
if $BUILD_GEOS ; then | |||
echo 'Building GEOS' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't echo things for other dependencies. This script runs with -x
and so each command is shown.
@wraymo Please rebase as well (the signature fuzzer should be fixed). |
I'm not sure if there's a way to rebuild both the adapter and CentOS images. Since the adapter image is based on the CentOS image, and the CentOS image's Dockerfile runs However, I suspect that the current CI/CD pipeline is using a cached image, which might explain why GEOS is missing. Additionally, because To resolve this, we likely need to either:
What do you think? @czentgr |
@wraymo The containers are not used during this PR build that would contain the geos library. You could build new dependency images and push then to github and then use them in the PR for a test. They will get rebuild after this PR is merged. But it doesn't matter, for this PR it should use the bundling but it is not. Edit: Ahh yes, in the build it might set it to SYSTEM. Yes, please add the explicit override in the CI for this PR and set it to bundled. |
@czentgr Done. Could you rerun the pipeline? |
LGTM not sure why the tests or signature checks are failing as you didn't touch anything relevant. Maybe geos cmake is setting variables that leak into the main build? (yeah probably this: https://github.com/libgeos/geos/blob/0fdc40af1fb125b59efc3d154c672314beff4249/CMakeLists.txt#L123) The easiest way to fix this is to unset it but there might be more vars being set. Moving it into a subdir like folly and boost will protect through dir scope. |
I think the signature changes issue got resolved. Lets rebase one more time and re-run. |
just rebased |
Not sure what causes the test failure, but when I build the adapter image locally and compile Velox with bundled GEOS, the tests run successfully. |
@czentgr @assignUser Do you have any insights into the failed test cases? They pass on my local setup. |
@assignUser I was using the bundled version without installing the system dependency. I’ve now added |
@assignUser Could you take a look at the changes? |
Summary
This PR introduces GEOS as an optional dependency, marking the first step in splitting #12053 to enable geospatial support.