Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add GEOS library as an optional dependency #12243

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wraymo
Copy link
Contributor

@wraymo wraymo commented Feb 3, 2025

Summary
This PR introduces GEOS as an optional dependency, marking the first step in splitting #12053 to enable geospatial support.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 3, 2025
Copy link

netlify bot commented Feb 3, 2025

Deploy Preview for meta-velox ready!

Name Link
🔨 Latest commit 34c141a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67c0a194d2d23000087fa526
😎 Deploy Preview https://deploy-preview-12243--meta-velox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Collaborator

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

CMakeLists.txt Outdated
@@ -138,6 +138,7 @@ option(VELOX_ENABLE_ABFS "Build Abfs Connector" OFF)
option(VELOX_ENABLE_HDFS "Build Hdfs Connector" OFF)
option(VELOX_ENABLE_PARQUET "Enable Parquet support" ON)
option(VELOX_ENABLE_ARROW "Enable Arrow support" OFF)
option(VELOX_ENABLE_GEO "Enable Geospatial support" ON)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should turn this OFF by default for now. Otherwise, we just bundle it needlessly.

We should add this to the setup scripts to make it a system dependency.

Copy link
Contributor Author

@wraymo wraymo Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, updated the code

@@ -78,7 +78,7 @@ function install_velox_deps_from_dnf {
dnf_install libevent-devel \
openssl-devel re2-devel libzstd-devel lz4-devel double-conversion-devel \
libdwarf-devel elfutils-libelf-devel curl-devel libicu-devel bison flex \
libsodium-devel zlib-devel
libsodium-devel zlib-devel geos-devel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the geos versions be pinned ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version is 3.10.1-1.el9 (an old version). I can replace it with geos-devel-3.10.1-1.el9. Does that work for you? Also, do you want to pin the version for the Ubuntu setup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning to some top level like 3.10.1 is good enough :)

@@ -138,6 +138,7 @@ option(VELOX_ENABLE_ABFS "Build Abfs Connector" OFF)
option(VELOX_ENABLE_HDFS "Build Hdfs Connector" OFF)
option(VELOX_ENABLE_PARQUET "Enable Parquet support" ON)
option(VELOX_ENABLE_ARROW "Enable Arrow support" OFF)
option(VELOX_ENABLE_GEO "Enable Geospatial support" OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add VELOX_ENABLE_GEO to the adapters build : https://github.com/facebookincubator/velox/blob/main/.github/workflows/linux-build-base.yml#L87 ?
If it doesnt get exercised by default, we need a signal these changes pass CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ;
Would be great if you can pin the versions. Thanks !

@wraymo wraymo force-pushed the geos_dep branch 2 times, most recently from efff04f to 1660caa Compare February 6, 2025 17:17
@@ -78,7 +78,7 @@ function install_velox_deps_from_dnf {
dnf_install libevent-devel \
openssl-devel re2-devel libzstd-devel lz4-devel double-conversion-devel \
libdwarf-devel elfutils-libelf-devel curl-devel libicu-devel bison flex \
libsodium-devel zlib-devel
libsodium-devel zlib-devel geos-devel-3.10.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking (this may be my ignorance here): We are using Geos 3.13.0 above and 3.10.1 (devel) here. Is that fine or will that break things?

Copy link
Contributor Author

@wraymo wraymo Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It looks like only brew keeps the version up to date (3.13.0), while the latest versions available are 3.10.1 in epel and 3.12.1 in apt. Based on the release logs of GEOS, there don't seem to be any missing features relevant to our requirements, but I'm unsure about the performance differences.

One option is to remove it as a system dependency, similar to what we did for simdjson or xsimd. For example, we don't install libsimdjson-dev or simdjson-devel in the Ubuntu and CentOS setup scripts (even though they're available) but only include them in the macOS setup script. In CMakeLists.txt, we just specify the minimum version in resolve_dependency in case the user uses a non-bundled version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had problems in the past with dependencies coming from brew being the latest version (fmt was one of those).
In my opinion, any velox dependency should come from a predefined version and not let the OS pick which one to use. For xsimd and simdjson we should install it into the system as well. Not sure why we rely on the bundling. Likely because these are header only dependencies so it doesn't really matter.

Copy link
Contributor Author

@wraymo wraymo Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @czentgr. That makes sense. Having a pre-defined version ensures the same behaviour for different platforms. But if we install from different package managers, it might still end up with different versions.

Do you think it's a good idea to make it bundled by default and remove it from system dependency, which ensures all environments use the same version, reducing potential compatibility issues.

Or do you prefer a hybrid approach - installing the default version from the package manager and keeping a bundled version? Users can switch to the bundled version if there's an error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have been using the hybrid approach in that most dependencies are system dependencies but also offer the bundling of the same versions. Where the OS doesn't give the desired version we build them from scratch via the install methods. So we can pin a version and install it from the scripts and through bundling and be consistent everywhere. Basically not relying on the OS especially because the OS package versions can lag behind quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code

@wraymo wraymo force-pushed the geos_dep branch 2 times, most recently from 35e8ee0 to eaf4b64 Compare February 13, 2025 03:42
@@ -235,6 +237,14 @@ function install_cuda {
dnf install -y cuda-nvcc-$dashed cuda-cudart-devel-$dashed cuda-nvrtc-devel-$dashed cuda-driver-devel-$dashed
}

function install_geos {
if $BUILD_GEOS ; then
Copy link
Collaborator

@czentgr czentgr Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to improve to be sure we compare correctly

if [ "$BUILD_GEOS" = true ]; then

See https://stackoverflow.com/questions/2953646/how-can-i-declare-and-use-boolean-variables-in-a-shell-script which has a nice explanation why this isn't sufficient and is actually bad (notably the shell command execution).

Copy link
Contributor Author

@wraymo wraymo Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I just follow the way to install duckdb (including this buggy test command and echo). Probably we need to change it as well in another PR.

@@ -235,6 +237,14 @@ function install_cuda {
dnf install -y cuda-nvcc-$dashed cuda-cudart-devel-$dashed cuda-nvrtc-devel-$dashed cuda-driver-devel-$dashed
}

function install_geos {
if $BUILD_GEOS ; then
echo 'Building GEOS'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't echo things for other dependencies. This script runs with -x and so each command is shown.

@czentgr
Copy link
Collaborator

czentgr commented Feb 20, 2025

@wraymo Please rebase as well (the signature fuzzer should be fixed).
Interesting, can you check on why the geos is found in the system (even though it is not available in the dependency image) and not set to use the BUNDLED version? I don't think it is part of the dependency image because it later can't find the cmake files for it that should be installed into the system if present.

@wraymo
Copy link
Contributor Author

wraymo commented Feb 20, 2025

I'm not sure if there's a way to rebuild both the adapter and CentOS images. Since the adapter image is based on the CentOS image, and the CentOS image's Dockerfile runs setup-centos.sh, which installs GEOS by default, the image should already have GEOS installed.

However, I suspect that the current CI/CD pipeline is using a cached image, which might explain why GEOS is missing. Additionally, because VELOX_DEPENDENCY_SOURCE is set to SYSTEM for Linux release with adapters (unless explicitly overridden for certain dependencies), it does not use the bundled version of GEOS.

To resolve this, we likely need to either:

  1. Rebuild the CentOS and adapter images to ensure GEOS is included.
  2. Explicitly set geos_SOURCE: BUNDLED to use the bundled version instead.

What do you think? @czentgr

@czentgr
Copy link
Collaborator

czentgr commented Feb 20, 2025

@wraymo The containers are not used during this PR build that would contain the geos library. You could build new dependency images and push then to github and then use them in the PR for a test. They will get rebuild after this PR is merged. But it doesn't matter, for this PR it should use the bundling but it is not.
You could try to build it in the container that the CI pipeline is using to figure out what the env is like.

Edit: Ahh yes, in the build it might set it to SYSTEM. Yes, please add the explicit override in the CI for this PR and set it to bundled.
Once the new dependency images are added this can be changed again.

@wraymo
Copy link
Contributor Author

wraymo commented Feb 20, 2025

@czentgr Done. Could you rerun the pipeline?

@assignUser
Copy link
Collaborator

assignUser commented Feb 22, 2025

LGTM not sure why the tests or signature checks are failing as you didn't touch anything relevant. Maybe geos cmake is setting variables that leak into the main build? (yeah probably this: https://github.com/libgeos/geos/blob/0fdc40af1fb125b59efc3d154c672314beff4249/CMakeLists.txt#L123)

The easiest way to fix this is to unset it but there might be more vars being set. Moving it into a subdir like folly and boost will protect through dir scope.

@czentgr
Copy link
Collaborator

czentgr commented Feb 24, 2025

I think the signature changes issue got resolved. Lets rebase one more time and re-run.

@wraymo
Copy link
Contributor Author

wraymo commented Feb 24, 2025

just rebased

@wraymo
Copy link
Contributor Author

wraymo commented Feb 25, 2025

Not sure what causes the test failure, but when I build the adapter image locally and compile Velox with bundled GEOS, the tests run successfully.

@wraymo
Copy link
Contributor Author

wraymo commented Feb 27, 2025

@czentgr @assignUser Do you have any insights into the failed test cases? They pass on my local setup.

@assignUser
Copy link
Collaborator

@wraymo I assume you have geos installed locally? Try to use the bundled version, that will likely make the error show up. If so I assume it's this issue

@wraymo
Copy link
Contributor Author

wraymo commented Feb 27, 2025

@assignUser I was using the bundled version without installing the system dependency. I’ve now added unset in geos.cmake, and all tests pass successfully.

@wraymo
Copy link
Contributor Author

wraymo commented Feb 28, 2025

@assignUser Could you take a look at the changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants