Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ballista-cli crate #2495

Merged
merged 18 commits into from
May 12, 2022
Merged

Add new ballista-cli crate #2495

merged 18 commits into from
May 12, 2022

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 9, 2022

Which issue does this PR close?

Closes #2493 and #2433 and #2517

Rationale for this change

  • Allows us to release DataFusion and Ballista separately. This was not practical before because the datafusion-cli depended on ballista which in turn depended on datafusion, leading to issues such as datafusion-cli crate has circular dependency #2433)
  • Allows DataFusion and Ballista CLIs to evolve separately without Ballista adding development overhead to the DataFusion CLI

Here is the new crate dependency diagram:

crate-deps

What changes are included in this PR?

  • Create new ballista-cli crate by copying all of datafusion-cli crate
  • Remove ballista features from datafusion-cli crate
  • Update dependency diagram
  • Refactor to have ballista-cli re-use some of the code from datafusion-cli instead of duplicating all of the code
  • Update CI scripts
  • Update user guide

Are there any user-facing changes?

Yes

@andygrove andygrove self-assigned this May 9, 2022
@github-actions github-actions bot added datafusion Changes in the datafusion crate development-process Related to development process of DataFusion labels May 9, 2022
@andygrove andygrove changed the title Add new ballista-cli crate WIP: Add new ballista-cli crate May 9, 2022
@alamb
Copy link
Contributor

alamb commented May 9, 2022

cc @liukun4515 and team

@andygrove andygrove changed the title WIP: Add new ballista-cli crate Add new ballista-cli crate May 10, 2022
@andygrove andygrove marked this pull request as ready for review May 10, 2022 14:01
use std::sync::Arc;

/// The CLI supports using a local DataFusion context or a distributed BallistaContext
pub enum Context {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can probably be simplified over time as this cli is always for ballista

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think ballista still needs a local option but it should be using a local (in-process) ballista scheduler and executor and not using datafusion directly.

@alamb
Copy link
Contributor

alamb commented May 11, 2022

Thanks @andygrove looks much better to me

@andygrove andygrove linked an issue May 11, 2022 that may be closed by this pull request
@andygrove
Copy link
Member Author

Thanks for the review @alamb. I pushed another commit to fix the datafusion-cli lock file that had conflicts after merging from master and also added the ballista-cli lock file that was previously ignored. I also updated the CI integration test to build the ballista-cli.

@andygrove
Copy link
Member Author

AMD64 tests failed with No space left on device ... trying again

@andygrove
Copy link
Member Author

AMD64 tests failed with No space left on device 3 times in a row now.

@andygrove
Copy link
Member Author

@alamb All PR builds are failing on AMD64 with no space left on device. Any idea what we need to do to resolve?

@alamb
Copy link
Contributor

alamb commented May 12, 2022

@alamb All PR builds are failing on AMD64 with no space left on device. Any idea what we need to do to resolve?

Perhaps we can try to update the cache-key in https://github.com/apache/arrow-datafusion/blob/master/.github/workflows/rust.yml#L55 to see if something about the old caches is too large?

@andygrove
Copy link
Member Author

@alamb All PR builds are failing on AMD64 with no space left on device. Any idea what we need to do to resolve?

Perhaps we can try to update the cache-key in https://github.com/apache/arrow-datafusion/blob/master/.github/workflows/rust.yml#L55 to see if something about the old caches is too large?

Thanks. I suppose this cache just continues to grow forever and doesn't clear out old versions of deps that we no longer use? Perhaps we should include the month name in the cache so it resets periodically?

@andygrove
Copy link
Member Author

@alamb Moving the ballista tests to a separate job fixes this. Ok to merge?

@@ -123,9 +115,9 @@ jobs:
run: |
export ARROW_TEST_DATA=$(pwd)/testing/data
export PARQUET_TEST_DATA=$(pwd)/parquet-testing/data
cargo test
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were previously only running cargo test in the datafusion-examples crate in this step

Comment on lines -80 to -82
(cd datafusion-cli && cargo check)
(cd datafusion-cli && cargo check --no-default-features)
(cd datafusion-cli && cargo check --features=ballista)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test datafusion-cli in the integration tests and we no longer support running this with ballista

@andygrove
Copy link
Member Author

I'm going to go ahead and merge to unblock other PRs and so that I can create the release candidate PR.

@andygrove andygrove merged commit 80f7bbf into apache:master May 12, 2022
@andygrove andygrove deleted the ballista-cli branch May 12, 2022 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate development-process Related to development process of DataFusion
Projects
None yet
2 participants