Skip to content

Commit

Permalink
Move datafusion-cli to new crate (#231)
Browse files Browse the repository at this point in the history
* Move datafusion-cli to own crate

* Cargo.toml fixes, remove repl code

* Remove bin option

* License and doc updates

* Fix link to CLI docs in readme

* Use re-exported arrow

* fmt

* Spacing

* Inherit datafusion default

* Update datafusion/docs/cli.md

Co-authored-by: Andy Grove <[email protected]>

* Update datafusion/docs/cli.md

Co-authored-by: Andy Grove <[email protected]>

* Update docker setup

Co-authored-by: Andy Grove <[email protected]>
  • Loading branch information
Dandandan and andygrove authored May 3, 2021
1 parent 3072df6 commit 47bd3fa
Show file tree
Hide file tree
Showing 9 changed files with 64 additions and 84 deletions.
30 changes: 2 additions & 28 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,28 +1,2 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Turn .dockerignore to .dockerallow by excluding everything and explicitly
# allowing specific files and directories. This enables us to quickly add
# dependency files to the docker content without scanning the whole directory.
# This setup requires to all of our docker containers have arrow's source
# as a mounted directory.

ci
dev
testing
parquet-testing
**/target/*
.git
**target
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
[workspace]
members = [
"datafusion",
"datafusion-cli",
"datafusion-examples",
"benchmarks",
"ballista/rust/client",
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ datafusion = "4.0.0-SNAPSHOT"

## Using DataFusion as a binary

DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](docs/cli.md) for more information.
DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](datafusion/docs/cli.md) for more information.

# Status

Expand Down
33 changes: 33 additions & 0 deletions datafusion-cli/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

[package]
name = "datafusion-cli"
version = "4.0.0-SNAPSHOT"
authors = ["Apache Arrow <[email protected]>"]
edition = "2018"
keywords = [ "arrow", "query", "sql", "cli", "repl" ]
license = "Apache-2.0"
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"


[dependencies]
clap = "2.33"
rustyline = "8.0"
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync"] }
datafusion = { path = "../datafusion" }
13 changes: 8 additions & 5 deletions datafusion/Dockerfile → datafusion-cli/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,14 @@
# specific language governing permissions and limitations
# under the License.

FROM rustlang/rust:nightly
FROM rust:latest


COPY ./datafusion ./usr/src/datafusion
COPY ./datafusion-cli ./usr/src/datafusion-cli

WORKDIR /usr/src/datafusion-cli
RUN cargo install --path .

COPY format /arrow/format/
COPY rust /arrow/rust/
WORKDIR /arrow/rust/datafusion
RUN cargo install --bin datafusion-cli --path .

CMD ["datafusion-cli", "--data-path", "/data"]
23 changes: 12 additions & 11 deletions datafusion/src/bin/repl.rs → datafusion-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@

#![allow(bare_trait_objects)]

use arrow::util::pretty;
use clap::{crate_version, App, Arg};
use datafusion::arrow::util::pretty;
use datafusion::error::Result;
use datafusion::execution::context::{ExecutionConfig, ExecutionContext};
use rustyline::Editor;
Expand All @@ -44,7 +44,7 @@ pub async fn main() {
)
.arg(
Arg::with_name("batch-size")
.help("The batch size of each query, default value is 1048576")
.help("The batch size of each query, or use DataFusion default")
.short("c")
.long("batch-size")
.takes_value(true),
Expand All @@ -56,16 +56,17 @@ pub async fn main() {
env::set_current_dir(&p).unwrap();
};

let batch_size = matches
let mut execution_config = ExecutionConfig::new().with_information_schema(true);

if let Some(batch_size) = matches
.value_of("batch-size")
.map(|size| size.parse::<usize>().unwrap())
.unwrap_or(1_048_576);

let mut ctx = ExecutionContext::with_config(
ExecutionConfig::new()
.with_batch_size(batch_size)
.with_information_schema(true),
);
.and_then(|size| size.parse::<usize>().ok())
{
execution_config = execution_config.with_batch_size(batch_size);
};

let mut ctx =
ExecutionContext::with_config(execution_config.with_information_schema(true));

let mut rl = Editor::<()>::new();
rl.load_history(".history").ok();
Expand Down
9 changes: 1 addition & 8 deletions datafusion/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,8 @@ edition = "2018"
name = "datafusion"
path = "src/lib.rs"

[[bin]]
name = "datafusion-cli"
path = "src/bin/main.rs"

[features]
default = ["cli", "crypto_expressions", "regex_expressions", "unicode_expressions"]
cli = ["rustyline"]
default = ["crypto_expressions", "regex_expressions", "unicode_expressions"]
simd = ["arrow/simd"]
crypto_expressions = ["md-5", "sha2"]
regex_expressions = ["regex", "lazy_static"]
Expand All @@ -54,8 +49,6 @@ hashbrown = "0.11"
arrow = { git = "https://github.com/apache/arrow-rs", rev = "d008f31b107c1030a1f5144c164e8ca8bf543576", features = ["prettyprint"] }
parquet = { git = "https://github.com/apache/arrow-rs", rev = "d008f31b107c1030a1f5144c164e8ca8bf543576", features = ["arrow"] }
sqlparser = "0.9.0"
clap = "2.33"
rustyline = {version = "7.0", optional = true}
paste = "^1.0"
num_cpus = "1.13.0"
chrono = "0.4"
Expand Down
12 changes: 6 additions & 6 deletions datafusion/docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,19 @@ The DataFusion CLI is a command-line interactive SQL utility that allows queries
Use the following commands to clone this repository and run the CLI. This will require the Rust toolchain to be installed. Rust can be installed from [https://rustup.rs/](https://rustup.rs/).

```sh
git clone https://github.com/apache/arrow
cd arrow/rust/datafusion
cargo run --bin datafusion-cli --release
git clone https://github.com/apache/arrow-datafusion
cd arrow-datafusion/datafusion-cli
cargo run --release
```

## Run using Docker

Use the following commands to clone this repository and build a Docker image containing the CLI tool. Note that there is `.dockerignore` file in the root of the repository that may need to be deleted in order for this to work.

```sh
git clone https://github.com/apache/arrow
cd arrow
docker build -f rust/datafusion/Dockerfile . --tag datafusion-cli
git clone https://github.com/apache/arrow-datafusion
cd arrow-datafusion
docker build -f datafusion-cli/Dockerfile . --tag datafusion-cli
docker run -it -v $(your_data_location):/data datafusion-cli
```

Expand Down
25 changes: 0 additions & 25 deletions datafusion/src/bin/main.rs

This file was deleted.

0 comments on commit 47bd3fa

Please sign in to comment.