Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Rust Logical plan skeleton #1192

Merged
merged 7 commits into from
Aug 1, 2023
Merged

[CHORE] Rust Logical plan skeleton #1192

merged 7 commits into from
Aug 1, 2023

Conversation

xcharleslin
Copy link
Contributor

@xcharleslin xcharleslin commented Jul 26, 2023

Skeleton for new logical plan in Rust. Includes the following building blocks:

  • LogicalPlan enum. This is the core logical plan object. It currently has just one variant, Source.
  • SourceInfo enum, which contains information about what the data source is. Currently has one variant, FilesInfo, to represent file source.
  • LogicalPlanBuilder, which wraps a LogicalPlan to:
    • Serve as the API surface for Python.
    • (later) Do resolution of unresolved names to field IDs.
  • PyO3 bindings for LogicalPlanBuilder.

Next steps (separate PRs):

  • Add Daft flag to switch between old and new logical plans.
  • Have Daft create the new logical plan for file source operations. This will involve a new metadata resolution path.
  • A lightweight physical plan translation in Rust.
  • Have Daft be able to dispatch the translated physical plan, by hooking into the existing plan dispatching machinery.

@codecov
Copy link

codecov bot commented Jul 26, 2023

Codecov Report

Merging #1192 (d032844) into main (bc11e57) will increase coverage by 0.00%.
Report is 1 commits behind head on main.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1192   +/-   ##
=======================================
  Coverage   88.40%   88.41%           
=======================================
  Files          55       55           
  Lines        5606     5617   +11     
=======================================
+ Hits         4956     4966   +10     
- Misses        650      651    +1     

see 1 file with indirect coverage changes

@xcharleslin xcharleslin changed the title Charles/logplan [CHORE] Rust Logical plan skeleton Jul 31, 2023
@github-actions github-actions bot added the chore label Jul 31, 2023
@xcharleslin xcharleslin requested a review from clarkzinzow July 31, 2023 19:59
@xcharleslin xcharleslin marked this pull request as ready for review July 31, 2023 20:01

#[derive(Clone)]
pub struct LogicalPlanBuilder {
_plan: Arc<LogicalPlan>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need the underscore prefix to denote that the field is private, since struct fields are private by default and would only be made public with the pub plan qualifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to do it for clippy since the field is not yet used 😅

Comment on lines 14 to 29
// Create a new LogicalPlanBuilder for a Source node.
pub fn new_plan_from_source(filepaths: Vec<String>, schema: SchemaRef) -> LogicalPlanBuilder {
let source_info = source_info::SourceInfo::FilesInfo(source_info::FilesInfo {
filepaths,
schema: schema.clone(),
});
let source_node = LogicalPlan::Source(ops::Source {
schema,
source_info: source_info.into(),
filters: vec![], // Will be populated by plan optimizer.
limit: None, // Will be populated by plan optimizer.
});
LogicalPlanBuilder {
_plan: source_node.into(),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since this is a creator for LogicalPlanBuilder, it would probably make more sense to have this as a from_source method.

We should also probably have FilesInfo::new() and Source::new() constructor methods for those structs, especially the latter which will always set empty fields that will be populated later.

Suggested change
// Create a new LogicalPlanBuilder for a Source node.
pub fn new_plan_from_source(filepaths: Vec<String>, schema: SchemaRef) -> LogicalPlanBuilder {
let source_info = source_info::SourceInfo::FilesInfo(source_info::FilesInfo {
filepaths,
schema: schema.clone(),
});
let source_node = LogicalPlan::Source(ops::Source {
schema,
source_info: source_info.into(),
filters: vec![], // Will be populated by plan optimizer.
limit: None, // Will be populated by plan optimizer.
});
LogicalPlanBuilder {
_plan: source_node.into(),
}
}
impl LogicalPlanBuilder {
// Create a new LogicalPlanBuilder for a Source node.
pub fn from_source(filepaths: Vec<String>, schema: SchemaRef) -> Self {
let source_info = source_info::SourceInfo::FilesInfo(source_info::FilesInfo::new(filepaths, schema);
let source_node = LogicalPlan::Source(ops::Source::new(schema, source_info.into());
Self {
_plan: source_node.into(),
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I also made from_source take a Source struct.


#[pyclass]
#[derive(Clone)]
pub struct PyLogicalPlanBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expose the LogicalPlanBuilder directly via pyo3 rather than wrapping, similar to what we did for the ImageMode enum?

#[cfg_attr(feature = "python", pyclass)]
pub enum ImageMode {
L = 1,
LA = 2,
RGB = 3,
RGBA = 4,
L16 = 5,
LA16 = 6,
RGB16 = 7,
RGBA16 = 8,
RGB32F = 9,
RGBA32F = 10,
}
#[cfg(feature = "python")]
#[pymethods]
impl ImageMode {
/// Create an ImageMode from its string representation.
///
/// Args:
/// mode: String representation of the mode. This is the same as the enum
/// attribute name, e.g. ``ImageMode.from_mode_string("RGB")`` would
/// return ``ImageMode.RGB``.
#[staticmethod]
pub fn from_mode_string(mode: &str) -> PyResult<Self> {
Self::from_str(mode).map_err(|e| PyValueError::new_err(e.to_string()))
}
pub fn __str__(&self) -> PyResult<String> {
Ok(self.to_string())
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! Done, and removed the pyapi module entirely.

@xcharleslin xcharleslin requested a review from clarkzinzow July 31, 2023 23:23
Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

#[cfg(feature = "python")]
use daft_core::python::schema::PySchema;
#[cfg(feature = "python")]
use pyo3::prelude::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if the number of these uses increases, we can have a single attribute specification apply to an entire block of imports using this syntax:

#[cfg(feature = "python")]
use {
crate::array::pseudo_arrow::PseudoArrowArray,
crate::datatypes::{ListArray, PythonArray},
crate::ffi,
crate::with_match_numeric_daft_types,
log,
ndarray::IntoDimension,
num_traits::{NumCast, ToPrimitive},
numpy::{PyArray3, PyReadonlyArrayDyn},
pyo3::prelude::*,
std::iter,
std::ops::Deref,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think you were the one to make that change in your Duration dtype PR, I didn't know about that syntax until I saw your change! 😄

@xcharleslin xcharleslin merged commit bd4235e into main Aug 1, 2023
@xcharleslin xcharleslin deleted the charles/logplan branch August 1, 2023 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants