Implementing more granular Errors using snafu. #913

gsilvestrin · 2023-05-29T23:12:42Z

Uses snafu to expose more granular errors in Lance's api. Check out this example to get a feeling about how snafu can be used. For now I only changed some parts of the Dataset::write / Dataset::open functions.

If we go on this route, we will have a lot more error types. We can create Error types per modules to avoid poluting the main one
I kept the existing Error types to make the transition easier, but had to change them from enum types to struct (that's what makes the PR so big)
In the future, replace From<..> with context
impl From<Error> for ArrowError is this method used by the duckdb integration? What's the best way to handle an increasing number of Error types?

Closes #885

gsilvestrin · 2023-05-29T23:14:31Z

rust/src/dataset.rs

-            return Err(Error::IO(
-                "Attempt to write empty record batches".to_string(),
-            ));
+            return Err(Error::EmptyDataset);


Here is an example of how a generic error is replaced with a more granular one.

gsilvestrin · 2023-05-29T23:15:22Z

rust/src/dataset.rs

+        let test_uri = test_dir.path().to_str().unwrap();
+        let mut reader: Box<dyn RecordBatchReader> = Box::new(RecordBatchBuffer::empty());
+        let result = Dataset::write(&mut reader, test_uri, None).await;
+        assert!(matches!(result.unwrap_err(), Error::EmptyDataset { .. }));


Example of how the granular error can be handled

gsilvestrin · 2023-05-29T23:15:59Z

rust/src/error.rs

+    #[snafu(display("Append with different schema: original={original} new={new}"))]
+    SchemaMismatch { original: Schema, new: Schema },
+    #[snafu(display("Dataset at path {path} was not found: {source}"))]
+    DatasetNotFound {


Example of how the error is defined

Is there a guidance about what granualirty should we define Error enums?

for example, SchemaMismatch could be a sub-error of Schema. Similar to other Dataset ones to Error::IO.

I think the idea is that each module will declare its own error types - for instance the encoding module might have SchemaMismatch / CorruptedData / ..... Then the dataset.write method can add them as context with using the source column, like Intermediate in this example: https://docs.rs/snafu/latest/snafu/guide/examples/basic/enum.Error.html

I think one of the challenges of Rust errors is making them not about where they came from, but what the type of issues was. This is because you either have to use &dyn Error as the source, which makes the error opaque, or create a variant for every possible source.

IMO, ideally we'd have error variants that map closer to what are top-level exceptions in other languages. So all errors about invalid arguments are one variant, all errors about IO failures are another, etc. That will make it easier to map them to proper exceptions in the Node and Python bindings.

For the operations, I think we might wrap them up into another enum for brevity. So I'd imagine the end results looks something like:

enum LanceError { #[snafu(display("Invalid input: {source}"))] InvalidInput { source: &dyn Error }, // Python: ValueError, Node: RangeError #[snafu(display("Operation not allowed: {source}"))] InvalidOperation { source: OperationError }, #[snafu(display("Operation failed due to IO error: {source}"))] IOError { source: &dyn Error }, // Python: IOError #[snafu(display("Not authenticated or unauthorized: {source}"))] AuthError { source: &dyn Error }, } enum OperationError { #[snafu(display("Attempt to write empty record batches"))] EmptyDataset, #[snafu(display("Dataset already exists: {uri}"))] DatasetAlreadyExists { uri: String }, #[snafu(display("Append with different schema: original={original} new={new}"))] SchemaMismatch { original: Schema, new: Schema }, #[snafu(display("Dataset at path {path} was not found: {source}"))] DatasetNotFound { }

I like the idea of having some Errors that are reusable (such as InvalidInput), but at the end of the day we do need a single top-level Result / Error type for user APIs - how would that work when we have multiple enums such as LanceError and OperationError?

eddyxu · 2023-05-30T00:03:33Z

rust/src/error.rs

+    #[snafu(display("Append with different schema: original={original} new={new}"))]
+    SchemaMismatch { original: Schema, new: Schema },
+    #[snafu(display("Dataset at path {path} was not found: {source}"))]
+    DatasetNotFound {


Is there a guidance about what granualirty should we define Error enums?

for example, SchemaMismatch could be a sub-error of Schema. Similar to other Dataset ones to Error::IO.

eddyxu · 2023-05-30T00:47:41Z

rust/src/bin/lq.rs

-        .as_ref()
-        .ok_or_else(|| Error::Index("Must specify column".to_string()))?;
-    let _ = index_type.ok_or_else(|| Error::Index("Must specify index type".to_string()))?;
+    let col = column.as_ref().ok_or_else(|| Error::Index {


This does make more typing than before, can we add a few methods, such as

impl Error { fn index(msg: &str) -> Self {} }

and etc. Otherwise need to check out the internal structure of each Error type everytime.

Yes, I think for common cases that would be good. To handle strings maybe

impl Error { fn index(msg: impl From<String>) -> Self { ... } }

The existing Error::Index will be deprecated - we won't write new code that uses it. The snafu way is to define a new Error:

#[snafu(display("Must specify index type"))] MissingIndexType,

And then add context in the actual function

let _ = index_type.context(MissingIndexType);

Or using the ensure! macro (could replace most panics in our code)

ensure!(index_type.is_some() , MissingIndexType);

But making all this changes right now takes time - that's why I implemented Error::Index this way

wjones127 · 2023-05-30T15:52:21Z

rust/src/dataset.rs

+                object_store::Error::NotFound { path: _, source } => Error::DatasetNotFound {
+                    path: base_path.to_string(),
+                    source,
+                },


wjones127 · 2023-05-30T15:56:52Z

rust/src/dataset/fragment.rs

+            return Err(Error::IO {
+                message: "Cannot create FragmentReader with zero readers".to_string(),
+            });


Could we make an error Error::InvalidInput for these kind of validation errors?

I think so - I would like to keep this PR focused on the few use cases I have for Dataset, the PR is already pretty big :)

wjones127 · 2023-05-30T15:58:53Z

rust/src/bin/lq.rs

-        .as_ref()
-        .ok_or_else(|| Error::Index("Must specify column".to_string()))?;
-    let _ = index_type.ok_or_else(|| Error::Index("Must specify index type".to_string()))?;
+    let col = column.as_ref().ok_or_else(|| Error::Index {


Yes, I think for common cases that would be good. To handle strings maybe

impl Error { fn index(msg: impl From<String>) -> Self { ... } }

wjones127 · 2023-05-30T17:01:45Z

rust/src/error.rs

+    #[snafu(display("Append with different schema: original={original} new={new}"))]
+    SchemaMismatch { original: Schema, new: Schema },
+    #[snafu(display("Dataset at path {path} was not found: {source}"))]
+    DatasetNotFound {


I think one of the challenges of Rust errors is making them not about where they came from, but what the type of issues was. This is because you either have to use &dyn Error as the source, which makes the error opaque, or create a variant for every possible source.

IMO, ideally we'd have error variants that map closer to what are top-level exceptions in other languages. So all errors about invalid arguments are one variant, all errors about IO failures are another, etc. That will make it easier to map them to proper exceptions in the Node and Python bindings.

For the operations, I think we might wrap them up into another enum for brevity. So I'd imagine the end results looks something like:

enum LanceError { #[snafu(display("Invalid input: {source}"))] InvalidInput { source: &dyn Error }, // Python: ValueError, Node: RangeError #[snafu(display("Operation not allowed: {source}"))] InvalidOperation { source: OperationError }, #[snafu(display("Operation failed due to IO error: {source}"))] IOError { source: &dyn Error }, // Python: IOError #[snafu(display("Not authenticated or unauthorized: {source}"))] AuthError { source: &dyn Error }, } enum OperationError { #[snafu(display("Attempt to write empty record batches"))] EmptyDataset, #[snafu(display("Dataset already exists: {uri}"))] DatasetAlreadyExists { uri: String }, #[snafu(display("Append with different schema: original={original} new={new}"))] SchemaMismatch { original: Schema, new: Schema }, #[snafu(display("Dataset at path {path} was not found: {source}"))] DatasetNotFound { }

wjones127

This is a good first pass. I think we'll want follow ups:

Create general errors that can be mapped to Python/Node exceptions (e.g. InvalidInput -> ValueError)
Explore snafu's traceback support to make debugging easier.

gsilvestrin · 2023-05-30T23:18:44Z

thanks @wjones127. Agree this will need follow ups, especially related about how to manage the growing number of Errors (having InvalidInput would help here). Ideally we can create more Errors as we work on other tasks so it doesn't feel so overwhelming

Implementing more granular Errors using snafu.

3fe8016

gsilvestrin commented May 29, 2023

View reviewed changes

Updating lq.rs

c5d28fc

gsilvestrin marked this pull request as ready for review May 29, 2023 23:45

gsilvestrin requested review from eddyxu and changhiskhan May 29, 2023 23:45

eddyxu reviewed May 30, 2023

View reviewed changes

wjones127 reviewed May 30, 2023

View reviewed changes

wjones127 approved these changes May 30, 2023

View reviewed changes

gsilvestrin merged commit 505dbcd into main May 30, 2023

gsilvestrin deleted the gsilvestrin/snafu branch May 30, 2023 23:18

eddyxu pushed a commit that referenced this pull request May 30, 2023

Implementing more granular Errors using snafu. (#913)

fe7a0a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing more granular Errors using snafu. #913

Implementing more granular Errors using snafu. #913

gsilvestrin commented May 29, 2023 •

edited

Loading

gsilvestrin May 29, 2023

gsilvestrin May 29, 2023

gsilvestrin May 29, 2023

eddyxu May 30, 2023

gsilvestrin May 30, 2023

wjones127 May 30, 2023

gsilvestrin May 30, 2023

eddyxu May 30, 2023

eddyxu May 30, 2023

wjones127 May 30, 2023

gsilvestrin May 30, 2023

wjones127 May 30, 2023

wjones127 May 30, 2023

gsilvestrin May 30, 2023

wjones127 May 30, 2023

wjones127 May 30, 2023

wjones127 left a comment

gsilvestrin commented May 30, 2023

Implementing more granular Errors using snafu. #913

Implementing more granular Errors using snafu. #913

Conversation

gsilvestrin commented May 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjones127 left a comment

Choose a reason for hiding this comment

gsilvestrin commented May 30, 2023

gsilvestrin commented May 29, 2023 •

edited

Loading