Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid column reference in Table.AddColumn outputs null instead of an error #223

Open
3 tasks done
bgribaudo opened this issue Feb 3, 2023 · 4 comments
Open
3 tasks done
Labels
answered Questions that have been answered

Comments

@bgribaudo
Copy link
Contributor

Preflight Checklist

Power Query SDK

0.2.1

Regression From

No response

Platform

Windows

Architecture

x64

OS Version

No response

VSCode version

No response

PQSdkTool Path

No response

Bug Description

In Power Query, when Table.AddColumn's column generator function references a non-existent column, the output generated for the new column should be an error.

Steps to Reproduce

  1. Create a new connector project with a .test.pq file defined as follows (the connector's contents don't matter):
    let
        Source = Table.View(null, [
                GetRows = () => #table({"A"}, {{1}}),
                GetType = () => Value.Type(GetRows())
            ]),
        Added = Table.AddColumn(Source, "B", each if [NonExistent] <> null then 1 else 2)
    in
        Added
    
  2. Evaluate the test script file.

Actual Experience

The following two-column table is returned:

A B
1 null

JSON version of the above:

"Output": [
    {
        "A": 1,
        "B": null
    }
]

Expected Experience

Column B should contain an error, not a null value.

Additional Context

If the same M code is evaluated in Power BI's Query Editor, the following output is produced (reflects the expected error):
image

@bgribaudo
Copy link
Contributor Author

Wondering if the root issue here is not-Table.AddColumns specific, but rather is that any errors in table cells are rendered out as null by the SDK.

The following expression outputs a table whose third row contains an error in column a when run in Query Editor but contains null when run by the SDK.

Table.Pivot(
    Table.FromRecords({
        [key = "x", z = 123, attribute = "a", value = 1],
        [key = "x", z= 456, attribute = "c", value = 3],
        [key = "y", z= 456, attribute = "a", value = 2],
        [key = "y", z= 456, attribute = "a", value = 4]
    }),
    {"a", "b", "c"},
    "attribute",
    "value"
)

@mattmasson
Copy link
Member

Converting errors to null is by design for our console-based output. We're limited on what we can do outside of the PQ experience. Our options are to either throw an exception or return the cell value as null. I believe the previous VS SDK exposed a project property that let you choose the behavior, but I'm not sure we need to expose that behavior. Longer term, we want to use an embedded PQ experience for testing, which should give you the behavior you're looking for.

Here is a longer response regarding errors that Curt wrote up a while back:

We distinguish informally between “top-level errors”, “row-level errors” and “cell-level errors”. These can best be described as follows. Let’s say that your query calculates and returns some table value T. After T has been calculated, we iterate over the rows of T and return the values in each row. A top-level error is one which happens in the process of calculating T. It entirely prevents T from being produced. A row-level error is one which happens when fetching the next row of T. Because the state of the enumeration is unknown, we have to abandon it and report a failure. A cell-level error is one which happens when calculating an individual value in a row of T. Because it doesn’t prevent us from reading other values or continuing to iterate through the rows, it is the least blocking.

Here's the fun part. You can “upgrade” the error by forcing it to be examined “early”. So if your calculation of the top-level value forces enumeration of the table, then a row-level error gets turned into a top-level error. And if the process of moving to the next row forces evaluation of an error value, then a cell-level error gets turned into a row-level error. Consider

let
    Source = #table({"A", "B"}, {{"1", "2"}, {"3", "A"}}),
    AsNumber = Table.TransformColumnTypes(Source, {{"B", type number}}),
    Filtered = Table.SelectRows(AsNumber, each [B] < 100),
    Buffered = Table.Buffer(Filtered)
in
    Buffered

Source has no errors.
AsNumber contains a cell-level error for the B column in the second row.
Filtered upgrades this to a row-level error because it forced evaluation of the cell-level error when moving to the second row.
Buffered upgrades this to a top-level error because it forced the entire table to be enumerated before returning.

Now, top-level and cell-level errors have pretty well-defined behavior because in both cases there’s effective a single value which is in an error state. Row-level errors are a bit of an oddball, because there are no language semantics for enumeration; it’s just something which happens. So there’s no way to trap or represent these. If in the above example you were to look at the value of “Filtered” in Excel, you’d see a single row consisting entirely of error values, and that’s the best we can do inside of M itself. It’s much easier to distinguish these in our consumption layer, because now enumeration is a first-class operation. So enumeration can be made to throw the exception instead of having to rely on a row full of errors.

@mattmasson mattmasson added the answered Questions that have been answered label Apr 18, 2023
@bgribaudo
Copy link
Contributor Author

Thank you, @mattmasson, for sharing that explanation.

@CurtHagenlocher, you should seriously consider turning this into a blog post for the PQ blog...or as a guest contribution for my blog! :-)

@bgribaudo
Copy link
Contributor Author

Hmm....with a text-based output format (like the SDK currently provides), I could imagine outputting error info on a cell-by-cell basis by rendering each individual M cell value as a "try record" of sorts...something like:

"Output": [
    {
        "A": [HasError = false, Value = 1, Type = number],
        "B": [HasError = true, Error = {error goes here)]
    }
]

Technically works, but not the prettiest. At least, it's an idea, for what it's worth (may be more of interest to provide an error-enhanced output format option like this if the SDK's base library is expanded to be a general-purpose "run PQ to produce output" tool).

Would a much simpler fail-safe for now be to add a top-level Boolean property to the output of run-test indicating whether any cell-level errors were encountered? So, in the case of the example table, the consumer (e.g. me) could notice that this property is true. While that won't reveal which cell(s) errored, or what the error(s) was, at least the consumer would be warned that they shouldn't trust the nulls they're receiving, as they might actually represent suppressed errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered Questions that have been answered
Projects
None yet
Development

No branches or pull requests

2 participants