Invalid column reference in Table.AddColumn outputs null instead of an error #223

bgribaudo · 2023-02-03T18:10:07Z

Preflight Checklist

I have installed the latest version of Power Query SDK.
I have checked existing resources, including the common issues and the release notes.
I have searched for similar issues.

Power Query SDK

0.2.1

Regression From

No response

Platform

Windows

Architecture

x64

OS Version

No response

VSCode version

No response

PQSdkTool Path

No response

Bug Description

In Power Query, when Table.AddColumn's column generator function references a non-existent column, the output generated for the new column should be an error.

Steps to Reproduce

Create a new connector project with a .test.pq file defined as follows (the connector's contents don't matter):

let
    Source = Table.View(null, [
            GetRows = () => #table({"A"}, {{1}}),
            GetType = () => Value.Type(GetRows())
        ]),
    Added = Table.AddColumn(Source, "B", each if [NonExistent] <> null then 1 else 2)
in
    Added

Evaluate the test script file.

Actual Experience

The following two-column table is returned:

A	B
1	null

JSON version of the above:

"Output": [
    {
        "A": 1,
        "B": null
    }
]

Expected Experience

Column B should contain an error, not a null value.

Additional Context

If the same M code is evaluated in Power BI's Query Editor, the following output is produced (reflects the expected error):

The text was updated successfully, but these errors were encountered:

bgribaudo · 2023-03-17T21:01:44Z

Wondering if the root issue here is not-Table.AddColumns specific, but rather is that any errors in table cells are rendered out as null by the SDK.

The following expression outputs a table whose third row contains an error in column a when run in Query Editor but contains null when run by the SDK.

Table.Pivot(
    Table.FromRecords({
        [key = "x", z = 123, attribute = "a", value = 1],
        [key = "x", z= 456, attribute = "c", value = 3],
        [key = "y", z= 456, attribute = "a", value = 2],
        [key = "y", z= 456, attribute = "a", value = 4]
    }),
    {"a", "b", "c"},
    "attribute",
    "value"
)

mattmasson · 2023-04-18T15:42:34Z

Converting errors to null is by design for our console-based output. We're limited on what we can do outside of the PQ experience. Our options are to either throw an exception or return the cell value as null. I believe the previous VS SDK exposed a project property that let you choose the behavior, but I'm not sure we need to expose that behavior. Longer term, we want to use an embedded PQ experience for testing, which should give you the behavior you're looking for.

Here is a longer response regarding errors that Curt wrote up a while back:

We distinguish informally between “top-level errors”, “row-level errors” and “cell-level errors”. These can best be described as follows. Let’s say that your query calculates and returns some table value T. After T has been calculated, we iterate over the rows of T and return the values in each row. A top-level error is one which happens in the process of calculating T. It entirely prevents T from being produced. A row-level error is one which happens when fetching the next row of T. Because the state of the enumeration is unknown, we have to abandon it and report a failure. A cell-level error is one which happens when calculating an individual value in a row of T. Because it doesn’t prevent us from reading other values or continuing to iterate through the rows, it is the least blocking.

Here's the fun part. You can “upgrade” the error by forcing it to be examined “early”. So if your calculation of the top-level value forces enumeration of the table, then a row-level error gets turned into a top-level error. And if the process of moving to the next row forces evaluation of an error value, then a cell-level error gets turned into a row-level error. Consider

let
    Source = #table({"A", "B"}, {{"1", "2"}, {"3", "A"}}),
    AsNumber = Table.TransformColumnTypes(Source, {{"B", type number}}),
    Filtered = Table.SelectRows(AsNumber, each [B] < 100),
    Buffered = Table.Buffer(Filtered)
in
    Buffered

Source has no errors.
AsNumber contains a cell-level error for the B column in the second row.
Filtered upgrades this to a row-level error because it forced evaluation of the cell-level error when moving to the second row.
Buffered upgrades this to a top-level error because it forced the entire table to be enumerated before returning.

Now, top-level and cell-level errors have pretty well-defined behavior because in both cases there’s effective a single value which is in an error state. Row-level errors are a bit of an oddball, because there are no language semantics for enumeration; it’s just something which happens. So there’s no way to trap or represent these. If in the above example you were to look at the value of “Filtered” in Excel, you’d see a single row consisting entirely of error values, and that’s the best we can do inside of M itself. It’s much easier to distinguish these in our consumption layer, because now enumeration is a first-class operation. So enumeration can be made to throw the exception instead of having to rely on a row full of errors.

bgribaudo · 2023-04-27T17:07:59Z

Thank you, @mattmasson, for sharing that explanation.

@CurtHagenlocher, you should seriously consider turning this into a blog post for the PQ blog...or as a guest contribution for my blog! :-)

bgribaudo · 2023-04-27T17:20:22Z

Hmm....with a text-based output format (like the SDK currently provides), I could imagine outputting error info on a cell-by-cell basis by rendering each individual M cell value as a "try record" of sorts...something like:

"Output": [
    {
        "A": [HasError = false, Value = 1, Type = number],
        "B": [HasError = true, Error = {error goes here)]
    }
]

Technically works, but not the prettiest. At least, it's an idea, for what it's worth (may be more of interest to provide an error-enhanced output format option like this if the SDK's base library is expanded to be a general-purpose "run PQ to produce output" tool).

Would a much simpler fail-safe for now be to add a top-level Boolean property to the output of run-test indicating whether any cell-level errors were encountered? So, in the case of the example table, the consumer (e.g. me) could notice that this property is true. While that won't reveal which cell(s) errored, or what the error(s) was, at least the consumer would be warned that they shouldn't trust the nulls they're receiving, as they might actually represent suppressed errors.

mattmasson added the answered Questions that have been answered label Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid column reference in Table.AddColumn outputs null instead of an error #223

Invalid column reference in Table.AddColumn outputs null instead of an error #223

bgribaudo commented Feb 3, 2023

bgribaudo commented Mar 17, 2023

mattmasson commented Apr 18, 2023

bgribaudo commented Apr 27, 2023

bgribaudo commented Apr 27, 2023

Invalid column reference in Table.AddColumn outputs null instead of an error #223

Invalid column reference in Table.AddColumn outputs null instead of an error #223

Comments

bgribaudo commented Feb 3, 2023

Preflight Checklist

Power Query SDK

Regression From

Platform

Architecture

OS Version

VSCode version

PQSdkTool Path

Bug Description

Steps to Reproduce

Actual Experience

Expected Experience

Additional Context

bgribaudo commented Mar 17, 2023

mattmasson commented Apr 18, 2023

bgribaudo commented Apr 27, 2023

bgribaudo commented Apr 27, 2023