-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: database/sql: add configuration option to ignore null values in Scan #57099
Comments
Why is this necessary over using
The JSON unmarshaler does not turn
It could also mean that they have an application bug. You could make the same argument about Go turning nil pointer dereferences into the corresponding zero value. |
This is preferable as it doesn't require the programmer to know/remember whether or not a given SQL column is nullable. It's not necessary though, you're right that writing all your queries this way would avoid the problem too (as would remembering to include
This is a great point, thanks! I was mistaken about how that worked. The sql package should be able to behave like the JSON unmarshaler, so I think we should instead give the option to ignore null values (and not coalesce them to zero). I'll update the proposal to be less prescriptive.
Agreed that there are scenarios where programmers would not want this behavior (as described in #33835). We should allow programmers to make this decision for themselves. |
This problem has bothered me for days. type DatabaseValueScanner struct {
Value any
DatabaseTypeName string
}
func (scanner *DatabaseValueScanner) Scan(src interface{}) error {
switch value := src.(type) {
case int8:
scanner.Value = value
// ...
// ...
case []uint8:
scanner.Value = fmt.Sprintf("%s", value)
case time.Time:
scanner.Value = value
case nil:
scanner.Value = nil
default:
scanner.Value = value
}
return nil
} Here is a common use case pointers := make([]any, rowsCount)
for i := range pointers {
pointers[i] = &DatabaseValueScanner{
Name: columnTypes[i].Name(),
Type: columnTypes[i].DatabaseTypeName(),
}
}
for rows.Next() {
if err := rows.Scan(pointers...); err != nil {
return err
}
var values []any
for _, valueScanner := range pointers {
values = append(values, valueScanner.(*DatabaseValueScanner).Value)
}
dataset.Rows = append(dataset.Rows, values)
} |
like #28414, but for all null values
copied from #57066:
Currently, when scanning a NULL database value into a go type that is not nil-able, this package throws an error like "sql: Scan error on column index 0, name "foo": converting NULL to int is unsupported". This makes logical sense; nil and 0 are different values, and only the latter can be assigned to an int. The Null* types (NullString, NullInt, etc) account for this difference, giving programmers the ability to safely bridge the gap between nullable SQL values and primitive go types.
I propose we give programmers the option to change that behavior and instead ignore NULL values. There are a few reasons I think this option makes sense:
Alignment with programmer intent
In go, if I want a struct field with a string field, I would write
type foo struct{ bar string }
In mysql, if I want a table with a string column, I would write
CREATE TABLE foo (bar varchar(255));
But this produces incompatible types -- in mysql, any value can be NULL unless you explicitly say otherwise. If we wanted to enforce that our foo.bar column could never be NULL, we would have to write
CREATE TABLE foo (bar varchar(255) NOT NULL);
This nuance is seldom clear to programmers, and is a lesson that generally only gets learned the hard way. For instance, if you google "go sql null", you will get back numerous pages of blog posts, stack overflow posts, etc -- whether or not it is "right", the undeniable fact is that this behavior subverts programmer expectations on a regular basis.Furthermore, when a programmer writes some go code to fetch sql data and work with it in a go program, choosing to type their variable as a non-nilable type is a signal that they do not need to write special handling for the null case. It isn't up to us to determine whether or not they should; by writing their code this way, the programmer has indicated that they have no use for this distinction. When we force them to care (at runtime, often long after the code has been written, when we unexpectedly encounter a NULL value in the database), this produces broken software that isn't always straightforward to triage/remediate.
Fundamentally, the responsibility of this package is to bridge the gap between sql and go. Accounting for the disparate treatment of NULL in these two languages is currently the responsibility of the programmer, but by reconciling this difference in the translation layer, we can prevent bugs and make it easier to write great go code.
Consistency with json unmarshaling behaviors
As shown in this playground, unmarshaling json with NULL values into non-nilable types already works according to the programmer expectations described above -- we leave these values untouched. Whether or not the json unmarshaler is "right" is up for debate (see #33835), but the fact of the matter is that this will likely have to remain the default behavior forever. The fact that these two standard libraries have opposite opinions is unfortunate, and can probably never be changed. However, we compound this issue by giving programmers no mechanism to make them behave the same way. So, every go programmer has to be aware of this distinction and correctly design their types for the environment(s) that they expect to instantiate them from. This is clunky at best, and in practice it creates vast swaths of bug habitat.
It would be great to introduce an analogous configuration option in the standard json unmarshaler (or perhaps even reuse the same configuration flag), but that is out of scope for this issue.
For the above reasons, it's important to give programmers the ability to opt out of this behavior. Using Null* types or pointers to primitives remain viable for situations where we need to distinguish between NULL and other values, but in situations where the programmer does not care about this distinction, we should allow them this freedom.
I'm not totally sure of the most standard way to specify this bit of configuration; I think it would likely suffice to be a global setting (not per connection, transaction, statement, etc)
The text was updated successfully, but these errors were encountered: