-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace negative IMD values with NULLs #1548
Comments
Thanks for catching this. We should check the performance impact, but I'm less worried about a change like this because (a) the What would have caught this earlier is the thing we've discussed several times, some sort of Maybe I should write a placeholder ticket for this so we've got somewhere to link relevant issues to. |
Although the issue mentions negative IMD values, the only negative IMD values are -1s. So, we can use NULLIF to replace -1s with NULLs. I've checked the performance impact of this change by comparing the execution time of the new query (NULLIF) to the old query. The first template query counted the number of rows where imd_rounded was GTE zero. In this case, there was an 88ms difference in elapsed time between new and old, favouring old. The second template query counted the number of rows where imd_rounded was NULL (new query) or was -1 (old query). In this case, there was a 33ms difference in elapsed time between new and old, favouring new. Fixes #1548
In TPP, -1 is used to indicate a missing IMD value. We should replace this with NULL in the table definition in the TPP backend. As with #1523, we should check the performance impact before making any changes.
There's unlikely to be a problem with backwards compatibility, because if we write a dataset to arrow, we'll try to use an unsigned int for this column (because of this constraint) and it'll crash. But it's possible users are writing to CSV, so we should double check, through GH code search.
The text was updated successfully, but these errors were encountered: