Disallow duplicates in `infrastructure.cell_info` #6626

jc-harrison · 2024-05-24T11:24:01Z

#6433 added a new table infrastructure.cell_info to FlowDB for recording the full history of cell information, including cells that have been excluded due to quality concerns.

The original intention behind the design of this table was for it to include all cell information, including records with duplicate cell IDs. As such, the constraint that ensures no two simultaneously-valid cells have the same ID is only applied over rows where to_include = True, so that duplicates can be included in the table provided they are excluded from use in analysis.

On reflection, I think this was a bad design decision - when ingesting new cell information, we want to join to the previous cell information (including "excluded" cells so that we can carry over these exclusions and avoid re-including cells), and this join is made more complicated by the presence of duplicates in the cell info table. I think we would do better to make a distinction between "valid-but-excluded cells" (such as those with suspicious longitude/latitude coordinates), which should be included in infrastructure.cell_info to avoid re-including excluded cells in future updates, and "invalid cell records" (such as duplicates, or cell records with null cell ID - which cannot be included in infrastructure.cell_info due to non-null constraint), which should perhaps be kept in a separate table so we do not lose the information, but do not need to be in infrastructure.cell_info.

The text was updated successfully, but these errors were encountered:

jc-harrison added FlowDB Issues related to FlowDB refactoring labels May 24, 2024

jc-harrison mentioned this issue May 24, 2024

Add invalid_cell_info table and change exclude constraint on cell_info #6627

Merged

8 tasks

mergify bot closed this as completed in #6627 Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow duplicates in `infrastructure.cell_info` #6626

Disallow duplicates in `infrastructure.cell_info` #6626

jc-harrison commented May 24, 2024

Disallow duplicates in infrastructure.cell_info #6626

Disallow duplicates in infrastructure.cell_info #6626

Comments

jc-harrison commented May 24, 2024

Disallow duplicates in `infrastructure.cell_info` #6626

Disallow duplicates in `infrastructure.cell_info` #6626