You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#6433 added a new table infrastructure.cell_info to FlowDB for recording the full history of cell information, including cells that have been excluded due to quality concerns.
The original intention behind the design of this table was for it to include all cell information, including records with duplicate cell IDs. As such, the constraint that ensures no two simultaneously-valid cells have the same ID is only applied over rows where to_include = True, so that duplicates can be included in the table provided they are excluded from use in analysis.
On reflection, I think this was a bad design decision - when ingesting new cell information, we want to join to the previous cell information (including "excluded" cells so that we can carry over these exclusions and avoid re-including cells), and this join is made more complicated by the presence of duplicates in the cell info table. I think we would do better to make a distinction between "valid-but-excluded cells" (such as those with suspicious longitude/latitude coordinates), which should be included in infrastructure.cell_info to avoid re-including excluded cells in future updates, and "invalid cell records" (such as duplicates, or cell records with null cell ID - which cannot be included in infrastructure.cell_info due to non-null constraint), which should perhaps be kept in a separate table so we do not lose the information, but do not need to be in infrastructure.cell_info.
The text was updated successfully, but these errors were encountered:
#6433 added a new table
infrastructure.cell_info
to FlowDB for recording the full history of cell information, including cells that have been excluded due to quality concerns.The original intention behind the design of this table was for it to include all cell information, including records with duplicate cell IDs. As such, the constraint that ensures no two simultaneously-valid cells have the same ID is only applied over rows where
to_include = True
, so that duplicates can be included in the table provided they are excluded from use in analysis.On reflection, I think this was a bad design decision - when ingesting new cell information, we want to join to the previous cell information (including "excluded" cells so that we can carry over these exclusions and avoid re-including cells), and this join is made more complicated by the presence of duplicates in the cell info table. I think we would do better to make a distinction between "valid-but-excluded cells" (such as those with suspicious longitude/latitude coordinates), which should be included in
infrastructure.cell_info
to avoid re-including excluded cells in future updates, and "invalid cell records" (such as duplicates, or cell records with null cell ID - which cannot be included ininfrastructure.cell_info
due to non-null constraint), which should perhaps be kept in a separate table so we do not lose the information, but do not need to be ininfrastructure.cell_info
.The text was updated successfully, but these errors were encountered: