Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ceiling dates with NULL in TPP tables #1523

Closed
rebkwok opened this issue Aug 23, 2023 · 17 comments
Closed

Replace ceiling dates with NULL in TPP tables #1523

rebkwok opened this issue Aug 23, 2023 · 17 comments
Assignees

Comments

@rebkwok
Copy link
Contributor

rebkwok commented Aug 23, 2023

In some TPP tables, missing or unknown dates are represented by a ceiling date (9999-12-31), which can cause confusion or coding errors (see e.g. https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1692786760750299)

We already replace ceiling end dates in the practice_registrations table, and consultation date in the clinical_events table (as of #1522). For any other date fields in TPP tables we should consider whether ceiling dates should be replaced with NULL.

For each date/datetime field, we should check:

  • is it nullable? If not, does the database contains ceiling dates?
  • could a ceiling date for this field represent something meaningful or useful?
  • will any existing ehrQL studies be affected by changing ceiling dates to NULL?
  • what are the performance implications of replacing ceiling dates with NULL?

Date/datetime fields in TPP (based on database schema report):

OPENPROMPT - Non-nullable; no ceiling dates in data

  • OpenPROMPT.ConsultationDate
  • OpenPROMPT.CreationDate

ICNARC - All nullable, no ceiling dates

  • ICNARC.DateOfDeath
  • ICNARC.HospitalAdmissionDate
  • ICNARC.HospitalDischargeDate
  • ICNARC.IcuAdmissionDateTime
  • ICNARC.IcuDischargeDateTime
  • ICNARC.OriginalHospitalAdmissionDate
  • ICNARC.OriginalIcuAdmissionDate
  • ICNARC.UltimateHospitalDischargeDate
  • ICNARC.UltimateIcuDischargeDate

ISARIC_Patient_Data_Topline (all varchar, all nullable - the following are converted to date in cohort-extractor):

(Only used in cohort-extractor, ehrql uses ISARIC_New)

  • assess_or_admit_date
  • cestdat
  • daily_dsstdat
  • daily_lbdat
  • dlvrdtc_rptestcd
  • dsstdat
  • dsstdtc
  • hostdat
  • hostdat_transfer
  • hosttim (cohort extractor converts to date, but the field is a timestring in format 00:00:00)
  • influenza_2021_vaccined
  • symp_or_assess_date

ISARIC_New

All varchar, all nullable, date-like fields can contain "NA"; currently in exploration stage in ehrQL
Datefield used in ehrql:

  • covid19_vaccined
  • covid19_vaccine2d
  • cestdat
  • hostdat (includes impossible future dates - max 3030-04-03)
  • hostdat_transfer
  • dsstdat
  • dsstdtc

APCS

  • APCS.Admission_Date (Nullable, no floor/ceiling values)
  • APCS.Discharge_Date (Nullable, possible default min date - 1900-01-01)

CPNS (all nullable, no ceiling dates)

  • CPNS.DateOfAdmission
  • CPNS.DateOfDeath
  • CPNS.DateOfResult (possible default min date - 1901-01-01)
  • CPNS.DateOfSwabbed (possible default min date - 1901-01-01)
  • CPNS.NationalApprovedDate
  • CPNS.RegionalApprovedDate
  • CPNS.snapDate

EC (all Nullable)

  • Arrival_Date
  • EC_Decision_To_Admit_Date (possible default min date - 1901-01-01, impossible future dates, no ceiling dates)
  • EC_Injury_Date (possible default min date - 1901-01-01, impossible future dates, no ceiling dates)

OPA (all nullable)

  • Appointment_Date
  • Referral_Request_Received_Date (possible default min date - 1901-01-01, impossible future dates, no ceiling dates)

SGSS_AllTests_Negative (All nullable, no floor/ceiling dates)

  • Lab_Report_Date
  • Specimen_Date

SGSS_AllTests_Positive (All nullable, no floor/ceiling dates)

  • Lab_Report_Date
  • Specimen_Date

SGSS_Negative (All nullable, no floor/ceiling dates)

  • Earliest_Specimen_Date
  • Lab_Report_Date

SGSS_Positive (All nullable, no floor/ceiling dates)

  • Earliest_Specimen_Date
  • Lab_Report_Date

Therapeutics (All nullable, no floor/ceiling dates)

  • Received
  • TreatmentStartDate (includes impossible future date - 5202)

WL_ClockStops (datelike fields, all varchar, all nullable, all except RTT have min date 1900-01-01, several have impossible future dates, no ceiling dates)

  • Cancellation_Date
  • Date_Last_Attended
  • Date_Of_Last_Priority_Review
  • DECISION_TO_ADMIT_DATE
  • Due_Date
  • Last_Dna_Date
  • Outpatient_Appointment_Date
  • Outpatient_Future_Appointment_Date
  • Referral_Request_Received_Date
  • REFERRAL_TO_TREATMENT_PERIOD_END_DATE
  • REFERRAL_TO_TREATMENT_PERIOD_START_DATE
  • Tci_Date
  • Week_Ending_Date

WL_Diagnostics (datelike fields, all varchar, all nullable. A couple have ceiling dates - noted below - several have impossible min dates that look like defaults (1900-01-01) and impossible max dates that are not ceiling)

  • Diagnostic_Clock_Start_Date
  • Planned_Diagnostic_Due_Date (ceiling dates)
  • REFERRAL_TO_TREATMENT_PERIOD_END_DATE
  • REFERRAL_TO_TREATMENT_PERIOD_START_DATE
  • Week_Ending_Date

WL_OpenPathways (datelike fields, all varchar, all nullable. A couple have ceiling dates - noted below - several have impossible min dates that look like defaults (1900-01-01) and impossible max dates that are not ceiling)

  • Cancellation_Date
  • Current_Pathway_Period_Start_Date
  • Date_Last_Attended
  • Date_Of_Last_Priority_Review
  • DECISION_TO_ADMIT_DATE
  • Due_Date (ceiling 9999-12-31)
  • Last_Dna_Date
  • Outpatient_Appointment_Date
  • Outpatient_Future_Appointment_Date
  • REFERRAL_REQUEST_RECEIVED_DATE
  • REFERRAL_TO_TREATMENT_PERIOD_END_DATE (ceiling 9999-12-31)
  • REFERRAL_TO_TREATMENT_PERIOD_START_DATE
  • Tci_Date
  • Week_Ending_Date

ONS_Deaths (Nullable, no ceiling dates)

  • dod

DecisionSupportValue (Nullable, no ceiling dates)

  • CalculationDateTime

Appointment (all non Nullable, most contain default min 1900-01-01)

  • ArrivedDate
  • BookedDate
  • EndDate
  • FinishedDate (contains ceiling dates)
  • SeenDate (contains ceiling dates)
  • StartDate

Consultation

  • ConsultationDate (contains ceiling dates and impossible min date (1760))

MedicationIssue (all non-nullable)

  • ConsultationDate (impossible min/max dates, no ceiling)
  • EndDate (contains ceiling date)
  • StartDate (impossible min/max dates, no ceiling)

Organisation (non nullable, no ceiling dates)

  • GoLiveDate

Patient (non nullable)

  • DateOfBirth
  • DateOfDeath (contains ceiling date)

PatientAddress (non nullable)

  • EndDate (contains ceiling date)
  • StartDate (contains ceiling date)

RegistrationHistory

  • EndDate (contains ceiling date)
  • StartDate (contains min 1900-01-01)

Vaccination

  • VaccinationDate (nonnullable, max 9977, no ceiling)

UKRR

  • rrt_start (nullable, no ceiling)
@evansd
Copy link
Contributor

evansd commented Aug 23, 2023

Just to add extra consideration here (as mentioned in this comment), we should be mindful of the performance impact of making these transformations and make sure the updated queries perform adequately.

@rebkwok
Copy link
Contributor Author

rebkwok commented Sep 5, 2023

New query:

SELECT COUNT(*) FROM (
    SELECT Patient_ID AS patient_id,
            CASE
                WHEN ConsultationDate != '9999-12-31T00:00:00' THEN CAST(ConsultationDate AS date)
            END AS date,
            NULL AS snomedct_code,
            CTV3Code AS ctv3_code,
            NumericValue AS numeric_value
        FROM CodedEvent
        UNION ALL
        SELECT
            Patient_ID AS patient_id,
            CASE
                WHEN ConsultationDate != '9999-12-31T00:00:00' THEN CAST(ConsultationDate AS date)
            END AS date,
            ConceptId AS snomedct_code,
            NULL AS ctv3_code,
            NumericValue AS numeric_value
        FROM CodedEvent_SNOMED
) t
WHERE t >= 20220101 AND t <= 20220131

Original query:

SELECT COUNT(*) FROM (
            SELECT
                Patient_ID AS patient_id,
                CAST(ConsultationDate AS date) AS date,
                NULL AS snomedct_code,
                CTV3Code AS ctv3_code,
                NumericValue AS numeric_value
            FROM CodedEvent
            UNION ALL
            SELECT
                Patient_ID AS patient_id,
                CAST(ConsultationDate AS date) AS date,
                ConceptId AS snomedct_code,
                NULL AS ctv3_code,
                NumericValue AS numeric_value
            FROM CodedEvent_SNOMED
) t
WHERE t >= 20220101 AND t <= 20220131

Run with 1 month (2022-01-01 to 2022-01-31) and 6 months (2022-01-01 to 2022-06-30) ranges

filter range run new cpu new elapsed original cpu original elapsed
1 month 1 4,836,105 239,511 25,719 215,253
1 month 2 5,036,597 183,397 32,704 65,780
1 month 3 4,869,610 262,010 32,672 68,357
6 months 1 4,958,292 259,757 211,813 883,616
6 months 2 4,974,973 180,155 213,829 736,947
6 months 3 4,965,485 273,178 228,656 797,122

@evansd
Copy link
Contributor

evansd commented Sep 5, 2023

Wow, what a confusing set of results! But it goes to show that doing this kind of thing is worthwhile, I think.

It's just occurred to me however that is a bit of an unrepresentative test because it doesn't filter on any codes at all, and that means it actually has to do the union and query both of the underlying tables. In practice, we always (or near enough to always as makes no difference) supply either a ctv3_code or snomedct_code filter and that means that the query planner can immediately discard one half of the union and only query a single table.

At the risk of being a massive pain, I think it would be worth re-running with a WHERE ctv3_code IN (A, B, C) and, separately, a WHERE snomedct_code IN (X, Y, Z). I guess it would make sense to choose 3 reasonably common codes, and to use the equivalent CTV3 and SNOMED-CT codes.

The other thing I'd be interested in is whether it makes any difference using NULLIF(ConsulationDate, '9999-12-31T00:00:00') instead of the CASE expression. Semantically they're equivalent of course, but I wouldn't be entirely surprised to find that the query planner can optimise through one and not the other.

@rebkwok
Copy link
Contributor Author

rebkwok commented Sep 5, 2023

New query:

SELECT COUNT(*) FROM (
    SELECT Patient_ID AS patient_id,
            CASE
                WHEN ConsultationDate != '9999-12-31T00:00:00' THEN CAST(ConsultationDate AS date)
            END AS date,
            NULL AS snomedct_code,
            CTV3Code AS ctv3_code,
            NumericValue AS numeric_value
        FROM CodedEvent
        UNION ALL
        SELECT
            Patient_ID AS patient_id,
            CASE
                WHEN ConsultationDate != '9999-12-31T00:00:00' THEN CAST(ConsultationDate AS date)
            END AS date,
            ConceptId AS snomedct_code,
            NULL AS ctv3_code,
            NumericValue AS numeric_value
        FROM CodedEvent_SNOMED
) t
WHERE t.ctv3_code in (...) AND t >= 20220101 AND t <= 20220131

Original Query:

SELECT COUNT(*) FROM (
            SELECT
                Patient_ID AS patient_id,
                CAST(ConsultationDate AS date) AS date,
                NULL AS snomedct_code,
                CTV3Code AS ctv3_code,
                NumericValue AS numeric_value
            FROM CodedEvent
            UNION ALL
            SELECT
                Patient_ID AS patient_id,
                CAST(ConsultationDate AS date) AS date,
                ConceptId AS snomedct_code,
                NULL AS ctv3_code,
                NumericValue AS numeric_value
            FROM CodedEvent_SNOMED
) t
WHERE t.ctv3_code in (...) AND t >= 20220101 AND t <= 20220131

Filtering on CTV3codes

Filters on the top 3 CTV3 Codes from 2022 that aren't Y codes
Scans on CodedEvent table

filter range run new scans new cpu new elapsed original scans original cpu original elapsed
1 month 1 55 123,079 4,577 1 23,671 23,792
1 month 2 69 85,078 2,373 1 23,531 23,534
1 month 3 90 87,421 2,822 1 23,562 23,556
6 months 1 61 87,566 2,625 1 124,063 124,086
6 months 2 55 89,231 2,885 1 124,469 124,458
6 months 3 85 87,628 2,490 1 124,469 126,062

Filtering on SnomedCt Code

Filters on the top 3 Snomed Codes from 2022 that aren't Y codes
Scans on CodedEvent_SNOMED table

filter range run new scans new cpu new elapsed original scans original cpu original elapsed
1 month 1 52 83,910 3,450 1 22,797 22,979
1 month 2 52 73,888 5,631 1 22,843 29,338
1 month 3 52 72,248 2,840 1 22,734 22,994
6 months 1 75 78,079 2,675 84 61,855 2,985
6 months 2 52 74,300 3,908 52 58,450 2,461
6 months 3 52 75,050 3,756 52 59,331 2,738

@rebkwok
Copy link
Contributor Author

rebkwok commented Sep 5, 2023

@evansd As suggested ☝️ with filters on CTV3 and Snomed codes. For the CTV3 one, the new query beats the original in elapsed time over 1 month, and in both elapsed and cpu over 6 months. The Snomed one is similar for one month, but over 6 months the elapsed time is similar for both queries, and cpu lower for the original one, but not by a vast amount. Scans on the CodedEvent_SNOMED table go up for the 6 month one for the original query those, and I've no idea why.

@evansd
Copy link
Contributor

evansd commented Sep 5, 2023

Thanks for doing all this.

These results make so little sense to me. The first row of the CTV3 table shows CPU usage jumping 4-5x with the new query, but elapsed time consistently dropping 10-50x.

I think this just shows that we don't have much an intuition for how the db will perform on any given query and we really need access to more detailed query planner stats.

But all that said, the aim here was just to make sure that the change wasn't going to make things obviously much worse than they were before and on the basis of this I'd say we're probably OK to go ahead.

Would welcome other thoughts (@Jongmassey @inglesp).

@rebkwok
Copy link
Contributor Author

rebkwok commented Sep 5, 2023

Using NULLIF

Filtering on CTV3codes

filter range run new scans new cpu new elapsed original scans original cpu original elapsed
1 month 1 55 122,948 4,311 1 25,828 39,106
1 month 2 103 90,000 3,396 1 23,468 29,884
1 month 3 72 87,957 2,832 1 23,328 23,531
6 months 1 88 88,485 2,790 1 138,000 154,322
6 months 2 66 89,284 2,686 1 123,297 123,298
6 months 3 56 88,844 2,465 1 123,250 123,245

Filtering on Snomed codes

filter range run new scans new cpu new elapsed original scans original cpu original elapsed
1 month 1 69 92,106 3,666 1 26,125 30,220
1 month 2 74 71,983 2,482 1 22,797 22,804
1 month 3 62 74,856 2,380 1 22,750 22,751
6 months 1 58 75,081 2,307 68 61,654 2,265
6 months 2 67 74,140 2,569 53 61,669 2,357
6 months 3 57 77,272 2,495 77 60,353 2,194

@inglesp
Copy link
Contributor

inglesp commented Sep 5, 2023

But all that said, the aim here was just to make sure that the change wasn't going to make things obviously much worse than they were before and on the basis of this I'd say we're probably OK to go ahead.

Yes, on the (rather baffling) evidence I'd agree.

@Jongmassey
Copy link
Contributor

This has a bit of a smell of this being a partitioned table or index(es) (probably on the basis of date), but given that "scan count" hasn't actually meant "how many times the table was accessed" for ages it can sometimes be a slightly misleading metric. What were the physical/logical read stats?

I have a slight hesitation here because of measures, which quite often run month-by-month and if we're increasing the IO by 60x+ and CPU by 3x+ then this might make the already-not-great performance a fair bit worse. For queries with a larger time span I'd say the difference is largely moot.

@Jongmassey
Copy link
Contributor

I prefer NULLIF() syntax but that's really a matter of preference. Good to see that they seem to compile equivalently

@Jongmassey
Copy link
Contributor

It'd be really nice if replacement of NULL-equivalent dates was done during ETL rather than every single query!

@evansd
Copy link
Contributor

evansd commented Sep 6, 2023

It'd be really nice if replacement of NULL-equivalent dates was done during ETL rather than every single query!

It really would, wouldn't it? Another option, which I think would be achievable but a non-trivial amount of work, would be to do the date transformation in ehrQL (rather than in an blob of SQL text which is opaque to ehrQL). That would give ehrQL the option to optimise away the transformation if it can determine that it isn't needed e.g. if you're querying for dates in a particular range. Or replace queries like date.is_not_null with date != '9999-12-31'.

Longer term I'd like to move as much transformation logic as we can out of the QueryTable escape hatch and into something introspectable by ehrQL. But there's no way we're going to be doing that any time soon.

@rebkwok rebkwok moved this from Next to In Progress in Data Team Sep 20, 2023
@rebkwok
Copy link
Contributor Author

rebkwok commented Oct 17, 2023

See issue description for a list of date fields in TPP (for tables that we use in cohort-extractor/ehrql). Of those, the ones to investigate (i.e. that contain ceiling dates and are used/will probably be used in ehrql) are:

  • PatientAddress.StartDate
  • PatientAddress.EndDate
  • RegistrationHistory.EndDate (already replace with null in ehrql, but using CASE; investigate other methods)
  • Patient.DateOfDeath (already replace with null in ehrql, but using CASE; investigate other methods)

Other questions:

  • cohort-extractor and ehrQL use different ISARIC tables
  • cohort-extractor uses SeenDate from Appointments; ehrQl only includes StartDate and BookedDate in the table. SeenDate has ceiling dates.

@rebkwok
Copy link
Contributor Author

rebkwok commented Oct 18, 2023

RegistrationHistory.EndDate

Current query:

SELECT
                reg.Patient_ID AS patient_id,
                CAST(reg.StartDate AS date) AS start_date,
                CASE
                    WHEN reg.EndDate = '9999-12-31' THEN NULL
                    ELSE CAST(reg.EndDate AS date)
                END AS end_date,
                org.Organisation_ID AS practice_pseudo_id,
                NULLIF(org.STPCode, '') AS practice_stp,
                NULLIF(org.Region, '') AS practice_nuts1_region_name
            FROM RegistrationHistory AS reg
            LEFT OUTER JOIN Organisation AS org
            ON reg.Organisation_ID = org.Organisation_ID

New query:

SELECT
                reg.Patient_ID AS patient_id,
                CAST(reg.StartDate AS date) AS start_date,
                 CAST(NULLIF(reg.EndDate,  '9999-12-31') AS date) AS end_date,
                org.Organisation_ID AS practice_pseudo_id,
                NULLIF(org.STPCode, '') AS practice_stp,
                NULLIF(org.Region, '') AS practice_nuts1_region_name
            FROM RegistrationHistory AS reg
            LEFT OUTER JOIN Organisation AS org
            ON reg.Organisation_ID = org.Organisation_ID

Results:

Run Original CPU Original Elapsed New CPU New Elapsed
1 9923 359 10041 325
2 9812 1653 9890 1728
3 1152 1589 10660 1490
4 9267 519 9267 556

@rebkwok
Copy link
Contributor Author

rebkwok commented Oct 18, 2023

Patient.DateOfDeath

Current query:

SELECT
                Patient_ID as patient_id,
                DateOfBirth as date_of_birth,
                CASE
                    WHEN Sex = 'M' THEN 'male'
                    WHEN Sex = 'F' THEN 'female'
                    WHEN Sex = 'I' THEN 'intersex'
                    ELSE 'unknown'
                END AS sex,
                CASE
                    WHEN DateOfDeath != '99991231' THEN DateOfDeath
                END As date_of_death
            FROM Patient

New query:

SELECT
                Patient_ID as patient_id,
                DateOfBirth as date_of_birth,
                CASE
                    WHEN Sex = 'M' THEN 'male'
                    WHEN Sex = 'F' THEN 'female'
                    WHEN Sex = 'I' THEN 'intersex'
                    ELSE 'unknown'
                END AS sex,
                NULLIF(DateOfDeath, '99991231') As date_of_death
            FROM Patient

Results:

Run Original CPU Original Elapsed New CPU New Elapsed
1 2712 78 2199 72
2 2701 94 2517 123
3 2783 217 2642 249

@rebkwok
Copy link
Contributor Author

rebkwok commented Oct 18, 2023

PatientAddress.StartDate, PatientAddress.EndDate

Current query:

            SELECT
                addr.Patient_ID AS patient_id,
                addr.PatientAddress_ID AS address_id,
                CAST(addr.StartDate AS date) AS start_date,
                CAST(addr.EndDate AS date) AS end_date,
                addr.AddressType AS address_type,
                addr.RuralUrbanClassificationCode AS rural_urban_classification,
                addr.ImdRankRounded AS imd_rounded,
                CASE
                    WHEN addr.MSOACode NOT IN ('NPC', '') THEN addr.MSOACode
                END AS msoa_code,
                CASE
                    WHEN addr.MSOACode NOT IN ('NPC', '') THEN 1
                    ELSE 0
                END AS has_postcode,
                CASE
                    WHEN carehm.PatientAddress_ID IS NOT NULL THEN 1
                    ELSE 0
                END AS care_home_is_potential_match,
                CASE
                    WHEN carehm.LocationRequiresNursing = 'Y' THEN 1
                    WHEN carehm.LocationRequiresNursing = 'N' THEN 0
                 END AS care_home_requires_nursing,
                CASE
                    WHEN carehm.LocationDoesNotRequireNursing = 'Y' THEN 1
                    WHEN carehm.LocationDoesNotRequireNursing = 'N' THEN 0
                 END AS care_home_does_not_require_nursing
            FROM PatientAddress AS addr
            LEFT JOIN PotentialCareHomeAddress AS carehm
            ON addr.PatientAddress_ID = carehm.PatientAddress_ID

New query:

            SELECT
                addr.Patient_ID AS patient_id,
                addr.PatientAddress_ID AS address_id,
                NULLIF(CAST(addr.StartDate, '9999-12-31') AS date) AS start_date,
                NULLIF(CAST(addr.EndDate, '9999-12-31') AS date) AS end_date,
                addr.AddressType AS address_type,
                addr.RuralUrbanClassificationCode AS rural_urban_classification,
                addr.ImdRankRounded AS imd_rounded,
                CASE
                    WHEN addr.MSOACode NOT IN ('NPC', '') THEN addr.MSOACode
                END AS msoa_code,
                CASE
                    WHEN addr.MSOACode NOT IN ('NPC', '') THEN 1
                    ELSE 0
                END AS has_postcode,
                CASE
                    WHEN carehm.PatientAddress_ID IS NOT NULL THEN 1
                    ELSE 0
                END AS care_home_is_potential_match,
                CASE
                    WHEN carehm.LocationRequiresNursing = 'Y' THEN 1
                    WHEN carehm.LocationRequiresNursing = 'N' THEN 0
                 END AS care_home_requires_nursing,
                CASE
                    WHEN carehm.LocationDoesNotRequireNursing = 'Y' THEN 1
                    WHEN carehm.LocationDoesNotRequireNursing = 'N' THEN 0
                 END AS care_home_does_not_require_nursing
            FROM PatientAddress AS addr
            LEFT JOIN PotentialCareHomeAddress AS carehm
            ON addr.PatientAddress_ID = carehm.PatientAddress_ID

Results:

Run Original CPU Original Elapsed New CPU New Elapsed
1 26913 918 27623 889
2 26171 1441 28109 1050
3 24673 933 26062 868

@rebkwok
Copy link
Contributor Author

rebkwok commented Oct 19, 2023

Closing this issue:

  • All date fields (in tables used by cohort-extractor/ehrql) checked for ceiling dates
  • Issue created to add seendate to appointment table Add SeenDate to Appointments table #1667
  • Issue create to replace ceiling dates in address table with null Replace ceiling start_date and end_date in PatientAddress table with null #1668
  • checked current method of null-replacement (CASE) in patient table and registration table against NULLIF method - doesn't look like there's anything to be gained from changing it
  • Waiting list tables have ceiling dates in some fields - but this is still all exploratory in ehrQL
  • Several date fields contain impossible dates which are probably data entry errors. Not sure there is much to be done about that except perhaps a note in docs that users should default to using min/max date cutoffs in their filters Document date fields that are known to contain bad dates #1672
  • Bug in cohort-extractor - ISARIC hosttim field is being incorrectly converted to a date, instead of a time. Issue created, but possible it's not actually used by anyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants