Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime grouping by parts #1458

Merged
merged 9 commits into from
Jul 13, 2022
Merged

Datetime grouping by parts #1458

merged 9 commits into from
Jul 13, 2022

Conversation

mathemancer
Copy link
Contributor

@mathemancer mathemancer commented Jul 12, 2022

Fixes #429

This adds grouping by year, month, or day for any date/time types that include those parts. This also adds a separate grouping method that lets one group by any extractable date part (see below for details).

Technical details

Method 1: simplified truncating

The first, simplified, grouping method utilizes the preproc functionality of grouping, and simply specifies some appropriate preprocessing functions to accomplish the desired part extraction. Note that I went with truncating rather than proper part extraction, so

[2022-01-01, 2022-01-02, 2023-01-01, 2023-02-01]

Would produce 2 groups when grouped by year, 3 groups when grouped by month, and then 4 groups grouped by day. This is in contrast to 2 groups by year, 2 groups by month, and 2 groups by day, which would be the effect of actually extracting date parts and grouping by those. I deemed the former logic to be more likely to be the intent of most users.

This should be called with a request like:

GET http://localhost:8000/api/db/v0/tables/7/records/?grouping={"columns": [23], "mode": "distinct", "preproc": "truncate_to_year"}

or

GET http://localhost:8000/api/db/v0/tables/7/records/?grouping={"columns": [23], "mode": "distinct", "preproc": "truncate_to_month"}

or

GET http://localhost:8000/api/db/v0/tables/7/records/?grouping={"columns": [23], "mode": "distinct", "preproc": "truncate_to_day"}

Method 2: Extraction

This grouping method actually extracts a date (or time or interval) part based on a given parameter, and groups based only on that part. So,

[2022-01-01, 2022-01-02, 2023-01-01, 2023-02-01]

would produce 2 groups by year, 2 groups by month, and 2 groups by day.

This method can use any extractable date or time part listed in the documentation

It should be called with a request like:

GET http://localhost:8000/api/db/v0/tables/7/records/?grouping={"columns": [23], "mode": "extract", "extract_field": "year"}

extract_field is the parameter giving the desired date part for grouping.

Screenshots

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the master branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@mathemancer mathemancer requested review from a team and dmos62 and removed request for a team July 12, 2022 03:57
@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2022

Codecov Report

Merging #1458 (39d604e) into master (d63bed8) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1458      +/-   ##
==========================================
+ Coverage   92.66%   92.71%   +0.04%     
==========================================
  Files         130      130              
  Lines        5560     5598      +38     
==========================================
+ Hits         5152     5190      +38     
  Misses        408      408              
Flag Coverage Δ
pytest-backend 92.71% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mathesar/api/pagination.py 78.57% <ø> (ø)
db/functions/known_db_functions.py 100.00% <100.00%> (ø)
db/records/operations/group.py 98.23% <100.00%> (+0.11%) ⬆️
db/types/custom/datetime.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d63bed8...39d604e. Read the comment docs.

@kgodey
Copy link
Contributor

kgodey commented Jul 12, 2022

Would it be easy to add support for additional groupings? The original ticket called for additional groupings (e.g. decade, century, quarter, anything supported by the EXTRACT function).

@mathemancer
Copy link
Contributor Author

mathemancer commented Jul 12, 2022

Would it be easy to add support for additional groupings? The original ticket called for additional groupings (e.g. decade, century, quarter, anything supported by the EXTRACT function).

Yes, (but). The problems with that are two-fold.

First, we don't have (at this moment) a way to encode acceptable string arguments to a given function in the function info endpoint. Thus, we can't have (for example) a 2-argument function to describe both the desire to extract a date part, and which date part to extract without the front end needing to hard-code every string that it wants to be able to extract. After talking a bit with @dmos62 , we decided to avoid creating a separate DBFunction for each date part, since we'd need to remove them all later once we have the descriptive infrastructure more fleshed-out.

The second problem is that simply extracting a date part and then grouping on it is a bit unintuitive, and less likely to be useful than the current truncating logic. In case my example from the description was unclear, extracting (for example) the month from a date and grouping on that will put all dates that happen to occur in December in the same group, regardless of their year. This is in contrast with the current logic, where a "month" grouping is actually a year-month pair grouping.

As an interim step, I'll add a 2-arg extraction version so if we want to hard-code the various strings describing date parts, and get the behavior described in the second paragraph, the functionality will be there. For the current cycle, I recommend the UI call the pre-fab year, month, and day truncating functions since those make more sense for the report generation described in the storyboard.

@mathemancer mathemancer requested review from a team and pavish and removed request for a team July 12, 2022 08:23
Note: don't commit while blind with integration test rage in the future
@mathemancer mathemancer enabled auto-merge July 12, 2022 12:44
@kgodey
Copy link
Contributor

kgodey commented Jul 12, 2022

The second problem is that simply extracting a date part and then grouping on it is a bit unintuitive

I agree with your decision here, I don't think we should naively group by date part.

I think other common groupings that users may want are:

  • Second
  • Minute
  • Hour
  • Quarter
  • Decade

I assume those will be handled in subsequent PRs (maybe not in this cycle)?

@mathemancer
Copy link
Contributor Author

mathemancer commented Jul 13, 2022

The second problem is that simply extracting a date part and then grouping on it is a bit unintuitive

I agree with your decision here, I don't think we should naively group by date part.

I think other common groupings that users may want are:

  • Second
  • Minute
  • Hour
  • Quarter
  • Decade

I assume those will be handled in subsequent PRs (maybe not in this cycle)?

Long run, I want to improve the function description machinery to handle describing acceptable string arguments, then redo most of this work. This would enable those.

@mathemancer mathemancer merged commit 8e877d2 into master Jul 13, 2022
@mathemancer mathemancer deleted the datetime_grouping branch July 13, 2022 04:00
@kgodey
Copy link
Contributor

kgodey commented Jul 13, 2022

Long run, I want to improve the function description machinery to handle describing acceptable string arguments, then redo most of this work. This would enable those.

Can you create a ticket to track this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Implement grouping options for date & time types
3 participants