Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesirable record grouping behaviours #352

Closed
pavish opened this issue Jul 12, 2021 · 12 comments · Fixed by #353, #477 or #481
Closed

Undesirable record grouping behaviours #352

pavish opened this issue Jul 12, 2021 · 12 comments · Fixed by #353, #477 or #481
Assignees
Labels
good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this ready Ready for implementation work: backend Related to Python, Django, and simple SQL

Comments

@pavish
Copy link
Member

pavish commented Jul 12, 2021

Description

Record grouping has a set of behaviours, that are not desirable.

  • It considers order_by, which leads to formation of incorrect query on the backend, if we don't group by the sorted column.
    Screenshot 2021-07-13 at 12 05 01 AM

  • It considers limit and offset. These apply on the grouped result itself, and is unrelated to the record limit & offset.
    Screenshot 2021-07-13 at 12 02 20 AM
    Screenshot 2021-07-13 at 12 02 42 AM

Expected behavior

  • It should not consider order_by.
  • It should not consider limit and offset.

We could also probably have a dedicated API for this. It could also obtain the values for columns, to filter the grouped results. Having it as part of records API makes less sense, since the group count is not a reflection of the record results.

@eito-fis eito-fis self-assigned this Jul 12, 2021
@eito-fis
Copy link
Contributor

Do we also want the filter parameter to not affect the returned counts?

@pavish
Copy link
Member Author

pavish commented Jul 12, 2021

The filter parameter has to affect the returned counts.

@kgodey
Copy link
Contributor

kgodey commented Jul 12, 2021

@pavish A couple of questions:

(1) Are you suggesting that we move this functionality to a separate API or that we fix the current API? It's unclear from the issue description.

For context, the idea behind keeping it in the record list API was that we could return group counts for a given query. We would only look at the list of records that the passed-in limit, offset, sorts, and filters affect and return group counts for that query. For example, if you were displaying page 2 of a table with 50 records per page, we would show you the group counts relevant to groups visible on that page.

@eito-fis I assume this matches your understanding.

(2) If we do move it to a separate API, I'm not sure what you mean by "It could also obtain the values for columns, to filter the grouped results." Could you explain further, including how you would use it?

@pavish
Copy link
Member Author

pavish commented Jul 12, 2021

@kgodey

(1) I am open to both options. I suggested moving to separate API, as the group_counts currently are not related to the record results.

We would only look at the list of records that the passed-in limit, offset, sorts, and filters affect and return group counts for that query

This would solve all the problems at hand. I also assumed this to be the behaviour initially, but this isn't the current behaviour. For this, we would need to fetch the records first, then find the values for which we obtain the grouped counts, with only the filters and not anything else. But I believe currently, it is directly applying all the query params on the group by query.

(2) If we had a dedicated API, we would need something like /v0/tables/1/record_groups/?fields=["Center"]&filter={Center:'Name of center'}. This would only return the group 'Name of center'. The behaviour is akin to how filters work with records. This will not be needed if we could do the former.

@kgodey
Copy link
Contributor

kgodey commented Jul 12, 2021

Let's go with keeping it in the current API and fixing it to work the way I described. @eito-fis let me know if you have any questions.

@eito-fis
Copy link
Contributor

To be clear, its not that we don't want to consider limit or offset at all, rather we want to limit and offset before grouping. Is this correct?

@kgodey
Copy link
Contributor

kgodey commented Jul 12, 2021

@eito-fis Yes.

@pavish
Copy link
Member Author

pavish commented Jul 12, 2021

@kgodey @eito-fis There is one other change, it's more on the convention side of the API.

Currently the results are in this format, when there are more than one grouped column:

"NASA Kennedy Space Center,Application": 29,
"NASA Kennedy Space Center,Issued": 58,
"NASA Johnson Space Center,Application": 54,
"NASA Ames Research Center,Application": 38,
"NASA Ames Research Center,Issued": 100,

I think it would make more sense to change it to:

"NASA Kennedy Space Center": {
  "Application": 29,
  "Issued": 58
},
"NASA Johnson Space Center": {
  "Application": 54
},
"NASA Ames Research Center": {
  "Application": 38,
  "Issued": 100
},

I am siding with the latter, since column names could have commas in them.

@eito-fis eito-fis mentioned this issue Jul 12, 2021
7 tasks
@kgodey
Copy link
Contributor

kgodey commented Jul 12, 2021

@pavish I find that format confusing, since both NASA Kennedy Space Center and Application are field values, nesting them implies that one is the parent and one is a child, which is incorrect. I do agree that comma separation could end up with unparseable strings.

I suggest we do something like:

{
    "results": [
        {
            "columns": ["NASA Kennedy Space Center", "Application"],
            "count": 29
        },
        {
            "columns": ["NASA Kennedy Space Center", "Issued"],
            "count": 58
        }
    ]
}

@kgodey kgodey added ready Ready for implementation work: backend Related to Python, Django, and simple SQL labels Jul 13, 2021
@pavish
Copy link
Member Author

pavish commented Jul 19, 2021

if you were displaying page 2 of a table with 50 records per page, we would show you the group counts relevant to groups visible on that page.

@kgodey In this scenario, if page 2 has only one group 'Group A' which in total consists of 90 records, I assumed this would return the group name 'Group A' with count 90.

But currently limit is still considered, and it returns 50. We would need to re-open this issue.

@kgodey kgodey reopened this Jul 19, 2021
@kgodey kgodey assigned kgodey and unassigned eito-fis Jul 19, 2021
@kgodey
Copy link
Contributor

kgodey commented Jul 19, 2021

I think this was my error with speccing it out, I didn't think through what considering limits would mean. Since @eito-fis is out this week, I'll prioritize fixing it.

@kgodey kgodey removed their assignment Jul 19, 2021
@kgodey kgodey added good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this labels Jul 19, 2021
@kgodey
Copy link
Contributor

kgodey commented Jul 19, 2021

Actually, since this isn't blocking @pavish (the counts will just be wrong for now), I'm not going to prioritize it myself.

@pavish pavish mentioned this issue Jul 20, 2021
7 tasks
@kgodey kgodey self-assigned this Jul 21, 2021
@kgodey kgodey mentioned this issue Jul 23, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Everything in "Help wanted", PLUS being relatively easy and straightforward to implement. help wanted Community contributors can implement this ready Ready for implementation work: backend Related to Python, Django, and simple SQL
Projects
No open projects
3 participants