-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a channel language filter. #12892
base: develop
Are you sure you want to change the base?
Conversation
Build Artifacts
|
c0db573
to
bb22259
Compare
bb22259
to
1f5a9a4
Compare
a468f6f
to
5926cb1
Compare
Ignore constraints when generating sql alchemy schema.
Allow language selection logic to be reused.
5926cb1
to
52dde1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some questions for now following our snackhack today. I started manual QA but will update with a few more questions/screencasts from that on Monday just thinking about the expected behavior (some of the interactions surprised me)
assessmentmetadata_map = { | ||
a["contentnode"]: a | ||
for a in models.AssessmentMetaData.objects.filter( | ||
contentnode__in=queryset | ||
contentnode__in=ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the PR review today, we were discussing why the queryset was replaced with this ids array. Some ideas were that the queryset had never been required, since we had the items array, and items might be a smaller set than the queryset. Or, that this might be an optimization. Is that right? were there other motivations here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an optimization, yes. Because we've already got the items query from the DB, doing the subquery with the queryset, rather than just passing in the materialized ids was slowing down the query somewhat. As the content node endpoint is only performant when we do limited queries (and hence we almost always paginate requests to it) - the risk of there being too many ids is mitigated.
# for topic nodes, this is the union of all languages of all descendant nodes | ||
# and any language set on the topic node itself | ||
# We do this to allow filtering of a topic tree by a specific language for | ||
# multi-language channels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here I'm understanding the why but not the how of the M2M field. Meaning, I understand what's trying to happen and that the M2M field choice facilitates it, but I can't fully wrap my mind around why it has to be a many to many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because languages are a separate table - and each contentnode can contain multiple languages. So it needs to be a many to many field, because for every content node, we could be keying to multiple other languages (and vice versa).
lang=test_language, | ||
) | ||
|
||
# This should not raise an IntegrityError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The M2M through table is uniquely keyed by contentnode_Id and language_id - so it would raise an integrity error if we tried to create a duplicate entry, as we might if there were multiple children or grandchildren with the same language, and we were doing this in a way that failed to account for that.
lang_direction="ltr", | ||
) | ||
|
||
# Create two child nodes with different languages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these nodes are different from each other but also different from the parent language
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent language kind of becomes superfluous here, because having a topic say it is a specific language when none of its descendants are in that language is just bad metadata - so we do it based entirely on the descendant recursion.
@@ -583,6 +584,31 @@ def recurse_annotation_up_tree(channel_id): | |||
) | |||
) | |||
|
|||
# First clear out all existing language relationships for nodes in this channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where would these pre-existing relationships on the nodes have come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a previous annotation - we do annotation any time we import additional resources in a channel.
@@ -9,4 +9,4 @@ django-debug-toolbar==4.3.0 | |||
drf-yasg==1.21.7 | |||
coreapi==2.3.3 | |||
nodeenv==1.3.3 | |||
sqlacodegen==2.1.0 | |||
sqlacodegen==2.3.0.post1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just a tangential update? or is it actually related to this commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to make sure we had the latest version for recreating the schema.
["content_contentnode.id"], | ||
deferrable=True, | ||
initially="DEFERRED", | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this was deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they should never have been added in the first place - adding these constraints into the sqlalchemy schema adds additional overhead to our channel import code, and is just a consequence of Django 3.2 doing foreign key integrity checks on SQLite now.
They also break tests when additional things are added to the schema.
On the manual QA side, the only real "bug" I've found is when navigating into a channel, which doesn't work at all if any language filter is applied. Screen.Recording.2025-02-06.at.1.01.43.PM.movWhat I was trying to do here (other than test) was to determine what is the "expected" behavior for the resources that are presented to the user once they are browsing the channel. I guess this one isn't a very good example, because it's not a multi-language channel, but if for example I had selected "french" and clicked into a channel that had french and spanish resources, would the resources be (pre)filtered at this level? Because the query does have This also breaks navigating into a channel under the "explore libraries" section, if any language filter is applied on the main library page (such as the default language to match the UI). However, the user is able to click on the "explore" button without error. AND, once I'm within the "explore libraries" page, I can click on a channel and view it without error. The URLS are different
vs. clicking directly on a channel from the library page
Lastly, I do think that we talked about this, but now that I'm testing it out, I am a little bit surprised by the channel language filtering on the explore libraries experience.
I don't think any of this is blocking, because right now, I'm sort of going on "well I don't know if this makes the most sense", but I would just appreciate @jtamiace taking a look and seeing what she thinks. I know there are some technical constraints that we're operating under which is again why I don't think any of this is blocking. I think I'm just wondering about what is the best combination of "shared filters" vs. "default and not accidentally confusing people" and I do not have an answer. |
I don't think I touched the logic here, no - the main issue is that with the 'frontend applied' language filtering, it is only filtering by languages available in the currently active library (in this case, the library on the local Kolibri). If we filter the previewed libraries by those languages, then they could very well appear completely empty, or they might appear subtly deficient - we only have fr-fr on the local device, but the remote device only has resources in fr-ca. Doing the filtering on the backend would resolve this, but we would then need to add an additional language filter on the backend that accepts just a language code (with no sub code) and does the relevant filtering. This would break backwards compatibility, but would probably be a better experience. |
Yes, although this is probably already true for any situations where there isn't an internet connection and there are limited or no LAN devices. I guess the "preview" on the home page I am somewhat less concerned with (although maybe we could add a string especially for the kolibri library that makes it a bit more clear that it is the entire kolibri public content library that is being previewed). But, I feel more inclined to push for having the same language filter as on the library page apply to the "explore libraries" page (i.e. fr-fr if applied, even when default language is english) ... would this also require the backend filtering? |
Kind of - unless we are looking at a specific library source, we don't have labels for every "included language" in a channel (i.e. a language that is in a channel but is not the primary language), because the API only returns the language id for the included language. So we can't populate a complete list of selectable languages unless we add an additional fetch to every remote library to get all the language labels. Ultimately, I think switching to backend filtering is probably better, even with the backwards incompatibility, simply because it will reduce the awkward frontend data processing we have to do (and the weird await I had to add to wait for the search labels to load before we are able to fetch channels). We can also optimize the backend filtering pretty well, compared to comparing against 3 language codes. |
Summary
language
property of the channel/root node metadata as theincluded_languages
metadata currently only includes the language id and not the name.References
Fixes #12702
Reviewer guidance
Open questions:
es
channelses-*
channelsAfter updates:
Main open question is around how we are doing the filtering - currently, we use the fact that we are already fetching all the language metadata in the frontend to then send all relevant language codes to the backend in the API request.
This is a neat solution, but does mean we are adding an explicit chain step to our request waterfall - i.e. we have to wait until we've fetched the global search labels before we can fetch anything else.
The other alternative we could do there would be to instead just leverage the fact that we only really care about matching on the
lang_code
attribute - the part before the dash. We could add a new filter query to the API endpoint and then do matching on the backend against that.In either case, we could potentially also still improve query performance here by making a language bitmask for the most frequently occurring languages, and annotate for quicker lookups - falling back to matching against the M2M included_languages when requesting languages not included in the bitmask.