Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a channel language filter. #12892

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

rtibbles
Copy link
Member

@rtibbles rtibbles commented Nov 27, 2024

Summary

  • Adds a filter for channel languages on the main Library page and each remote library browsing page
  • Populates the filter options only from the language property of the channel/root node metadata as the included_languages metadata currently only includes the language id and not the name.
  • When a language has not been specified (i.e. on first page load), the user is redirected to the language that is the closest match to their interface language. If none exists, no language is selected.
  • Displayed channels are displayed based on an exact match between the selected language and either their channel language, or their included languages property

Screenshot from 2024-11-27 13-43-49
Screenshot from 2024-11-27 13-42-04
Screenshot from 2024-11-27 13-40-42
Screenshot from 2024-11-27 13-40-32

References

Fixes #12702

Reviewer guidance

Open questions:

  • What, if any, effect should the channel language selection have on the 'language' selection for resource/folder search?
  • The language match can be very exact - for example in the first screenshot above where we only see two resources because that's what is in Castillian Spanish specificaly - because we mostly use language + region codes for our interface language, but our content languages are a mix of language and region codes, and just language codes. An alternative would be to do a less strict filtering, so we include all channels that are at least a language only match, so for example, if es-es is the selected language we would show:
    • First all the es-es channels
    • Then all the es channels
    • Then all the es-* channels

After updates:
Main open question is around how we are doing the filtering - currently, we use the fact that we are already fetching all the language metadata in the frontend to then send all relevant language codes to the backend in the API request.

This is a neat solution, but does mean we are adding an explicit chain step to our request waterfall - i.e. we have to wait until we've fetched the global search labels before we can fetch anything else.

The other alternative we could do there would be to instead just leverage the fact that we only really care about matching on the lang_code attribute - the part before the dash. We could add a new filter query to the API endpoint and then do matching on the backend against that.

In either case, we could potentially also still improve query performance here by making a language bitmask for the most frequently occurring languages, and annotate for quicker lookups - falling back to matching against the M2M included_languages when requesting languages not included in the bitmask.

@github-actions github-actions bot added APP: Learn Re: Learn App (content, quizzes, lessons, etc.) DEV: frontend labels Nov 27, 2024
@rtibbles rtibbles force-pushed the filter_your_language branch from c0db573 to bb22259 Compare November 27, 2024 23:28
@marcellamaki marcellamaki self-assigned this Dec 3, 2024
@rtibbles rtibbles marked this pull request as draft December 4, 2024 20:54
@rtibbles rtibbles force-pushed the filter_your_language branch from bb22259 to 1f5a9a4 Compare December 17, 2024 23:12
@github-actions github-actions bot added the DEV: backend Python, databases, networking, filesystem... label Dec 17, 2024
@rtibbles rtibbles force-pushed the filter_your_language branch 2 times, most recently from a468f6f to 5926cb1 Compare December 19, 2024 17:07
@rtibbles rtibbles force-pushed the filter_your_language branch from 5926cb1 to 52dde1d Compare December 27, 2024 16:47
Copy link
Member

@marcellamaki marcellamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some questions for now following our snackhack today. I started manual QA but will update with a few more questions/screencasts from that on Monday just thinking about the expected behavior (some of the interactions surprised me)

assessmentmetadata_map = {
a["contentnode"]: a
for a in models.AssessmentMetaData.objects.filter(
contentnode__in=queryset
contentnode__in=ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the PR review today, we were discussing why the queryset was replaced with this ids array. Some ideas were that the queryset had never been required, since we had the items array, and items might be a smaller set than the queryset. Or, that this might be an optimization. Is that right? were there other motivations here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an optimization, yes. Because we've already got the items query from the DB, doing the subquery with the queryset, rather than just passing in the materialized ids was slowing down the query somewhat. As the content node endpoint is only performant when we do limited queries (and hence we almost always paginate requests to it) - the risk of there being too many ids is mitigated.

# for topic nodes, this is the union of all languages of all descendant nodes
# and any language set on the topic node itself
# We do this to allow filtering of a topic tree by a specific language for
# multi-language channels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here I'm understanding the why but not the how of the M2M field. Meaning, I understand what's trying to happen and that the M2M field choice facilitates it, but I can't fully wrap my mind around why it has to be a many to many.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because languages are a separate table - and each contentnode can contain multiple languages. So it needs to be a many to many field, because for every content node, we could be keying to multiple other languages (and vice versa).

lang=test_language,
)

# This should not raise an IntegrityError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The M2M through table is uniquely keyed by contentnode_Id and language_id - so it would raise an integrity error if we tried to create a duplicate entry, as we might if there were multiple children or grandchildren with the same language, and we were doing this in a way that failed to account for that.

lang_direction="ltr",
)

# Create two child nodes with different languages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these nodes are different from each other but also different from the parent language

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent language kind of becomes superfluous here, because having a topic say it is a specific language when none of its descendants are in that language is just bad metadata - so we do it based entirely on the descendant recursion.

@@ -583,6 +584,31 @@ def recurse_annotation_up_tree(channel_id):
)
)

# First clear out all existing language relationships for nodes in this channel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where would these pre-existing relationships on the nodes have come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a previous annotation - we do annotation any time we import additional resources in a channel.

@@ -9,4 +9,4 @@ django-debug-toolbar==4.3.0
drf-yasg==1.21.7
coreapi==2.3.3
nodeenv==1.3.3
sqlacodegen==2.1.0
sqlacodegen==2.3.0.post1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just a tangential update? or is it actually related to this commit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to make sure we had the latest version for recreating the schema.

["content_contentnode.id"],
deferrable=True,
initially="DEFERRED",
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this was deleted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they should never have been added in the first place - adding these constraints into the sqlalchemy schema adds additional overhead to our channel import code, and is just a consequence of Django 3.2 doing foreign key integrity checks on SQLite now.

They also break tests when additional things are added to the schema.

@marcellamaki
Copy link
Member

On the manual QA side, the only real "bug" I've found is when navigating into a channel, which doesn't work at all if any language filter is applied.

Screen.Recording.2025-02-06.at.1.01.43.PM.mov

What I was trying to do here (other than test) was to determine what is the "expected" behavior for the resources that are presented to the user once they are browsing the channel. I guess this one isn't a very good example, because it's not a multi-language channel, but if for example I had selected "french" and clicked into a channel that had french and spanish resources, would the resources be (pre)filtered at this level? Because the query does have &primaryLanguage=fr at the end (which maybe is what's contributing to the bug?), and I just wasn't sure how to interpret that, and if it would have implications in the UI for the channel browsing, or if it was there to be used when navigating backwards.

This also breaks navigating into a channel under the "explore libraries" section, if any language filter is applied on the main library page (such as the default language to match the UI). However, the user is able to click on the "explore" button without error. AND, once I'm within the "explore libraries" page, I can click on a channel and view it without error.
Screenshot 2025-02-06 at 1 11 49 PM

The URLS are different
Navigating THRU "explore libraries" page

http://127.0.0.1:8000/en/learn/#/topics/2d04ed86aa8c519fae78a250e03d8482/t/8a2d480dbc9b53408c688e8188326b16/folders?prevName=LIBRARY&prevQuery=%257B%2522prevName%2522%3A%2522LIBRARY%2522,%2522prevQuery%2522%3A%2522%25257B%252522primaryLanguage%252522%3A%252522en%252522%25257D%2522,%2522prevParams%2522%3A%2522%25257B%25257D%2522,%2522primaryLanguage%2522%3A%2522en%2522%257D&prevParams=%257B%2522deviceId%2522%3A%25222d04ed86aa8c519fae78a250e03d8482%2522%257D&primaryLanguage=en

vs. clicking directly on a channel from the library page

http://127.0.0.1:8000/en/learn/#/topics/2d04ed86aa8c519fae78a250e03d8482/t/8166e765a0095bfaa49e98d034653dc5/folders?prevName=LIBRARY&prevQuery=%257B%2522primaryLanguage%2522%3A%2522en%2522%257D&prevParams=%257B%257D&primaryLanguage=en

Lastly, I do think that we talked about this, but now that I'm testing it out, I am a little bit surprised by the channel language filtering on the explore libraries experience.

  • on the main library page, even when a language filter is applied, there is not filter on the previewed channels
  • then, if a user clicks "explore", they have the "default language" (UI language) filter applied
  • if the user changes the default language at the top of their own page (i.e. to French) and then clicks the "explore" under explore libraries, they still have the default (UI language) applied, (i.e. english).

I don't think any of this is blocking, because right now, I'm sort of going on "well I don't know if this makes the most sense", but I would just appreciate @jtamiace taking a look and seeing what she thinks. I know there are some technical constraints that we're operating under which is again why I don't think any of this is blocking. I think I'm just wondering about what is the best combination of "shared filters" vs. "default and not accidentally confusing people" and I do not have an answer.

@rtibbles
Copy link
Member Author

rtibbles commented Feb 7, 2025

on the main library page, even when a language filter is applied, there is not filter on the previewed channels

I don't think I touched the logic here, no - the main issue is that with the 'frontend applied' language filtering, it is only filtering by languages available in the currently active library (in this case, the library on the local Kolibri).

If we filter the previewed libraries by those languages, then they could very well appear completely empty, or they might appear subtly deficient - we only have fr-fr on the local device, but the remote device only has resources in fr-ca.

Doing the filtering on the backend would resolve this, but we would then need to add an additional language filter on the backend that accepts just a language code (with no sub code) and does the relevant filtering. This would break backwards compatibility, but would probably be a better experience.

@marcellamaki
Copy link
Member

If we filter the previewed libraries by those languages, then they could very well appear completely empty, or they might appear subtly deficient - we only have fr-fr on the local device, but the remote device only has resources in fr-ca.

Yes, although this is probably already true for any situations where there isn't an internet connection and there are limited or no LAN devices.

I guess the "preview" on the home page I am somewhat less concerned with (although maybe we could add a string especially for the kolibri library that makes it a bit more clear that it is the entire kolibri public content library that is being previewed). But, I feel more inclined to push for having the same language filter as on the library page apply to the "explore libraries" page (i.e. fr-fr if applied, even when default language is english) ... would this also require the backend filtering?

@rtibbles
Copy link
Member Author

rtibbles commented Feb 7, 2025

would this also require the backend filtering?

Kind of - unless we are looking at a specific library source, we don't have labels for every "included language" in a channel (i.e. a language that is in a channel but is not the primary language), because the API only returns the language id for the included language. So we can't populate a complete list of selectable languages unless we add an additional fetch to every remote library to get all the language labels.

Ultimately, I think switching to backend filtering is probably better, even with the backwards incompatibility, simply because it will reduce the awkward frontend data processing we have to do (and the weird await I had to add to wait for the search labels to load before we are able to fetch channels).

We can also optimize the backend filtering pretty well, compared to comparing against 3 language codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APP: Learn Re: Learn App (content, quizzes, lessons, etc.) DEV: backend Python, databases, networking, filesystem... DEV: frontend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement language filter for channels
2 participants