Implement a channel language filter. #12892

rtibbles · 2024-11-27T22:01:29Z

Summary

Adds a filter for channel languages on the main Library page and each remote library browsing page
Populates the filter options only from the language property of the channel/root node metadata as the included_languages metadata currently only includes the language id and not the name.
When a language has not been specified (i.e. on first page load), the user is redirected to the language that is the closest match to their interface language. If none exists, no language is selected.
Displayed channels are displayed based on an exact match between the selected language and either their channel language, or their included languages property

References

Reviewer guidance

Open questions:

What, if any, effect should the channel language selection have on the 'language' selection for resource/folder search?
The language match can be very exact - for example in the first screenshot above where we only see two resources because that's what is in Castillian Spanish specificaly - because we mostly use language + region codes for our interface language, but our content languages are a mix of language and region codes, and just language codes. An alternative would be to do a less strict filtering, so we include all channels that are at least a language only match, so for example, if es-es is the selected language we would show:
- First all the es-es channels
- Then all the es channels
- Then all the es-* channels

After updates:
Main open question is around how we are doing the filtering - currently, we use the fact that we are already fetching all the language metadata in the frontend to then send all relevant language codes to the backend in the API request.

This is a neat solution, but does mean we are adding an explicit chain step to our request waterfall - i.e. we have to wait until we've fetched the global search labels before we can fetch anything else.

The other alternative we could do there would be to instead just leverage the fact that we only really care about matching on the lang_code attribute - the part before the dash. We could add a new filter query to the API endpoint and then do matching on the backend against that.

In either case, we could potentially also still improve query performance here by making a language bitmask for the most frequently occurring languages, and annotate for quicker lookups - falling back to matching against the M2M included_languages when requesting languages not included in the bitmask.

github-actions · 2024-11-27T22:33:30Z

Build Artifacts

Asset type	Download link
PEX file	kolibri-0.18.0a0.dev0_git.279.g96856cc1.pex
Windows Installer (EXE)	kolibri-0.18.0a0.dev0+git.279.g96856cc1-windows-setup-unsigned.exe
Debian Package	kolibri_0.18.0a0.dev0+git.279.g96856cc1-0ubuntu1_all.deb
Mac Installer (DMG)	kolibri-0.18.0a0.dev0+git.279.g96856cc1.dmg
Android Package (APK)	kolibri-0.18.0a0.dev0+git.279.g96856cc1-0.1.4-debug.apk
Raspberry Pi Image	kolibri-pi-image-0.18.0a0.dev0+git.279.g96856cc1.zip
TAR file	kolibri-0.18.0a0.dev0+git.279.g96856cc1.tar.gz
WHL file	kolibri-0.18.0a0.dev0+git.279.g96856cc1-py2.py3-none-any.whl

…bar.

Ignore constraints when generating sql alchemy schema.

Allow language selection logic to be reused.

marcellamaki

I've added some questions for now following our snackhack today. I started manual QA but will update with a few more questions/screencasts from that on Monday just thinking about the expected behavior (some of the interactions surprised me)

marcellamaki · 2025-01-17T17:21:23Z

kolibri/core/content/api.py

        assessmentmetadata_map = {
            a["contentnode"]: a
            for a in models.AssessmentMetaData.objects.filter(
-                contentnode__in=queryset
+                contentnode__in=ids


During the PR review today, we were discussing why the queryset was replaced with this ids array. Some ideas were that the queryset had never been required, since we had the items array, and items might be a smaller set than the queryset. Or, that this might be an optimization. Is that right? were there other motivations here?

It was an optimization, yes. Because we've already got the items query from the DB, doing the subquery with the queryset, rather than just passing in the materialized ids was slowing down the query somewhat. As the content node endpoint is only performant when we do limited queries (and hence we almost always paginate requests to it) - the risk of there being too many ids is mitigated.

marcellamaki · 2025-01-17T19:04:55Z

kolibri/core/content/models.py

+    # for topic nodes, this is the union of all languages of all descendant nodes
+    # and any language set on the topic node itself
+    # We do this to allow filtering of a topic tree by a specific language for
+    # multi-language channels.


I think here I'm understanding the why but not the how of the M2M field. Meaning, I understand what's trying to happen and that the M2M field choice facilitates it, but I can't fully wrap my mind around why it has to be a many to many.

Because languages are a separate table - and each contentnode can contain multiple languages. So it needs to be a many to many field, because for every content node, we could be keying to multiple other languages (and vice versa).

marcellamaki · 2025-01-17T19:05:48Z

kolibri/core/content/test/test_annotation.py

+            lang=test_language,
+        )
+
+        # This should not raise an IntegrityError


The M2M through table is uniquely keyed by contentnode_Id and language_id - so it would raise an integrity error if we tried to create a duplicate entry, as we might if there were multiple children or grandchildren with the same language, and we were doing this in a way that failed to account for that.

marcellamaki · 2025-01-17T19:06:56Z

kolibri/core/content/test/test_annotation.py

+            lang_direction="ltr",
+        )
+
+        # Create two child nodes with different languages


so these nodes are different from each other but also different from the parent language

The parent language kind of becomes superfluous here, because having a topic say it is a specific language when none of its descendants are in that language is just bad metadata - so we do it based entirely on the descendant recursion.

marcellamaki · 2025-01-17T20:56:59Z

kolibri/core/content/utils/annotation.py

@@ -583,6 +584,31 @@ def recurse_annotation_up_tree(channel_id):
        )
    )

+    # First clear out all existing language relationships for nodes in this channel


where would these pre-existing relationships on the nodes have come from?

From a previous annotation - we do annotation any time we import additional resources in a channel.

marcellamaki · 2025-01-17T20:58:09Z

requirements/dev.txt

@@ -9,4 +9,4 @@ django-debug-toolbar==4.3.0
 drf-yasg==1.21.7
 coreapi==2.3.3
 nodeenv==1.3.3
-sqlacodegen==2.1.0
+sqlacodegen==2.3.0.post1


is this just a tangential update? or is it actually related to this commit

This was to make sure we had the latest version for recreating the schema.

marcellamaki · 2025-01-17T21:50:44Z

kolibri/core/content/contentschema/versions/content_schema_current.py

-            ["content_contentnode.id"],
-            deferrable=True,
-            initially="DEFERRED",
-        ),


I don't know why this was deleted

Because they should never have been added in the first place - adding these constraints into the sqlalchemy schema adds additional overhead to our channel import code, and is just a consequence of Django 3.2 doing foreign key integrity checks on SQLite now.

They also break tests when additional things are added to the schema.

marcellamaki · 2025-02-06T18:29:29Z

On the manual QA side, the only real "bug" I've found is when navigating into a channel, which doesn't work at all if any language filter is applied.

Screen.Recording.2025-02-06.at.1.01.43.PM.mov

What I was trying to do here (other than test) was to determine what is the "expected" behavior for the resources that are presented to the user once they are browsing the channel. I guess this one isn't a very good example, because it's not a multi-language channel, but if for example I had selected "french" and clicked into a channel that had french and spanish resources, would the resources be (pre)filtered at this level? Because the query does have &primaryLanguage=fr at the end (which maybe is what's contributing to the bug?), and I just wasn't sure how to interpret that, and if it would have implications in the UI for the channel browsing, or if it was there to be used when navigating backwards.

This also breaks navigating into a channel under the "explore libraries" section, if any language filter is applied on the main library page (such as the default language to match the UI). However, the user is able to click on the "explore" button without error. AND, once I'm within the "explore libraries" page, I can click on a channel and view it without error.

The URLS are different
Navigating THRU "explore libraries" page

http://127.0.0.1:8000/en/learn/#/topics/2d04ed86aa8c519fae78a250e03d8482/t/8a2d480dbc9b53408c688e8188326b16/folders?prevName=LIBRARY&prevQuery=%257B%2522prevName%2522%3A%2522LIBRARY%2522,%2522prevQuery%2522%3A%2522%25257B%252522primaryLanguage%252522%3A%252522en%252522%25257D%2522,%2522prevParams%2522%3A%2522%25257B%25257D%2522,%2522primaryLanguage%2522%3A%2522en%2522%257D&prevParams=%257B%2522deviceId%2522%3A%25222d04ed86aa8c519fae78a250e03d8482%2522%257D&primaryLanguage=en

vs. clicking directly on a channel from the library page

http://127.0.0.1:8000/en/learn/#/topics/2d04ed86aa8c519fae78a250e03d8482/t/8166e765a0095bfaa49e98d034653dc5/folders?prevName=LIBRARY&prevQuery=%257B%2522primaryLanguage%2522%3A%2522en%2522%257D&prevParams=%257B%257D&primaryLanguage=en

Lastly, I do think that we talked about this, but now that I'm testing it out, I am a little bit surprised by the channel language filtering on the explore libraries experience.

on the main library page, even when a language filter is applied, there is not filter on the previewed channels
then, if a user clicks "explore", they have the "default language" (UI language) filter applied
if the user changes the default language at the top of their own page (i.e. to French) and then clicks the "explore" under explore libraries, they still have the default (UI language) applied, (i.e. english).

I don't think any of this is blocking, because right now, I'm sort of going on "well I don't know if this makes the most sense", but I would just appreciate @jtamiace taking a look and seeing what she thinks. I know there are some technical constraints that we're operating under which is again why I don't think any of this is blocking. I think I'm just wondering about what is the best combination of "shared filters" vs. "default and not accidentally confusing people" and I do not have an answer.

rtibbles · 2025-02-07T15:56:32Z

on the main library page, even when a language filter is applied, there is not filter on the previewed channels

I don't think I touched the logic here, no - the main issue is that with the 'frontend applied' language filtering, it is only filtering by languages available in the currently active library (in this case, the library on the local Kolibri).

If we filter the previewed libraries by those languages, then they could very well appear completely empty, or they might appear subtly deficient - we only have fr-fr on the local device, but the remote device only has resources in fr-ca.

Doing the filtering on the backend would resolve this, but we would then need to add an additional language filter on the backend that accepts just a language code (with no sub code) and does the relevant filtering. This would break backwards compatibility, but would probably be a better experience.

marcellamaki · 2025-02-07T16:17:44Z

If we filter the previewed libraries by those languages, then they could very well appear completely empty, or they might appear subtly deficient - we only have fr-fr on the local device, but the remote device only has resources in fr-ca.

Yes, although this is probably already true for any situations where there isn't an internet connection and there are limited or no LAN devices.

I guess the "preview" on the home page I am somewhat less concerned with (although maybe we could add a string especially for the kolibri library that makes it a bit more clear that it is the entire kolibri public content library that is being previewed). But, I feel more inclined to push for having the same language filter as on the library page apply to the "explore libraries" page (i.e. fr-fr if applied, even when default language is english) ... would this also require the backend filtering?

rtibbles · 2025-02-07T16:32:55Z

would this also require the backend filtering?

Kind of - unless we are looking at a specific library source, we don't have labels for every "included language" in a channel (i.e. a language that is in a channel but is not the primary language), because the API only returns the language id for the included language. So we can't populate a complete list of selectable languages unless we add an additional fetch to every remote library to get all the language labels.

Ultimately, I think switching to backend filtering is probably better, even with the backwards incompatibility, simply because it will reduce the awkward frontend data processing we have to do (and the weird await I had to add to wait for the search labels to load before we are able to fetch channels).

We can also optimize the backend filtering pretty well, compared to comparing against 3 language codes.

github-actions bot added APP: Learn Re: Learn App (content, quizzes, lessons, etc.) DEV: frontend labels Nov 27, 2024

rtibbles requested review from jtamiace and marcellamaki November 27, 2024 22:01

rtibbles force-pushed the filter_your_language branch from c0db573 to bb22259 Compare November 27, 2024 23:28

marcellamaki self-assigned this Dec 3, 2024

rtibbles marked this pull request as draft December 4, 2024 20:54

rtibbles force-pushed the filter_your_language branch from bb22259 to 1f5a9a4 Compare December 17, 2024 23:12

github-actions bot added the DEV: backend Python, databases, networking, filesystem... label Dec 17, 2024

rtibbles force-pushed the filter_your_language branch 2 times, most recently from a468f6f to 5926cb1 Compare December 19, 2024 17:07

rtibbles added 3 commits December 27, 2024 08:47

Remove unmaintained django debug panel in favour of django debug tool…

af67515

…bar.

Add included_languages M2M field to contentnodes.

4176893

Ignore constraints when generating sql alchemy schema.

Add primary language filtering to useBaseSearch.

52dde1d

Allow language selection logic to be reused.

rtibbles force-pushed the filter_your_language branch from 5926cb1 to 52dde1d Compare December 27, 2024 16:47

marcellamaki reviewed Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a channel language filter. #12892

Implement a channel language filter. #12892

rtibbles commented Nov 27, 2024 •

edited

Loading

github-actions bot commented Nov 27, 2024 •

edited

Loading

marcellamaki left a comment

marcellamaki Jan 17, 2025

rtibbles Jan 20, 2025

marcellamaki Jan 17, 2025

rtibbles Jan 20, 2025

marcellamaki Jan 17, 2025

rtibbles Jan 20, 2025

marcellamaki Jan 17, 2025

rtibbles Jan 20, 2025

marcellamaki Jan 17, 2025

rtibbles Jan 21, 2025

marcellamaki Jan 17, 2025

rtibbles Jan 20, 2025

marcellamaki Jan 17, 2025

rtibbles Jan 20, 2025

marcellamaki commented Feb 6, 2025

rtibbles commented Feb 7, 2025

marcellamaki commented Feb 7, 2025

rtibbles commented Feb 7, 2025

Implement a channel language filter. #12892

Are you sure you want to change the base?

Implement a channel language filter. #12892

Conversation

rtibbles commented Nov 27, 2024 • edited Loading

Summary

References

Reviewer guidance

github-actions bot commented Nov 27, 2024 • edited Loading

marcellamaki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcellamaki commented Feb 6, 2025

rtibbles commented Feb 7, 2025

marcellamaki commented Feb 7, 2025

rtibbles commented Feb 7, 2025

rtibbles commented Nov 27, 2024 •

edited

Loading

github-actions bot commented Nov 27, 2024 •

edited

Loading