Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a channel language filter. #12892

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions kolibri/core/content/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,12 +258,19 @@ def list(self, request, *args, **kwargs):
return super(RemoteViewSet, self).list(request, *args, **kwargs)


class CharInFilter(BaseInFilter, CharFilter):
pass


class ChannelMetadataFilter(FilterSet):
available = BooleanFilter(method="filter_available", label="Available")
contains_exercise = BooleanFilter(
method="filter_contains_exercise", label="Has exercises"
)
contains_quiz = BooleanFilter(method="filter_contains_quiz", label="Has quizzes")
languages = CharInFilter(
field_name="included_languages", label="Languages", distinct=True
)

class Meta:
model = models.ChannelMetadata
Expand Down Expand Up @@ -417,10 +424,6 @@ class UUIDInFilter(BaseInFilter, UUIDFilter):
pass


class CharInFilter(BaseInFilter, CharFilter):
pass


contentnode_filter_fields = [
"parent",
"parent__isnull",
Expand Down Expand Up @@ -470,7 +473,7 @@ class ContentNodeFilter(FilterSet):
learner_needs = CharFilter(method="bitmask_contains_and")
keywords = CharFilter(method="filter_keywords")
channels = UUIDInFilter(field_name="channel_id")
languages = CharInFilter(field_name="lang_id")
languages = CharInFilter(field_name="included_languages")
categories__isnull = BooleanFilter(field_name="categories", lookup_expr="isnull")
lft__gt = NumberFilter(field_name="lft", lookup_expr="gt")
rght__lt = NumberFilter(field_name="rght", lookup_expr="lt")
Expand Down Expand Up @@ -671,10 +674,11 @@ def get_queryset(self):
return models.ContentNode.objects.filter(available=True)

def get_related_data_maps(self, items, queryset):
ids = [item["id"] for item in items]
assessmentmetadata_map = {
a["contentnode"]: a
for a in models.AssessmentMetaData.objects.filter(
contentnode__in=queryset
contentnode__in=ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the PR review today, we were discussing why the queryset was replaced with this ids array. Some ideas were that the queryset had never been required, since we had the items array, and items might be a smaller set than the queryset. Or, that this might be an optimization. Is that right? were there other motivations here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an optimization, yes. Because we've already got the items query from the DB, doing the subquery with the queryset, rather than just passing in the materialized ids was slowing down the query somewhat. As the content node endpoint is only performant when we do limited queries (and hence we almost always paginate requests to it) - the risk of there being too many ids is mitigated.

).values(
"assessment_item_ids",
"number_of_assessments",
Expand All @@ -688,7 +692,7 @@ def get_related_data_maps(self, items, queryset):
files_map = {}

files = list(
models.File.objects.filter(contentnode__in=queryset).values(
models.File.objects.filter(contentnode__in=ids).values(
"id",
"contentnode",
"local_file__id",
Expand Down Expand Up @@ -723,7 +727,7 @@ def get_related_data_maps(self, items, queryset):
tags_map = {}

for t in (
models.ContentTag.objects.filter(tagged_content__in=queryset)
models.ContentTag.objects.filter(tagged_content__in=ids)
.values(
"tag_name",
"tagged_content",
Expand Down
129 changes: 31 additions & 98 deletions kolibri/core/content/contentschema/versions/content_schema_current.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,13 @@
from sqlalchemy import BigInteger
from sqlalchemy import Boolean
from sqlalchemy import CHAR
from sqlalchemy import CheckConstraint
from sqlalchemy import Column
from sqlalchemy import Float
from sqlalchemy import ForeignKey
from sqlalchemy import ForeignKeyConstraint
from sqlalchemy import Index
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy import Text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship

Base = declarative_base()
metadata = Base.metadata
Expand Down Expand Up @@ -47,34 +43,17 @@ class ContentLocalfile(Base):
class ContentContentnode(Base):
__tablename__ = "content_contentnode"
__table_args__ = (
CheckConstraint("lft >= 0"),
CheckConstraint("tree_id >= 0"),
CheckConstraint("level >= 0"),
CheckConstraint("duration >= 0"),
CheckConstraint("rght >= 0"),
ForeignKeyConstraint(
["lang_id"],
["content_language.id"],
deferrable=True,
initially="DEFERRED",
),
ForeignKeyConstraint(
["parent_id"],
["content_contentnode.id"],
deferrable=True,
initially="DEFERRED",
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this was deleted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they should never have been added in the first place - adding these constraints into the sqlalchemy schema adds additional overhead to our channel import code, and is just a consequence of Django 3.2 doing foreign key integrity checks on SQLite now.

They also break tests when additional things are added to the schema.

Index(
"content_contentnode_level_channel_id_available_29f0bb18_idx",
"content_contentnode_level_channel_id_kind_fd732cc4_idx",
"level",
"channel_id",
"available",
"kind",
),
Index(
"content_contentnode_level_channel_id_kind_fd732cc4_idx",
"content_contentnode_level_channel_id_available_29f0bb18_idx",
"level",
"channel_id",
"kind",
"available",
),
)

Expand All @@ -89,6 +68,7 @@ class ContentContentnode(Base):
kind = Column(String(200), nullable=False)
available = Column(Boolean, nullable=False)
lft = Column(Integer, nullable=False)
rght = Column(Integer, nullable=False)
tree_id = Column(Integer, nullable=False, index=True)
level = Column(Integer, nullable=False)
lang_id = Column(String(14), index=True)
Expand All @@ -112,12 +92,8 @@ class ContentContentnode(Base):
learning_activities_bitmask_0 = Column(BigInteger)
ancestors = Column(Text)
admin_imported = Column(Boolean)
rght = Column(Integer, nullable=False)
parent_id = Column(CHAR(32), index=True)

lang = relationship("ContentLanguage")
parent = relationship("ContentContentnode", remote_side=[id])


class ContentAssessmentmetadata(Base):
__tablename__ = "content_assessmentmetadata"
Expand All @@ -128,22 +104,11 @@ class ContentAssessmentmetadata(Base):
mastery_model = Column(Text, nullable=False)
randomize = Column(Boolean, nullable=False)
is_manipulable = Column(Boolean, nullable=False)
contentnode_id = Column(
ForeignKey("content_contentnode.id"), nullable=False, index=True
)

contentnode = relationship("ContentContentnode")
contentnode_id = Column(CHAR(32), nullable=False, index=True)


class ContentChannelmetadata(Base):
__tablename__ = "content_channelmetadata"
__table_args__ = (
CheckConstraint('"order" >= 0'),
ForeignKeyConstraint(
["root_id"],
["content_contentnode.id"],
),
)

id = Column(CHAR(32), primary_key=True)
name = Column(String(200), nullable=False)
Expand All @@ -163,8 +128,6 @@ class ContentChannelmetadata(Base):
included_categories = Column(Text)
included_grade_levels = Column(Text)

root = relationship("ContentContentnode")


class ContentContentnodeHasPrerequisite(Base):
__tablename__ = "content_contentnode_has_prerequisite"
Expand All @@ -178,22 +141,25 @@ class ContentContentnodeHasPrerequisite(Base):
)

id = Column(Integer, primary_key=True)
from_contentnode_id = Column(
ForeignKey("content_contentnode.id"), nullable=False, index=True
)
to_contentnode_id = Column(
ForeignKey("content_contentnode.id"), nullable=False, index=True
)
from_contentnode_id = Column(CHAR(32), nullable=False, index=True)
to_contentnode_id = Column(CHAR(32), nullable=False, index=True)

from_contentnode = relationship(
"ContentContentnode",
primaryjoin="ContentContentnodeHasPrerequisite.from_contentnode_id == ContentContentnode.id",
)
to_contentnode = relationship(
"ContentContentnode",
primaryjoin="ContentContentnodeHasPrerequisite.to_contentnode_id == ContentContentnode.id",

class ContentContentnodeIncludedLanguages(Base):
__tablename__ = "content_contentnode_included_languages"
__table_args__ = (
Index(
"content_contentnode_included_languages_contentnode_id_language_id_7d14ec8b_uniq",
"contentnode_id",
"language_id",
unique=True,
),
)

id = Column(Integer, primary_key=True)
contentnode_id = Column(CHAR(32), nullable=False, index=True)
language_id = Column(String(14), nullable=False, index=True)


class ContentContentnodeRelated(Base):
__tablename__ = "content_contentnode_related"
Expand All @@ -207,21 +173,8 @@ class ContentContentnodeRelated(Base):
)

id = Column(Integer, primary_key=True)
from_contentnode_id = Column(
ForeignKey("content_contentnode.id"), nullable=False, index=True
)
to_contentnode_id = Column(
ForeignKey("content_contentnode.id"), nullable=False, index=True
)

from_contentnode = relationship(
"ContentContentnode",
primaryjoin="ContentContentnodeRelated.from_contentnode_id == ContentContentnode.id",
)
to_contentnode = relationship(
"ContentContentnode",
primaryjoin="ContentContentnodeRelated.to_contentnode_id == ContentContentnode.id",
)
from_contentnode_id = Column(CHAR(32), nullable=False, index=True)
to_contentnode_id = Column(CHAR(32), nullable=False, index=True)


class ContentContentnodeTags(Base):
Expand All @@ -236,15 +189,8 @@ class ContentContentnodeTags(Base):
)

id = Column(Integer, primary_key=True)
contentnode_id = Column(
ForeignKey("content_contentnode.id"), nullable=False, index=True
)
contenttag_id = Column(
ForeignKey("content_contenttag.id"), nullable=False, index=True
)

contentnode = relationship("ContentContentnode")
contenttag = relationship("ContentContenttag")
contentnode_id = Column(CHAR(32), nullable=False, index=True)
contenttag_id = Column(CHAR(32), nullable=False, index=True)


class ContentFile(Base):
Expand All @@ -254,19 +200,11 @@ class ContentFile(Base):
supplementary = Column(Boolean, nullable=False)
thumbnail = Column(Boolean, nullable=False)
priority = Column(Integer, index=True)
contentnode_id = Column(
ForeignKey("content_contentnode.id"), nullable=False, index=True
)
lang_id = Column(ForeignKey("content_language.id"), index=True)
local_file_id = Column(
ForeignKey("content_localfile.id"), nullable=False, index=True
)
contentnode_id = Column(CHAR(32), nullable=False, index=True)
lang_id = Column(String(14), index=True)
local_file_id = Column(String(32), nullable=False, index=True)
preset = Column(String(150), nullable=False)

contentnode = relationship("ContentContentnode")
lang = relationship("ContentLanguage")
local_file = relationship("ContentLocalfile")


class ContentChannelmetadataIncludedLanguages(Base):
__tablename__ = "content_channelmetadata_included_languages"
Expand All @@ -280,11 +218,6 @@ class ContentChannelmetadataIncludedLanguages(Base):
)

id = Column(Integer, primary_key=True)
channelmetadata_id = Column(
ForeignKey("content_channelmetadata.id"), nullable=False, index=True
)
language_id = Column(ForeignKey("content_language.id"), nullable=False, index=True)
sort_value = Column(Integer, nullable=False)

channelmetadata = relationship("ContentChannelmetadata")
language = relationship("ContentLanguage")
channelmetadata_id = Column(CHAR(32), nullable=False, index=True)
language_id = Column(String(14), nullable=False, index=True)
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def handle(self, *args, **options):
metadata.bind = engine

generator = CodeGenerator(
metadata, False, False, True, True, False, nocomments=False
metadata, False, True, True, True, False, nocomments=False
)

with io.open(
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Generated by Django 3.2.25 on 2024-12-18 00:14
from django.db import migrations
from django.db import models


class Migration(migrations.Migration):

dependencies = [
("content", "0039_channelmetadata_ordered_fields"),
]

operations = [
migrations.AddField(
model_name="contentnode",
name="included_languages",
field=models.ManyToManyField(
blank=True,
related_name="contentnodes",
to="content.Language",
verbose_name="languages",
),
),
]
10 changes: 10 additions & 0 deletions kolibri/core/content/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,16 @@ class ContentNode(base_models.ContentNode):
# needs a subsequent Kolibri upgrade step to backfill these values.
admin_imported = models.BooleanField(null=True)

# Languages that are in this node and/or any descendant nodes of this node
# for non-topic nodes, this is the language of the node itself
# for topic nodes, this is the union of all languages of all descendant nodes
# and any language set on the topic node itself
# We do this to allow filtering of a topic tree by a specific language for
# multi-language channels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here I'm understanding the why but not the how of the M2M field. Meaning, I understand what's trying to happen and that the M2M field choice facilitates it, but I can't fully wrap my mind around why it has to be a many to many.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because languages are a separate table - and each contentnode can contain multiple languages. So it needs to be a many to many field, because for every content node, we could be keying to multiple other languages (and vice versa).

included_languages = models.ManyToManyField(
"Language", related_name="contentnodes", verbose_name="languages", blank=True
)

objects = ContentNodeManager()

class Meta:
Expand Down
Loading