Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for v2 content libs #143

Merged
merged 1 commit into from
Jan 31, 2024

Conversation

ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Jan 18, 2024

Work to support openedx/edx-platform#34066

This bumps the version for openedx-learning to 0.5.0.

Some highlights:

  • new API functions and queries needed for draft/publish comparisons, fetching & creating component versions, etc.
  • re-created the migrations yet again, since this is the last time I should be able to get away with it before it gets installed into edx-platform.
  • made some fixes that I realized would bite us later (like making the text collation text insensitive for search purposes later).
  • new custom manager to make it easier to specify common "select_related" queries on models, first used on Component.
  • a few performance tweaks based on looking at the queries that show up when using this in edx-platform in Switch v2 libraries to Learning Core data models edx-platform#34066 (mostly getting rid of n+1 queries)

Squashed commit message:

This commit adds a number of features needed to accomodate content
libraries. These are all collected together because this work was done
while iteratively making more and more of libraries functional, and it
was difficult to properly separate out later.

This bumps the version to 0.5.0.

New publishing API functions:

* Learning Packages
    * get_learning_package
    * get_learning_package_by_key,
    * update_learning_package

* Publishable Entities
    * get_publishable_entity and
    * get_publishable_entity_by_key

* Draft/Publishing
    * get_last_publish
    * get_all_drafts
    * get_entities_with_unpublished_changes
    * get_entities_with_unpublished_deletes
    * get_published_version
    * set_draft_version
    * soft_delete_draft
    * reset_drafts_to_published

New Components API functions:

* create_next_version
* get_component_by_key
* component_exists_by_key
* get_components

PublishableEntityMixin improvements:

* New property methods: latest, has_unpublished_changes
* Existing methods were improved to use model-cached values.

Encoding fixes:

* collations for multiple fields were changed to be case insensitive in
  order to better support simple text searches.

Model optimizations:

* created the WithRelationsManager to more easily define querysets that
  involve adding select_related fields. This was used to elininate a
  number of n+1 queries that came up during libraries development.
* PublishableEntityVersion.version_num was reduced to a 4-byte
  PositiveIntegerField instead of using the 8-byte default for
  openedx_learning apps (it's assumed we will never make more than 2
  billion versions of a particular PublishableEntity). This was done to
  help reduce the size of indexes.

Typing:

* type annotations were added to many functions and test functions.

@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 22, 2024

Okay, I think I've gotten all the basic changes in there. Now to fix old tests and add new ones for the new functionality. Then a final re-base and version bump.

@ormsbee ormsbee force-pushed the changes-for-v2-content-libs branch from 7e5546a to 979d6cc Compare January 22, 2024 23:38
@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 22, 2024

Old tests are fixed, added a few new ones, still some more to go. I'll deal with mypy last.

@ormsbee ormsbee marked this pull request as ready for review January 23, 2024 00:07
@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 23, 2024

@kdmccormick, @bradenmacdonald: If you have some time, please give this a review pass. The tests aren't fully there yet, but I'd appreciate your thoughts on the API additions especially.

Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These API changes look great @ormsbee ! All my notes here are very minor nits, so feel free to ignore.

openedx_learning/core/components/api.py Outdated Show resolved Hide resolved
openedx_learning/core/components/api.py Show resolved Hide resolved
def get_component_by_pk(component_pk: int) -> Component:
return Component.objects.get(pk=component_pk)
def get_component_by_key(
learning_package_id: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/Q: Is it inconsistent that Learning Package PK is suffixed _id whereas component PKs are suffixed _pk in parameter names in this API? Or is there some difference related to what types of ID each thing has?

learning_package_id: int vs. component_pk: int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe I should stick with _pk for everything?

The reason why it's not component_id is because there is no Component.id field–Components have a OneToOneField relationship to PublishableEntity with primary_key=True. So the actual database field is oel_components_component.publishable_entity_id.

So maybe I should just convert everything to be _pk for consistency instead, since learning_package.pk will just be an alias to learning_package.id anyway. But one of the things that I didn't think through at the time is that if you have a foreign key relation to Component from another model, the db field will be component_id, and component_version.component_pk doesn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't think it's a big deal either way, but I'd probably lean toward calling it component_id for consistency, even though Component.id doesn't exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee @bradenmacdonald Consider me a soft vote for _pk. In edx-platform, _id could indicate all sorts of things: primary db keys, opaque keys (course_id, library_id, usage_id), and other things (block_id, user_id). _pk is unambiguously a primary db key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm going to go with _pk everywhere and do that refactoring now.

openedx_learning/core/components/api.py Outdated Show resolved Hide resolved
openedx_learning/core/components/api.py Outdated Show resolved Hide resolved
openedx_learning/core/publishing/api.py Outdated Show resolved Hide resolved
.filter(learning_package_id=learning_package_id) \
.exclude(draft__version=F('published__version'))

def get_entities_with_unpublished_deletes(learning_package_id: int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intended use case of this API method? Perhaps a widget in the UI that says something like "These things have been deleted since your last publish: ____" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure, actually. It's returned as part of the metadata in the get_library call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new code, I just run .exists() on the queryset returned here.

tests/openedx_learning/core/components/test_api.py Outdated Show resolved Hide resolved
@ormsbee ormsbee force-pushed the changes-for-v2-content-libs branch from 3207984 to f62f8c5 Compare January 29, 2024 16:18
@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 30, 2024

Okay, I wrote the tests, made changes based on the review comments, and the linters are appeased. I'm going to do the _pk refactoring now, but with any luck I'm hoping this can merge today.

@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 30, 2024

Okay, slight delay because I noticed that it's always doing the work of loading all components no matter the pagination params in the REST API. I think this is a problem on the edx-platform side, but I'm looking into it to be sure.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed all the apis, but had to sign off before getting to the mixins and model managers. Looking great! No blockers from me.

openedx_learning/core/components/api.py Outdated Show resolved Hide resolved
openedx_learning/core/components/api.py Show resolved Hide resolved
@ormsbee ormsbee force-pushed the changes-for-v2-content-libs branch from f5a042f to d36ec08 Compare January 30, 2024 23:17
@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 30, 2024

Okay, I wrote the tests, made changes based on the review comments, and the linters are appeased. I'm going to do the _pk refactoring now, but with any luck I'm hoping this can merge today.

I went back and forth on _id vs. _pk for a bit and then decided to punt entirely by making these arguments positional-only for the moment. >_<

@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 31, 2024

Also, I tracked down the query issue to something in edx-platform, and I'll have an update for that PR with the fix later. Part of it points to an issue that our internal APIs have with QuerySets and pagination. Namely, the pattern for DRF and Django in general is QuerySet -> Pagination -> Serialization. So it gets awkward when we're returning something from an API that we want to be QuerySet-like in that it's filterable and pageable, but also presents something that is not actually a model. I hacked up something before that wrapped the __iter()__ call and did serialization that seemed to work, but I feel like that's probably too much hackery to want to support in the longer term.

Anyhow, I'll discuss it more on the other PR.

@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 31, 2024

It occurred to me while looking at some recent comments in the wiki w.r.t. library content needs that we probably want to have ComponentType represented as its own model. Partly for saving space, but mostly because it gives us something to later hang policy off of on a per-type basis, e.g. "videos are okay to show in libraries, SGA blocks are experimental for library use".

I implemented that this evening. It was pleasantly straightforward, and it also gets rid of our having to tell pylint to ignore our use of type as a variable name. Tests already pass, so hopefully I can just do it as a fast-follow to this PR tomorrow if this merges.

@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 31, 2024

BTW, the only reason this build isn't green is because I was a doofus and mis-spelled a couple of conventional commit messages. Please know that it's ready for final review, and I'll squash it at the end.

@ormsbee ormsbee mentioned this pull request Jan 31, 2024
@kdmccormick kdmccormick self-assigned this Jan 31, 2024
Copy link
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are nice API changes and the code looks good to me!

I didn't run/test the code - let me know if you'd like me to.

@bradenmacdonald
Copy link
Contributor

Question that's not directly related: Can you use add_content_to_component_version to modify a ComponentVersion that has already been published?

@ormsbee ormsbee force-pushed the changes-for-v2-content-libs branch from 9695377 to ad6a897 Compare January 31, 2024 19:58
This commit adds a number of features needed to accomodate content
libraries. These are all collected together because this work was done
while iteratively making more and more of libraries functional, and it
was difficult to properly separate out later.

This bumps the version to 0.5.0.

New publishing API functions:

* Learning Packages
    * get_learning_package
    * get_learning_package_by_key,
    * update_learning_package

* Publishable Entities
    * get_publishable_entity and
    * get_publishable_entity_by_key

* Draft/Publishing
    * get_last_publish
    * get_all_drafts
    * get_entities_with_unpublished_changes
    * get_entities_with_unpublished_deletes
    * get_published_version
    * set_draft_version
    * soft_delete_draft
    * reset_drafts_to_published

New Components API functions:

* create_next_version
* get_component_by_key
* component_exists_by_key
* get_components

PublishableEntityMixin improvements:

* New property methods: latest, has_unpublished_changes
* Existing methods were improved to use model-cached values.

Encoding fixes:

* collations for multiple fields were changed to be case insensitive in
  order to better support simple text searches.

Model optimizations:

* created the WithRelationsManager to more easily define querysets that
  involve adding select_related fields. This was used to elininate a
  number of n+1 queries that came up during libraries development.
* PublishableEntityVersion.version_num was reduced to a 4-byte
  PositiveIntegerField instead of using the 8-byte default for
  openedx_learning apps (it's assumed we will never make more than 2
  billion versions of a particular PublishableEntity). This was done to
  help reduce the size of indexes.

Typing:

* type annotations were added to many functions and test functions.
@ormsbee ormsbee force-pushed the changes-for-v2-content-libs branch from ad6a897 to 68a32d8 Compare January 31, 2024 20:00
@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 31, 2024

Thanks folks! I squashed my commits and made a decent commit message out of it. Will merge once things are green.

@ormsbee ormsbee merged commit 2ec2918 into openedx:main Jan 31, 2024
7 checks passed
@ormsbee ormsbee deleted the changes-for-v2-content-libs branch January 31, 2024 20:16
@ormsbee
Copy link
Contributor Author

ormsbee commented Jan 31, 2024

Question that's not directly related: Can you use add_content_to_component_version to modify a ComponentVersion that has already been published?

Yes, you can. 95% of the time you shouldn't, but some real use cases came up where doing so seems like it would be beneficial:

  • Bug fixes: We expect people to attach data to ComponentVersions, but that code will have bugs at some point, and people will want to be able to correct that programmatically without creating a new "version" of that thing.
  • Data upgrade/migration: Say we have tens of thousands of Videos with SRT files, and we want to bulk upgrade everything to VTT files instead. We'd probably update the player to recognize both (with a preference for VTT), and then bulk convert SRT to VTT where VTT is absent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants