Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collections #1824

Merged
merged 5 commits into from
Oct 3, 2024
Merged

Conversation

alejandromumo
Copy link
Member

closes #1820

invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved
invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved
invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved
invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved
invenio_rdm_records/collections/models.py Outdated Show resolved Hide resolved
ret = db.session.execute(stmt).scalars().all()
return [type(self)(r) for r in ret]

def get_subcollections(self, max_depth=3):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to have in CollectionTree API

invenio_rdm_records/collections/api.py Show resolved Hide resolved
invenio_rdm_records/collections/api.py Show resolved Hide resolved

def create(self, identity, community_id, tree_slug, slug, title, query, **kwargs):
"""Create a new collection."""
current_communities.service.require_permission(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it has a REST API, permissions could be added directly to the service so they can e.g. be overwritten by Zenodo to System

Comment on lines +82 to +108
def read(self, identity, id_):
"""Get a collection by ID or slug."""
collection = self.collection_cls.resolve(id_)
if not collection:
raise ValueError(f"Collection {id_} not found.")
if collection.community:
current_communities.service.require_permission(
identity, "read", community_id=collection.community.id
)

return CollectionItem(collection)

def read_slug(self, identity, community_id, tree_slug, slug):
"""Get a collection by slug."""
current_communities.service.require_permission(
identity, "read", community_id=community_id
)

ctree = CollectionTree.get_by_slug(tree_slug, community_id)
if not ctree:
raise ValueError(f"Collection tree {tree_slug} not found.")

collection = self.collection_cls.resolve(slug, ctree.id, use_slug=True)
if not collection:
raise ValueError(f"Collection {slug} not found.")

return CollectionItem(collection)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collapse into just one read that is smart enough, same as API get

consider depth as well

@classmethod
def get_by_slug(cls, slug, tree_id):
"""Get a collection by slug."""
return cls.query.filter(cls.slug == slug, cls.tree_id == tree_id).one_or_none()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we return None, or should it fail when it's not found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point of failing fast, however, I prefer to return None for separation of concerns. Whether the collection does not exist is an exception, I think it should be up to the business layer to decide. From a data point of view, it tried to fetch the collection but nothing was found.

I agree that we need to throw an exception at some point to have the presentation layer(s) acting accordingly (and that's currently missing in this PR). I will add that part in the next iteration since I refactored part of the public API / service layers from Alex's comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe None works in this case, but you will see soon its limitations. With a None return value there are 2 main problems (and maybe more):

  1. every time you call the func, you need to check if the return value is None or not. And the subsequent code will have to handle this case. So, as in your PR, you end up having if everywhere. Not very elegant.
  2. even if less applicable in this func, in general you can't differentiate different type of unhappy path.

An exception is much more elegant because the caller will have many more options on how to handle the unhappy path.
I don't really understand what are concerns are better separate with returning None.

No problem with solving this later on, in subsequent PRs.

from .models import CollectionTree as CollectionTreeModel


class Collection:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that this api layer is really needed, when the data model is not a classic record with a json schema? Can't the models be directly used in the service layer (as done in invenio-jobs for example)?
This class looks like a wrapper of the model, and the various methods can be moved to the model class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, I agree this class feels like a wrapper to the model, and it could be richer (e.g. I and Alex discussed storing the relation between collections in memory and then they could be "navigated" in memory).

However, some ideas that made me implement a public API for collections:

  1. Abstract some concepts about how we store collections in DB (materialized path). In my opinion, for the developer, things like path should be abstracted and encapsulated under a "nice looking" API.
  2. Related to the one above, adding some public-facing utilities to improve the usability of collections, e.g. collection.ancestors, collection.children, create a collection based on a parent, resolving a collection by id or slug, community_id pair.
  3. dump / to_dict methods for serialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents

I understand your reasoning. At the same time, with this approach, we will end up having a wrapper classes that have most of the attributes and methods duplicated from the encapsulated model. And then we add a couple of extra methods.
I would not know what is the separation of concerns or abstraction, and if I have to add a new feature (a new query), I would not know if I have to put in the API or in the Model. What is the criteria that tells me where to put it?

For the path, maybe, I don't see it for the moment in the class.
For your second point, it can be done in the model, I don't see the difference.

A model is already an API class. What has been done for records is actually very different and in that case very much needed: the model is loaded into a dict object, with many other extra methods.

If we start creating wrappers everywhere, we will end up having more complex code, and less DRY. As a comparison, it looks a little bit like the props drilling problem in React.

As an alternative, if you really feel like that we need to separate things, then in this case I would go with inheritance instead of composition.

To recap: I think that our final objective is to have the simplest possible code. If the majority of the APIs exposed by a model are intended to be used, I would use the model directly or create inheritance.
Composition and wrapping make sense to me when you want to hide the majority of the features provided by the wrapped object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are excellent points, I will keep them in mind in the next PR.

Thanks for the detailed explanation 🚀


def create(self, identity, community_id, tree_slug, slug, title, query, **kwargs):
"""Create a new collection."""
current_communities.service.require_permission(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should introduce a collection specific permissions instead of using the communities' ones?

can_update default value is CommunityOwners, is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, conceptually, communities are not required to have a collection (think about a collection on the repository level). However, the current state of this service is to require a community (this is our use case right now).

I agree that the collections' permission policy at some point has to be specific for collections, but it still needs more thought and the service to be adapted to it.

For a first launch, the permission can_update from communities is good since we don't want any user of a community creating collections for a community

return CollectionItem(collection)

def read(self, identity, id_):
"""Get a collection by ID or slug."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Get a collection by ID or slug."""
"""Get a collection by ID."""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be refactored in a second iteration when we add collections to the browse page. The idea is that service.read() is flexible enough that it can be done by ID or slug

collection = self.collection_cls.create(
slug=slug, title=title, query=query, ctree=ctree, **kwargs
)
return CollectionItem(collection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create and add methods are missing the commit operations (for example uow.register(ModelCommitOp(...)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be added now :)


def test_create(running_app, db, community, community_owner):
"""Test collection creation via API."""
tree = CollectionTree.create(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing the test to read from the DB when asserting the creation of the collection [tree]. Given that the db commit is missing, they should fail.

@alejandromumo alejandromumo merged commit ac37cf8 into inveniosoftware:master Oct 3, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add collections backend
3 participants