Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add category #239

Merged
merged 11 commits into from
Aug 30, 2022
Merged

feat: add category #239

merged 11 commits into from
Aug 30, 2022

Conversation

tlambert03
Copy link
Collaborator

@tlambert03 tlambert03 commented Aug 14, 2022

This reverts part of #38 and adds back a category field with a defined list of allowable values for use in in-app filtering.

see #239 (comment) below for clearer detail

@tlambert03 tlambert03 changed the title feat add category feat: add category Aug 14, 2022
@codecov
Copy link

codecov bot commented Aug 14, 2022

Codecov Report

Merging #239 (beda70d) into main (c057206) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #239   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           37        37           
  Lines         2785      2801   +16     
=========================================
+ Hits          2785      2801   +16     
Impacted Files Coverage Δ
npe2/manifest/schema.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@tlambert03
Copy link
Collaborator Author

@DragaDoncila and @nclack ... I wish i didn't have to request reviews from folks who also work on/with the hub team, since those review requests unfortunately don't get reciprocated in the other direction. but you're the folks who are paying attention to this repo. thank

@tlambert03
Copy link
Collaborator Author

In a followup pr, we should add a less constrained field (like "keywords") that could be any string. if desired, people could use it for the EDAM labels, i.e. EDAM-BIOIMAGING:Topic:Imaging:Fixed sample imaging

@tlambert03 tlambert03 requested review from jni and alisterburt August 14, 2022 15:02
@nclack
Copy link
Collaborator

nclack commented Aug 15, 2022

I feel like it would be better to associate the category with a command instead of the manifest. I think this is what vscode does? And it makes sense to me that I could have a "Segmentation" command and a "Visualization" command, and I'd want to be able to search for them separately.

In a followup pr, we should add a less constrained field (like "keywords") that could be any string.

This is probably the main thing I'm worried about here - tying the npe2 schema to a particular set of categories. I want it to be a string/list-of-string here with the salient categories determined by the use case on the napari side.

Copy link
Contributor

@alisterburt alisterburt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! and nice point to raise @nclack - I think it's worth keeping the 'plugin category' separate from the 'command category' concept, many plugins may implement a few commands for denoising but I likely wouldn't choose this over a specialist denoising plugin when choosing which plugin to download.

I want it to be a string/list-of-string here with the salient categories determined by the use case on the napari side.

This makes sense to me, my only concern is that without providing a list of pre-defined categories in the spec we're likely to end up in a situation where the 'categories' users provide are less consistent (and therefore less useful) what do you think @tlambert03 ?

Given that this is proposed with a very specific UI goal in mind and I agree that that UI goal is better served by plugin categories than command categories I'm approving here

@tlambert03
Copy link
Collaborator Author

tlambert03 commented Aug 15, 2022

I feel like it would be better to associate the category with a command instead of the manifest. I think this is what vscode does?

vscode uses a relatively small, fixed set of categories: [Programming Languages, Snippets, Linters, Themes, Debuggers, Formatters, Keymaps, SCM Providers, Other, Extension Packs, Language Packs, Data Science, Machine Learning, Visualization, Notebooks, Education, Testing], at the plugin level, which can be used to filter plugins in the search

Screen Shot 2022-08-15 at 6 43 22 AM

I think the trick is being broad enough that we don't need dozens of categories, while being specific enough to be useful. I also don't want to restrict the possible scope here to BioImaging.

tying the npe2 schema to a particular set of categories.

I do get that concern, but share @alisterburt's thoughts that without having a predefined set of choices, it could go off the rails. The trick of course is in the selection, so would appreciate your thoughts on the current list, copied below for convenience (and we can add more in the future, but shouldn't go nuts). I came to this selection by looking through all the current plugins, and by looking a bit through Fiji (though again, don't want to restrict ourselves to that scope ultimately).

class Category(str, Enum):
    """Broad plugin categories, values for PluginManifest.categories."""

    Acquisition = "Acquisition"
    Annotation = "Annotation"
    Data = "Data"
    Denoising = "Denoising"
    Image_Processing = "Image Processing"
    IO = "IO"
    Machine_Learning = "Machine Learning"
    Measurement = "Measurement"
    Segmentation = "Segmentation"
    Themes = "Themes"
    Transformations = "Transformations"
    Utilities = "Utilities"
    Visualization = "Visualization"
    Simulation = "Simulation"

another field added later could be less constricted (such as keywords in the vscode schema)

@tlambert03
Copy link
Collaborator Author

And it makes sense to me that I could have a "Segmentation" command and a "Visualization" command, and I'd want to be able to search for them separately.

one more clarification: a plugin that offered both segmentation and visualization routines would just add both strings to their categories. Users filtering based on either would see that plugin. Then, at runtime (once in napari) we can do much more with command-specific searching. I'm not sure if feels right here to extend the command schema for this (i feel like it would lead to a tremendous amount of verbosity ... imagine Image Processing on every one of scikit-image's commands)

@nclack
Copy link
Collaborator

nclack commented Aug 15, 2022

vscode uses a relatively small, fixed set of categories

Thanks! I missed that. Will keep the discussion focused on plugin categories!

When I'm looking at the vscode extension categories, I think the goal is that the use is pretty well scoped so they don't evolve (much).

I think the trick is being broad enough that we don't need dozens of categories while being specific enough to be useful. I also don't want to restrict the possible scope here to BioImaging.

I like the intent. How confident are you feeling about the categories in the pr at the moment?

One of the most common vscode categories is "Other" but this pr doesn't have that.

I do get that concern, but share @alisterburt's thoughts that without having a predefined set of choices, it could go off the rails.

I'm more or less wondering where they should be validated. As long as there's validation somewhere, I imagine we'll be able to keep the set of categories well defined.

Also, I'm not sure how to define a minimally sufficient set of categories and so I'd like to punt the decision to somewhere closer to the use case.

One can't punt forever though and the vscode examples give me a pretty good idea of what to shoot for. So if I can't convince on the "move the decision toward the use-case" front (and it sounds like I can't 😄), I can review the proposed categories as is.

@tlambert03
Copy link
Collaborator Author

tlambert03 commented Aug 15, 2022

I think the goal is that the use is pretty well scoped so they don't evolve (much).

totally agree.

How confident are you feeling about the categories in the pr at the moment?

maybe 80%? 😂 I feel like they represent mostly non-overlapping applications that cover a broad range... but I also think it's likely that at least one more thing would need to be added. would definitely like more eyes on it (cc @jni, @sofroniewn, @kevinyamauchi, @DragaDoncila )

One of the most common vscode categories is "Other" but this pr doesn't have that.

Yeah, I noticed that too. I could add an "other" ... but I struggled to see the difference/merit between Other, and just nothing at all. Thoughts?

I'm more or less wondering where they should be validated. As long as there's validation somewhere, I imagine we'll be able to keep the set of categories well defined.

Yes, they're validated. by using an Enum, pydantic casts strings and fails if they are not one of the enumerations members.

So if I can't convince on the "move the decision toward the use-case" front (and it sounds like I can't 😄), I can review the proposed categories as is.

I'm happy to be moved! I'm not really following what the suggestion is though? The goal was definitely to try to pick categories that largely cover existing plugin use cases, while not restricting future cases or being to bio-focused.

@nclack
Copy link
Collaborator

nclack commented Aug 15, 2022

I struggled to see the difference/merit between Other, and just nothing at all. Thoughts?

Other just makes sure every plugin can be assigned at least one category.

I'm more or less wondering where they should be validated.

I meant will npe2 validate the categories or napari. The napari one is the one "closer to the use case".

This would mean that npe2 validate wouldn't be able to check if the manifest had valid categories. Instead, napari would be the source of truth and complain if the categories weren't right.

The more I write, the more I think I'm basically advocating for going off the rails (unstructured tagging). And I agree that's not so useful.

@tlambert03
Copy link
Collaborator Author

Other just makes sure every plugin can be assigned at least one category.

yes... but why? 😂 internally, what's the difference between categories = [] and categories = ['Other']?

@tlambert03
Copy link
Collaborator Author

I meant will npe2 validate the categories or napari. The napari one is the one "closer to the use case".

oh, I see. like napari would maintain a list, and npe2 couldn't validate them without napari? yeah, I think my preference is to have them over here.

@DragaDoncila
Copy link
Contributor

Honestly I really like it. I think having an other is meaningless, without providing the option to then enter free text, which is a no-go imo. I think the proposed categories are sufficiently broad to cover the current use cases, and sufficiently specific to be useful.

The only real question for me is if we think any of these categories would ever be removed, and as I look through them I think the answer is "probably not". If we do think some of them should be removed, then I suggest we err on the side of removing, since we can easily add categories back in.

I can imagine having to expand the list to continue getting meaningful filtering as the number of plugins grows, and perhaps also adding a hierarchy? A hierarchy would probably be the most annoying to make backwards compatible but it would definitely be doable, and potentially many of these categories would end up as roots of the hierarchy anyway.

Definitely would like to see the addition of keywords as general tags as well in a future PR - tags are the other category in some ways, and tracking common choices for keywords could give us an idea of the gaps in the currently offered categories

@tlambert03
Copy link
Collaborator Author

The only real question for me is if we think any of these categories would ever be removed, and as I look through them I think the answer is "probably not". If we do think some of them should be removed, then I suggest we err on the side of removing, since we can easily add categories back in.

agree completely, we should trim anything we're not completely happy with. Here are the ones that I'm less sure about, either in terms of inclusion or wording:

  • "Data" -> too abstract, should probably named something like "Sample Data" or "Datasets"?
  • "Denoising" -> necessary? too specific? Or should it be more broad, like "Filters"
  • "IO" -> too jargony? "Input/Output?" "File IO"?
  • Themes -> nothing really implements, and would be easy to add later
  • Transformations -> meant as umbrella for things like registrations as well as deskewing, etc... Good?
  • Utilities -> this is sort of the "other" category, it would be things like "helper widgets" and other things that could have been in core but aren't. Like robert's Layer Info widget

@tlambert03
Copy link
Collaborator Author

tlambert03 commented Aug 16, 2022

and perhaps also adding a hierarchy? A hierarchy would probably be the most annoying to make backwards compatible but it would definitely be doable,

also very much agree. Hierarchy's could be colon separated, Transformations:Registration:Non-linear

so i guess another critical look at the existing proposal from the perspective of "are these are roots" would be good

Copy link
Contributor

@kevinyamauchi kevinyamauchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tlambert03 ! Overall, this looks good to me.

From my perspective, the categories should all be at the same level in a hiearchy (i.e., none of the categories are a subset of another). In that lens, I find the inclusion of "Image Processing" and "Denoising" a bit confusing. It feels like "denoising" is a bit specific relative to a category like "Image Processing" and "Denoising" feels like an "Image Processing" operation. What types of plugins did have in mind for "Image Processing"?

I am wondering if "Denoising" and "Image Processing" could be removed and replaced with "filter" or "filtering" (or "Image Filtering"?)?

I don't feel super strongly about this, so happy to defer to the group.

edit: I just read the comment below regarding "roots". I don't think "Image Processing" would be at the same level of hierarchy as "Denoising". I could imagine some of the other categories would be children of "Image Processing" as well (e.g., "Segmentation", "Transformations"). I'm not sure if it makes sense to make "Image Processing" root or start at the level of more specific "Image Processing" operations (e.g., filtering, segmentation, denoising)

so i guess another critical look at the existing proposal from the perspective of "are these are roots" would be good

edit 2: actually after some thought, I think it will yield more usable search results for users if the root starts more granular than "image processing". Thus, it seems to me that we should eliminate "Image Processing" and replace it some more specific categories (e.g., "image filtering").

@kevinyamauchi
Copy link
Contributor

Some feedback from the August 17, 2022 community meeting:

  • people seemed generally in favor of adding categories
  • @andy-sweet suggested it would be a good exercise to write a one sentence summary of each one to look for overlap
  • @Czaki was concerned that "Machine Learning" might be too broad of a category
  • @psobolewskiPhD suggested a "workflow" category

@tlambert03
Copy link
Collaborator Author

thanks @kevinyamauchi!

I find the inclusion of "Image Processing" and "Denoising" a bit confusing.

yep, I can see that. as commented in #239 (comment) ... I think denoising is perhaps too specific. lets remove for now.

What types of plugins did have in mind for "Image Processing"?

well, scikit-image would essentially be Image Processing, but not (just) filtering, and not (just) transforms, etc...

I could imagine some of the other categories would be children of "Image Processing" as well (e.g., "Segmentation", "Transformations"). I'm not sure if it makes sense to make "Image Processing" root or start at the level of more specific "Image Processing" operations (e.g., filtering, segmentation, denoising)

Yep, that one really does seem to be the most overlap-prone. Let's definitely kill "denoising", and then would love to get @jni to weigh in on the rest. My thoughts: yes, segmentation generally involves some sort of image processing, but there are so many segmentation plugins, that grouping them all as "image processing" seems to vague. on the other hand, swapping "image processing" with the much more specific "filtering", means that any image processing that is not filtering doesn't really have a place. So that is indeed the trick here. For example, if we kill "image processing" do we also add "morphological operations", (which are clearly not filters) etc...?

It seems a bit easier to later add Image Processing:Filters than to later move filters to Image Processing:Filters.

@Czaki was concerned that "Machine Learning" might be too broad of a category

we can remove it if you'd like, but i don't think we should be more specific yet

@tlambert03
Copy link
Collaborator Author

tlambert03 commented Aug 17, 2022

one sentence summaries:

  • Acquisition = drive devices (webcams, microscopes, etc) to acquire data
  • Annotation = tools that facilitate labeling, marking, and, erm, "annotating" data within napari
  • Data = Sample datasets for training, demonstration, learning, etc...
  • Image_Processing = Routines that take in numerical arrays and generally return new arrays or datasets (e.g. scikit image stuff, deconvolution, super-res reconstruction, etc...)
  • IO = Plugins that read from and or write to files or data streams not supported natively by napari
  • Machine_Learning = Plugins that employ machine learning to facilitate either training or prediction (could remove, but wouldn't want to go more precise)
  • Measurement = Tools that extract measurements (i.e. into tabular, graph, or other data formats), such as region properties, etc...
  • Segmentation = tools that identify objects and/or boundaries in datasets (including, but not limited to, images)
  • Themes = plugins that modify the look and feel of napari
  • Transformations = linear and or nonlinear registrations, data transformations and mappings in general.
  • Utilities = Conveniences, widgets, etc... stuff that could conceivably be "core" but which is community-supported
  • Visualization = tools for plotting, rendering, and visualization (on top of those provided by napari)
  • Simulation = tools that simulate some physical process. microscope/PSF generators, optics simulators, astronomy simulations, etc...

@kevinyamauchi
Copy link
Contributor

Thanks for the detailed reply, @tlambert03 ! The one sentence summaries really helped clarify things for me.

For example, if we kill "image processing" do we also add "morphological operations", (which are clearly not filters) etc...?

It seems a bit easier to later add Image Processing:Filters than to later move filters to Image Processing:Filters.

I see your point. Given the diversity of Image_Processing:* categories and the removal of denoising I agree we should keep Image_Processing as a category for now with an option to add sub-categories in the future if we deem it useful.

@Czaki was concerned that "Machine Learning" might be too broad of a category

we can remove it if you'd like, but i don't think we should be more specific yet

Sorry, I think I was a bit terse in my summary (rushing off to dinner). There was a bit more nuance here in that people felt that while Machine_Learning is broad, it is useful to annotate the other categories as Machine_Learning owing to the often different hardware and data requirements to use ML-based tools. It does seem to me that it's a bit unlikely given the other categories that a plugin would ever be just Machine_Learning. Do you agree? If so, does this violate the "rootness" guideline?

@kevinyamauchi
Copy link
Contributor

one sentence summaries:

* **Acquisition** = drive devices (webcams, microscopes, etc) to acquire data

* **Annotation** = tools that facilitate labeling, marking, and, erm, "annotating" data within napari

* **Data** = Sample datasets for training, demonstration, learning, etc...

* **Image_Processing** =  Routines that take in numerical arrays and generally return new arrays or datasets (e.g. scikit image stuff, deconvolution, super-res reconstruction, etc...)

* **IO** = Plugins that read from and or write to files or data streams not supported natively by napari

* **Machine_Learning** = Plugins that employ machine learning to facilitate either training or prediction (could remove, but wouldn't want to go more precise)

* **Measurement** = Tools that extract measurements (i.e. into tabular, graph, or other data formats), such as region properties, etc...

* **Segmentation** = tools that identify objects and/or boundaries in datasets (including, but not limited to, images)

* **Themes** = plugins that modify the look and feel of napari

* **Transformations** = linear and or nonlinear registrations, data transformations and mappings in general.

* **Utilities** = Conveniences, widgets, etc... stuff that could conceivably be "core" but which is community-supported

* **Visualization** = tools for plotting, rendering, and visualization (on top of those provided by napari)

* **Simulation** = tools that simulate some physical process. microscope/PSF generators, optics simulators, astronomy simulations, etc...

Once we sort the question regarding Machine_Learning, this list looks good to me!

@kevinyamauchi
Copy link
Contributor

Also, <3 for including simulation! I think napari has a ton of potential as a front end for simulation.

@tlambert03
Copy link
Collaborator Author

tlambert03 commented Aug 18, 2022

It does seem to me that it's a bit unlikely given the other categories that a plugin would ever be just Machine_Learning. Do you agree? If so, does this violate the "rootness" guideline?

A plugin needn't have a single category, it's a list, so you could definitely have [Segmentation, Machine Learning]

I don't see that as violating "rootness" as you could also just have [Segmentation] or perhaps just [Machine Learning] if it offered say some gui for doing anything with machine learning that didn't also fall into any other category (imagine some tensorboard like thing that lets you watch your training, but say that the model being trained didn't really do any of things covered by other categories)

This isnt however a field for dependencies, so it shouldn't be "tensor flow" or "PyTorch" for example (we already have that in dependencies), and I would argue it also shouldn't be "GAN" or "U-Net" either (too specific).

So, it's either remove it or keep it. Thoughts?

Copy link
Contributor

@kevinyamauchi kevinyamauchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thank you for your patient explanations, @tlambert03 . Based on the discussion, I would prefer that we remove denoising and keep Machine_learning.

npe2/manifest/schema.py Outdated Show resolved Hide resolved
Co-authored-by: Kevin Yamauchi <[email protected]>
@tlambert03
Copy link
Collaborator Author

thanks @kevinyamauchi. will let this sit for another few days if anyone else wants to comment

@brisvag
Copy link
Contributor

brisvag commented Aug 29, 2022

I agree with most of what came up so far, great summary!

Sorry, I think I was a bit terse in my summary (rushing off to dinner). There was a bit more nuance here in that people felt that while Machine_Learning is broad, it is useful to annotate the other categories as Machine_Learning owing to the often different hardware and data requirements to use ML-based tools. It does seem to me that it's a bit unlikely given the other categories that a plugin would ever be just Machine_Learning. Do you agree? If so, does this violate the "rootness" guideline?

Does this mean that something like GPU could be a category, then? I personally don't see these (and Machine Learning) as being the same "kind" of category as Image Processing. I don't really see a use case where just Machine Learning is a useful category... But I also don't feel that strongly about this :)

@tlambert03
Copy link
Collaborator Author

I don't really see a use case where just Machine Learning is a useful category

but again, as mentioned above, it doesn't need to be just machine learning. i.e. [Segmentation, Machine Learning] tells you something that is a bit different than [Segmentation] and [Machine Learning, Acquisition] is different than [Acquisition]. Does that change anything? or still think it should just be left out?

@brisvag
Copy link
Contributor

brisvag commented Aug 29, 2022

Sorry, I put my answer a bit backwards above; when I said:

Does this mean that something like GPU could be a category, then?

I meant it as an answer to your

A plugin needn't have a single category

I'm not saying I'm opposed to GPU, Machine Learning and other "flavour" qualifiers being used as categories, but I'm pointing out that to me they feel different. It's "how does this work?" rather than "what does this do?".

@tlambert03
Copy link
Collaborator Author

It's "how does this work?" rather than "what does this do?".

Indeed, that's a very good way to put it. And perhaps a reason to hold off on including it, if only for being an outlier in that regard

@tlambert03
Copy link
Collaborator Author

removed, prolly easier to add than remove

@tlambert03
Copy link
Collaborator Author

this has been up for a long time, and received a lot of input. So merging. thanks all

@tlambert03 tlambert03 enabled auto-merge (squash) August 30, 2022 01:49
@tlambert03 tlambert03 merged commit 4b7f0fe into napari:main Aug 30, 2022
@tlambert03 tlambert03 deleted the category branch August 30, 2022 01:50
@kne42 kne42 added the enhancement New feature or request label Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants