-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about package domain scope: please clarify "data munging" #21
Comments
Hi @eriknw and welcome to pyOpenSci! Thanks for your feedback. Just for context (in case anyone hasn't read the page you linked), data munging is defined there as "Tools for processing data from scientific data formats." You are right that it would be good to have examples. As you probably saw, we are finishing up an overhaul of our guides, and once we finish, we will be able to start actively reviewing packages again now that we have a new fiscal sponsor . So the first answer to your question is: we will probably have a lot more examples soon!
We are focused on more domain-specific packages that build on top of the well-established packages you name, as @lwasser explains here: https://www.pyopensci.org/blog/what-makes-open-source-python-package-healthy.html#a-note-about-our-pyopensci-packages So, no, those packages would not be in scope, although you are right, under a very loose definition of "data from scientific formats", they all technically have functionality for data munging. Some examples of packages we've already reviewed that have data munging are:
Similarly you can see that packages from our sister org rOpenSci with data-munging functionality are focused on more or less domain-specific data formats: e.g., medical record transcription data, spatial data @lwasser maybe we should:
@eriknw I hear what you are saying that "data munging" is a very broad term. Would linking to examples address this issue in your mind? |
hi @eriknw !! Welcome to pyOpenSci!! 👋 I'm mostly offline through next monday end of day but I wanted to say hello! AND thank you for the question 🎆 i'm just curious - are you considering submitting a package to us and trying to better understand scope? Or are you trying to help us (THANK YOU!) with clarifying those areas in our scope so others can better understand what is in vs out of scope? As perhaps those bullets are confusing (this has actually been brought up before by @arianesasso so well worth considering carefully! Or maybe both? |
Hi there, thanks for the replies! The answer to @lwasser's question is, potentially, both. I was trying to understand scope, purpose, and vision, and thought data munging in particular needs clarification for the benefit of everyone. To add context, I'm considering submitting The examples I've seen (thanks @NickleDave!) tend to be very niche, specific, and close to the science or application. I think |
Thank you for your quick reply @eriknw. I don't want to speak for @lwasser again in some way that makes her reply while she's trying to not work 😬 but I would not say that our scope is limited to very particular domains or applications. One of our goals is to help packages in those domains achieve consistent standards that align with core scientific Python packages. My immediate impression is that @lwasser I actually can't find good language in the guide right now about this, do we need to say more about "Python wrappers / interfaces for tools in other languages" somewhere? (We do talk about API wrappers but that's obvs not the same thing). |
Right on, thanks for the quick replies all! (And do please enjoy the holidays and time off if applicable) I think it's good enough for this issue that As an outsider reading the pyOpenSci website, the vision and purpose seems to be fairly broad and inclusive w.r.t. scientifically oriented packages. The specific section "Python package domain scope" seems more narrow. It's probably pretty difficult to adequately define scope in such a way. |
Also, my specific questions have been answered, so feel free to close. Thanks again! |
hi @eriknw i just wanted to followup after reading comments above. and then i will close. Our scope categories came out of early pyOpenSci meetings. Early on it made sense to be broad and focus on things I was working on (geospatial & education!) so I had more expertise there. as such i think we need to revisit them. you aren't the first with this question!
I want to modify @NickleDave response. those packages are definitely in our domain scope to be reviewed. However, because they have huge maintainer teams, and high quality infrastructure and are widely visible, they aren't target packages for us to review. But for example. rOpenSci did an early review of tidyverse (which is big and widely used) in the R world. So i don't want to say that they aren't in scope. They technically are. they just aren't the types of packages we are focused on now. Your package does NOT need to be tightly linked to a specific domain for us to review it (even tho you may see examples of this right now in our ecosystem). It can have general applications and still be in scope. And I do think it IS in scope for us (as David said too!). And the fact that it wraps another tool would not make it out of scope. We just want to ensure it does so using best practices (particularly thinking about future maintenance if a maintainer steps down). That is a technical scope issue rather than a domain scope. I hope that helps. we shall revisit this for sure! |
oh also happy new year!!! |
I would categorize it with |
I was reading this:
https://www.pyopensci.org/peer-review-guide/software-peer-review-guide/author-guide.html#python-package-domain-scope
and it seems to me that "data munging" is potentially the largest scope (i.e., lots of packages "do something with data"), but it doesn't seem clearly defined or explained (IMHO). I think it would help to have examples. For example, would
numpy
,scipy
,pandas
,networkx
, andscikit-learn
be within scope if they weren't already well-established packages with communities? What are canonical examples of "data munging" packages that are within scope?Thanks!
The text was updated successfully, but these errors were encountered: