-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crowsetta: A Python tool to work with any format for annotating animal vocalizations and bioacoustics data. #68
Comments
hi @NickleDave responding here so it's visible that this issue is being worked on! I am looking for an editor for this package now given I can't step in due to COI! :) but please know i've reached out to a few people and am in the process of finding an editor. |
Thank you @lwasser, understood |
Hi @NickleDave , I'm Chiara and I'm going to be the editor for your submission. |
Hi @cmarmo! Nice to meet you as well. Looking forward to working with you on the review, I am sure I will learn a lot. A week is perfect, I should have survived a deadline at my day job by then 😆 |
Editor in Chief checksHi @NickleDave, below are the basic checks that your package needs to pass
Editor commentsThe package is already in a very good shape, thank you for all your work! I would like to mention that the I'm going to look for reviewers for your package now. |
🙏 thank you @cmarmo
I can move it to the root, and I agree with you that it's easier to spot that way for someone looking at the repo e.g. on GitHub in the browser. I think I was basically copying what I saw in numpy--I thought the file had to be there for the link to show up when someone opened an issue. But now I see the GitHub docs say it can be there, in the root, or docs, as you said. Good to know. So I'll just move it. |
Done in vocalpy/crowsetta@9d435f6 edit: also I have now filled out the onboarding survey (sorry, forgot!) |
Thanks @NickleDave ! |
Dear @NickleDave, |
Hi @cmarmo, @NickleDave, and @rhine3, |
@cmarmo this is all awesome! @rhine3 and @shaupert welcome to pyopensci!! @shaupert if you have any questions at all about the review process please get in touch. It is useful for us to have a review that focuses on usability, documentation, etc. You are welcome to reach out to us with questions at any time - so please feel free to do that if you wish / if it would be helpful. |
Hi @cmarmo, @NickleDave, et al., Thanks for the invitation to review and the warm welcome! As both a dabbler in Python software development and a potential user of this tool, I am looking forward to reviewing this package. I plan to finish my review by February 10th. |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
For packages also submitting to JOSS
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted. The package contains a
Final approval (post-review)
Estimated hours spent reviewing: Review CommentsFunction documentation:A Question about pyOpenSci in general - is an individual example really required for every function? The functions are well-documented but some do not have examples, e.g. crowsetta.data.data.available_formats(). Some functions do have examples e.g. some of the functions here. I’m not sure how to quickly and reliably identify which functions are user-facing and non-user facing. I do feel that the number of examples provided is sufficient. Community guidelines:
README contents:
Usability:Overall, the package was very easy to install on my computer. Documentation is linked to on the repository page and in the README. The entire API is fully documented and I felt that the number of examples provided was sufficient. A minor comment: installing the developer version was slightly more difficult–because the package documentation required me to use pipx/nox, I had to update Homebrew and chase down the source of some errors, which ended up taking me about an hour of detective work (and probably another hour of my computer downloading/upgrading packages in the background). For the average user this doesn’t apply. For the developer, this is a minor and routine obstacle. Functionality:
Tests:A handful of tests failed at first when I attempted to use a Code format:
JOSS paper
|
Hi, please find my review below: Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Readme file requirements
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
UsabilityReviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 10h Review CommentsThis is my first review of a Python package, this is the reason why it took me about 10h. The package does a simple task, which is to parse and convert annotations done on a spectrogram or a waveform into an object that is easy to manipulate. I’m thinking of using it in my own package. As a contributor to the packages As a user of Audacity to annotate my spectrograms, I was a little bit surprised that Finally I tried to follow the guidelines to create and a class for a new format. I found the process a bit hard to follow, especially because it was written for seq not for bbox which I realized to late. Maybe, it would be interesting to have a guideline for both types of format. Below are some minor points TestOnly one test failed. I opened an issue Bugscrowsetta.data.data.available_formats() and crowsetta.formats.as_list() do not give the same list. The format "yarden" is not in crowsetta.data.data.available_formats() Documentationcrowsetta.Segment : miss a description of the input arguments ImprovementsExamples are not provided for each class. I recommend adding a simple example for each class and for each method if possible. SummaryI agree with most of the comments of @rhine3. I don't think it's necessary to add them twice ;). |
Hi @rhine3 and @shaupert thank you both for your detailed reviews and the helpful issues you raised. I plan to reply here and address those issues within two weeks as required by the guide. Just wanted to let you and @cmarmo know in the meantime that I have seen this 🙏 and I'm working on it 😅 |
@rhine3 I agree with you. Was just discussing this with @lwasser in the pyOS Slack (putting on my pyOS maintainer hat for a second). Could you please raise an issue on the software-peer-review repo suggesting we add this? I am guessing we will do so, and want to make sure you get credit |
Hi @NickleDave , just checking... Do you think you will be able to address all the comments about TextGrid, or maybe it is worth to submit to JOSS and let major changes for a new version of crowsetta? |
Hi @cmarmo I think I will be able to address but it's taking a little longer than expected. Feedback from Yannick has made me realize crowsetta is not quite providing what it promises for TextGrids. I thought the fix would be trivial but it turns out the code I vendored is not so easy to change. I have a plan to address, just haven't had time yet. |
Just merged a final PR that addresses feedback from @YannickJadoul along with some stray commits I made to main that I'll link here: At a high level, I totally rewrote the TextGrid class so that it can:
To make the it possible to convert all interval tiers, I also added a new feature for Replying to note how I addressed comments:
I agree this needs to be clear, but would prefer to keep any description of how the format maps to crowsetta's data model in the docstrings with the code, in case it changes. I have added language saying basically "please read the docstring for further detail".
The new TextGrid class raises an error for overlapping intervals. And thanks @drammock for confirming! 🙂
You can now read more than one tier. A single Annotation can have multiple Sequences.
The empty intervals are now removed by default.
This is one of the main reasons I rewrote from scratch.
Both short and full formats should now be parsed and the text encoding should be handled correctly.
I did find some implementations that handle the binary format but did not implement at this time, made an issue to add later Thank you for providing the example TextGrid files.
Now noted in the docstring, along with a link to the issue about adding ability to load binary format.
There is not a way to do this -- would like to know if people would use it. Some formats specify sample numbers so I tried to include this in the "generic" format although I think when segmenting most people end up in units of segment anyways.
You can now specify tiers by name or index.
Crowsetta does mean to provide this but it doesn't right now.
I totally did not know you could do that in docs. Thanks so much for pointing this out. I have fixed it as much as possible. Surprisingly it doesn't work for pandas. Seems to be an open issue related to it: pandas-dev/pandas#38397
Fixed, thank you!
Suppressed But this will have to wait until after today! |
@cmarmo please proceed to checks for JOSS, thank you for your patience |
🎉 crowsetta has been approved by pyOpenSci! Thank you @NickleDave for submitting crowsetta and many thanks to @rhine3 and @shaupert for reviewing this package, and @YannickJadoul for providing expertise on TextGrid format. 😸 There are a few things left to do to wrap up this submission:
It looks like you would like to submit this package to JOSS. Here are the next steps:
|
@NickleDave I have added the 'accepted date' in the issue description. Do you mind releasing the current version of crowsetta as I can refer to it in the issue description? Thanks! |
Wow, I'm impressed how you went above and beyond to address all those picked nits, @NickleDave!
To answer your final question: yes, I created the TextGrid myself to go along with the fragment of the audio file I've been using during developments and tests (and checked with people more knowledgeable about linguistics if it's +- correct ;-) ). Feel free to just use it without any need to reference Parselmouth! |
Hi @cmarmo just updating here that I made a release, here's the DOI: https://zenodo.org/record/7781587. |
@YannickJadoul I asked @lwasser to send you an invite to the pyOpenSci slack too!
yes very much so, thank you |
wow - everyone!! this is awesome. Congratulations @NickleDave !! and @cmarmo for a successful review! And so many thanks @shaupert @YannickJadoul @rhine3 for your time in reviewing here! David, let us know when this goes to JOSS! in the meantime i'll post about crowsetta on social. We'd also love a blog about the package if you have time. 😄 This blog allows us to better promote your package / show what your package can do! |
JOSS review in progress: openjournals/joss-reviews#5332 edit: that was the pre-review issue, this is the review: |
@cmarmo accepted into JOSS! 🎉 |
Congratulations @NickleDave ! I'm going to perform the last editor checks and then close this issue. 🎉 |
All done! Wonderful package and review experience! 🎉 |
hi colleagues!! @NickleDave if you have time, i'd greatly appreciate your taking the post-review survey. the main pieces of data that we'd love to collect are how the review improved / impacted your package. but some other questions are asked there as well. many thanks in advance for doing this!! |
Submitting Author: David Nicholson (@NickleDave )
All current maintainers: (@NickleDave )
Package Name: crowsetta
One-Line Description of Package: A Python tool to work with any format for annotating animal vocalizations and bioacoustics data.
Repository Link: https://github.com/vocalpy/crowsetta
Version submitted: 4.0.0.post2
Editor: @cmarmo
Reviewer 1: @rhine3
Reviewer 2: @shaupert
Archive:
JOSS:
Version accepted: v 5.0
Date accepted (month/day/year): 03/28/2023
Description
crowsetta provides a Pythonic way to work with annotation formats for animal vocalizations and bioacoustics data. It has has built-in support for many widely used formats such as Audacity label tracks, Praat .TextGrid files, and Raven .txt files. The package focuses on providing interoperability, as well as making it easier to share data in plaintext flat-file formats (csv) and common serialization formats (json). In addition, abstractions in the package are designed to make it easy to use these simplified formats for common downstream tasks. Examples of such tasks are fitting statistical models of vocal behavior, and building datasets to train machine learning models that predict new annotations.
Scope
n/a
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
Anyone that works with animal vocalizations or other bioacoustics data that is annotated in some way. Examples (from the landing page of the docs): neuroscientists studying how songbirds learn their song, or why mice emit ultrasonic calls. Ecologists studying dialects of finches distributed across Asia, linguists studying accents in the Caribbean, a speech pathologist looking for phonetic changes that indicate early onset Alzheimer’s disease.
Not to my knowledge.
There are many format-specific packages in various states of maintenance, e.g. a search for the format
textgrid
used by the application Praat on PyPI currently returns 33 packages (include crowsetta):https://pypi.org/search/?q=textgrid&o=
There are also several larger packages whose functionality includes the ability to parse specific formats, e.g. Parselmouth wraps all of Praat and thus can load TextGrid files. But the goal of crowsetta is mainly to provide interoperability, and to do so for a wide array of formats, so that other higher-level libraries can leverage its functionality. This emphasis on data extraction + munging, like the possibly destructive transformation to other formats, makes it in scope for pyOpenSci.
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
yes
JOSS Checks
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Note: Do not submit your package separately to JOSS
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
Please fill out our survey
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. *Have feedback/comments about our review process? Leave a comment here
Potential reviewers
As discussed with @lwasser I am tagging some potential reviewers given that pyOpenSci does not currently have anyone that's familiar with this area (besides me)
@rhine3 @shyamblast @YannickJadoul @avakiai @danibene @nilor
edit: just updating here for clarity that I am only suggesting reviewers to help bootstrap the process since we're still a growing org. @lwasser will assign an editor that will then reach out directly to potential reviewers. Sorry for any confusion, and we appreciate interest of people that have replied so far.
Editor and Review Templates
Editor and review templates can be found here
The text was updated successfully, but these errors were encountered: