-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
submission: pangoling: Access to word predictability using large language (transformer) models #575
Comments
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type |
🚀 Editor check started 👋 |
Checks for pangoling (v0.0.0.9005)git hash: 543c11bd
Important: All failing checks above must be addressed prior to proceeding Package License: MIT + file LICENSE 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baselapply (19), length (7), c (6), dim (6), paste0 (6), t (6), unlist (4), by (3), list (3), names (3), seq_len (3), which (3), do.call (2), getOption (2), matrix (2), ncol (2), rep (2), seq_along (2), sum (2), unname (2), as.list (1), floor (1), for (1), grepl (1), lengths (1), mode (1), new.env (1), options (1), rownames (1), split (1), switch (1), vector (1) pangolingcreate_tensor_lst (5), lst_to_kwargs (5), char_to_token (4), encode (4), get_id (4), get_vocab (4), get_word_by_word_texts (2), masked_lp_mat (2), causal_config (1), causal_lp (1), causal_lp_mats (1), causal_mat (1), causal_next_tokens_tbl (1), causal_preload (1), causal_tokens_lp_tbl (1), chr_detect (1), masked_config (1), num_to_token (1), word_lp (1) tidytablemap_chr. (4), map2 (3), map. (2), pmap. (2), arrange. (1), map (1), map_dbl. (1), map_dfr (1), map_dfr. (1), map2_dbl. (1), pmap_chr (1), relocate (1), tidytable (1) reticulatepy_to_r (5) memoisememoise (3) cachemcache_mem (2) graphicstext (2) data.tablechmatch (1) statslm (1) tidyselecteverything (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
4232217801 | pages build and deployment | success | c6ce46 | 16 | 2023-02-21 |
4232181659 | pkgdown | success | 543c11 | 44 | 2023-02-21 |
4232181660 | R-CMD-check | success | 543c11 | 37 | 2023-02-21 |
4232181654 | test-coverage | success | 543c11 | 39 | 2023-02-21 |
3b. goodpractice
results
R CMD check
with rcmdcheck
rcmdcheck found no errors, warnings, or notes
Test coverage with covr
Package coverage: 0.89
The following files are not completely covered by tests:
file | coverage |
---|---|
R/tr_causal.R | 0% |
R/tr_masked.R | 0% |
R/tr_utils.R | 0% |
R/utils.R | 0% |
R/zzz.R | 0% |
Cyclocomplexity with cyclocomp
The following function have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
word_lp | 16 |
Static code analyses with lintr
lintr found the following 39 potential issues:
message | number of times |
---|---|
Avoid library() and require() calls in packages | 5 |
Lines should not be more than 80 characters. | 34 |
Package Versions
package | version |
---|---|
pkgstats | 0.1.3 |
pkgcheck | 0.1.1.11 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Hi, Also when I run ✔ Package coverage is 94.8%. |
Thanks @bnicenboim for your full submission and for explaining the issue with test coverage. Your explanation makes sense, so we can move forward. I'll start searching for a handling editor. In the meantime you may want to start thinking of potential reviewers to suggest to the handling editor. |
Is there a pool of potential reviewers that I can have access to? |
I guess authors are mostly guided by their knowledge of their intended audience. But for inspiration see how editors look for reviewers. Editors have access to a private airtable database, but often we look elsewhere. |
Dear @bnicenboim I'm sorry for the extraordinary delay in finding a handling editor. Most editors are busy and some handling more than one package. And the very few available are not yet due to handle another submission. Please hold a bit longer. |
ok, thanks for letting me know, no problem. |
@ropensci-review-bot assign @karthik as editor |
Assigned! @karthik is now the editor |
👋 @bnicenboim |
Hi, any news about the next steps? |
Hi @bnicenboim |
Editor checks:
Editor commentsNo additional comments at this time. I'm looking for reviewers at the moment, but if you've got any suggestions for people with expertise but no conflict, please suggest names. |
@ropensci-review-bot seeking reviewers |
Please add this badge to the README of your package repository: [![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/575_status.svg)](https://github.com/ropensci/software-review/issues/575) Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news |
I really don't know about reviewers, I guess someone involved in the packages named here: Or maybe someone based on the reverse imports of reticulate: |
@ropensci-review-bot assign @lisalevinson as reviewer |
@lisalevinson added to the reviewers list. Review due date is 2023-05-24. Thanks @lisalevinson for accepting to review! Please refer to our reviewer guide. rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more. |
@lisalevinson: If you haven't done so, please fill this form for us to update our reviewers records. |
Submission on hold! |
Hi, I did a lot of progress (and unrelated changes) and I had some question for @lisalevinson that was never answered, but regardless it's fine to have this on hold until my teaching is over and I can finalize the last things. |
@bnicenboim So sorry - I never saw your questions at all. I was on medical leave when you posted them and had an away message on my email, but that wouldn't go back to you through GitHub notifications of course! And then it must have gotten lost in a sea of miscellaneous GitHub emails when I tried to sort through everything after. It would take me some time to try it all out again and remember how everything works - honestly right now this will be hard for me to do before the end of May because I am co-organizing HSP and that is right after our current semester ends. Would that be an OK timeline for when you plan to pick it up again? I may be able to squeeze it in earlier but I don't want to make promises I'm not sure that I can keep! |
I don't think I'll touch anything until maybe June, so no worries from my side. |
@ldecicco-USGS: Please review the holding status |
Hi @bnicenboim - checking in with the rOpenSci editors team. Do you still plan on picking this back up? |
Yes, I'm picking it up. Sorry for the long delay. I also wanted to add some other features. I'll see how easy it is or if at least the function names that I'll have are compatible with these other usages I'm thinking about. Also, I don't think my answer to @lisalevinson is relevant anymore. She's completely right in that the two uses of causal_lp are confusing. I'll divide it into two functions. |
Phew, it took some time but I'm mostly done. I'm checking that I still comply with all the requisites of ropensci. Then, I'll report the changes I made and how I answered the reviewers' issues. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for pangoling (v0.0.0.9010)git hash: 5931c547
Important: All failing checks above must be addressed prior to proceeding Package License: MIT + file LICENSE 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baselapply (26), names (12), length (11), list (11), c (10), dim (9), paste0 (8), by (7), unlist (6), sequence (5), setdiff (5), t (5), which (5), drop (4), split (4), rep (3), seq_along (3), seq_len (3), unname (3), all (2), as.list (2), do.call (2), matrix (2), sum (2), unsplit (2), version (2), for (1), getOption (1), grepl (1), if (1), ifelse (1), is.null (1), lengths (1), mode (1), ncol (1), new.env (1), nrow (1), options (1), rbind (1), rownames (1), switch (1), vapply (1), vector (1) pangolingcreate_tensor_lst (8), get_vocab (8), encode (6), conc_words (4), get_id (4), lst_to_kwargs (4), causal_config (1), causal_lp (1), causal_lp_mats (1), causal_mat (1), causal_pred_mats (1), causal_preload (1), causal_targets_pred (1), causal_tokens_lp_tbl (1), causal_words_pred (1), char_to_token (1), chr_detect (1), install_py_pangoling (1), is_mac (1), is_really_string (1), ln_p_change (1), masked_lp_mat (1), num_to_token (1), safe_decode (1), word_lp (1) tidytablemap (6), map_chr (5), map_dfr (3), map2 (3), pmap (3), arrange (2), tidytable (2), map_dbl (1), map2_dbl (1), pmap_chr (1), relocate (1) reticulatepy_to_r (9), virtualenv_starter (2), import (1), use_virtualenv (1), virtualenv_remove (1) memoisememoise (3) cachemcache_mem (2) graphicstext (2) data.tablechmatch (1) statslm (1) tidyselecteverything (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
12864365257 | pages build and deployment | success | 3a7bec | 24 | 2025-01-20 |
12864039437 | pkgdown | success | 5931c5 | 84 | 2025-01-20 |
12864039448 | R-CMD-check | success | 5931c5 | 77 | 2025-01-20 |
12864039440 | test-coverage | success | 5931c5 | 79 | 2025-01-20 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following error:
- checking examples ... ERROR
Running examples in ‘pangoling-Ex.R’ failed
The error most likely occurred in:
Name: causal_config
Title: Returns the configuration of a causal model
Aliases: causal_config
** Examples
causal_config(model = "gpt2")
Error in py_run_string_impl(code, local, convert) :
ModuleNotFoundError: No module named 'torch'
Runreticulate::py_last_error()
for details.
Calls: causal_config ... eval -> -> -> py_run_string_impl
Execution halted
R CMD check generated the following note:
- checking data for non-ASCII characters ... NOTE
Note: found 25872 marked UTF-8 strings
R CMD check generated the following check_fails:
- rcmdcheck_non_ascii_characters_in_data
- rcmdcheck_examples_run
Test coverage with covr
ERROR: Test Coverage Failed
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
install_py_pangoling | 17 |
word_lp | 17 |
Static code analyses with lintr
lintr found no issues with this package!
Package Versions
package | version |
---|---|
pkgstats | 0.2.0.48 |
pkgcheck | 0.1.2.77 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Hi, I'm not sure what to do about the errors. I don't get the warnings or notes here: It might be that rmdcheck doesn't install the python dependencies needed? But I'm following the latest advice from reticulate (https://rstudio.github.io/reticulate/articles/package.html) in both creating a function to install the python dependencies rather than doing it automatically and how to test them using Github Actions |
Hi @bnicenboim, Mark here from the rOpenSci team. Please ignore those check failures; they are because we don't have the ability to install package-specific python dependencies in our check system. Your link to successful GitHub workflow is fine. We also no longer have an editor assigned to this submission, since Karthik has stepped back from rOpenSci, so I'll ensure that happens, and you should hear from newly assigned editor asap. Thanks for returning to your submission - we always appreciate things getting moving again! |
Hi everyone, after a long time, I have an update! I made a lot of changes—see the NEWS file. Renamed most functions, added more arguments, simplified the installation, added a troubleshooting vignette and a worked out example with Chinese sentences and surprisal values. Thanks again for the reviews! The long delay turned out to be beneficial, as looking at it with fresh eyes helped me improve the documentation. I address the reviewers' comments below. @utkuturk's ReviewClarity issue: Python dependencies@utkuturk mentioned difficulties with installing the Python dependencies. Response: I updated the installation process to follow the latest Clarity issue: Downloading models@utkuturk suggested making it clearer that the package downloads models locally and does not operate through an API. Response: This is now explicitly stated in the startup message:
Advanced Configuration:
|
Thank you so much for the detailed notes and updates, @bnicenboim ! @lisalevinson, @utkuturk - I know it has been a bit since you completed your reviews. Could you please reference @bnicenboim 's update comments above and assess if you feel these address the themes you raised? Ideally, you could please acknowledge using this template |
@ropensci-review-bot assign @emilyriederer as editor |
Assigned! @emilyriederer is now the editor |
@ropensci-review-bot submit review time 10 |
Logged review for utkuturk (hours: 10) |
@ropensci-review-bot submit review #575 (comment) time 11 |
Logged review for lisalevinson (hours: 11) |
Submitting Author Name: Bruno Nicenboim
Submitting Author Github Handle: @bnicenboim
Repository: https://github.com/bnicenboim/pangoling
Version submitted: 0.0.0.9005
Submission type: Standard
Editor: @emilyriederer
Reviewers: @lisalevinson, @utkuturk
Archive: TBD
Version accepted: TBD
Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
The package is built on top of the python package
transformers
, and it offers some basic functionality for text analysis, including tokenization and perplexity calculation. Cruciallypangoling
also offers word predictability, which is widely used as a predictor in psycho and neurolinguistics, and it's not trivial to obtain. Alsotransformers
works with "tokens" rather than "words", and then pangoling takes cares of the mapping between the tokens to the target words (or even phrases).This is mostly for psycho/neuro/- linguists that use word predictability as a predictor in their research, such as in ERP/EEG and reading studies.
Another R package that acts as a wrapper for
transformers
istext
However,text
is more general, and its focus is on Natural Language Processing and Machine Learning.pangoling
is much more specific and the focus is on measures used as predictors in analyses of data from experiments, rather than NLP.text
doesn't allow for generating pangoling output in a straightforward way and in fact, I'm not sure if it's even possible to get token probabilities fromtext
since it seems more limited than the python packagetransformers
.NA
#573
pkgcheck
items which your package is unable to pass.pkgcheck
fails only because of the use of<<-
. But this is done in.OnLoad
as recommended by reticulate. Also see this issue .Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: