Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Scientific Publishing with MyST Markdown #918

Merged
merged 9 commits into from
Sep 25, 2024

Conversation

rowanc1
Copy link
Contributor

@rowanc1 rowanc1 commented May 31, 2024

If you are creating this PR in order to submit a draft of your paper, please name your PR with Paper: <title>. An editor will then add a paper label and GitHub Actions will be run to check and build your paper.

See the project readme for more information.

Editor: Hongsup Shin @hongsupshin

Reviewers:

@rowanc1 rowanc1 added the paper This indicates that the PR in question is a paper label May 31, 2024
Copy link

github-actions bot commented May 31, 2024

Curvenote Preview

Directory Preview Checks Updated (UTC)
papers/cockett_etal 🔍 Inspect 129 checks passed (4 optional) Sep 4, 2024, 10:56 PM

Copy link
Collaborator

@fwkoch fwkoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - very cool to write and submit with the same tools we are writing about. Really shows that something is working!

Most of my comments are minor typos, etc - there were just a few paragraphs I suggested more substantial changes.

Also, while previewing this on my phone, it was problematic that footnotes were not accessible at all... I think it's worth taking sometime on jupyter-book/mystmd#926 or anther way to enable footnote pop-ups on mobile. But that's a comment for MyST, not this submission 😄


### Authoring Structured Content

There are currently many challenges for individuals or groups to author research information that can be shared in a structured and rigorous way. By this we mean the things that _structurally_ set a scientific article apart from, for example, a blog post: structured content, cross-references, valid citations with persistent identifiers (PIDs), and standardized metadata for licensing, funding information, authors and affiliations. This structured content and metadata as well as the standards behind them are what defines the "scientific record" and enables archiving, discoverability, accessibility, interoperability and the ability to reuse or cite content [@10.1038/sdata.2016.18].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we are explaining how "structure" is a critical component of published research, and part of that is "structured content" - this is a little circular. What actually is "structured content?" Well-defined sections like Introduction, Conclusion, etc? Or machine readability (e.g. XML, JSON)?

@ameyxd ameyxd self-assigned this Jun 4, 2024
@ameyxd

This comment was marked as resolved.

@rowanc1

This comment was marked as resolved.

@lheagy
Copy link

lheagy commented Jun 7, 2024

Thanks for sharing this, I have added my review, and I am excited about the open-review of SciPy Proceedings! It is a solid article and reads well. I made some annotations on a pdf (the irony! 🙃 ), some grammatical things, and some things to think about for the future.

Transformative Approaches in Scientific Publishing - SciPy Proceedings.pdf

The idea of "continuous science" is really interesting. I think fleshing this out further would be helpful. As a reader, it is quite easy to jump to the conclusion that some of these ideas about "continuous science" will put a lot of work / expectations on reviewers. Which I don't think is the point? To me, it seems that the argument is more that there are practices / ideas from continuous integration that could help improve workflows in science. But this is a bit nuanced and might be nice to try to make more explicit.

Congrats on bringing this together! I look forward to seeing it published in the proceedings

@ameyxd ameyxd removed their assignment Jun 11, 2024
Copy link

@agoose77 agoose77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got to 2.3 (and stopped before reading it), but wanted to get this feedback in now rather than blocking.

I always find reviews feel a bit "confrontational" -- change this, reword that. So, suffice to say, this is a fab read so far.

In this scenario, a static HTML site is built from your content which can be hosted as any other static website, while some of he advantages of dynamic hosting are lost, it is an easy and accessible way for individuals to self-publish.

In 2024, Curvenote was asked to improve our integrations to GitHub to support the SciPy Proceedings and re-imagine a MyST based publishing approach that uses GitHub for open-peer-review, implementing a submission, editorial and peer review process with GitHub issues, PRs and actions as a fabric.
The process previously used technology shared with the Journal of Open Source Software (JOSS), which popularized this approach [@10.7717/peerj-cs.147].
Copy link
Member

@stefanv stefanv Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a factual correction here: the SciPy proceedings tools were implemented around 2010 (I'd have to go check the exact year), and included the proceedings builder (RST to IEEE PDF) and a preview build server (procbuild). The JOSS Whedon bot may have been added into the mix at some point, but SciPy's open, GitHub-based proceedings review, text-format paper build tooling etc. predates JOSS by a significant margin.

/cc @jarrodmillman who has a better memory than me, and who published the first proceedings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @stefanv, I will update that to include some of the history. When I was writing that sentence I was only trying to capture the current state rather than evolution, but it does come across as misrepresenting that history a bit. Not my intention!

Copy link
Member

@stefanv stefanv Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, never thought so! Just think it's worth mentioning, since it was novel at the time (and, in a sad statement about the publishing industry, still is, to a large extent).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add that the JOSS Whedon bot was customized and renamed "scoobies" for SciPy Proceedings in 2021. What scoobies did is largely superseded by how we are now using GitHub projects. Here's a listing of the "scoobies help" output:

# List all available commands
@scoobies help

# Show our community Code of Conduct and Guidelines
@scoobies code of conduct

# Add to this issue's reviewers list
@scoobies add @username to reviewers

# Remove from this issue's reviewers list
@scoobies remove @username from reviewers

# Assign a user as the editor of this submission
@scoobies assign @username as editor

# Remove the editor assigned to this submission
@scoobies remove editor

# Add a user to this issue's assignees list
@scoobies add assignee: @username

# Remove a user from this issue's assignees list
@scoobies remove assignee: @username

# Builds paper
@scoobies build paper

# Checks build status
@scoobies build status

# Check the references of the paper for missing DOIs
# Optionally, it can be run on a non-default branch 
@scoobies check references
@scoobies check references from branch custom-branch-name

# Label issue with: paper
@scoobies mark paper

# Label issue with: needs-review. Remove labels: unready, needs-more-review, pending-comment, ready
@scoobies mark needs review

# Label issue with: needs-more-review. Remove labels: unready, needs-review, pending-comment, ready
@scoobies mark needs more review

# Label issue with: pending-comment. Remove labels: unready, needs-review, needs-more-review, ready
@scoobies mark pending comment

# Label issue with: ready. Remove labels: unready, needs-review, needs-more-review, pending-comment
@scoobies mark ready

# Label issue with: unready. Remove labels: needs-review, needs-more-review, pending-comment, ready
@scoobies mark not ready

# Label issue with: does_not_build:server
@scoobies mark server fail

# Remove labels: does_not_build:server
@scoobies mark server success

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mepa hosted scoobies on Heroku, I believe. And I hosted procbuild on a VM at RENCI using a letsencrypt/certbot cert. There was overhead/maintenance involved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall where we first hosted procbuild in 2023, maybe on a Berkeley machine or on new.scipy.org, but I know @mpacer around 2018 hosted it on Heroku as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more detail to what @stefanv and @cbcunc have already mentioned just for interest. (I'm not implying it needs to be incorporated into the current paper.)

First, here is a nice blog post by @deniederhut about the SciPy partnership with JOSS (which was initiated in 2020): https://blog.neater-hut.com/scipy-is-partnering-with-joss-part-1.html

From the SciPy Proceedings Committee side, Dillon integrated JOSS's review management bot in time for the 2021 proceedings review period, and deployed it on Heroku free tier in both 2021 and 2022. By 2023, Heroku had removed its free product plans so I looked for potential free options but ended up deploying it on Heroku.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SciPy's open, GitHub-based proceedings review, text-format paper build tooling etc. predates JOSS by a significant margin

Indeed, Arfon has cited SciPy as an influence on the design of JOSS's publication system

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that great! I thought JOSS came first and SciPy was influenced by it.

@pranoy-ray
Copy link
Collaborator

Hey all, I am Pranoy, a graduate researcher at GeorgiaTech; I will be a reviewer for this paper, and it's nice to meet you all. Can someone send me a direct link to the paper? (I mean the pdf or something in a readable format with all the figures incorporated)?

@rowanc1
Copy link
Contributor Author

rowanc1 commented Jun 20, 2024

The html preview is in the second comment!

#918 (comment)

Looking forward to your comments.

@aterrel
Copy link
Collaborator

aterrel commented Jun 24, 2024

Hello! I'll have some notes on the review this week, been on vacation and travel the last few.

@rowanc1
Copy link
Contributor Author

rowanc1 commented Jun 24, 2024

Thanks @aterrel and @pranoy-ray -- very much looking forward to your feedback.

@natesjacobs
Copy link

A few comments on this article below. Loved reading this and very excited for the future of MyST and native computational article formats!

Major / general comments:

  1. I would consider adopting a slightly more specific title / overarching narrative, that isn't "Transformative Approaches in Scientific Publishing" but slightly more nuanced in that it starts from a more realistic place of solving computationally intensive writing (your core strength where you outcompete others) and how THAT can help solve core problem with all scientific publishing: "How code-friendly authoring platforms will transform how all science articles are written". It feels a bit more specific and playing to your core strength, while still making the broad sweeping claim.

  2. Consider setting up a more specific relationship between MyST and curvenote, rather than just discussing them as "two different tools doing similar things". I really like the narrative that curvenote is leading an effort to establish MyST as an open standard for structured articles. So MyST is at the core of a movement, and curvenote is adopting it for their products and helping to expand it out into more mainstream use.

  3. In regard to Continuous Science Practices and other more idealistic open science concepts, I would make sure to think deeply about past failures when facing the overwhelming force of prestige and career advancement, and make sure that factors into your writing and framing of issues so that the audience doesn't assume naivité. This is tricky because some audiences like to hear the idealistic pure venting etc (like me!) but I do think a very critical component of success will be answering the question of how a publishing venue or workflow will advance someone's career. Like it or not, that has been the single driving force for all publishing evolutions and lack thereof. It's a hard pill to swallow, since you would hope we would care more about science being correct and rapid, but as far as I can tell the University - Publisher - Funder trifecta still cares deeply about prestige and publishing venue above all else. I can talk more about what I was hoping to build up as an alternative prestige pathway (based on organic growth and earned media which has analogs in other industries like the music industry), but perhaps for this article merely acknowledging it or stating it as an issue would suffice.

  4. Consider emphasizing "fun to write and read" or "expressive articles" as a high level take home from the feel of the article. This may not be appropriate for this article, but one gut level concept I always tried to hit (unsuccessfully most times, so take this with a grain of salt!) is that these new types of authoring can just be a lot more fun and expressive. You don't have to collapse your figure into a single static image. You can link to the dynamic content or the code and let people jump right into it! That's a really fun experience for both the author and the reader. Actions that are fun are sticky for users, and products that are sticky tend to be successful.

Minor / specific comments:

  1. "Many other tools have worked on aspects of integrating computation into scientific articles, notably R-Markdown Xie et al., 2018 and it’s successor Quarto (https://quarto.org/); both of these projects have similar aims to MyST Markdown."

This feels VERY markdown heavy. I don't have a comprehensive knowledge of computationally focused article formats, but at a minimum you should also mention eLife's experiment with 'executable articles' as well as the efforts of 'papers with code'. Or, alternatively, you could narrow the scope of this statement to markdown related efforts.

  1. Double check to make sure it's clear to audience that this is the curvenote team writing about curvenote

@aterrel
Copy link
Collaborator

aterrel commented Jun 28, 2024

INITIAL Independent Review Report

Reviewer: Andy R. Terrel

Department/Center/Division: Compute Products

Institution/University/Company: NVIDIA

Field of interest / expertise: Computer Science / Computational Mathematics

Country: USA

Article reviewed: Scientific Publishing with MyST Markdown

GENERAL EVALUATION

Please rate the paper using the following criteria (please use the abbreviation
to the right of the description)::

below doesn't meet standards for academic publication
meets meets or exceeds the standards for academic publication
n/a not applicable

  • Quality of the approach: Meets

  • Quality of the writing: Meets

  • Quality of the figures/tables: Meets

SPECIFIC EVALUATION

For the following questions, please respond with 'yes' or 'no'. If you
answer 'no', please provide a brief, one- to two-sentence explanation.

  • Is the code made publicly available and does the article sufficiently
    describe how to access it?

PARTIAL, Curvenote.com is a proprietary tool, but the software claims to be open source. In trying to reproduce a local build, the documentation links found in the open source repository https://github.com/curvenote/curvenote, led to broken links on the curvenote.com website, i.e. (https://curvenote.com/docs/cli, https://curvenote.com/docs/web, https://curvenote.com/docs/cli/authorization , ), thus it is hard to verify this claim without fully reviewing the software which is out of scope for this review.

  • Does the article present the problem in an appropriate context?
    Specifically, does it:

    • explain why the problem is important,
    • describe in which situations it arises,
    • outline relevant previous work,
    • provide background information for non-experts

The article does explain the problem but could be better. My recommendation is to modify the following justifications to only scholarly references that are verifiable rather than hand-wavy arguments without sufficient data.

Using climate change to justify a new workflow is hardly scholarly. I do not see justification that faster publications will solve these generational societal problems.

Additionally the paper claims that it takes "elite" software teams to use standard devops practices, this is motivated by marketing whitepapers trying to sell platforms, not scholarly inquiry.

Finally the paper relies on an extremely rough economic argument claiming that it can address 15% of the global spend on publishing is not believable. Better publishing tools have been around for decades, but the business models of the publishing world have not changed. To make this argument, the authors would need to give a much more detailed analysis of the economics of these direct costs and evaluate what behavior changes scientists would need to make to achieve this lower cost. If they could use tools that were so much cheaper, it seems they would do so but there are fundamental barriers that are not scholarly studied in this paper.

  • Is the content of the paper accessible to a computational scientist
    with no specific knowledge in the given field?

Yes

  • Does the paper describe a well-formulated scientific or technical
    achievement?

It gives a concept of continuous scientific publication (although using a more generic and misleading term "continuous science") and how features of the curvenotes cli can support such practices

  • Are the technical and scientific decisions well-motivated and
    clearly explained?

The article needs a better discussion on prior methods to reduce costs of publications. While it may be true that the world does spend too much effort on a older process (which the paper used a paper from 14 years ago to justify), it is not true that there are not other low cost publishing alternatives. The argument would be much stronger if it compared itself to these processes and their apparent failure to create the cultural change advocated in scientific publishing that is advocated here.

  • Are the code examples (if any) sound, clear, and well-written?

No code examples given.

  • Is the paper factually correct?

Yes modulo the arguments above that are made in an unverifiable way.

  • Is the language and grammar of sufficient quality?

Yes.

  • Are the conclusions justified?

Yes the conclusions are mild in comparison to the initial claims of the paper.

  • Is prior work properly and fully cited?

Need citation on other low-cost publication methodologies.

  • Should any part of the article be shortened or expanded? Please explain.

Yes, either justify how faster science solves generational societal problems or remove the claim. The authors need to evaluate other low cost publication methodologies and their gaps rather than just claiming it is a large economic problem and citing a paper from 14 years prior.

  • In your view, is the paper fit for publication in the conference proceedings?
    Please suggest specific improvements and indicate whether you think the
    article needs a significant rewrite (rather than a minor revision).

This paper is fit for publication with minor revisions as stated above.

@rowanc1
Copy link
Contributor Author

rowanc1 commented Jul 2, 2024

Thank you very much @natesjacobs and @aterrel for the reviews, appreciate the time that you took and will respond with comments and changes to the manuscript. In terms of timing, I will wait for a further review from @pranoy-ray before I dive into satisfying the changes you have requested and providing more information. Thanks again all for your help!

@hongsupshin
Copy link
Contributor

hongsupshin commented Jul 5, 2024

Hey @rowanc1 , just got an email from @pranoy-ray saying that he won't be able to conduct the review. I will review the paper and leave the comments this weekend!

Copy link
Contributor

@hongsupshin hongsupshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great article. Suggesting minor changes for improved clarity and typos.

@hongsupshin
Copy link
Contributor

Hey Rowan, just a friendly reminder that all initial reviews are in. I highly recommend you start responding to the comments soon since it'd take time for the reviewers to respond to the changes. Remember that the open review period ends on Sep 2, and you will not be able to make any changes to the manuscript after that point. If you have any questions, please let me know!

@rowanc1
Copy link
Contributor Author

rowanc1 commented Aug 27, 2024

Thank you @lheagy for the reviews. I have implemented your spelling and grammar fixes. Additionally, we will improve the figures and the description of continuous science that you suggested.

Thank you @agoose77 for the review and the fixes to the language. I have adopted the majority of your changes and I think they improved the quality of the manuscript.

Thanks @stefanv @cbcunc @mepa @deniederhut for providing some more information on previous versions of the SciPy proceedings infrastructure. I have updated this sentence to better reflect the influence of SciPy Proceedings infrastructure.


@aterrel - thanks for the detailed review.

Is the code made publicly available?

We have updated the text clarifying the scope of this paper and links to the open source repositories:

We have fixed the links in our Curvenote readme, thanks for catching those.

Only use scholarly references that are verifiable

We have added several other references when discussing continuous practices. We have also taken a pass through the article to ensure that some of the claims (both economic and thinking about climate change, pandemics etc.) are motivation only.

Using climate change to justify a new workflow is hardly scholarly.

I have rewritten the abstract and removed this, however, this is relatively common in motivating improvements to scholarly publishing, albeit more on pandemics lately. The argument is that scientists are doing important work, and getting access to that work sooner (e.g. via preprints, open data) in better, more actionable ways (more integrated publishing models, notebooks, integrated compute, etc.) accelerates the time to building upon, understanding or (re)using that work. Some high-profile examples of this motivation are the UN and US policy that are making similar claims on open access to research. For example:

[Open Science] is increasingly recognized as a critical accelerator for the achievement of the United Nations Sustainable Development Goals.

Additionally, the White House making a similarly broad claim in 2022:

When federally funded research is available to the public, it can improve lives, provide
policymakers with important evidence with which to make critical decisions, accelerate the rates
of discovery and translation, and drive more equitable outcomes across every sector of society.

Not only to fight a pandemic, but to advance all areas of study, including urgent issues such as cancer, clean energy, economic disparities, and climate change.

  • Nelson Memo - Office of Science and Technology Policy (White House, 2022)

The Nelson Memo, similarly goes from those high-level statements to mandating requirements on metadata, co-author affiliations, funding sources, and other persistent identifiers and talking about the actual mechanics of open science.

Improved tooling, practices and incentives for open science is critical and can have significant impacts on both the cost and speed of scientific progress.

I do not see justification that faster publications will solve these generational societal problems.

As an example, faster publication of genomes early in pandemic led to early designs of vaccines within two days and manufacturing within two weeks. The urgency prompted a "unprecedented" and lasting change in how science was shared (via preprints), and major publications temporarily changed their access model to make research free to read. These changes to enable faster, more open publishing practices "saves lives"; has prompted regulatory change (e.g. the OSTP Nelson memo which comes into full effect in 5 months); and is continuing to change funding models to focus on earlier access to research especially via preprints.

We have added a footnote as well as some of these references.

"elite" software teams & marketing whitepapers

We have rephrased the "elite" software team language and updated the text of this section. The DORA report is the largest industry survey on continuous delivery practices that we know of. We have also added several other references to smaller studies, surveys and review articles showing similar conclusions and speedups.

claiming that [MyST/Curvenote] can address 15% of the global spend on publishing is not believable

Stranger things have happened! However, we do not make this claim, only that our goal is to reduce these direct-publication costs.

The article needs a better discussion on prior methods to reduce costs of publications. [...] The argument would be much stronger if it compared itself to these processes and their apparent failure to create the cultural change advocated in scientific publishing that is advocated here.

We have improved a comparison to the Journal of Open Source Software (JOSS).
The economics provide motivation, however, the main take away is that authoring structured content directly provides new, transformed reading experiences and the ability to iterate/release versions close to "publication quality" by ensuring checks, continuous practices, etc.
We have expanded a comparison to Quarto which is the closest tool to doing similar aspects of these workflows, and most especially are in the authors control rather than the publisher.
A wider analysis on the economics and reasons why the sociotechnical publishing system has not embraced lower cost models is beyond the scope of this article.

used a paper from 14 years ago to justify

This is a manifesto to a conference, I have updated the text with a footnote to make that more clear.


Thank you @hongsupshin for the review - I have responded to the majority of your comments inline and resolved them. Feel free to reopen if you have further refinements to make! I am still working through a few more of the issues in the next few days!

Thanks @natesjacobs for the review. I will continue to work through some responses in the next day or two as well.


TODO:

  • Address remaining comments by @hongsupshin
  • Final pass on text

Copy link
Contributor

@hongsupshin hongsupshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @rowanc1 for all the hard work!

@hongsupshin
Copy link
Contributor

hongsupshin commented Sep 5, 2024

@aterrel @natesjacobs @agoose77 Hello reviewers, the author @rowanc1 made changes to the manuscript based on your comments. If you don't have any further comments, could you please approve the changes? Thanks for your time and effort for reviewing this paper!

@hongsupshin
Copy link
Contributor

Thanks @fwkoch @lheagy @aterrel @stefanv @natesjacobs @agoose77 for reviewing this paper!

@cbcunc cbcunc merged commit 694b060 into scipy-conference:2024 Sep 25, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
paper This indicates that the PR in question is a paper ready-for-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.