Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify license headers in source files #11605

Closed
cbrnr opened this issue Mar 28, 2023 · 88 comments · Fixed by #12781
Closed

Simplify license headers in source files #11605

cbrnr opened this issue Mar 28, 2023 · 88 comments · Fixed by #12781
Assignees
Milestone

Comments

@cbrnr
Copy link
Contributor

cbrnr commented Mar 28, 2023

Currently, our source files show individual authors as well as the license in a comment near the top, for example:

# Authors: Jaakko Leppakangas <[email protected]>
#          Robert Luke <[email protected]>
#
# License: BSD-3-Clause

I think this is not ideal for the following reasons:

  1. It is not trivial to keep the list of authors up to date. Since this information is available from Git anyway, it is also not necessary.
  2. I don't think we have always updated these lists, so currently this information is likely outdated.
  3. There's always the question what kind of change(s) qualifies an author to get added to that list.

I suggest that we replace our headers with the following license text across all source files:

# Authors: MNE-Python development team
# License: BSD-3-Clause

The exact wording can be changed of course (feel free to make suggestions), but the point is to get rid of individual authors.

@sappelhoff
Copy link
Member

This was also brought up by @drammock in #9569 (comment) some time ago.

I generally agree. Do we need a copyright notice in each file? Isn't the license in the root of the repository enough, to cover it all?

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 28, 2023

Removing is even better. I don't think every file needs to have the license when there is a license in the root.

@larsoner
Copy link
Member

larsoner commented Mar 28, 2023

I don't think every file needs to have the license when there is a license in the root.

I was originally on board with this until I read this stackexchange thread where people sometimes (usually?) generally recommend that you do include the license so that a copied-away file still makes it clear what the license is. So I'm +0 on removing the BSD 3-clause line, I'm fine with keeping it or removing it.

Last time the "authors" stuff came up, IIRC (I can't find the thread, maybe it was in person at a dev meeting!) most people thought "yes these should go away" until @agramfort pointed out that it's a nice way for newer contributors to get some credit for what they do. I think we conveged on getting rid of the authors in mne/ but keeping those lines in examples/ and tutorials/ because -- even though they, too, end up out of date -- it's nice for newer and/or less experienced people to have their name visible on something that people are more likely to read (i.e., an example or tutorial) than our whats_new.rst.

@larsoner larsoner added this to the 1.4 milestone Mar 28, 2023
@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 28, 2023

Good points @larsoner. Then I'm also +0 on removing the license line.

I'm also OK with removing authors only in mne/. If we keep them in examples/ and tutorials/, we should define what kind of changes warrant adding a new author. Are we OK with a typo fix? Or does it have to be more substantial? This discussion could be avoided if we remove authors in all sources, which I'd still prefer. We now highlight new authors in our changelog, which I believe we didn't do when @agramfort made this comment.

@larsoner
Copy link
Member

We now highlight new authors in our changelog, which I believe we didn't do when @agramfort made this comment.

Looking back, looks like it was actually @rob-luke who advocated for keeping the examples/tutorial authors here in the same thread linked to before in July 2021 (I had just missed it). And back then we already were highlighting new contributors in the release April 2021 (and even if we didn't, I think the changelog is a lot less visible to end-users in general than examples/tutorials).

we should define what kind of changes warrant adding a new author. Are we OK with a typo fix? Or does it have to be more substantial?

I think people will generally be self-policing here and add their name if they think they did something worthwhile. I think we can take it case-by-case if something concerning does come up rather than trying to codify a policy here.

@drammock
Copy link
Member

I think the changelog is a lot less visible to end-users in general than examples/tutorials

this is the point (originally made by @rob-luke as @larsoner mentioned) that convinced me to keep authors on tutorials/examples, even though I still dislike the fact that they can get stale/out of date/inaccurate.

I like the idea of removing authors from the source .py files within mne/ though. Anyone who is reading the source code I think it's fair to assume they at least know about changelogs, and many will know about git blame too. Plus GitHub shows avatars of all authors if you view the source on GitHub, so that's another way authors are visible for source files.

I'm +0.5 to keep the license lines.

@agramfort
Copy link
Member

agramfort commented Mar 28, 2023 via email

@drammock
Copy link
Member

I've been contacted by people to ask questions in sklearn code as they found my name and email on the files.

I have too, but this is one reason I think it's bad to have the names there, because:

  1. in general I think we want to discourage users from sending emails directly to specific devs --- questions/support requests should go through our preferred channels (discourse / GitHub) so that currently active maintainers will see the message.
  2. Assuming names are in chronological order, if folks email the first name on the list they are in fact more likely to get someone who hasn't touched that code in a while.

having your name on a file gives a sense of code ownership and responsibility

Is this your personal feeling, or have contributors told you that they feel this way? (how many?) If we want to cultivate that feeling of responsibility, we could enforce / strongly encourage using the GitHub codeowner feature

for some contributors it's important to them to have their name. There can be a sense of pride to have your name there and can maybe encourage them to contribute more.

is the sense of pride from seeing your name in the file itself any greater than seeing your name on the associated commit(s)? I feel like we don't really know whether people are adding their names because they feel pride, or because they saw other names and assumed they were supposed to do it.

@jasmainak
Copy link
Member

names on top of files survives github and where the code is stored.

I find this to be the most important reason to keep names ... when files are re-organized "git blame" stops working. E.g.: https://github.com/mne-tools/mne-python/blob/main/mne/report/report.py ... I don't see my name in the git commits

image

even though I might be able to answer questions related to the MNE-report better than the average contributor. For functionality that is old, this may be even more critical as it might be hard to dig up who originally wrote that part of the code.

@drammock
Copy link
Member

ok, I withdraw my objection to names in the source files then. It's weird that you aren't listed by GitHub as a contrib @jasmainak; you show up in the blame of that file:

Screenshot 2023-03-28 at 15-12-04 mne-python_mne_report_report py at main · mne-tools_mne-python

@drammock
Copy link
Member

So summarizing what I think is an emerging consensus (?):

It is not trivial to keep the list of authors up to date. Since this information is available from Git anyway, it is also not necessary.

There are social reasons (pride, fame/publicity, guilt/ownership) to keep them there.

I don't think we have always updated these lists, so currently this information is likely outdated.

The harm caused by outdated information is minimal / consists of inactive devs getting unwanted emails, and/or users having their emails go unanswered. I still think this could be avoided; maybe we include names but not emails?

There's always the question what kind of change(s) qualifies an author to get added to that list.

@larsoner suggests that the honor system / letting people decide for themselves is good enough here. I won't quibble with that.

So in the end maybe there is nothing to be done here @cbrnr ? Unless you want to push for removing just the emails and keeping the names... but IDK if that will be any less contentious.

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 29, 2023

I still think names in the source code only make sense if they are kept up to date. I rarely bother to put my name there, but now this makes it look as though I did not contribute. I can imagine that this is the case for others as well.

@drammock made some good points: people should not be looking at the source to find developers and ask questions. They should use our official channels. If these are not obvious, we can make them more obvisous in our documentation.

The sense of ownership and responsibility can be instilled by other means, and I think we should try to find something that works automatically. The spirit of open source is not that someone "owns" a piece of code, but we as a community have a shared responsibility for everything we create in MNE-Python. Putting some names in the sources seems to be biased and unfair unless we keep everything up to date all the time (which is not feasible).

Even if some previous contributor knows most about a specific topic I trust current developers to either answer the question or forward it to someone who they think will be able to help.

So I still think that removing names (but keeping the license) in mne/ (but keeping them in examples/ and tutorials/) is the better option.

@agramfort
Copy link
Member

@cbrnr i would say add your name to the files you feel a clear ownership for. I don't want your name in the files then don't.

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 29, 2023

i would say add your name to the files you feel a clear ownership for. I don't want your name in the files then don't.

It's not about me. It's that I don't like it in general for the reasons I mentioned.

So this is settled? No chance that something can be done here?

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 29, 2023

Regarding the comment by @jasmainak, you show up in the blame of the file you linked to, so git blame does not stop working when files are re-organized. GitHub shows the most recent contributer per line. It's the list that's buggy, we should probably report that to GitHub. Because that file was renamed, you need to go to the previous file, which is linked in the list of commits.

But this file demonstrates the issue I have: @hoechenberger and @larsoner have contributed a lot, but they are not mentioned at the top of the file. Whether or not that's intentional is also beside the point, because if there is a list this indicates to others that these are the developers who contributed most and are responsible for that file. Which is simply not true.

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 29, 2023

Here's a shell command that prints out all contributors of a file:

git --no-pager log --follow --pretty=format:"%aN <%ae>" mne/report/report.py | sort | uniq

The important bit is the --follow option, which makes it work across renames.

If you want to preserve the order of contributions:

git --no-pager log --follow --pretty=format:"%aN <%ae>" mne/report/report.py | nl | sort -uk2 | sort -nk1 | cut -f2-
Dimitri Papadopoulos Orfanos <[email protected]>
Eric Larson <[email protected]>
Daniel McCloy <[email protected]>
Mathieu Scheltienne <[email protected]>
Clemens Brunner <[email protected]>
Mathieu Scheltienne <[email protected]>
Richard Höchenberger <[email protected]>
Alex Rockhill <[email protected]>
Jeff Stout <[email protected]>
Marijn van Vliet <[email protected]>
Alexandre Gramfort <[email protected]>
Stefan Appelhoff <[email protected]>
Rob Luke <[email protected]>
Valerii Chirkov <[email protected]>
Martin Schulz <[email protected]>
Guillaume Favelier <[email protected]>
Dmitrii Altukhov <[email protected]>
Alex Rockhill <[email protected]>
Daniel McCloy <[email protected]>
Joan Massich <[email protected]>
Jaakko Leppakangas <[email protected]>
Mainak Jas <[email protected]>
Denis A. Engemann <[email protected]>
Teon Brooks <[email protected]>
Teon Brooks <[email protected]>
Jean-Remi King <[email protected]>
Mainak Jas <[email protected]>
Denis A. Engemann <[email protected]>

@hoechenberger
Copy link
Member

hoechenberger commented Mar 29, 2023

But this file demonstrates the issue I have: @hoechenberger and @larsoner have contributed a lot, but they are not mentioned at the top of the file. Whether or not that's intentional is also beside the point,

I'm okay with having the names of the original authors of a module / function listed on top. I don't care if I'm listed even if I contributed a lot, for as long as the commit history is preserved. Should we at some point be forced to kill the history, we should add all contributors to the top.

Makes me wonder about the planned transition to Black though, where we'll touch many, many lines of the code. I'm wondering if we could associate this gigantic commit with a "virtual" contributor / bot (instead of a real developer account). But this just as an aside.

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 29, 2023

I'm okay with having the names of the original authors of a module / function listed on top. I don't care if I'm listed even if I contributed a lot, for as long as the commit history is preserved. Should we at some point be forced to kill the history, we should add all contributors to the top.

But some (new) contributors might care, but not dare to ask to include their name because they might think that only original contributors or "worthy" contributions should go in this list.

That's exactly my suggestion, either we copy the full list to the header, or none. Everything in between is not fair.

Coming back to a point I thought we'd converged on (also in a previous discussion): I would leave authors in tutorials and examples, but not in sources. This was the consensus in the linked discussion in my opinion.

Re Black, it is possible to ignore revisions - is that not sufficient? GitHub does support it (https://docs.github.com/en/repositories/working-with-files/using-files/viewing-a-file#ignore-commits-in-the-blame-view).

@britta-wstnr
Copy link
Member

I agree with @agramfort here: it can be valuable for some, especially junior, developers to have their name in the source files. In my opinion, this should be marked by "substantial contribution".

The reason why I think this is valuable is the following: having a name on a source file is an easy way to showcase that one has substantially contributed to a specific method etc. Recruitment committees of various disciplines will probably not use a command line tool or other git/GitHub functionality to figure out contributions. However, adding a link to a file where your name is up to is easy to do.
Furthermore, people who look at the source code locally will make a connection with the names - thus increasing visibility for the contributors.

Now I agree with @cbrnr that the practice how this is done is disadvantaging contributors who are less inclined to just put themselves on the files. I wonder if what we actually need is guidelines what makes a contribution substantial enough to have your name on the file (I also see how this would always be an arbitrary cut-off).

I am against having all names up, as this would add a lot of text for potentially less substantial commits (e.g., having fixed a typo in the docstring etc.) and would decrease visibility.

Another possibility could be to outsource the contributor list to another file we keep on our homepage - that could fix it for my first point, but again not for the visibility.

@hoechenberger
Copy link
Member

But some (new) contributors might care, but not dare to ask to include their name

Ok right, very valid point. I've been in this situation before and I never knew when it's justified to add my name.

@drammock
Copy link
Member

git --no-pager log --follow --pretty=format:"%aN <%ae>" mne/report/report.py | nl | sort -uk2 | sort -nk1 | cut -f2-

If you can add this to a commit hook that automatically runs only on the changed files and then auto updates the author list (and deduplicates with our mailmap!) then I think that would be quite a good solution. Names who only made small typo fixes will still be in there but I'd rather err in that direction than to not include names who made large contributions

@britta-wstnr
Copy link
Member

Mh @drammock I'd argue that then the name list become meaningless, no? For all points mentioned here: ownership, visibility, pride, ...

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 29, 2023

I'd argue that then the name list become meaningless, no? For all points mentioned here: ownership, visibility, pride, ...

But everyone who contributed has the same right to get ownership, visibility, etc.?

Since I wanted to avoid defining what an "important" contribution is, the best solution IMO is to remove author names from source files and try to add other means to provide these perks. For everyone.

@britta-wstnr
Copy link
Member

britta-wstnr commented Mar 29, 2023

Well, my point is that with all names on there, it would not work that way.
Would someone feel ownership/responsibility if there are 15+ other names? Probably no.
Would someone get visibility if there are 15+ other names? Probably also no.

But I do think that if someone made substantial changes to a part of the code base, they should be able to get recognition for that beyond having their name in a long list of names or not mentioned at all.
At this point I also want to throw in that this might be especially important for marginalized groups in open source, who might not be just "believed" to have "really contributed".

@sappelhoff
Copy link
Member

sappelhoff commented Mar 29, 2023

Having observed this a bit, I want to revise my opinion. I think we should:

  • use CODEOWNERS for "ownership" --> this allows people to get notifications when files are touched
  • use our changelog for "credit"/"attribution" --> after all it's more relevant to be able to point to a specific PR that you have implemented, rather than say "I worked on this file in general"
    • I think what would be really awesome would be a way to filter the changelog by contributor ... something like: list me all contributions by @example-person across all MNE versions
    • ideally, such a query would be associated with a specific URL that could be shared. Then I could send someone a URL and they could see all my contributions with links to the specific PRs.
  • git blame for specifics (the "black" commit can be filtered via a .git-blame-ignore-revs file, which does also work on GitHub, next to the CLI)
  • keep author names in tutorials and examples, AND write some loose guidelines (preferably examples) when a contribution is considered big enough to add a name there ... OR add a "author contributions" section to each tutorial/example ... something like the CREDIT taxonomy
  • remove names and emails from files, and only keep the license abbreviation there --> credit/ownership/blame is IMHO properly solved by the above mentioned items

@cbrnr
Copy link
Contributor Author

cbrnr commented Mar 29, 2023

I agree with @sappelhoff 100%!

@hoechenberger
Copy link
Member

@britta-wstnr

But I do think that if someone made substantial changes to a part of the code base, they should be able to get recognition for that beyond having their name in a long list of names or not mentioned at all.

While I do agree, I also see the difficulty here that @cbrnr mentioned: how do we quantify that? How big or important is "substantial"? Changed lines of code (how many?)? Performance improvements (how much faster?)? Continuous maintenance of a module (for how long?)? Do the barriers we set here depend on seniority of the contributor (beginners -> lower, senior devs -> higher barrier?)?

There must be a clear path that contributors can follow to get their name added at the top-level. Otherwise it'll be a recipe for conflict and disappointment.

@britta-wstnr
Copy link
Member

@hoechenberger as mentioned in one of my earlier comments, I agree with that as well.
However, that also is a problem for tutorials and examples as @sappelhoff acknowledges and proposes to write guidelines for - i.e., we have to solve that anyway.

@drammock
Copy link
Member

drammock commented Mar 29, 2023

@britta-wstnr

But I do think that if someone made substantial changes to a part of the code base, they should be able to get recognition for that beyond having their name in a long list of names or not mentioned at all. At this point I also want to throw in that this might be especially important for marginalized groups in open source, who might not be just "believed" to have "really contributed".

I am very sympathetic to and supportive of these points. But until we have a proposal for how "substantial" is defined and adjudicated, I feel I cannot support inclusion/exclusion of any names --- either show all names, or show none.

I think several of us agree that letting the contributor decide whether or not to include their name isn't a very good solution: it disadvantages anyone feeling shy/humble/insecure and advantages anyone feeling arrogant/confident/entitled. But I also really don't want the mechanism to be "whoever is reviewing the PR decides" --- clearly there are differences of opinion among members of our steering council about how this should be handled, and whether a contributor gets credit in this way should not be determined by luck of the draw in who is available for PR reviews.

@sappelhoff

ideally, such a query would be associated with a specific URL that could be shared. Then I could send someone a URL and they could see all my contributions with links to the specific PRs.

This is already available: https://github.com/mne-tools/mne-python/commits?author=sappelhoff

keep author names in tutorials and examples, AND write some loose guidelines

See above point; "loose guideline" does not, IMO, eliminate the bias toward advantaging some contribs and disadvantaging others, and it adds overhead for devs to adjudicate (or at least "keep an eye out" for whether names were (not) added (in)appropriately).

@larsoner
Copy link
Member

larsoner commented Oct 3, 2023

@agramfort, @drammock and I did a little bit of investigation. When you google authors (we tried @jasmainak and @britta-wstnr for example) you do not get links to source files, even if you go pretty deep in the results. You do (sometimes as the first result!) get links to examples/ and tutorials/. Based on this, the cumulative discussion and ideas above, and some additional discussion today with @britta-wstnr, @wmvanvliet, and @jasmainak (hooray sprint time making chatting much easier!) we came up with the following tentative plan.

  1. Remove names from mne/ source files , replaced with:

    # Authors: MNE-Python developers <https://mne.tools/stable/credit.html>
    # License: BSD-3-Clause
    

    Source files are not easily found by Google so don't give great credit anyway. Also sometimes people email the authors for help with code which is a bug not a feature -- people should use standard communication channels (e.g., discourse).

  2. Create a GitHub action that:

    1. Populates a doc/credit.rst that is roughly organized with public submodules as headings. Under each heading is a list of all authors of the file (ordered by credit, probably +/- lines over the history). This should be at least as discoverable / reportable for people who need it as having the names in the source files since it will be on our webpage, and we can link to it directly from the front page somehow.
    2. Populates examples/ and tutorials/ Python files with all contributor names to a given example/tutorial. Since these files are already discoverable by search, we can easily keep it this way by preserving and extending/completing these automatically.

Hopefully this preserves credit "backward compatibility" sufficiently while improving things overall, especially in terms of needing to make decisions about what to include or exclude with each contribution/PR. Someone (me probably!) "just" needs to implement this stuff but really I don't think it will be too onerous having played around a bit in #11975 and with GitHub actions recently. And critically it's extensible and adjustable as needed, for example we can add sections on grant support and Discourse activity that we currently don't credit sufficiently (if at all).

@cbrnr
Copy link
Contributor Author

cbrnr commented Jun 29, 2024

Meanwhile, Scikit-Learn has had a similar discussion, and they decided to phase out individual author names in source files: https://blog.scikit-learn.org/updates/authorship-info/

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 11, 2024

Do you think we could find a solution for the upcoming release? My simplified proposal is:

  1. Remove individual authors from source files (but not from examples and tutorials!).
  2. As @larsoner suggested, we can include a URL in the header that links to contributors. I wouldn't bother trying to create something super fancy, let's just move contributors from the bottom of our landing page to https://mne.tools/contributors.html for now. That way, we can always add more things to that page later. I know that's not the same level of detail, but I think we have to find a balance that's not putting too much work on devs to implement, and I don't think we should wait another release.

@larsoner
Copy link
Member

I think everyone is on board with (1) assuming there is some suitable replacement for the source-file credit, but I don't think (2) qualifies as a suitable replacement according to the previous discussions. I have some code somewhere sitting around for #11605 (comment) that I can try to polish, or at least open as a WIP PR. That way if someone is really motivated to work on this (sounds like you might be @cbrnr ?) then they could push it over the finish line.

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 11, 2024

I thought that in the light of the latest developments in Scikit-Learn, we might be able to reconsider (2). They don't implement such a highly specific credit system, and I don't know any other project that does. So I'm really proposing to follow what other projects are doing to avoid creating too much work for us devs.

@drammock
Copy link
Member

in the light of the latest developments in Scikit-Learn, we might be able to reconsider (2). They don't implement such a highly specific credit system, and I don't know any other project that does.

I think any project that uses https://allcontributors.org/ or https://github.com/mntnr/name-your-contributors is probably doing a better job than either us or sklearn. So while I'm all in favor of looking to other members of our ecosystem for inspiration on ways we could improve, it's not clear to me that emulating sklearn is an improvement.

As I said in #11605 (comment):

I wonder if it would help to try to separate the question of "what would be optimal going forward?" from the question of "what would preserve or improve on the imperfect situation we've inherited?"

which @cbrnr agreed with (at the time at least). If we're going to do incremental steps, let's make sure they're steps forward from the (possibly marginalized/disadvantaged/unprivileged) contributors' point of view, not only from the maintenance burden for current devs view.
IMO, removing the names from the source files and also moving the contributor avatars from our homepage to some other page seems like two steps in the wrong direction (i.e. both are reducing contributor visibility).

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 11, 2024

Then just keep the contributors on the main page? I feel like we already list all contributors in the release notes, and we specifically highlight new contributors, so I think there's no need for more complexity.

@hoechenberger
Copy link
Member

hoechenberger commented Jul 12, 2024

Why not go all-in and write some tooling that extracts the contributors to each source file from Git and then adds all of them to the top of the file. We now have experience with CI-generated file changes with autofix.ci.

i still think there are much better ways to create contributor visibility but we've embarrassingly been unable to reach a consensus in over 9 months.

FWIW MNE-BIDS just removed individual authors.

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 12, 2024

I don't like the idea of more tooling. Maybe this has been mentioned before, but why is

https://github.com/mne-tools/mne-python/graphs/contributors

not sufficient? It shows all contributors with commit-level granularity.

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 12, 2024

What some maybe fail to realize is that the current status quo is deeply unfair and inconsistent. So it doesn't really help to qualify all suggestions as steps in the wrong direction, because doing nothing is actually larger steps in the wrong direction IMO.

@hoechenberger
Copy link
Member

the current status quo is deeply unfair and inconsistent. So it doesn't really help to qualify all suggestions as steps in the wrong direction

I agree

@sappelhoff
Copy link
Member

sappelhoff commented Jul 12, 2024

FWIW MNE-BIDS just removed individual authors.

to explain, I have removed them (with approval) for these reasons (which have all been mentioned above):

  1. inconsistent status quo (some authors missing on some files, not all files had an authors list, things got outdated ...)
  2. author list in files was sort of misleading (say, one person wrote 90%, the other 10%, yet both are listed as equal)
  3. a statement like "this is from US, all the contributors as a team" sends a nice and concise message (rather than, we work as a team, but this is only by people A, B, C), IMHO
  4. original goal of having individual names is better covered by other means (see below)

I find it very important that FOSS contributors receive credit for their contributions, and I believe this can be accurately tracked by existing tools (which have also all been mentioned above):

  1. git blame (and other git commands to inspect the git data)
  2. the changelog we publish on the website
  3. GitHub "tools" to inspect the git data, such as the contributors graph, or a author specific commit history ... or even linking to PRs
  4. the CITATION.cff file
  5. the Zenodo archive (based on data from the CITATION.cff file)
  6. a CODEOWNERS file (we currently don't have that, but might add it in the future)
  7. the paper in JOSS (but this is actually problematic, because not all contributors are authors on the paper, only those that had contributed up to the time of publication)

☝️ picking one of these methods is bound to give a contributor the granularity at which they want to advertise their contributions. I also believe that at least a few of these methods are suitable for non-technically versed people who want to judge the amount of a person's contributions.

For the future, I wish that we could become better (at MNE-BIDS) at tracking different kinds of contributions (contributor roles), that are potentially not picked up by the git history (e.g., informal conversations at meetings, help in the user forum, offline design work that is then committed by someone else, ...). A promising direction in that regard is the potential inclusion of "contributor roles" within the CITATION.cff standard. Note that we already add authors to the changelog if they haven't committed but contributed in another way:

Manually add authors who have done a lot of work (reviewing, advising, ...) but don't show up in the shortlog because they did not submit PRs

I furthermore think MNE-BIDS would profit from having a dedicated "contributors" page, like MNE-Python has on the bottom of the landing page.

Please note that all that I mention above is with regards to MNE-BIDS, which is a much smaller code base with much fewer contributors, so we can't copy all arguments one to one to MNE-Python.

@drammock
Copy link
Member

A real-time discussion with @britta-wstnr and @larsoner today yielded the following proposal:

  1. create a page on our website that, when called with a contributor's GH handle as a URL parameter, will dynamically create a summary of their contributions to MNE. Candidates for inclusion are:
    • number of merged PRs (better than number of commits because it avoids the imbalance created by switching to squash-merging around 2016 or so)
    • number of PR reviews
    • some summary of forum participation (exact metrics TBD)
    • leadership activity (should be API queryable from GH teams; or at least we can make it so that this is possible going forward)
    • participation in sprints or GSoC (easy enough to maintain a YAML or JSON of this data)
    • teaching of MNE workshops (again, manually-maintained JSON or YAML, but burden should hopefully be quite low)
    • obtaining grants or other funding
    • possibly other metrics, either now or later
  2. Each metric gets a sentence or two, contextualizing what is shown (e.g., whether the sprint had competitive admission like our training sprints, or was invite-only). We can discuss whether displaying ranking for each metric is helpful/informative context or not.
  3. The whole page gets some contextualizing info too (e.g., specifically mentions what we don't (or can't easily) track; mentions known biases in the metrics, etc).
  4. Info is pulled via monthly cron job and dumped somewhere, and the dynamic page draws from that data source (the API queries will be too slow to do on every page load)
  5. target is to create and merge this shortly after the next release, so that we have plenty of time to fine-tune it before it becomes part of the stable docs site.
  6. until it's launched, we don't remove names from source files.

@larsoner and I are willing to lead the effort on building this, though @hoechenberger has indicated willingness to work on a "certificate of contribution" so hopefully this is close enough to that idea that he'd be willing to help out with this approach too.

@hoechenberger
Copy link
Member

hoechenberger commented Jul 14, 2024

@drammock I'm happy to work on a certificate template and also to help develop and deploy the API, I've gained some experience in both backend and frontend development including cryptography over the past couple of years

let me know if we should connect for a call to discuss and assign tasks

Great to hear we're finally moving forward here!

@cbrnr
Copy link
Contributor Author

cbrnr commented Jul 15, 2024

Sounds like a plan! I'm fine with reporting merged PRs, but just to clarify:

number of merged PRs (better than number of commits because it avoids the imbalance created by switching to squash-merging around 2016 or so)

This means that after we switched to squash-merging, every merged PR equals a single commit, right? So technically, the imbalance would not affect any new contributors, just very old contributions, right?

@drammock
Copy link
Member

after we switched to squash-merging, every merged PR equals a single commit, right? So technically, the imbalance would not affect any new contributors, just very old contributions, right?

exactly right. Counting commits biases in favor of contributions prior to the squash-merge transition, and unevenly so (bias is stronger if a person commits frequently, regardless of how complex or extensive the work was)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants