-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify license headers in source files #11605
Comments
This was also brought up by @drammock in #9569 (comment) some time ago. I generally agree. Do we need a copyright notice in each file? Isn't the license in the root of the repository enough, to cover it all? |
Removing is even better. I don't think every file needs to have the license when there is a license in the root. |
I was originally on board with this until I read this stackexchange thread where people sometimes (usually?) generally recommend that you do include the license so that a copied-away file still makes it clear what the license is. So I'm +0 on removing the Last time the "authors" stuff came up, IIRC (I can't find the thread, maybe it was in person at a dev meeting!) most people thought "yes these should go away" until @agramfort pointed out that it's a nice way for newer contributors to get some credit for what they do. I think we conveged on getting rid of the |
Good points @larsoner. Then I'm also +0 on removing the license line. I'm also OK with removing authors only in |
Looking back, looks like it was actually @rob-luke who advocated for keeping the examples/tutorial authors here in the same thread linked to before in July 2021 (I had just missed it). And back then we already were highlighting new contributors in the release April 2021 (and even if we didn't, I think the changelog is a lot less visible to end-users in general than examples/tutorials).
I think people will generally be self-policing here and add their name if they think they did something worthwhile. I think we can take it case-by-case if something concerning does come up rather than trying to codify a policy here. |
this is the point (originally made by @rob-luke as @larsoner mentioned) that convinced me to keep authors on tutorials/examples, even though I still dislike the fact that they can get stale/out of date/inaccurate. I like the idea of removing authors from the source I'm +0.5 to keep the license lines. |
my take on this and why I think it's valuable to keep names:
- having your name on a file gives a sense of code ownership and
responsibility.
- names on top of files survives github and where the code is stored.
- I've been contacted by people to ask questions in sklearn code as they
found my name and email on the files.
- for some contributors it's important to them to have their name. There
can be a sense of pride to have your name there and can maybe encourage
them to contribute more.
my 2c
Message ID: ***@***.***>
… |
I have too, but this is one reason I think it's bad to have the names there, because:
Is this your personal feeling, or have contributors told you that they feel this way? (how many?) If we want to cultivate that feeling of responsibility, we could enforce / strongly encourage using the GitHub
is the sense of pride from seeing your name in the file itself any greater than seeing your name on the associated commit(s)? I feel like we don't really know whether people are adding their names because they feel pride, or because they saw other names and assumed they were supposed to do it. |
I find this to be the most important reason to keep names ... when files are re-organized "git blame" stops working. E.g.: https://github.com/mne-tools/mne-python/blob/main/mne/report/report.py ... I don't see my name in the git commits even though I might be able to answer questions related to the MNE-report better than the average contributor. For functionality that is old, this may be even more critical as it might be hard to dig up who originally wrote that part of the code. |
ok, I withdraw my objection to names in the source files then. It's weird that you aren't listed by GitHub as a contrib @jasmainak; you show up in the blame of that file: |
So summarizing what I think is an emerging consensus (?):
There are social reasons (pride, fame/publicity, guilt/ownership) to keep them there.
The harm caused by outdated information is minimal / consists of inactive devs getting unwanted emails, and/or users having their emails go unanswered. I still think this could be avoided; maybe we include names but not emails?
@larsoner suggests that the honor system / letting people decide for themselves is good enough here. I won't quibble with that. So in the end maybe there is nothing to be done here @cbrnr ? Unless you want to push for removing just the emails and keeping the names... but IDK if that will be any less contentious. |
I still think names in the source code only make sense if they are kept up to date. I rarely bother to put my name there, but now this makes it look as though I did not contribute. I can imagine that this is the case for others as well. @drammock made some good points: people should not be looking at the source to find developers and ask questions. They should use our official channels. If these are not obvious, we can make them more obvisous in our documentation. The sense of ownership and responsibility can be instilled by other means, and I think we should try to find something that works automatically. The spirit of open source is not that someone "owns" a piece of code, but we as a community have a shared responsibility for everything we create in MNE-Python. Putting some names in the sources seems to be biased and unfair unless we keep everything up to date all the time (which is not feasible). Even if some previous contributor knows most about a specific topic I trust current developers to either answer the question or forward it to someone who they think will be able to help. So I still think that removing names (but keeping the license) in |
@cbrnr i would say add your name to the files you feel a clear ownership for. I don't want your name in the files then don't. |
It's not about me. It's that I don't like it in general for the reasons I mentioned. So this is settled? No chance that something can be done here? |
Regarding the comment by @jasmainak, you show up in the blame of the file you linked to, so But this file demonstrates the issue I have: @hoechenberger and @larsoner have contributed a lot, but they are not mentioned at the top of the file. Whether or not that's intentional is also beside the point, because if there is a list this indicates to others that these are the developers who contributed most and are responsible for that file. Which is simply not true. |
Here's a shell command that prints out all contributors of a file:
The important bit is the If you want to preserve the order of contributions:
|
I'm okay with having the names of the original authors of a module / function listed on top. I don't care if I'm listed even if I contributed a lot, for as long as the commit history is preserved. Should we at some point be forced to kill the history, we should add all contributors to the top. Makes me wonder about the planned transition to Black though, where we'll touch many, many lines of the code. I'm wondering if we could associate this gigantic commit with a "virtual" contributor / bot (instead of a real developer account). But this just as an aside. |
But some (new) contributors might care, but not dare to ask to include their name because they might think that only original contributors or "worthy" contributions should go in this list. That's exactly my suggestion, either we copy the full list to the header, or none. Everything in between is not fair. Coming back to a point I thought we'd converged on (also in a previous discussion): I would leave authors in tutorials and examples, but not in sources. This was the consensus in the linked discussion in my opinion. Re Black, it is possible to ignore revisions - is that not sufficient? GitHub does support it (https://docs.github.com/en/repositories/working-with-files/using-files/viewing-a-file#ignore-commits-in-the-blame-view). |
I agree with @agramfort here: it can be valuable for some, especially junior, developers to have their name in the source files. In my opinion, this should be marked by "substantial contribution". The reason why I think this is valuable is the following: having a name on a source file is an easy way to showcase that one has substantially contributed to a specific method etc. Recruitment committees of various disciplines will probably not use a command line tool or other git/GitHub functionality to figure out contributions. However, adding a link to a file where your name is up to is easy to do. Now I agree with @cbrnr that the practice how this is done is disadvantaging contributors who are less inclined to just put themselves on the files. I wonder if what we actually need is guidelines what makes a contribution substantial enough to have your name on the file (I also see how this would always be an arbitrary cut-off). I am against having all names up, as this would add a lot of text for potentially less substantial commits (e.g., having fixed a typo in the docstring etc.) and would decrease visibility. Another possibility could be to outsource the contributor list to another file we keep on our homepage - that could fix it for my first point, but again not for the visibility. |
Ok right, very valid point. I've been in this situation before and I never knew when it's justified to add my name. |
If you can add this to a commit hook that automatically runs only on the changed files and then auto updates the author list (and deduplicates with our mailmap!) then I think that would be quite a good solution. Names who only made small typo fixes will still be in there but I'd rather err in that direction than to not include names who made large contributions |
Mh @drammock I'd argue that then the name list become meaningless, no? For all points mentioned here: ownership, visibility, pride, ... |
But everyone who contributed has the same right to get ownership, visibility, etc.? Since I wanted to avoid defining what an "important" contribution is, the best solution IMO is to remove author names from source files and try to add other means to provide these perks. For everyone. |
Well, my point is that with all names on there, it would not work that way. But I do think that if someone made substantial changes to a part of the code base, they should be able to get recognition for that beyond having their name in a long list of names or not mentioned at all. |
Having observed this a bit, I want to revise my opinion. I think we should:
|
I agree with @sappelhoff 100%! |
While I do agree, I also see the difficulty here that @cbrnr mentioned: how do we quantify that? How big or important is "substantial"? Changed lines of code (how many?)? Performance improvements (how much faster?)? Continuous maintenance of a module (for how long?)? Do the barriers we set here depend on seniority of the contributor (beginners -> lower, senior devs -> higher barrier?)? There must be a clear path that contributors can follow to get their name added at the top-level. Otherwise it'll be a recipe for conflict and disappointment. |
@hoechenberger as mentioned in one of my earlier comments, I agree with that as well. |
I am very sympathetic to and supportive of these points. But until we have a proposal for how "substantial" is defined and adjudicated, I feel I cannot support inclusion/exclusion of any names --- either show all names, or show none. I think several of us agree that letting the contributor decide whether or not to include their name isn't a very good solution: it disadvantages anyone feeling shy/humble/insecure and advantages anyone feeling arrogant/confident/entitled. But I also really don't want the mechanism to be "whoever is reviewing the PR decides" --- clearly there are differences of opinion among members of our steering council about how this should be handled, and whether a contributor gets credit in this way should not be determined by luck of the draw in who is available for PR reviews.
This is already available: https://github.com/mne-tools/mne-python/commits?author=sappelhoff
See above point; "loose guideline" does not, IMO, eliminate the bias toward advantaging some contribs and disadvantaging others, and it adds overhead for devs to adjudicate (or at least "keep an eye out" for whether names were (not) added (in)appropriately). |
@agramfort, @drammock and I did a little bit of investigation. When you google authors (we tried @jasmainak and @britta-wstnr for example) you do not get links to source files, even if you go pretty deep in the results. You do (sometimes as the first result!) get links to examples/ and tutorials/. Based on this, the cumulative discussion and ideas above, and some additional discussion today with @britta-wstnr, @wmvanvliet, and @jasmainak (hooray sprint time making chatting much easier!) we came up with the following tentative plan.
Hopefully this preserves credit "backward compatibility" sufficiently while improving things overall, especially in terms of needing to make decisions about what to include or exclude with each contribution/PR. Someone (me probably!) "just" needs to implement this stuff but really I don't think it will be too onerous having played around a bit in #11975 and with GitHub actions recently. And critically it's extensible and adjustable as needed, for example we can add sections on grant support and Discourse activity that we currently don't credit sufficiently (if at all). |
Meanwhile, Scikit-Learn has had a similar discussion, and they decided to phase out individual author names in source files: https://blog.scikit-learn.org/updates/authorship-info/ |
Do you think we could find a solution for the upcoming release? My simplified proposal is:
|
I think everyone is on board with (1) assuming there is some suitable replacement for the source-file credit, but I don't think (2) qualifies as a suitable replacement according to the previous discussions. I have some code somewhere sitting around for #11605 (comment) that I can try to polish, or at least open as a WIP PR. That way if someone is really motivated to work on this (sounds like you might be @cbrnr ?) then they could push it over the finish line. |
I thought that in the light of the latest developments in Scikit-Learn, we might be able to reconsider (2). They don't implement such a highly specific credit system, and I don't know any other project that does. So I'm really proposing to follow what other projects are doing to avoid creating too much work for us devs. |
I think any project that uses https://allcontributors.org/ or https://github.com/mntnr/name-your-contributors is probably doing a better job than either us or sklearn. So while I'm all in favor of looking to other members of our ecosystem for inspiration on ways we could improve, it's not clear to me that emulating sklearn is an improvement. As I said in #11605 (comment):
which @cbrnr agreed with (at the time at least). If we're going to do incremental steps, let's make sure they're steps forward from the (possibly marginalized/disadvantaged/unprivileged) contributors' point of view, not only from the maintenance burden for current devs view. |
Then just keep the contributors on the main page? I feel like we already list all contributors in the release notes, and we specifically highlight new contributors, so I think there's no need for more complexity. |
Why not go all-in and write some tooling that extracts the contributors to each source file from Git and then adds all of them to the top of the file. We now have experience with CI-generated file changes with autofix.ci. i still think there are much better ways to create contributor visibility but we've embarrassingly been unable to reach a consensus in over 9 months. FWIW MNE-BIDS just removed individual authors. |
I don't like the idea of more tooling. Maybe this has been mentioned before, but why is https://github.com/mne-tools/mne-python/graphs/contributors not sufficient? It shows all contributors with commit-level granularity. |
What some maybe fail to realize is that the current status quo is deeply unfair and inconsistent. So it doesn't really help to qualify all suggestions as steps in the wrong direction, because doing nothing is actually larger steps in the wrong direction IMO. |
I agree |
to explain, I have removed them (with approval) for these reasons (which have all been mentioned above):
I find it very important that FOSS contributors receive credit for their contributions, and I believe this can be accurately tracked by existing tools (which have also all been mentioned above):
☝️ picking one of these methods is bound to give a contributor the granularity at which they want to advertise their contributions. I also believe that at least a few of these methods are suitable for non-technically versed people who want to judge the amount of a person's contributions. For the future, I wish that we could become better (at MNE-BIDS) at tracking different kinds of contributions (contributor roles), that are potentially not picked up by the git history (e.g., informal conversations at meetings, help in the user forum, offline design work that is then committed by someone else, ...). A promising direction in that regard is the potential inclusion of "contributor roles" within the CITATION.cff standard. Note that we already add authors to the changelog if they haven't committed but contributed in another way:
I furthermore think MNE-BIDS would profit from having a dedicated "contributors" page, like MNE-Python has on the bottom of the landing page. Please note that all that I mention above is with regards to MNE-BIDS, which is a much smaller code base with much fewer contributors, so we can't copy all arguments one to one to MNE-Python. |
A real-time discussion with @britta-wstnr and @larsoner today yielded the following proposal:
@larsoner and I are willing to lead the effort on building this, though @hoechenberger has indicated willingness to work on a "certificate of contribution" so hopefully this is close enough to that idea that he'd be willing to help out with this approach too. |
@drammock I'm happy to work on a certificate template and also to help develop and deploy the API, I've gained some experience in both backend and frontend development including cryptography over the past couple of years let me know if we should connect for a call to discuss and assign tasks Great to hear we're finally moving forward here! |
Sounds like a plan! I'm fine with reporting merged PRs, but just to clarify:
This means that after we switched to squash-merging, every merged PR equals a single commit, right? So technically, the imbalance would not affect any new contributors, just very old contributions, right? |
exactly right. Counting commits biases in favor of contributions prior to the squash-merge transition, and unevenly so (bias is stronger if a person commits frequently, regardless of how complex or extensive the work was) |
Currently, our source files show individual authors as well as the license in a comment near the top, for example:
I think this is not ideal for the following reasons:
I suggest that we replace our headers with the following license text across all source files:
The exact wording can be changed of course (feel free to make suggestions), but the point is to get rid of individual authors.
The text was updated successfully, but these errors were encountered: