Fix #5499: Include in pip's User-Agent whether it looks like pip is in CI #6273

cjerdonek · 2019-02-16T23:40:28Z

cjerdonek · 2019-02-16T23:44:48Z

By the way, when writing this PR I noticed that test_user_agent() wasn't actually testing anything because the assert was missing.

So that is fixed now, too.

gaborbernat

We should document inline what keys cover what ci managers, maybe link to their documentation. Also maybe this will require them to be whitelisted in tox.

cjerdonek · 2019-02-17T13:07:41Z

@gaborbernat I can provide examples for each key, but it may be hard to be comprehensive, and it can change over time as CI companies go in and out of business. Links to the online documentation of companies can also change. What about linking to the original issue number where there is more info?

Regarding Tox, it wouldn't be required as the purpose of this PR is to provide some indication. There can certainly be false negatives, but it is unlikely for there to be false positives. In that sense, this information would provide a lower bound for CI hits.

gaborbernat · 2019-02-17T13:12:13Z

Well, my worry is more along the lines did we leave some out? When should we remove some when CI companies migrate. As it is the list seems like an arbitrary element, does this cover all popular CIs? (at least per the latest user survey? + Github Actions)

PS. tox is always written with small t.

gaborbernat · 2019-02-17T13:14:04Z

src/pip/_internal/download.py

+    Return whether it looks like pip is running under CI.
+    """
+    return (
+        'BUILD_BUILDID' in os.environ or


seems kinda repetitive, maybe instead any(key in os.environ for key in ('A', 'B))

gaborbernat · 2019-02-17T13:16:44Z

tests/unit/test_download.py

+    user_agent = PipSession().headers["User-Agent"]
+
+    assert user_agent.startswith("pip/%s" % pip.__version__)
+    assert '"ci":true' in user_agent or '"ci":false' in user_agent


I would prefer to test exhaustively here, some kind of parametrize per key exists/missing 🤔 as per this implementation the test would pass if someone writes either return True or return False inside looks like CI 🤔 such regressions would go without much notice

cjerdonek · 2019-02-17T13:20:41Z

Well, my worry is more along the lines did we leave some out?

As I said, it can't hurt if something is left out. At worst, you won't get as much information. Also, it doesn't hurt to leave in a variable. I will add a code comment so you can see some of the CI systems that are covered.

gaborbernat · 2019-02-17T13:27:14Z

@cjerdonek I was thinking to provide a list of covered CIs more along the lines to keep the list easy to maintain, easy to see if we left out some important ones or not. 👍 Granted we can't cover all, but we should strive for as much as possible to make these numbers relevant.

dstufft · 2019-02-17T14:19:40Z

My main concern with something like this is whether "false negatives" matter or not. Right now it's very easy to say that the download stats represent all of downloads from PyPI and that we don't attempt to interpret the results, leaving that up to people who are instead actually querying the data.

I think that by explicitly stating that some downloads are known to be from CI, the implication most people will assume is that the other downloads are not from CI, but the truth is the other downloads we just don't know if they're from CI or not. I think this is made worse by the fact this PR currently implements the "other" case as specifying ci = False when in reality it's more like ci = None (or no CI at all).

It might also be time to formalize pip's user agent in a PEP and make it generic so that other projects can opt-in to providing the same information.

cjerdonek · 2019-02-17T14:19:56Z

Updated.

cjerdonek · 2019-02-17T14:25:47Z

I think that by explicitly stating that some downloads are known to be from CI, the implication most people will assume is that the other downloads are not from CI,

Isn't this just a matter of documentation, or choosing a different name for the key? Like what about "may-be-ci" or "possible-ci" or "looks-like-ci" etc? It doesn't really matter to me. What would you prefer for the name? Re: the "other" case, the false value is meant to convey that pip ran its check and that it returned false.

[Edit: Or "has-ci-env-var"]

pradyunsg · 2019-02-17T14:32:35Z

Maybe ci = "unknown".

pradyunsg · 2019-02-17T14:33:50Z

It might also be time to formalize pip's user agent in a PEP and make it generic so that other projects can opt-in to providing the same information.

+1 to this idea.

dstufft · 2019-02-17T14:36:03Z

I think the name is OK as is. I’m just raising it as a possible concern. I do think that the False value is particularly misleading though, would would prefer either an explicit None or just omitting the key all together.

cjerdonek · 2019-02-17T14:40:55Z

would would prefer either an explicit None or just omitting the key all together.

Okay, I'll change it to None, then, so it's easier to know when the check has been run.

gaborbernat

Slight change suggestions to make it easier to maintain, but otherwise looks good.

tests/unit/test_download.py

src/pip/_internal/download.py

cjerdonek · 2019-02-18T07:34:57Z

Thanks a lot for your detailed reviews and suggestions, @gaborbernat.

ssbarnea

There is a much easier way to detect automation: sys.stdout.isatty().

isatty will be true only for terminals, so any scripts/ci will not use it. Not neet to try to be too smart and guess every possible CI systems in the wild

pradyunsg · 2019-02-18T18:24:12Z

isatty will be true only for terminals, so any scripts/ci will not use it.

On Travis CI, it does return True.

Not sure what the statement after that was for -- I'll politely suggest you read the discussion in the corresponding issue for this.

cjerdonek · 2019-02-18T18:24:42Z

@ssbarnea That suggestion was already raised here: #5499 (comment)
The problem with it is that it can have false positives, which means we can’t rely on it to say anything definitive. The environment variable approach on the other hand wouldn’t, which means we can use it to get a reliable lower bound.

cjerdonek · 2019-02-18T18:31:39Z

I suppose it wouldn’t hurt to add a code comment mentioning for future maintainers why that approach wasn’t used.

cjerdonek · 2019-02-19T03:29:05Z

I added a code comment saying why we don't check for a tty.

pradyunsg

LGTM tho, I suggest waiting for another review on this.

ssbarnea · 2019-02-20T18:27:39Z

AFAIK Travis is the only CI that simulates a TTY during execution so I would find this logic as needing less maintenance: just check for travis envvar, for others it will work fine with istty().

pradyunsg · 2019-02-20T19:06:16Z

I wouldn't mind including a separate UA item for isatty(). Let's keep that separate from the CI env-var check.

cjerdonek · 2019-02-20T20:49:03Z

@ssbarnea There are many CI systems (including probably many we don’t know about), and their behavior can also change over time, so I don’t think we can say that for sure. It would be nice not to introduce a source of false results. I like @pradyunsg’s suggestion to make it another improvement to consider.

ssbarnea · 2019-02-21T12:25:02Z

I think we got lost into details here: I though that we wanted to know if pip was used in an interactive way (where user can interact with it, like answering a prompt...) or not. Any CD/CI would count as an non-interactive executions mode, but the same applies for scripted runs on individual machines if the output is processed by another tool (including a simple redirection to a file.

If I do some-cmd 2>&1 >result.log it also counts as non-interactive execution of some-cmd, same as CI and I can assure you that there is not "CI" variable present in this way. Be sure that some-cmd will return False for isatty() when run like this.

So, I would not say that using isatty() would lower the quality of differentiation but the opposite, it would make the split better.

For start I could see clear problems with current proposed change:

Does not detect Zuul CI - ZUUL_CHANGE is one of the vars defined by it
Choles if CI=false or no, or 0. I bet there are people doing this somewhere.

cjerdonek · 2019-02-22T07:01:03Z

I though that we wanted to know if pip was used in an interactive way (where user can interact with it, like answering a prompt...) or not.

My understanding of the original issue was to provide the ability to filter out (at least a good portion of) pip invocations due to automation. In my view, non-interactive executions on individual machines should not be included in this group.

In any case, regardless of what constitutes "automation," the PR that I submitted here is focused only on detecting a CI environment (which is also why the key was named ci). A key-value storing whether a tty is present could perhaps also be useful, but it is not the subject of this PR and so should be considered separately IMO as @pradyunsg and I have already suggested.

Regarding ZUUL_CHANGE and the possibility of CI=false, etc, I'm willing to make changes like those, but I'd like to hear first from those who already approved the change so I don't make unnecessary changes, or make changes that would cause them no longer to approve. Also, I would like to emphasize that this PR doesn't have to be complete with respect to detecting CI for the information to be useful. This is also mentioned in the code comments. Even if Zuul CI or other systems we're not aware of were left out, it would still let us filter out a large number of CI runs.

cjerdonek · 2019-02-24T22:20:39Z

With three approvals now from PyPA members (as well as myself), I'm going to merge this now. I'm happy to consider either or both of @ssbarnea's two suggestions (ZUUL_CHANGE and the possibility of CI=false, etc) as a follow-on change if any others think it would be useful.

Storing isatty() in the User Agent as an additional piece of info is something else that can be considered as a separate PR. (I'm not planning to create that PR myself though.)

pradyunsg · 2019-02-25T04:41:26Z

Sounds good to me @cjerdonek! :)

lock · 2019-05-28T23:11:09Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

cjerdonek force-pushed the issue-5499-detect-ci-for-user-agent branch from 2a5cc25 to ca4679a Compare February 16, 2019 23:42

cjerdonek mentioned this pull request Feb 16, 2019

Differentiating organic vs automated installations #5499

Closed

cjerdonek force-pushed the issue-5499-detect-ci-for-user-agent branch from ca4679a to 3580a5b Compare February 16, 2019 23:48

cjerdonek added type: enhancement Improvements to functionality C: download About fetching data from PyPI and other sources type: maintenance Related to Development and Maintenance Processes labels Feb 16, 2019

cjerdonek force-pushed the issue-5499-detect-ci-for-user-agent branch from 3580a5b to 3d69354 Compare February 17, 2019 00:00

gaborbernat reviewed Feb 17, 2019

View reviewed changes

cjerdonek force-pushed the issue-5499-detect-ci-for-user-agent branch from 3d69354 to 883421e Compare February 17, 2019 14:17

cjerdonek force-pushed the issue-5499-detect-ci-for-user-agent branch 4 times, most recently from 391af7b to 14a6aaa Compare February 17, 2019 15:15

Include in pip's User-Agent whether it looks like pip is in CI.

14a6aaa

gaborbernat approved these changes Feb 17, 2019

View reviewed changes

tests/unit/test_download.py Outdated Show resolved Hide resolved

tests/unit/test_download.py Outdated Show resolved Hide resolved

src/pip/_internal/download.py Outdated Show resolved Hide resolved

gaborbernat approved these changes Feb 18, 2019

View reviewed changes

ssbarnea suggested changes Feb 18, 2019

View reviewed changes

cjerdonek force-pushed the issue-5499-detect-ci-for-user-agent branch from cf1205d to a229f11 Compare February 19, 2019 03:28

Address review comments.

a229f11

pradyunsg approved these changes Feb 20, 2019

View reviewed changes

xavfernandez approved these changes Feb 24, 2019

View reviewed changes

cjerdonek merged commit 821247d into pypa:master Feb 24, 2019

cjerdonek deleted the issue-5499-detect-ci-for-user-agent branch February 24, 2019 22:25

cjerdonek mentioned this pull request May 23, 2019

Check for explicit PIP_IS_CI environment variable to report automated installs to Warehouse. #6522

Merged

lock bot added the auto-locked Outdated issues that have been locked by automation label May 28, 2019

lock bot locked as resolved and limited conversation to collaborators May 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #5499: Include in pip's User-Agent whether it looks like pip is in CI #6273

Fix #5499: Include in pip's User-Agent whether it looks like pip is in CI #6273

cjerdonek commented Feb 16, 2019

cjerdonek commented Feb 16, 2019

gaborbernat left a comment

cjerdonek commented Feb 17, 2019

gaborbernat commented Feb 17, 2019

gaborbernat Feb 17, 2019

gaborbernat Feb 17, 2019

cjerdonek commented Feb 17, 2019

gaborbernat commented Feb 17, 2019

dstufft commented Feb 17, 2019

cjerdonek commented Feb 17, 2019

cjerdonek commented Feb 17, 2019 •

edited

Loading

pradyunsg commented Feb 17, 2019

pradyunsg commented Feb 17, 2019

dstufft commented Feb 17, 2019

cjerdonek commented Feb 17, 2019

gaborbernat left a comment

cjerdonek commented Feb 18, 2019

ssbarnea left a comment

pradyunsg commented Feb 18, 2019

cjerdonek commented Feb 18, 2019

cjerdonek commented Feb 18, 2019

cjerdonek commented Feb 19, 2019

pradyunsg left a comment

ssbarnea commented Feb 20, 2019

pradyunsg commented Feb 20, 2019

cjerdonek commented Feb 20, 2019

ssbarnea commented Feb 21, 2019

cjerdonek commented Feb 22, 2019

cjerdonek commented Feb 24, 2019

pradyunsg commented Feb 25, 2019

lock bot commented May 28, 2019

Fix #5499: Include in pip's User-Agent whether it looks like pip is in CI #6273

Fix #5499: Include in pip's User-Agent whether it looks like pip is in CI #6273

Conversation

cjerdonek commented Feb 16, 2019

cjerdonek commented Feb 16, 2019

gaborbernat left a comment

Choose a reason for hiding this comment

cjerdonek commented Feb 17, 2019

gaborbernat commented Feb 17, 2019

gaborbernat Feb 17, 2019

Choose a reason for hiding this comment

gaborbernat Feb 17, 2019

Choose a reason for hiding this comment

cjerdonek commented Feb 17, 2019

gaborbernat commented Feb 17, 2019

dstufft commented Feb 17, 2019

cjerdonek commented Feb 17, 2019

cjerdonek commented Feb 17, 2019 • edited Loading

pradyunsg commented Feb 17, 2019

pradyunsg commented Feb 17, 2019

dstufft commented Feb 17, 2019

cjerdonek commented Feb 17, 2019

gaborbernat left a comment

Choose a reason for hiding this comment

cjerdonek commented Feb 18, 2019

ssbarnea left a comment

Choose a reason for hiding this comment

pradyunsg commented Feb 18, 2019

cjerdonek commented Feb 18, 2019

cjerdonek commented Feb 18, 2019

cjerdonek commented Feb 19, 2019

pradyunsg left a comment

Choose a reason for hiding this comment

ssbarnea commented Feb 20, 2019

pradyunsg commented Feb 20, 2019

cjerdonek commented Feb 20, 2019

ssbarnea commented Feb 21, 2019

cjerdonek commented Feb 22, 2019

cjerdonek commented Feb 24, 2019

pradyunsg commented Feb 25, 2019

lock bot commented May 28, 2019

cjerdonek commented Feb 17, 2019 •

edited

Loading