Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --exclude #9992

Merged
merged 22 commits into from
Feb 10, 2021
Merged

Add --exclude #9992

merged 22 commits into from
Feb 10, 2021

Conversation

hauntsaninja
Copy link
Collaborator

Resolves #4675, resolves #9981

Additionally, we always ignore site-packages and node_modules.
Also note that this doesn't really affect import discovery; it only
directly affects passing files or packages to mypy.

The additional check before suggesting "are you missing an
init.py" didn't make any sense to me, so I removed it, appended to
the message and downgraded the severity to note.

Resolves python#4675, resolves python#9981

Additionally, we always ignore site-packages and node_modules.
Also note that this doesn't really affect import discovery; it only
directly affects passing files or packages to mypy.

The additional check before suggesting "are you missing an
__init__.py" didn't make any sense to me, so I removed it, appended to
the message and downgraded the severity to note.
@hauntsaninja hauntsaninja requested a review from JukkaL January 29, 2021 08:16
@github-actions

This comment has been minimized.

@hauntsaninja hauntsaninja marked this pull request as draft January 29, 2021 19:26
hauntsaninja added 2 commits January 29, 2021 15:49
This shouldn't actually come up in practice, since we use absolute paths
pretty aggressively.
@python python deleted a comment from github-actions bot Jan 30, 2021
@python python deleted a comment from github-actions bot Jan 30, 2021
@python python deleted a comment from github-actions bot Jan 30, 2021
@python python deleted a comment from github-actions bot Jan 30, 2021
@python python deleted a comment from github-actions bot Jan 30, 2021
@python python deleted a comment from github-actions bot Jan 30, 2021
@hauntsaninja hauntsaninja marked this pull request as ready for review January 30, 2021 00:47
@python python deleted a comment from github-actions bot Jan 30, 2021
("a2.b.f", "/pkg"),
]

options.ignore_path = ["b/c"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test case I'm not sure about.
We could complicate the behaviour, so that paths with slashes in the beginning or middle are treated as fixed paths to ignore, as opposed to patterns. That is, if there's a slash at the beginning it's just an absolute path and we ignore it. If there's a slash in the middle, then we treat it as a fixed path relative to the current directory and ignore it.
As opposed to right now, where we just treat "b/c" as a suffix and use it to ignore "CWD/x/y/b/c" as well as "CWD/b/c"

@intgr
Copy link
Contributor

intgr commented Jan 30, 2021

There's a bikeshedding opportunity here for (re)naming --ignore-path :)

Arbitrary survey of tools that I commonly use:

  • black uses --exclude
  • flake8 uses --exclude (flake8 --ignore refers to ignoring error codes, not paths)
  • pytest uses --ignore, --ignore-glob
  • coverage.py uses --omit
  • Some Unix tools (tar, rsync) use --exclude. But recursive ls uses --ignore

I find that he word "ignore" could also refer to scanning files and ignoring errors therein. Like the existing --ignore-missing-imports or ignore_errors= in mypy.ini

I would slightly prefer --exclude or --exclude-path, as there is less potential for this confusion.

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Jan 30, 2021

Sure, --exclude makes sense to me. The path in --ignore-path could be somewhat misleading, since maybe people could interpret that as a fixed path. I just went ahead with it as a slightly more general rename of the --ignore-dir that @JukkaL proposed in the issue. I'll change it to whatever Jukka decides.

I'll also call out that many other tools support regexes or globs. We can do this too, but it's not critical and I figure it's something we could always choose to add support for later.

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Jan 30, 2021

Oh also I guess I never fully replied to #9981 (comment).
I'm also happy to provide a PR that reverts some of these changes... Although if we were to revert I feel fairly strongly that we should continue to recursively explore with --namespace-packages (despite that being ad hoc and not a complete workaround), since the behaviour is really quite broken without it and it'd be hard to communicate to users what's expected to work and what's not on which version.

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 1, 2021

I like --exclude. I also think that regular expressions would be good to support, since adding support later can break existing exclude patterns, especially on Windows, since \ is special in regular expressions. Regular expressions would probably also be faster to match, and they should not be hard to implement.

I actually like how black does this, and I'd prefer to have something fairly similar. Here are some thoughts:

  • If we'd use regexps, then only a single value for --exclude would ever be needed, as | would work. This would simplify things a little.
  • --exclude could override the default exclusions instead of building on top of them. This could be useful in case our defaults don't work for some people. This would also speed up matching, since there would always be exactly one pattern.
  • By using re.compile beforehand, matching should be pretty fast.
  • If we'd always use / for path separators (even on Windows), the same exclude patterns would work across platforms. This is also nice on Windows, since otherwise \ would have to be escaped, which is awkward in a regexp.

For example, the default pattern could be something like /(site-packages|node_modules|\.[^/]*)/. This works assuming directory paths always include a / suffix. We'd also need to make sure there's also / as a prefix.

Other things that could be useful to ignore by default (and black ignores these):

  • build (setup.py can create this)
  • dist (setup.py can create this)

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 1, 2021

Also, when using verbose logging (-v), it would be good to log all excluded paths.

@intgr
Copy link
Contributor

intgr commented Feb 1, 2021

regular expressions would be good to support, since adding support later can break existing exclude patterns

Alternatively regex excludes could be a separate option (--exclude, --exclude-regex). But I have no preference on this matter.

@marctorsoc
Copy link

I like --exclude. I also think that regular expressions would be good to support, since adding support later can break existing exclude patterns, especially on Windows, since \ is special in regular expressions. Regular expressions would probably also be faster to match, and they should not be hard to implement.

I actually like how black does this, and I'd prefer to have something fairly similar. Here are some thoughts:

  • If we'd use regexps, then only a single value for --exclude would ever be needed, as | would work. This would simplify things a little.
  • --exclude could override the default exclusions instead of building on top of them. This could be useful in case our defaults don't work for some people. This would also speed up matching, since there would always be exactly one pattern.
  • By using re.compile beforehand, matching should be pretty fast.
  • If we'd always use / for path separators (even on Windows), the same exclude patterns would work across platforms. This is also nice on Windows, since otherwise \ would have to be escaped, which is awkward in a regexp.

For example, the default pattern could be something like /(site-packages|node_modules|\.[^/]*)/. This works assuming directory paths always include a / suffix. We'd also need to make sure there's also / as a prefix.

Other things that could be useful to ignore by default (and black ignores these):

  • build (setup.py can create this)
  • dist (setup.py can create this)

I think there are two paths here:

  • do the simple case now (pass a list of directories to ignore + exclude the typical directories most people want to ignore) and work on a better version with regex, etc in a followup PR
  • or do the one with all features within this PR

I would advocate for the former, as even though it would create some dirtier configs, it allows delivering something faster. Regex can be added as a separate option, as @intgr suggests. WDYT?

@JukkaL
Copy link
Collaborator

JukkaL commented Feb 1, 2021

I'd rather avoid the need for two options. We already have a large number of command-line options, and making sure all combinations of options work well is nontrivial.

I think that it's also fine to turn off the new behavior (at least when not using --namespace-packages) in the next release and have the full-featured implementation supporting more complex patterns in the release after that. We can possibly also ignore some hard-coded paths in the first version but not make it configurable.

@python python deleted a comment from github-actions bot Feb 3, 2021
@python python deleted a comment from github-actions bot Feb 3, 2021
@python python deleted a comment from github-actions bot Feb 3, 2021
@python python deleted a comment from github-actions bot Feb 3, 2021
@python python deleted a comment from github-actions bot Feb 3, 2021
@JelleZijlstra
Copy link
Member

I'm a bit late to the party, but for reference Black also got a feature request to support extending the default exclusions (psf/black#1571). I agree that for mypy it makes sense to default to extending the default excludes.

@python python deleted a comment from github-actions bot Feb 3, 2021
@python python deleted a comment from github-actions bot Feb 3, 2021
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Looks good, just a few small nits. I'll probably do the next mypy release next week (Mon, Tue or Wed), if everything goes as planned. If you don't have time to update the PR before that, I can do the minor tweaks myself.

docs/source/command_line.rst Outdated Show resolved Hide resolved
mypy/test/test_find_sources.py Outdated Show resolved Hide resolved
mypy/test/test_find_sources.py Outdated Show resolved Hide resolved
@hauntsaninja
Copy link
Collaborator Author

Changes made!

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2021

Diff from mypy_primer, showing the effect of this PR on open source code:

xarray (https://github.com/pydata/xarray.git)
- conftest.py: error: Are you missing an __init__.py?
+ conftest.py: note: Are you missing an __init__.py? Alternatively, consider using --exclude to avoid checking one of them.

bandersnatch (https://github.com/pypa/bandersnatch.git)
+ src/bandersnatch/tests/plugins/test_allowlist_name.py: note: Are you missing an __init__.py? Alternatively, consider using --exclude to avoid checking one of them.

mitmproxy (https://github.com/mitmproxy/mitmproxy.git)
- examples/addons/anatomy.py: error: Are you missing an __init__.py?
+ examples/addons/anatomy.py: note: Are you missing an __init__.py? Alternatively, consider using --exclude to avoid checking one of them.

optuna (https://github.com/optuna/optuna.git)
+ examples/pruning/simple.py: note: Are you missing an __init__.py? Alternatively, consider using --exclude to avoid checking one of them.


ibis (https://github.com/ibis-project/ibis.git)
+ ibis/backends/hdf5/tests/conftest.py: note: Are you missing an __init__.py? Alternatively, consider using --exclude to avoid checking one of them.

@MatthiasLohr
Copy link

Any idea when this get merged and released? mypy checks are broken for me since 0.800 and I need this option to fix this.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, looks great!

@JukkaL JukkaL merged commit 6bfc2db into python:master Feb 10, 2021
@JukkaL
Copy link
Collaborator

JukkaL commented Feb 10, 2021

@MatthiasLohr I'm going to start to prepare the release now. It should be out today/tomorrow, assuming I encounter no unexpected issues.

@JukkaL JukkaL mentioned this pull request Feb 10, 2021
JukkaL pushed a commit that referenced this pull request Feb 10, 2021
Resolves #4675, resolves #9981.

Additionally, we always ignore site-packages and node_modules,
and directories starting with a dot. Also note that this doesn't really 
affect import discovery; it only directly affects passing files or 
packages to mypy.

The additional check before suggesting "are you missing an
__init__.py" didn't make any sense to me, so I removed it, appended to
the message and downgraded the severity to note.

Co-authored-by: hauntsaninja <>
@MatthiasLohr
Copy link

Awesome, thanks!

@hauntsaninja hauntsaninja deleted the ignore branch February 10, 2021 17:59
@intgr
Copy link
Contributor

intgr commented Mar 10, 2021

I wanted to say (better late than never): thank you so much for implementing this capability on such short notice!

Not only does this solve #9981, it almost halves the time taken for a full mypy run in our project, since we can exclude machine-generated files like test snapshots.

@rgommers
Copy link

rgommers commented Apr 9, 2021

The way this option is implemented as a regex makes use of it in mypy.ini quite weird - a single regex is inconsistent with how files works, and doesn't look scalable on a large code base. Here's what I just had to do in a PyTorch PR:

files =
    torch,
    caffe2,
    test/test_bundled_images.py,
    test/test_bundled_inputs.py,
    test/test_complex.py,
    test/test_datapipe.py,
    test/test_expecttest.py,
    test/test_futures.py,
    test/test_numpy_interop.py,
    test/test_torch.py,
    test/test_type_hints.py,
    test/test_type_info.py,
    test/test_utils.py,
    tools/clang_format_utils.py,
    tools/clang_tidy.py,
    tools/generate_torch_version.py,
    tools/stats_utils/*.py

#
# `exclude` is a regex, not a list of paths like `files` (sigh)
#
exclude = torch/include/|torch/csrc/|torch/distributed/elastic/agent/server/api.py

Using exclude as a list of paths doesn't work.

Is this intentional, and if not should I open a new issue?

@hauntsaninja
Copy link
Collaborator Author

The initial implementation didn't use regexes, but you can see Jukka's reasoning for regexes here: #9992 (comment) (tldr; matches how other tools do it, people are going to want it and it'd be a breaking change later)
Note a list of exact paths would be annoying here, e.g. with exact paths you wouldn't be able to do things like exclude all files named "setup.py" no matter where in a large directory tree they might be, so we weren't ever going for exact "files"-like semantics.

I'd open an issue for this. I think a good path forward here for making long regex's readable is allowing a list of regexes / the flag to be specified multiple times, and just chaining them together with "or"s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants