-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surprising config file resolution behavior #1069
Comments
The current behavior is: take the common ancestor of all provided files, then find the most specific Need to think on the right change here. |
IMO the best approach is to always use the most specific config for every file processed (allowing multiple configs to be used in a given run for different files). Together with config inheritance (#1055), this is most convenient from a user perspective, allowing a single run from the root to process every file with appropriate config. |
Yeah I think that's right. |
(Just not a trivial change so will take a bit of time.) |
Another quirk I've discovered is that the |
I'll try to prioritize this and get it done this week. |
If you can get config inheritance working that would be huge for us. Would go a long way towards preventing confusing and frustrating scenarios where implicit differences in config cause editor-driven and CLI or CI-driven invocations of I realize of course it's a tricky problem-- one that none of the other python linters have solved (to my knowledge). It leaves monorepos in a bad spot. |
Yeah I can totally empathize. Monorepos are amazing but tend to be so under-supported by tooling. Will try to make this happen soon! |
(I'm probably over-stating how much work this is, the code is actually pretty well set-up for it.) |
Noticed after updating to
No longer treats the |
I think you want to set this: [tool.ruff]
src = ["python_modules/dagster", "examples/docs_snippets"] Otherwise, it's as if you're trying to import |
Oh, hmm, I hadn't seen the
Instead the formatting is handled by:
For these other packages, and first party is correctly detected. I guess that seems unintuitive to me? |
Hmm, I don't think that's quite what's happening. If you're using In fact, it does look like src = ["python_modules/dagster", "python_modules/dagster-graphql", "examples/docs_snippets"] Which leads to this change: --- a/python_modules/dagster-graphql/dagster_graphql/implementation/fetch_runs.py
+++ b/python_modules/dagster-graphql/dagster_graphql/implementation/fetch_runs.py
@@ -1,7 +1,6 @@
from collections import defaultdict
from typing import TYPE_CHECKING, Dict, KeysView, List, Mapping, Sequence, cast
-from dagster_graphql.implementation.fetch_assets import get_asset_nodes_by_asset_key
from graphene import ResolveInfo
from dagster import (
@@ -16,6 +15,7 @@
from dagster._core.storage.pipeline_run import RunRecord, RunsFilter
from dagster._core.storage.tags import TagType, get_tag_type
from dagster._legacy import PipelineDefinition, PipelineRunStatus
+from dagster_graphql.implementation.fetch_assets import get_asset_nodes_by_asset_key So there are a few ways to solve this. Immediately, you could add your modules as |
I'm gonna try to write up a proposal for how this system change before I implement it, to ensure that it works well for individual projects and monorepos. |
My current plan is as follows:
A couple other relevant details:
One thing I'm unsure of is how the Thinking aloud for a sec, let's say I have the following setup:
Say, in my root [tool.ruff]
src = ["python_modules/dagster", "python_modules/dagit"] And in [tool.ruff]
extends = "../../pyproject.toml"
extend-ignore = ["E501"] The nice thing about this model is that running The confusing thing is that because we set To achieve that, we'd need to add A few ways to solve that problem:
|
@smackesey - No urgency at all but I'd like to ensure that the scoped work here leads to an improvement in Dagster's setup and workflow, so feedback is very welcome. |
Hey Charlie thanks for writing this up. In general I really like this plan as it incorporates basically everything I requested! Some specific questions/comments:
(1) Will (2) Would it be possible in this scheme to override (without the old mechanism of direct setting overwriting) an inherited
I think this is important since other tools may deal with
Before commenting, I need to clarify my own understanding of
|
Yeah. (Paths will be relative to the
Hah that's tricky. If you had this setup, you would end up ignoring
Yeah this makes sense (and it's not much effort to support).
It's probably easiest to explain it with a detailed example, and then maybe you can help me simplify that explanation. When we go to resolve an import, there are some trivial cases: if an import has a dot level ( If it doesn't fit one of those cases, we take the first part of the import -- so for We then iterate over the
If we can resolve one of these, we mark it as "first-party". Otherwise, it's "third-party". Your example above is correct, except that
|
I think this is important because you should always be able to enable a check from the CLI regardless of what is in your config files.
OK thanks for clarifying. So, I'm not sure of how
I would think the "first party" resolution logic from |
Let me do a bit of research into how |
I guess you could traverse upwards and look for |
Pretty sure every Python module has a unique
Once you have the name of the current module, you can easily tell if the module being imported from should be first-party or not (prior to applying the extra logic of So you would need to model the logic used to generate a module's |
Interesting, yeah you're right-- in that case, IMO this represents an an opportunity to improve on isort's behavior, which is not suited to monorepos. It is confusing to not see intra-package imports classified as first party by static tooling, because logically they are first-party. Let me just describe what I think is the ideal functionality, with some concrete examples from Dagster:
Regarding (1), this matters because a monorepo can easily contain, e.g. example project code (ours does) which should treat the main packages as third-party. (2) is tricky to nail 100% because a module's However, if you decide to go with a wholly explicit approach (all first party must be listed in
This would capture all of Dagster's packages. A caveat is that this overshoots a little bit-- like |
Another possibility, and I think you ultimately might need something like this anyway if you want to replicate e.g. |
Ok cool. This all makes sense. I'm leaning towards the |
I'll provide an update here when I have an estimate of when this should all ship. I'm gonna ship pieces of it incrementally. |
Another config-related item that would be useful-- environment variable interpolation, specifically for the planned
This becomes more useful the more repos you have. |
Alright, let's see how much of this I can get done today. |
Hierarchical settings (use the closest (I have to make some decisions around how Ruff behaves when it can't find a |
@smackesey - I think I'm likely to support glob patterns in |
This is coming along great and we'll merge to Dagster pretty soon. Just wondering if the |
@smackesey -- Na that hasn't gone out yet. But I did ship the |
Correct me if I'm wrong but the globbing just helps capture all of our packages under
That would mean that all of our examples and python modules would form one giant first-party blob right? So that would mean that, no matter which package I'm in, everything caught by the globs is first party. So if I were in a file What I'm looking to achieve is to treat only the current package (and possibly core |
Yeah I think what you're describing is correct (I thought you wanted all modules to use the same set of third- and first-party rules). |
So the idea was to use |
exactly |
Makes sense. I need to figure out if this is behavior that always applies, or is just the new default behavior for the |
IMO it should be on by default, but togglable with a setting. Although I can't think of why one would turn it off (doesn't mean there isn't a case!)-- it's basically the definition of "first party" to be in the current package-- and if you want to make other stuff first party too, you can always specify that explicitly. |
Sounds good. This won’t go out today, but it shouldn’t be too hard to implement. |
### Summary & Motivation Adopt [Ruff](https://github.com/charliermarsh/ruff) as our linter/import sorter. Turns on E (pycodestyle), F (pyflakes), I001 (isort), PL* (pylint), RUF (ruff-specific), T (print statement) rule categories with some selected exclusions. Benefits. Ruff: - Replaces pylint and isort. - Can autofix many detected errors, functionality that `pylint` lacks completely. - Seems 100-1000x faster than pylint (bc it's written in Rust instead of Python). This enables immediate in-editor feedback (there is a VSCode extension and you can also run in a generic language server like `null-ls`) and running on the whole repo as a pre-commit hook. `ruff ``git ls-files '*.py'``` runs on my dev machine near-instantaneously. - Implements rules not implemented in pylint (from flake8 and other Python linters). - Is very actively developed. We can probably get items on our linter wishlist seriously considered and probably implemented on a relatively short timeline. I have already had success with this with Ruff's creator, @charliermarsh. In contrast stuff sits on isort/pylint's issue tracker for years. - Allows us to run a single repo-wide lint step in BK instead of individual steps for every package (since ruff does not depend on Python env). - Has rules to lint/format docstrings, something we have no tool for at present. Current limitations. Ruff: - Does not yet support all of pylint's rules, (but it is moving towards that goal). - Does not yet support human-readable aliases for the numeric error flake8-style error codes it uses. These are much less readable than pylint codes. - Does not yet support in-file file/block/function-scoped error-specific ignores (e.g. the way you can put `# pylint: disable=print-call` etc at the top of a file to disable that rule for the whole file. Instead there are just line-scoped error-specific ignores and file-scoped total opting out of linting. You can also use the config file to ignore specific errors for specific file patterns, but this is often a lot less convenient than in-file comments. - Does not yet support custom plugins. We have a custom pylint plugin with a few rules related to `pendulum` that can't be ported. Notes: - This PR changes a ton of files but the vast majority of these are minor corrections to fix newly detected lint errors or the result of import sorting tweaks. All of the non-lint-error-fix logic is in separate commits: - Updates to dagster dependencies, tool config in pyproject.toml, dev environment setup script, Makefile. - Toxfile changes dropping pylint - Buildkite changes dropping pylint/isort and introducing ruff. - Lint fixes include: - Unused imports removed - Unused variables given `_` prefix - Convert `==/!= None` to `is/is not None` - Condensing of multiple `isinstance` clauses - Rename ambiguous variables (`l` is not allowed) - Many import sorting changes - `From X import Y as Z` imports now enforce one `Y as Z` import per line - Some subtle changes in import spacing (ruff does not _exactly_ match isort but it's pretty close) in certain cases - Stacked on top of this PR is one that turns on Ruff's docstring formatting rules, that has another large changeset. - I changed the structure of all of our export-only `__init__.py` files to match the style used in `dagster/__init__.py`, where there is no `__all__` and exports are redundantly aliased. This is important to allow ruff to auto-detect and auto-delete unused imports (redundant alias marks the import as both public for type checkers and "used" for ruff). **In the future, if you want to expose imported symbols from an `__init__.py`, they must be redundantly aliased or else ruff will auto-delete theme as unused imports.** - Because of the current lack of support for file/block/function-scoped error-specific ignores, I've left our `# pylint` comments in the code until ruff supports this and we can bulk-convert them to ruff equivalents. - There is now a single ruff step in BK instead of the different `pylint` steps for each package. This makes immediate BK output a bit less informative, but saves a lot of computation. You can just run `ruff` locally to get immediate output if the Ruff step fails. - Running ruff via your editor inside `docs_snippets` will probably return different results than running `make ruff` and `make check_ruff`. That's because `ruff` is discovering the `pyproject.toml` in docs snippets and not looking further up the tree to the root where our `ruff` config is (and which `make ruff` explicitly references). I have been talking through these issues with Ruff's creator and they will be resolved soon: see [here](astral-sh/ruff#1069 (comment)). Internal companion PR here: dagster-io/internal#4323 ### How I Tested These Changes BK.
### Summary & Motivation Adopt [Ruff](https://github.com/charliermarsh/ruff) as our linter/import sorter. Turns on E (pycodestyle), F (pyflakes), I001 (isort), PL* (pylint), RUF (ruff-specific), T (print statement) rule categories with some selected exclusions. Benefits. Ruff: - Replaces pylint and isort. - Can autofix many detected errors, functionality that `pylint` lacks completely. - Seems 100-1000x faster than pylint (bc it's written in Rust instead of Python). This enables immediate in-editor feedback (there is a VSCode extension and you can also run in a generic language server like `null-ls`) and running on the whole repo as a pre-commit hook. `ruff ``git ls-files '*.py'``` runs on my dev machine near-instantaneously. - Implements rules not implemented in pylint (from flake8 and other Python linters). - Is very actively developed. We can probably get items on our linter wishlist seriously considered and probably implemented on a relatively short timeline. I have already had success with this with Ruff's creator, @charliermarsh. In contrast stuff sits on isort/pylint's issue tracker for years. - Allows us to run a single repo-wide lint step in BK instead of individual steps for every package (since ruff does not depend on Python env). - Has rules to lint/format docstrings, something we have no tool for at present. Current limitations. Ruff: - Does not yet support all of pylint's rules, (but it is moving towards that goal). - Does not yet support human-readable aliases for the numeric error flake8-style error codes it uses. These are much less readable than pylint codes. - Does not yet support in-file file/block/function-scoped error-specific ignores (e.g. the way you can put `# pylint: disable=print-call` etc at the top of a file to disable that rule for the whole file. Instead there are just line-scoped error-specific ignores and file-scoped total opting out of linting. You can also use the config file to ignore specific errors for specific file patterns, but this is often a lot less convenient than in-file comments. - Does not yet support custom plugins. We have a custom pylint plugin with a few rules related to `pendulum` that can't be ported. Notes: - This PR changes a ton of files but the vast majority of these are minor corrections to fix newly detected lint errors or the result of import sorting tweaks. All of the non-lint-error-fix logic is in separate commits: - Updates to dagster dependencies, tool config in pyproject.toml, dev environment setup script, Makefile. - Toxfile changes dropping pylint - Buildkite changes dropping pylint/isort and introducing ruff. - Lint fixes include: - Unused imports removed - Unused variables given `_` prefix - Convert `==/!= None` to `is/is not None` - Condensing of multiple `isinstance` clauses - Rename ambiguous variables (`l` is not allowed) - Many import sorting changes - `From X import Y as Z` imports now enforce one `Y as Z` import per line - Some subtle changes in import spacing (ruff does not _exactly_ match isort but it's pretty close) in certain cases - Stacked on top of this PR is one that turns on Ruff's docstring formatting rules, that has another large changeset. - I changed the structure of all of our export-only `__init__.py` files to match the style used in `dagster/__init__.py`, where there is no `__all__` and exports are redundantly aliased. This is important to allow ruff to auto-detect and auto-delete unused imports (redundant alias marks the import as both public for type checkers and "used" for ruff). **In the future, if you want to expose imported symbols from an `__init__.py`, they must be redundantly aliased or else ruff will auto-delete theme as unused imports.** - Because of the current lack of support for file/block/function-scoped error-specific ignores, I've left our `# pylint` comments in the code until ruff supports this and we can bulk-convert them to ruff equivalents. - There is now a single ruff step in BK instead of the different `pylint` steps for each package. This makes immediate BK output a bit less informative, but saves a lot of computation. You can just run `ruff` locally to get immediate output if the Ruff step fails. - Running ruff via your editor inside `docs_snippets` will probably return different results than running `make ruff` and `make check_ruff`. That's because `ruff` is discovering the `pyproject.toml` in docs snippets and not looking further up the tree to the root where our `ruff` config is (and which `make ruff` explicitly references). I have been talking through these issues with Ruff's creator and they will be resolved soon: see [here](astral-sh/ruff#1069 (comment)). Internal companion PR here: https://github.com/dagster-io/internal/pull/4323 ### How I Tested These Changes BK.
Our repo has two pyproject.toml files with
[tool.ruff]
sections:pyproject.toml
("root")examples/docs_snippets/pyproject.toml
("docs_snippets")The logic that resolves the config file for any given invocation of
ruff
surprised me:So, for a given file in
examples/docs_snippets
, different runs ofruff
from the same cwd and with the same config files on disk will use differentpyproject.toml
files. It suprised me to have the config used for a particular file determined by whatever unrelated files also happen to be being processed instead of just (a) the cwd; and (b) location of existing config files.I don't have a strong opinion on how a config file is resolved, but IMO it should always resolve to the same one if the launch CWD and config files on disk are constant.
The text was updated successfully, but these errors were encountered: