Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Remove Type::None #14024

Merged
merged 4 commits into from
Nov 4, 2024
Merged

[red-knot] Remove Type::None #14024

merged 4 commits into from
Nov 4, 2024

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Oct 31, 2024

Summary

Removes Type::None in favor of KnownClass::NoneType.to_instance(…).

closes #13670

Test Plan

Existing tests pass.

@sharkdp sharkdp added the red-knot Multi-file analysis & type inference label Oct 31, 2024
@sharkdp sharkdp force-pushed the david/remove-type-none branch from 917ddd1 to 0534f31 Compare October 31, 2024 20:15
Copy link

codspeed-hq bot commented Oct 31, 2024

CodSpeed Performance Report

Merging #14024 will degrade performances by 4.32%

Comparing david/remove-type-none (c7d4bdb) with main (e302c2d)

Summary

❌ 1 (👁 1) regressions
✅ 31 untouched benchmarks

Benchmarks breakdown

Benchmark main david/remove-type-none Change
👁 red_knot_check_file[incremental] 4.4 ms 4.6 ms -4.32%

Copy link
Contributor

github-actions bot commented Oct 31, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Comment on lines 526 to 530
(Type::Instance(self_class), Type::Instance(target_class))
if self_class.is_known(db, KnownClass::NoneType) =>
{
target_class.is_known(db, KnownClass::NoneType)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove this branch too once we understand that NoneType is a subclass of object (my MRO PR is coming soon!!)

Copy link
Contributor Author

@sharkdp sharkdp Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is about NoneType <: NoneType. The X <: object case is handled further above.

I'm actually surprised that NoneType <: NoneType doesn't work without this. We have a other == self check in is_subclass_of on the ClassTypes. Is that not working correctly? Is it actually checking for equal classes or for some other identity. Like just comparing salsa IDs?

my MRO PR is coming soon!!

That will definitely help us remove some cases!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self == other is indeed just checking salsa IDs. But Salsa IDs are assigned by Salsa interning, which interns based on struct equality, so that should be the same thing as an Eq check.

It would be good to understand what's happening here. Let me know if you'd like another pair of eyes on it. Maybe it's related to the fact that there are multiple NoneType in typeshed? (The one in _typeshed and the one in types for more recent Python versions.)

Also, I think this branch should go into is_equivalent_to instead of is_subtype_of.

@sharkdp sharkdp force-pushed the david/remove-type-none branch from 0ff025a to 1735975 Compare October 31, 2024 20:58
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I wonder how much of the perf regression is due to the fact that currently I think every None ends up being a union of the None defined in _typeshed and the one defined intypes? This could mean we spend a lot more time in recursive union handling for things like is_subtype_of? If so, this should be largely mitigated by sys.version_info checking.

Comment on lines 526 to 530
(Type::Instance(self_class), Type::Instance(target_class))
if self_class.is_known(db, KnownClass::NoneType) =>
{
target_class.is_known(db, KnownClass::NoneType)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self == other is indeed just checking salsa IDs. But Salsa IDs are assigned by Salsa interning, which interns based on struct equality, so that should be the same thing as an Eq check.

It would be good to understand what's happening here. Let me know if you'd like another pair of eyes on it. Maybe it's related to the fact that there are multiple NoneType in typeshed? (The one in _typeshed and the one in types for more recent Python versions.)

Also, I think this branch should go into is_equivalent_to instead of is_subtype_of.

@MichaReiser
Copy link
Member

MichaReiser commented Nov 1, 2024

I took a look at the perf regression

What I spotted is that this branch now spends 5.6% of total time inside infer_function_definition_statement where we only spent 1.6% before.

Expanding the tree shows that the difference comes from infer_decorators that isn't present in the profile for main. I suspect that it comes from

@lru_cache(maxsize=None)
def cached_tz(hour_str: str, minute_str: str, sign_str: str) -> timezone:
    sign = 1 if sign_str == "+" else -1
    return timezone(
        timedelta(
            hours=sign * int(hour_str),
            minutes=sign * int(minute_str),
        )
    )

Inside infer_decorators there's now a call to Type::none which starts semantic indexing and what not of `lru_cache.

I tried to debug if we now infer a more accurate type for lru_cache but both main and this PR infer Todo for

from functools import lru_cache
from typing import reveal_type

reveal_type(lru_cache(maxsize=None))

The main thing that's puzzling me.... Why is the lru_cache type not cached? Shouldn't this bail out early instead of calling into semantic_index?

image

Ohh... never mind. I looked at the warmup phase instead of the incremental path.

This is for the incremental part

image

The finding is the same... and, for some reason, KnownClass::none is re-executing semantic indexing of the functools module which it should not... Not sure why. We should look into this

incremental check_types this PR

incremental check_types main

@MichaReiser
Copy link
Member

Looking at the salsa code. What this suggests is that we don't reuse the same salsa ids and, because of that have cache misses...

@MichaReiser
Copy link
Member

I'm not sure what's happening and/or if this is a bug in the benchmark. I copied the tomllib and manually ran the red knot CLI in watch mode and the salsa events indicate that it uses the memoized values except for _re.py that changed.

@MichaReiser
Copy link
Member

Okay, I might still have looked at the wrong profiles. I now created a PR to use a named closure for the incremental and warmup vs to avoid this in the future.

I haven't looked at it too closely because salsa's maybe changed after profiles are a bit of a pain but I strongly suspect that the difference mainly comes from that salsa now has to validate the ingredients for the module defining None? I suspect that addressing #13169 could help here as well.

@carljm
Copy link
Contributor

carljm commented Nov 1, 2024

the difference mainly comes from that salsa now has to validate the ingredients for the module defining None

Is this a speculation or is there evidence pointing to it? I couldn't really follow which comments here are based on looking at the wrong part of the profile :)

This could be, though. I was thinking we would already have a dependence on types module (because builtins imports it), but it's possible tomllib didn't actually resolve the type of any builtin depending on types module, so our lazy per-definition inference might have meant we previously didn't have to semantic-index types module.

If this is the cause of the regression, it's mostly just a weakness in our benchmark. Real-world codebases are likely to already rely on the types module in some way; adding a core typechecking dependency on it should not really be considered a real-world regression, if the regression just comes from "now we have to index that module."

I still think it's worth seeing whether Salsa-caching the resolution of the type of None makes a difference here. If the regression is entirely due to having all the new ingredients from the types module, then it won't.

@sharkdp sharkdp force-pushed the david/remove-type-none branch from f5f010f to c7d4bdb Compare November 4, 2024 11:04
@sharkdp
Copy link
Contributor Author

sharkdp commented Nov 4, 2024

Performance investigation

  • The original version of this PR had a -4% performance regression on the "incremental" as well as the "cold" benchmark (results). The "cold" benchmark was probably just slightly below the warning/error threshold.
  • Consequently, a rebase to main (after the merge of Give non-existent files a durability of at least Medium #14034 by @MichaReiser) did not help with improving performance. Both versions are still at -4% (results)
  • I then tried the suggestion by @carljm and added explicit salsa caching for the lookup of NoneType in 503aa46. The CI run now passes, but the performance regression is still there with -3% for both versions (results). Not sure if the resolution of these benchmarks is really sub-1%? If so, this is a small improvement.
  • I checked which modules were resolved when running red-knot on tomllib; On main, we do not load _typeshed; on this branch, we do (due to the explicit lookup of NoneType in typeshed). This alone could probably explain the increase in runtime.

This also explains why I didn't see a performance regression when comparing this branch with main on a larger codebase (black). Because we do load _typeshed even on main (for some reason) when running on black. Results:

Command Mean [ms] Min [ms] Max [ms] Relative
./red_knot_main --current-directory /home/shark/black 113.7 ± 3.4 105.5 120.0 1.00 ± 0.05
./red_knot_remove_none --current-directory /home/shark/black 113.2 ± 4.3 107.3 121.1 1.00

After talking to @MichaReiser, I removed the explicit caching of the NoneType lookup again.

@sharkdp sharkdp force-pushed the david/remove-type-none branch from 503aa46 to c7d4bdb Compare November 4, 2024 12:02
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only skimmed, but this all LGTM, and I agree that a small perf "regression" here is fine given your analysis (and given the significant improvements to maintainability this gives us)

@sharkdp sharkdp merged commit 88d9bb1 into main Nov 4, 2024
39 checks passed
@sharkdp sharkdp deleted the david/remove-type-none branch November 4, 2024 13:00
carljm added a commit to Glyphack/ruff that referenced this pull request Nov 5, 2024
* main: (39 commits)
  Also remove trailing comma while fixing C409 and C419 (astral-sh#14097)
  Re-enable clippy `useless-format` (astral-sh#14095)
  Derive message formats macro support to string (astral-sh#14093)
  Avoid cloning `Name` when looking up function and class types (astral-sh#14092)
  Replace `format!` without parameters with `.to_string()` (astral-sh#14090)
  [red-knot] Do not panic when encountering string annotations (astral-sh#14091)
  [red-knot] Add MRO resolution for classes (astral-sh#14027)
  [red-knot] Remove `Type::None` (astral-sh#14024)
  Cached inference of all definitions in an unpacking (astral-sh#13979)
  Update dependency uuid to v11 (astral-sh#14084)
  Update Rust crate notify to v7 (astral-sh#14083)
  Update cloudflare/wrangler-action action to v3.11.0 (astral-sh#14080)
  Update dependency mdformat-mkdocs to v3.1.1 (astral-sh#14081)
  Update pre-commit dependencies (astral-sh#14082)
  Update dependency ruff to v0.7.2 (astral-sh#14077)
  Update NPM Development dependencies (astral-sh#14078)
  Update Rust crate thiserror to v1.0.67 (astral-sh#14076)
  Update Rust crate syn to v2.0.87 (astral-sh#14075)
  Update Rust crate serde to v1.0.214 (astral-sh#14074)
  Update Rust crate pep440_rs to v0.7.2 (astral-sh#14073)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[red-knot] remove Type::None in favor of always using Type::Instance(_typeshed_.NoneType)
4 participants