Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Idents in HIR and remove emulation of hygiene with gensyms #51492

Merged
merged 8 commits into from
Jun 28, 2018

Conversation

petrochenkov
Copy link
Contributor

continuation of #51072, part of #49300

Not all Names in HIR are replaced with Idents, only those needed for hygiene or already having attached spans.

@petrochenkov
Copy link
Contributor Author

r? @eddyb

@rust-highfive
Copy link
Collaborator

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 11, 2018
@michaelwoerister
Copy link
Member

Thanks for making progress here, @petrochenkov!

Let's do perf-run because this could potentially affect incr. comp. re-use rates.

@bors try

@bors
Copy link
Contributor

bors commented Jun 11, 2018

⌛ Trying commit 0dcd858 with merge f006dd4...

bors added a commit that referenced this pull request Jun 11, 2018
Use `Ident`s in HIR and remove emulation of hygiene with gensyms

continuation of #51072, part of #49300

Not all `Name`s in HIR are replaced with `Ident`s, only those needed for hygiene or already having attached spans.
@bors
Copy link
Contributor

bors commented Jun 11, 2018

☀️ Test successful - status-travis
State: approved= try=True

@Mark-Simulacrum
Copy link
Member

I've queued the perf build.

}
})
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

}
}

pub fn modern(&self) -> LifetimeName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked and "modern" doesn't seem to be documented anywhere. Might be a good idea to point everything to Span (or even SyntaxContext)'s modern method, and have that documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will document.
The name itself is terribly generic, but it does a relatively intuitive thing - normalizes an identifier "up to macros 2.0", i.e. identifiers coming from different macro_rules! macros or from the non-macro context become same, but identifiers coming from different macro macros still stay different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so this normalizes for "global" or "item" hygiene, as opposed to "local" hygiene?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas how to make this less error-prone (so that .modern() is never forgotten accidentally) by removing PartialEq impl for Ident and introducing a couple of comparable newtypes, but this requires some prerequisite work.

@eddyb
Copy link
Member

eddyb commented Jun 11, 2018

r=me (with or without documentation for modern)

@Mark-Simulacrum
Copy link
Member

@@ -1133,7 +1133,7 @@ impl<'a, 'gcx> HashStable<StableHashingContext<'a>> for ty::CratePredicatesMap<'

impl_stable_hash_for!(struct ty::AssociatedItem {
def_id,
name,
ident,
Copy link
Contributor Author

@petrochenkov petrochenkov Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelwoerister
A suspicious detail: for ty::FieldDef I had to hash only ident.name instead of full ident (in #51072), otherwise I got some linking errors while building standard library crates, but for ty::AssociatedItem hashing the full ident was still okay for some reason.

@Mark-Simulacrum
Copy link
Member

Perf looks quite bad: 50-100% regressions in incremental println compile times.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jun 13, 2018

@michaelwoerister
How to avoid the incremental perf regressions better?
Avoid stable-hashing spans where they didn't previously exist (at least for now)?

@eddyb
Copy link
Member

eddyb commented Jun 13, 2018

@Mark-Simulacrum But is it a fixed cost, that larger crates do not get hit too badly by?

@eddyb
Copy link
Member

eddyb commented Jun 13, 2018

Can we make hashing spans faster, perhaps? What could that entail?

I wonder if there's a good "concatenative hash" we could use to cache "partial hashes" more often (and never hash hashes, among other things). I'm not sure if we really need a guarantee that hash_concat(hash(a), hash(b)) == hash((a, b)) but it mainly should be cheaper to "mix in" a partial hash than having to re-hash it as if it were just some integer value.
cc @eternaleye (who recently linked something similar, but I can't find it)

EDIIT: I thought about maybe replacing stable hashing where collisions are risky with something similar to what we do for Ty, where we always compare by-value as well, but only a single level, not recursing through it, thanks to interning - so what if we interned the HIR?
Also, @edef1c pointed out on IRC that I basically reinvented Merkle trees 😆.

@Mark-Simulacrum
Copy link
Member

It doesn't appear to be fixed cost, I think - the absolute diffs are also nonconstant.

@michaelwoerister
Copy link
Member

The problem is not that we hash more spans, the problem is that more data contains spans and therefore changing spans (e.g. when adding a newline somewhere) invalidates more cache entries. (At least I think that's the problem)

@petrochenkov
Copy link
Contributor Author

@michaelwoerister
I assumed that if spans are not hashed by stable hasher then they effectively don't exist from incremental compilation's point of view and don't invalidate anything on change.

I'm not sure what is covered by incremental compilation right now, but if name resolution parts for methods/fields/lifetimes happen before incremental compilation starts working, then incremental compilation can ignore these spans.

@eternaleye
Copy link
Contributor

@eddyb: So, you skipped a few steps in your message - the operation you described isn't "reinventing Merkle trees", but it is from the same kind of solution space. Some further information:

First - what you described above is known as an "associative" hash. There are currently no constructions for associative hashes with sufficient review to be cryptographically trustworthy. Monahan et. al. have proposed an approach called HashFusion, but it's reasonably likely that its additional structure (left/right inverses and identity hold for the combiner, not merely associativity) renders it insecure.

The reason to want an associative hash is because you can hash the smallest units, and then merge them in parallel from any set of points in the sequences. This allows for computing hashes of components, and recomputing only the changed components, regardless of the size of components. (For those who've studied ahead, this basically allows the root hash of a Merkle tree to be independent of chunk size.)

The closest anything currently gets to this is Merkle trees. However, Merkle trees have to decide a chunk size up front, necessitating padding for small objects and splitting for large ones. In addition, there are a variety of ways to screw up with Merkle trees, and become vulnerable to attacks that cause distinct sequences of input to produce the same Merkle tree - Bitcoin, among others, has been bitten by this in the past.

In addition, none of these approaches would actually fix the issue here - the reason is that a change that inserted or removed bytes would offset the underlying blocks, resulting in every hash afterwards changing. That's where rolling hashes are used currently, but they are not cryptographically strong.

If you were to try and make something for the actual problem being encountered, you'd need to do something like find the change boundaries using a rolling hash, and check a full cryptographic hash (or direct equality) for any ranges it claims are equal. However, this too has serious drawbacks:

  1. The rolling hash stage has a serious precision/memory tradeoff, in that you need to store one rolling hash per point in the original file you want to be able to count as a boundary.
  2. The boundaries it finds are unpredictable, so you will have to re-read the original file to perform the safety check, which requires keeping the original file around somewhere.

@eddyb
Copy link
Member

eddyb commented Jun 14, 2018

@eternaleye note that in the "EDIT" of #51492 (comment), I mention interning - which doesn't require collision resistance, just rare-enough collisions (for performance), as the keys are always compared, even if the hash matches. The main benefit comes from doing this at the "node level" instead of on entire trees, so the keys you always compare, have a fixed size (for each type of node).

@petrochenkov
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Jun 16, 2018

⌛ Trying commit e1158a897cb796fa4245e8b342237d4d8482cf24 with merge a9f9140f04b5741ff0bdf7090086c1ad567935c2...

@bors
Copy link
Contributor

bors commented Jun 16, 2018

☀️ Test successful - status-travis
State: approved= try=True

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-tools of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
  0     0    0     0    0     0      0      0 --:--:--  0:00:51 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:52 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:53 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:54 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:55 --:--:--     0curl: (6) Could not resolve host: s3-us-west-1.amazonaws.com
[00:59:28] thread 'main' panicked at 'failed to download openssl source: exit code: 6', bootstrap/native.rs:589:17
[00:59:28] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --no-fail-fast src/doc/book src/doc/nomicon src/doc/reference src/doc/rust-by-example src/tools/rls src/tools/rustfmt src/tools/miri src/tools/clippy
[00:59:28] Build completed unsuccessfully in 0:56:41
[00:59:28] {"book":"test-pass","rust-by-example":"test-pass","nomicon":"test-pass","rls":"test-pass","clippy-driver":"test-pass","reference":"test-pass","miri":"test-pass","rustfmt":"test-pass"}
[00:59:28] Verifying status of book...
---
travis_time:end:007d9fc1:start=1530169797157452703,finish=1530169797164559194,duration=7106491
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:10902f05
$ head -30 ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
head: cannot open ‘./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers’ for reading: No such file or directory
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:17750071
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

1 similar comment
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-tools of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
  0     0    0     0    0     0      0      0 --:--:--  0:00:51 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:52 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:53 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:54 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:55 --:--:--     0curl: (6) Could not resolve host: s3-us-west-1.amazonaws.com
[00:59:28] thread 'main' panicked at 'failed to download openssl source: exit code: 6', bootstrap/native.rs:589:17
[00:59:28] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --no-fail-fast src/doc/book src/doc/nomicon src/doc/reference src/doc/rust-by-example src/tools/rls src/tools/rustfmt src/tools/miri src/tools/clippy
[00:59:28] Build completed unsuccessfully in 0:56:41
[00:59:28] {"book":"test-pass","rust-by-example":"test-pass","nomicon":"test-pass","rls":"test-pass","clippy-driver":"test-pass","reference":"test-pass","miri":"test-pass","rustfmt":"test-pass"}
[00:59:28] Verifying status of book...
---
travis_time:end:007d9fc1:start=1530169797157452703,finish=1530169797164559194,duration=7106491
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:10902f05
$ head -30 ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
head: cannot open ‘./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers’ for reading: No such file or directory
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:17750071
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@petrochenkov
Copy link
Contributor Author

@bors r=eddyb

@bors
Copy link
Contributor

bors commented Jun 28, 2018

📌 Commit d7072b5 has been approved by eddyb

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 28, 2018
@bors
Copy link
Contributor

bors commented Jun 28, 2018

⌛ Testing commit d7072b5 with merge d84ad59...

bors added a commit that referenced this pull request Jun 28, 2018
Use `Ident`s in HIR and remove emulation of hygiene with gensyms

continuation of #51072, part of #49300

Not all `Name`s in HIR are replaced with `Ident`s, only those needed for hygiene or already having attached spans.
@bors
Copy link
Contributor

bors commented Jun 28, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: eddyb
Pushing d84ad59 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants