Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ast: Derive Hash and PartialEq on AST node types #5283

Closed
overlookmotel opened this issue Jul 16, 2024 · 19 comments · Fixed by #5451
Closed

ast: Derive Hash and PartialEq on AST node types #5283

overlookmotel opened this issue Jul 16, 2024 · 19 comments · Fixed by #5451
Assignees

Comments

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 16, 2024

Continuation from #3819

We should derive Hash and PartialEq on all AST node types.

Why?

The reasons we need to codegen them are (copied from #3819 (comment)):

Hash

We want Hash to skip span fields, and also symbol_id etc. Skipping Hash is currently achieved by a dummy Hash impl on Span type:

impl Hash for Span {
fn hash<H: Hasher>(&self, _state: &mut H) {
// hash to nothing so all ast spans can be comparable with hash
}
}

I saw this as a bit of a hack, and thought it would be more "proper" solution to implement Hash on all AST types with the span field explicitly skipped. That would be laborious and error-prone to do by hand, but much easier with codegen.

PartialEq

The reason why we are implementing Hash on AST types is for (if I remember right) comparing expressions in a lint rule to identify pointless if branches e.g. if (x) {} else if (x) { /* unreachable */ }. But actually Hash is not ideal for this use - PartialEq would be more efficient.

But it's hard to implement PartialEq because, again, we need to skip span fields, and the no-op hash hack isn't viable here, so we can't use #[derive(PartialEq)].

So I thought we could do that with codegen.

Use different traits?

Instead of implementing Hash and PartialEq, should we be more explicit about what these trait implementations do, and call them HashContent and PartialEqContent?

PartialEq could then also be implemented on AST nodes for when you want to == check if 2 AST nodes are the same including their spans. And when you compare 2 nodes without spans, you have to do node1.eq_content(&node2) so it's explicit that spans aren't included in the comparison.

I'm not sure if this is a good idea or not.

Other useful info

rzvxa provided some useful background on how Rust's internal derive macros work: #3819 (comment)

@Boshen
Copy link
Member

Boshen commented Aug 28, 2024

This stupid code is affecting Rolldown to use Span in hashmaps:

impl Hash for Span {
    fn hash<H: Hasher>(&self, _state: &mut H) {
        // hash to nothing so all ast spans can be comparable with hash
    }
}

Moving to the main repo as this is a real issue.

@Boshen Boshen transferred this issue from oxc-project/backlog Aug 28, 2024
@Boshen Boshen changed the title Derive Hash and PartialEq on AST node types ast: Derive Hash and PartialEq on AST node types Aug 28, 2024
@Boshen
Copy link
Member

Boshen commented Aug 29, 2024

Once this is done, we can replace this hash comparison of expressions to use PartialEq instead.

pub fn calculate_hash<T: Hash>(t: &T) -> u64 {
let mut hasher = FxHasher::default();
t.hash(&mut hasher);
hasher.finish()
}

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Aug 29, 2024

@Boshen What do you think about my thought above under "Use different traits?" section? I think it's confusing to implement Hash and PartialEq on types and for them to skip hashing/comparing some of the fields - that's not what those traits usually do.

I suggest:

  1. Create new trait PartialEqContent which is like PartialEq but skips Span, ScopeId, SymbolId and ReferenceId.
  2. Implement PartialEqContent on all AST types.
  3. Replace calculate_hash with code that uses PartialEqContent to compare.
  4. Remove the #[derive(Hash)] and custom Hash imps from all AST types (probably we don't need Hash impls any more, if calculate_hash is only current user of it).
  5. #[derive(Hash)] normally on Span.

If we want to, later on we could also implement HashContent, but right now it's not needed, so we can skip it for now.

@Boshen
Copy link
Member

Boshen commented Aug 29, 2024

👍 for removing confusing Hash.

PartialEq means partial equal right? But either choice is fine.

@overlookmotel
Copy link
Contributor Author

I find "partial" confusing, but it seems to mean a different kind of "partial" from "these 2 values are equal considering only part of them".

From PartialEq docs:

This trait allows for comparisons using the equality operator, for types that do not have a full equivalence relation. For example, in floating point numbers NaN != NaN, so floating point types implement PartialEq but not Eq. Formally speaking, when Rhs == Self, this trait corresponds to a partial equivalence relation.

I take this as meaning that PartialEq to all practical purposes means "equal", except from some weird edge cases.

@rzvxa
Copy link
Contributor

rzvxa commented Sep 3, 2024

I gave the definition of Partial equivalence relation a read, It states

In mathematics, a partial equivalence relation (often abbreviated as PER, in older literature also called restricted equivalence relation[1]) is a homogeneous binary relation that is symmetric and transitive. If the relation is also reflexive, then the relation is an equivalence relation.

Given that our equivalence relation between AST nodes is both symmetric and transitive I wonder maybe PartialEq is the right trait to describe it after all. Considering that 2 AST nodes are partially equal with respect to these 2 relations we are not that far off the mark with the actual definition.

My math is pretty rusty but here's what I understand.

Here are the properties of a "PER"

A. R is an equivalence relation on the subset Y = { x X x R x } X .
B. difunctional: the relation is the set { ( a , b ) f a = g b } for two partial functions f , g : X Y and some indicator set Y
C. right and left Euclidean: For a , b , c X , a R b and a R c implies b R c and similarly for left Euclideanness b R a and c R a imply b R c
D. quasi-reflexive: If x , y X and x R y , then x R x and y R y .

Here's my reasoning These aren't proof or anything.

A. For a set of AST nodes there is always a Y that is subset and equal to the original AST nodes set and every element has this partial relationship to itself.
B. If we have 2 functions f and g and they result in the same output it doesn't matter which one we use for mapping a collection to another
C. If we have 3 AST nodes that are elements of X and a PartialEq b, a PartialEq c then b PartialEq c,b PartialEq a and b PartialEq c
D. If x PartialEq y then x PartialEq x and y PartialEq y

With that in mind, I feel we have both requirements of being a Partial Equivalence Relation which are:

  1. if a R b , then b R a (symmetry)
  2. if a R b and b R c , then a R c (transitivity)

You may argue that in most of our use cases, we are reflexive as well but that might be stretching the definitions a bit too far.

@Boshen
Copy link
Member

Boshen commented Sep 3, 2024

😵

My take on this is what IdentifierA == IdentifierB means to the user, is it intuitive enough to think they are comparing values only.

@rzvxa
Copy link
Contributor

rzvxa commented Sep 3, 2024

Give it some thought and let me know which direction you want this to go. Implementing it using our new oxc_ast_tools::Derive trait should be easy enough either way. The OP was written before generate_derive which now resolves this concern with #[generate_derive(PartialEq)].

But it's hard to implement PartialEq because, again, we need to skip span fields, and the no-op hash hack isn't viable here, so we can't use #[derive(PartialEq)].

If we want to go with PartialEq we can do it like this as well:

/// 2 AST nodes are partially equal even when they don't share the same code span.
trait AstEq<T>: PartialEq<T> { }

This would work as a self-documenting super trait so we can describe our custom derive behavior with explicit docs comments and a custom name so we can use it in the nominal contexts if we ever want to not use the PartialEq trait in our generic parameters.

@Boshen
Copy link
Member

Boshen commented Sep 3, 2024

Shall we go with PartialEqContent and HashContent to remove confusion and ambiguity?

@rzvxa
Copy link
Contributor

rzvxa commented Sep 3, 2024

Shall we go with PartialEqContent and HashContent to remove confusion and ambiguity?

So It'd be something like this for comparison?

if a.content_eq(b) {
    // ....
}
// -------------
if !a.content_ne(b) {
    // ...
}

Or is it still a super trait of PartialEq to provide the == operator?

@Boshen
Copy link
Member

Boshen commented Sep 3, 2024

Shall we go for

if a.content_eq(b) {
    // ....
}
// -------------
if !a.content_ne(b) {
    // ...
}

I had nightmares when I worked with with operator overloading in C++ 😁

@overlookmotel
Copy link
Contributor Author

A. R is an equivalence relation on the subset Y = { x X x R x } X .
B. difunctional: the relation is the set { ( a , b ) f a = g b } for two partial functions f , g : X Y and some indicator set Y
C. right and left Euclidean: For a , b , c X , a R b and a R c implies b R c and similarly for left Euclideanness b R a and c R a imply b R c
D. quasi-reflexive: If x , y X and x R y , then x R x and y R y .

Holy flipping christ! Well I can't argue with that 🥲

Concerning the super trait:

If we make ContentEq a super-trait of PartialEq, then node1 == node2 would return false for two NullLiterals with different spans, right? And node1.content_eq(node2) would return true?

If that's correct, I suggest that we don't implement PartialEq on AST types, as then it makes it too easy to do node1 == node2 when what you really mean to do is use content_eq.

Obviously x == y is much nicer syntax, but the reason I suggested not using PartialEq is that (speaking in "common sense" terms, not formal logic), we all understand what == usually means, so it'd be potentially surprising/confusing if there's a "but it's not quite like that" in this case.

Incidentally, I agree with your choice of ContentEq for the trait name, rather than PartialContentEq. If I've understood all this correctly, formally speaking, the relation between AST nodes we're talking about is a full equivalence relation (so like Eq) rather than partial equivalence (PartialEq). So we get a simpler, shorter name, and get to be formally correct. Bonza!

@rzvxa
Copy link
Contributor

rzvxa commented Sep 3, 2024

How are things like this working when our span hash implementation is noop?

if let Some(group_comment) = self.expression_to_group_comment_map.get(&arg.span()) {

@rzvxa
Copy link
Contributor

rzvxa commented Sep 3, 2024

If that's correct, I suggest that we don't implement PartialEq on AST types, as then it makes it too easy to do node1 == node2 when what you really mean to do is use content_eq.

I agree, We should avoid implementing PartialEq where it doesn't make sense.

@Boshen
Copy link
Member

Boshen commented Sep 4, 2024

How are things like this working when our span hash implementation is noop?

if let Some(group_comment) = self.expression_to_group_comment_map.get(&arg.span()) {

It's stored as an array in the hashmap, it'll still use PartialEq from Span 😅

We have a few cases in Rolldown as well.

@rzvxa
Copy link
Contributor

rzvxa commented Sep 4, 2024

It's stored as an array in the hashmap, it'll still use PartialEq from Span 😅

I suspected that it was working because of PartialEq

We have a few cases in Rolldown as well.

Right now I'm trying to remove unnecessary hash implementations in the AST, The final result should implement a normal hash trait for the Span type.

Boshen pushed a commit that referenced this issue Sep 4, 2024
Part of #5283
Boshen pushed a commit that referenced this issue Sep 5, 2024
… from `Span` (#5451)

closes #5283

Also removes the noop Hash implementation on `Span` in favor of a real implementation.
rzvxa added a commit that referenced this issue Sep 5, 2024
… from `Span` (#5451)

closes #5283

Also removes the noop Hash implementation on `Span` in favor of a real implementation.
@rzvxa
Copy link
Contributor

rzvxa commented Sep 5, 2024

@Boshen Didn't Graphite implement closing issues after their respective PR is merged?

@rzvxa rzvxa closed this as completed Sep 5, 2024
@Boshen
Copy link
Member

Boshen commented Sep 5, 2024

@Boshen Didn't Graphite implement closing issues after their respective PR is merged?

Nope, it's something else ... they close referenced PRs from PRs 🤷

@rzvxa
Copy link
Contributor

rzvxa commented Sep 5, 2024

😆 I thought that was a bug since they've added this feature recently. It doesn't make sense to close a PR because a comment from it is mentioned with the word "fixes".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment