Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are README numbers updated? #7

Closed
WebReflection opened this issue Apr 15, 2020 · 14 comments
Closed

Are README numbers updated? #7

WebReflection opened this issue Apr 15, 2020 · 14 comments

Comments

@WebReflection
Copy link
Collaborator

Results seems quite off on my laptop (Dell XPS i7) so I wonder if there's some update needed, as I can see consistently udomdiff faster than others as overall usage.

js-diff-benchmark

@luwes
Copy link
Owner

luwes commented Apr 15, 2020

it's was kind of head to head on my end, udomdiff was slightly more first but there was also the gzip size.

with your size reduction I'm happy to put you in the first spot! 🎉

thanks for your contributions!

@luwes
Copy link
Owner

luwes commented Apr 15, 2020

I also normalized the replaceChild operation to be actually 2 operations delete and insert. not sure if this is expected but in the random test udomdiff does seem to have extra operations compared to snabbdom and stage0

Screen Shot 2020-04-15 at 9 22 21 AM

@WebReflection
Copy link
Collaborator Author

not sure if this is expected but in the random test udomdiff does seem to have extra operations compared to snabbdom and stage0

but it's faster, so yes, it's all good to me, the algo is what it is, it's focused on best compromise perf/operations/size, I hope that's OK

@luwes
Copy link
Owner

luwes commented Apr 15, 2020

definitely, if it's faster and smaller it's A-OK 👌

@luwes luwes closed this as completed Apr 15, 2020
@blankhart
Copy link
Contributor

udomdiff uses replaceChild in some cases where a node has moved rather than changed. Wouldn't this prevent it from qualifying as reliably "keyed"?

@WebReflection
Copy link
Collaborator Author

WebReflection commented Apr 17, 2020

@blankhart no, it's reliably keyed, unless you prove it's not, and why that's the case.

Keyed and DOM operations also don't have much to do with each other, and all other algorithms replace too. Inserting does the same too, so it'd be nice if FUD is kept away from this project, thanks.

@blankhart
Copy link
Contributor

Sorry and thanks. I was having an X/Y problem and think my question isn't really about udomdiff, which is definitely "keyed" in the sense @leeoniya described in this issue, i.e. "keyed implementations must not recycle/reuse dom nodes for differing keys (or object identities)."

Before looking at that discussion just now I'd assumed that to be "keyed," when a node moves around within the DOM, it was necessary to tell the runtime to move a DOM node, rather than removing or replacing the node and stitching it back in later. But neither of these approaches would recycle or reuse a DOM node by updating it with new data and so are consistent with being "keyed." (If udomdiff doesn't move nodes in both ways I have misunderstood the algorithm.)

It seemed to me though that those approaches may be distinguishable in practice if some process were observing DOM mutations and reacted differently based on the specific manner in which DOM operations moved a node around (i.e., a single insertBefore that moves a node versus replaceChild and insertBefore to remove a node and reinsert it somewhere else). That was my real question, i.e. whether this difference is important.

The only other algorithms here that I have focused on, list-difference and snabbdom, use replaceChild but they don't use it when moving a node. They use it after checking that the replaced node isn't in the new list and the inserted node wasn't in the old list (e.g., when updating a node in Up10th).

I'll think harder about how best to frame the question as it may be way off base and come back on it in an open issue if it remains a concern. I am a beginner in this area and don't know much yet about MutationObserver or other things that may be relevant. I just posted here in short form because I was away from a computer thinking about the udomdiff anomaly described above (low ms, high ops) and the specific optimization that caused it, which I regard as ingenious.

@WebReflection
Copy link
Collaborator Author

WebReflection commented Apr 18, 2020

To start with, uhtml is already in that table as both keyed and non-keyed, as it's capable of both, and it's based on udomdiff, so udomdiff definitively works with a keyed approach.

Moreover, this benchmark is only keyed, as each object is kept, moved, or replaced, by identity, no property/id is used to replace anything, only nodes.

On top of that, moving a node in one operation is not even always possible, take the swap row as example:

  • row 998 needs to be inserted at index 1
  • row 1 needs to be inserted at index 998

It doesn't matter which one you replace, one item will eventually be off the tree until it gets re-positioned where the other item was.

(low ms, high ops)

This is not correct. udomdiff does the minimal amount of needed operations almost everywhere, so the high ops is only the random one, but that's because udomdiff doesn't really look for anything, it linearly crawl the source list and work on demand (it streams operations) without striving to find the absolute minimal amount of ops, as it's kinda irrelevant in the real-world, but it grants low ms in most common cases.

Again, I've previously used Levensthein distance and E. Meyer's O(ND) and both were granting perfect amount of minimal operations but were super slow once applied to the DOM.

To recap:

  • keyed means nodes are either the same, in shuffle, reverse, or even append (first 1k) and prepend (only first 1k are new nodes)
  • keyed doesn't mean a node cannot leave the tree
  • keyed doesn't mean nodes are moved, rather than appended, 'cause even append is a remove, and insert, behind the scene

@WebReflection
Copy link
Collaborator Author

Proof

// create two custom elements
document.body.innerHTML = '<a-node>first</a-node><a-node>second</a-node>';

// define a Custom Elements
customElements.define('a-node', class extends HTMLElement {
  connectedCallback() {
    console.log('connected');
  }
  disconnectedCallback() {
    console.log('disconnected');
  }
});

// read connected twice in console ... and ...

// move last child before the first one, without removing it
document.body.insertBefore(
  document.body.lastChild,
  document.body.firstChild
);

// read in console disconnected, followed by connected

See? Moving a node, or removing and inserting it, doesn't make any difference behind the scene: the node is removed and appended, and any mutation observer or custom element would be triggered showing the sequence of removed and added.

@blankhart
Copy link
Contributor

blankhart commented Apr 18, 2020

I think this is convincing. Thanks for taking the time to provide such a thoughtful answer. The most interesting thing to me about this is that it both suggests the current benchmarks are measuring a few things in misleading ways, and provides a path to solving those problems.

  • Operations. In another issue it is discussed whether to count replaceChild as a 1 operation, but perhaps the real answer is that insertBefore should count as 2 operations when it is moving a node and 1 only when inserting a node. Doing this would clarify whether udomdiff actually may be performing fewer operations than list-difference and snabbdom, but more faithfully than counting replaceChild as 1 operation. Using the current approach, doing this may require manual intervention since the DOM APIs overload the call to insertBefore. An alternative may be to have the benchmark observe node connection/disconnection events rather than counting calls to the DOM API. Based on your answer above this should be able to witness everything.

  • Excess work. The shuffle benchmark, where these issues matter most, seems to assume that the smallest number of operations to transform the 1000-long old list into the new list is 1000, perhaps on the thought that you could just replaceChild all of them. But that would be 2000 DOM operations behind the scenes, as you put it. Moreover, this is not the actual minimum, which would require the benchmark to compute the Levenshtein edit distance as a reference. Seeing how close a fast implementation comes to Levenshtein actually may be valuable for future invention in diffing algorithms, though with the current set of alternatives is probably only actually interesting in the case of udomdiff as the others insert pretty much every time. Similar issues affect the other benchmarks, although in more straightforward ways.

It's unclear how many DOM operations these algorithms are performing using the current numbers based on calls to the DOM API. Of course, if different calls to the DOM API have different performance impacts even if they produce the same number of DOM mutations, that would be more relevant to the benchmark than counting actual DOM mutations and counting which calls occurred may be informative.

I'll want to understand better for myself why the mutation APIs don't allow a user to observe the difference between a "true" move and a removal/reinsertion, but am persuaded by your answer. Most descriptions of observing a childList referred to observing addition or removal of a node, and it wasn't clear what that really meant in the context where you are telling the runtime that morally the node is "just moving" and the runtime is performing the disconnection/connection operations for you.

@WebReflection
Copy link
Collaborator Author

To be honest, the counting was useful for me while developing udomdiff, and I'm not sure why we give it so much importance in here, as all it matters is the benchmark itself, not really the amount of operations.

However, this is how it works on the DOM:

  • insertBefore connects if the node wasn't in the tree, disconnects and connects if it was moved, even if the node is the same, disconnect and connect are triggered
  • replaceChild inevitably results into at least a diconnect, and a connect, but if the node to place was live, there are two disconnect and one connect. It triggers two events even if you replace the same node.
  • appendChild results into 1 operation if the node was off the tree, 2 operations if the node was somewhere else (disconnect, and connect)
  • removeChild is the only one in this list that means only 1 operation, that is disconnect

The appendChild is meaningless for diffing, as operations should be confined in the array of provided nodes, and appendChild might make the container "dirty", because the node might have been appended after other nodes already present in the container, so that is out of the equation.

All differs indeed either do insertBefore, or replaceChild, and removeChild, where only the last one is literally one operation, as it grants the node was live, and it got removed.

Everything else counts as 2 DOM tree mutation events.

Is this relevant? Not sure, the engine might be smart enough to just trigger events without actually needing any repaint/reflow, etc, but this is the status of the DOM when it comes to diffing: every operation costs 1 up to 2, or 3, observable events:

  • inertBefore 1 up to 2, if live
  • replaceChild 2 up to 3, if live
  • removeChild 1 ... but if followed later by an insertBefore, the sum is 2

Maybe we should better instrument all callbacks to reflect this counting, and simply show the result or operations, without expectations, as many replace followed by insertBefore might result into 2 operations, when now it's counted as 3 (see high num of opts in udomdiff and the shuffled "deck").

@WebReflection
Copy link
Collaborator Author

@luwes happy to know what you think about changing the counting to reflect what stated above

@WebReflection
Copy link
Collaborator Author

OK, I went ahead and tested this version of the instrument function:

function instrument(parent) {
  const {
    appendChild,
    insertBefore,
    removeChild,
    replaceChild
  } = parent;
  parent.operations = [];
  parent.appendChild = function (newNode) {
    const {textContent} = newNode;
    if (newNode.parentNode)
      this.operations.push(`append: drop(${textContent})`);
    this.operations.push(`append: add(${textContent})`);
    return appendChild.call(this, newNode);
  };
  parent.insertBefore = function (newNode, oldNode) {
    const {textContent} = newNode;
    if (newNode.parentNode)
      this.operations.push(`insert: drop(${textContent})`);
    this.operations.push(
      oldNode ?
        `insert: put(${textContent}) before (${oldNode.textContent})` :
        `insert: add(${textContent})`
    );
    return insertBefore.call(this, newNode, oldNode);
  };
  parent.removeChild = function (oldNode) {
    this.operations.push(`remove: drop(${oldNode.textContent})`);
    return removeChild.call(this, oldNode);
  };
  parent.replaceChild = function (newNode, oldNode) {
    const {textContent} = newNode;
    this.operations.push(`replace: drop(${oldNode.textContent})`);
    if (newNode.parentNode)
      this.operations.push(`replace: drop(${textContent})`);
    this.operations.push(`replace: put(${textContent})`);
    return replaceChild.call(this, newNode, oldNode);
  };
}

The current situation is now this one:

Screenshot from 2020-04-18 16-11-51

This table actually well explains why udomdiff is as fast as others, and it technically performs exactly same amount of DOM mutations, on the shuffle benchmark too.

I think we should go with this version, as it reflects the amount of observable mutations (disconnect and connect) in the real world.

@WebReflection
Copy link
Collaborator Author

Well, @blankhart, Merge Request landed, so thanks for the conversation, it was actually very useful at the end 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants