Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AOT/perf] classes holding an int can be more efficient than plain integers? #53594

Closed
modulovalue opened this issue Sep 23, 2023 · 7 comments
Closed
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. closed-as-intended Closed as the reported issue is expected behavior type-performance Issue relates to performance or code size

Comments

@modulovalue
Copy link
Contributor

(cf. #53572 (comment))

It can be shown that a program (running on the VM AOT) using a class that holds an integer can perform better than the same program using an integer without a wrapper class.

This can lead to counterintuitive results where a class can perform better than an extension type (cf. #53572)

I'm wondering, could "integers" perform as well as a "class holding an integer"? Or is there a reason why that might not be possible, desired, or worth the effort?

micro-benchmark as a repro

dart compile exe --enable-experiment=inline-class dispatch_bench4.dart

./dispatch_bench4.dart

via class • on class with instantiated mixin: 16ms
via extension type • on class with instantiated mixin: 20ms
via integer • on class with instantiated mixin: 20ms
void main() {
  const size = 20000000;
  final datasetClass = List.generate(size, (final a) => SomeClass(foo: a));
  final datasetExtensionType = List.generate(size, (final a) => SomeExtensionType(a));
  final datasetInt = List.generate(size, (final a) => a);
  final sw = Stopwatch();
  void measure(
    final String name,
    final void Function() fn,
  ) {
    sw.reset();
    sw.start();
    fn();
    sw.stop();
    print("$name: ${sw.elapsedMilliseconds}ms");
    
  }
  const viaClass = RunClass();
  const viaExtensionType = RunExtensionType();
  const viaInt = RunInteger();
  measure(
    "via class • on class with instantiated mixin",
    () => viaClass.execute(datasetClass),
  );
  measure(
    "via extension type • on class with instantiated mixin",
    () => viaExtensionType.execute(datasetExtensionType),
  );
  measure(
    "via integer • on class with instantiated mixin",
    () => viaInt.execute(datasetInt),
  );
}

class RunExtensionType with Run<SomeExtensionType> {
  const RunExtensionType();
  
  @override
  int sum(final SomeExtensionType v) => v.foo;
}

class RunClass with Run<SomeClass> {
  const RunClass();
  
  @override
  int sum(final SomeClass v) => v.foo;
}

class RunInteger with Run<int> {
  const RunInteger();
  
  @override
  int sum(final int v) => v;
}

extension type SomeExtensionType(int foo) {
  static int fooSelector(
    final SomeExtensionType a,
  ) => a.foo;
}

class SomeClass {
  static int fooSelector(
    final SomeClass a,
  ) => a.foo;
  
  final int foo;

  const SomeClass({
    required this.foo,
  });
}

mixin Run<T> {
  int sum(
    final T v,
  );

  int execute(
    final List<T> tree,
  ) {
    int total = 0;
    for (final a in tree) {
      total += sum(a);
    }
    return total;
  }
}
@lrhn
Copy link
Member

lrhn commented Sep 24, 2023

It seems unlikely that this benchmark can measure a significant difference in the access of the integer. The overhead of multiple function calls is so much larger than the thing being measured, that any difference in inlining or optimization will completely drown out the possible signal.

When I compile this code (only multiplying size by 4 to get larger times), I get

via class • on class with instantiated mixin: 95ms
via extension type • on class with instantiated mixin: 118ms
via integer • on class with instantiated mixin: 117ms

It's consistent that via class is smaller than the other, but also consistent that it's very, very little.

If I add the following two lines at the end of main:

 Object o = datasetClass.first;
  if (o is! SomeClass) throw "Bad";

(as an attempt to check that all instances of SomeClass are not allocation-sunk,)
then the timing changes to:

via class • on class with instantiated mixin: 96ms
via extension type • on class with instantiated mixin: 99ms
via integer • on class with instantiated mixin: 99ms

If I change those lines to

T confuse<T>(T v1, T v2) => 
      DateTime.now().millisecondsSinceEpoch > 0 ? v1 : v2;
  Object o = confuse(datasetClass.first, datasetExtensionType.first);
  if (o is! SomeClass) throw "Bad";

(to really check that SomeClass instances exist at runtime)
the timing changes to:

via class • on class with instantiated mixin: 100ms
via extension type • on class with instantiated mixin: 62ms
via integer • on class with instantiated mixin: 63ms

Whatever this thing is timing, it's has about a factor of two of uncertainty.

It's as if the more I look at SomeClass dynamically, the faster everything else gets. (Maybe it's because preventing SomeClass from being allocation sunk to an int makes the mixin for the actual int case more specialized. But I'm guessing blindly.)

In any case, this doesn't so much show that class access is faster than extension type access, more that also doing class member access may slow down direct access.

That's also interesting.
Suggests that there are things the compiler could choose differently to optimize for speed.
But also that it's mostly black magic from the user side.

@modulovalue
Copy link
Contributor Author

more that also doing class member access may slow down direct access.

This gave me the idea to try the following:

When I run the code in the issue description repeatedly, I get 15ms for the on class case. However, if I remove the extension and int portion of the benchmark, it never achieves 15ms. The presence of the extension and integer cases appears to speed up the class case.

I'm not sure what to make of this (is this a bug? is this considered expected behavior?), but it is confusing and counterintuitive to me from the perspective of a user. I expected AOT to be more consistent when it comes to performance or at least not to have code that comes after some code slow down (or speed up?) code that comes before it.

This is similar to #53571 where the use of some feature appears to slow down unrelated parts of the program.

@lrhn
Copy link
Member

lrhn commented Sep 24, 2023

Micro.-benchmarks are generally not particularly good at measuring optimizing compilers.
If they do show something, there is no guarantee that it applies to real programs too. And they risk the optimizer falling into a pathological case that would never happen in real programs.

If 99% of your programs runtime is on the same piece of code, a 5% deviation seems significant.

@modulovalue
Copy link
Contributor Author

Micro.-benchmarks are generally not particularly good at using optimizing compilers.

@lrhn Is there any other strategy that you can recommend that would allow one to develop a better intuition for how to write performant Dart programs running on the VM?

Even if micro benchmarks are imperfect, I'm hoping that it's better than completely relying on my intuition. I agree that it's suboptimal, but they can be useful!

@lrhn
Copy link
Member

lrhn commented Sep 25, 2023

I'd write real programs, then see where their bottlenecks are.
Check which operations are actually left in the program after the compiler has optimized it.
Then I'd try different approaches to optimizing, and see what works best.

I'll always go for avoiding allocation, and avoiding copying. And avoiding unnecessary work or overhead in general.
But I don't want to take extra steps, above the normal "don't do unnecessary work", outside of the code that actually matters.

@lrhn lrhn added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Sep 25, 2023
@mkustermann
Copy link
Member

/cc @alexmarkov

@alexmarkov alexmarkov added the closed-as-intended Closed as the reported issue is expected behavior label Oct 12, 2023
@alexmarkov
Copy link
Contributor

I'm wondering, could "integers" perform as well as a "class holding an integer"? Or is there a reason why that might not be possible, desired, or worth the effort?

Dart VM (both JIT and AOT) uses 2 different representations for integers: tagged (when int value can be potentially mixed with other objects, e.g. put in a container such as List) and unboxed. Tagged representation is compatible with other instances. It also avoids instance (box) creation when value is small (Smi), at the cost of an extra bit test when loading a value. That is not measured in your micro-benchmark, but equally important. What you see in this micro-benchmark is a small overhead of loading a value from a tagged representation compared to loading a field with unboxed representation. It works as intended.

If you would like to achieve maximum performance, avoid using integers as objects and use specialized lists from dart:typed_data instead of a general-purpose List.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. closed-as-intended Closed as the reported issue is expected behavior type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

4 participants