Breaking Change Request: Enable isolate groups by-default - Will result in changes to Performance Characteristics #46754

mkustermann · 2021-07-29T15:35:14Z

Intended change
We intend to enable isolate group sdk/issues/36097 support in the VM by-default.

This will make isolates spawned via Isolate.spawn run inside the same Isolate Group and therefore operate on the same heap, allowing sharing of various kinds of objects and allowing better communication.

Intended change in behavior:
The intention is to

make the per-isolate base memory overhead smaller (10x less RAM)
make isolates faster to spawn (10x faster spawn latency)
make isolates communicate faster (8x faster round-trip communication)
make receiver isolate of messages mostly non-blocking (removes O(n) receiver cost)
allow more rich communication between isolates (see sdk/issues/46623)
allow sharing of objects (program structure, JITed code, constants as well as any String objects - in the future possibly also user-defined data structures)
fix long-standing bugs that happen if isolates are used with the (currently non-atomic) hot-reload (e.g. flutter/issues/72195)
allow flutter to smoothly use multiple engines flutter.dev/docs/development/add-to-app/multiple-flutters

The justification/rationale for making the change
Get the improvements mentioned above to users. Enabling more data sharing of objects across isolates in the future.

The expected impact of this change
There are no functional differences. There will be changes in performance characteristics. Those changes will almost exclusively be positive.

The thread pool onto which all lightweight isolates are multiplexed onto is limited (around 10 atm), in order to ensure all threads executing on different cores have a big enough TLAB (thread local allocation buffer - a free chunk of memory from new space) to ensure fast bump allocation.

This means isolates will collaborate on garbage collections (and some other events like lazy JIT compilations, ...). As a consequence blocking GC operations (such as new space collections) will affect all isolates. The worst case pause time due to new space collections is unchanged, however heavily allocating isolates can impact other isolates.

In the common case where generational hypothesis holds (most objects die young) those collections continue being fast. Furthermore for Flutter specifically, any idle time on the UI thread is used to perform GCs, thereby also avoiding too long pause due to new space GCs. (The old space is mainly collected via concurrent marking & sweeping, thereby not stopping mutators)

The only existing use case that could be negatively impacted is existing apps for which many isolates are executing in parallel on many cores (e.g. big server applications).

3 important Dart customers (including flutter) have been opt'ing into this already for a longer period of time in AOT mode. So far we have only heard positive feedback from them (especially memory footprint reductions).

We expect the only use case that might be actually affected by this change is e.g. server customers that use isolates on many threads at the same time.

Clear steps for mitigating the change

For customers that use isolates on many threads at the same time (such running on large servers), the possible workaround is to use Isolate.spawnUri()on the same application - this will cause the VM to use an independent Isolate Group which gives the old behavior. Though communication is then restricted to json-like types.

(see also go/ig-by-default)

The text was updated successfully, but these errors were encountered:

mkustermann · 2021-07-29T15:37:17Z

/cc @a-siva @mraleph @aam (VM)
/cc @Hixie @xster @gaaclarke (Flutter)
/cc @vsmenon @mit-mit

Feel free to CC anyone else.

xster · 2021-07-29T16:27:17Z

To clarify, the new lightweight isolate implementation now supports both AOT and JIT right?

aam · 2021-07-29T16:39:17Z

To clarify, the new lightweight isolate implementation now supports both AOT and JIT right?

Right.

mkustermann · 2021-07-30T08:48:36Z

The breaking change has been announced on here

@mit-mit Could you help get any necessary approvals (since @franklinyow is out)?

Jonas-Sander · 2021-07-30T09:08:11Z

We expect the only use case that might be actually affected by this change is e.g. server customers that use isolates on many threads at the same time.

Have there been any measurements done on how much performance this might cost?
Additionally is this just a temporary pain point which will be made more performant in the future? Or is this something where the Dart team doesn't see Dart as an important use-case and thus won't optimize for?

I think espeically with the functions framework and Dart being more and more popular it might actually be become more important in the future (of course this is just my speculation).

mkustermann · 2021-07-30T11:59:54Z

Have there been any measurements done on how much performance this might cost?

We have done measurements on worst case scenarios. For example when 8 threads are continuously building up data structures that live for a long time (that means the generational hypothesis - which most VMs optimize for - does not hold) on 10+ GB heap can lead to a 2x slowdown.

Additionally is this just a temporary pain point which will be made more performant in the future?

Firstly, it's unclear whether any of our existing users would run into any pain points in practice (so far we are not aware of any).

We do have ideas about how this could be further optimized and might invest in that if we believe it is worthwhile.

Or is this something where the Dart team doesn't see Dart as an important use-case and thus won't optimize for?

Right now the VM team is not optimizing for large server use cases (i.e. 10-100s of cores and large amount of RAM) - because that is not how our users use Dart atm.

That being said, one can use the VM to work well in this setting - e.g. by using many isolate groups (as mentioned in the mitigation section above)

I think especially with the functions framework and Dart being more and more popular it might actually be become more
important in the future (of course this is just my speculation).

To the best of my knowledge cloud functions are often ephemerally executed, requests are independent of each other and little global state is kept. Based on that I wouldn't say this change would negatively impact such cases (it may even benefit such use cases).

Jonas-Sander · 2021-07-30T12:41:42Z

Thanks for your detailed response! :)

aam · 2021-07-30T17:36:42Z

Additionally is this just a temporary pain point which will be made more performant in the future? Or is this something where the Dart team doesn't see Dart as an important use-case and thus won't optimize for?

With spawning new isolates being at least 10x faster, potentially what we will see is that applications start start using short-lived/just-in-time-spawned isolates significantly more in addition or instead of larger long-running isolates. So in other words use of isolates could become more functions-oriented, similar to how compute in flutter is built. sendAndExit(sendPort, message) that is to be released in the future will speed up this flow even further.

mkustermann · 2021-07-30T21:05:30Z

Here are numbers from benchmarks we added specifically to see the impact of this change. It was measured in standalone Dart VM (JIT and AOT) on an Intel CPU at commit 5c0466a:

Benchmark	JIT Before	JIT After	JIT Change	AOT Before	AOT After	AOT Change
IsolateSpawn.Dart2JSToFinishRunningRMS	3137316.00	780292.63	-75.13 %	394717.94	396074.63	0.3437 %
IsolateSpawn.Dart2JSToStartRunningRMS	89621.59	574.28	-99.36 %	17494.70	259.79	-98.52 %
IsolateSpawnMemory.Dart2JSDeltaPeakProcessRss	972947456.00	782774272.00	-19.55 %	381440000.00	258723840.00	-32.17 %
IsolateSpawnMemory.Dart2JSDeltaRssOnStart	27043156.00	3941717.00	-85.42 %	18012842.00	1769472.00	-90.18 %
IsolateSpawnMemory.Dart2JSDeltaRssOnEnd	77070336.00	551594.00	-99.28 %	108882600.00	67689128.00	-37.83 %

We measured how long it takes for a mid-to-large application (dart2js in this case) to spawn a new isolate and what the additional base memory overhead is for such an isolate. (The helper isolate will run dart2js on a dart file)

We can see that

Spawning latency went from being O(n) in application size to being effectively a constant (JIT: 574 us, AOT: 260 us).
The spawned isolate can re-use JITed code and runs therefore much faster right from the start (it runs 4x faster).
The peak memory consumption reduces in JIT as well as AOT
The additional memory per isolate reduces by around 10x

Benchmark	JIT Before	JIT After	JIT Change	AOT Before	AOT After	AOT Change
SendPort.Receive.BinaryTree.2	5.2478	1.0162	-80.64 %	4.0722	1.2520	-69.25 %
SendPort.Receive.BinaryTree.4	12.357	1.0609	-91.41 %	6.3499	1.2568	-80.21 %
SendPort.Receive.BinaryTree.6	39.917	1.1565	-97.10 %	14.764	1.3500	-90.86 %
SendPort.Receive.BinaryTree.8	150.46	1.2812	-99.15 %	47.475	1.5075	-96.82 %
SendPort.Receive.BinaryTree.10	628.53	1.9064	-99.70 %	194.44	2.1566	-98.89 %
SendPort.Receive.BinaryTree.12	2548.28	4.2936	-99.83 %	794.96	4.3565	-99.45 %
SendPort.Receive.BinaryTree.14	10246.44	4.6576	-99.95 %	3256.48	4.6533	-99.86 %
SendPort.Receive.Json.400B	5.9036	1.1105	-81.19 %	6.3360	1.3150	-79.25 %
SendPort.Receive.Json.5KB	52.174	1.2674	-97.57 %	53.186	1.5660	-97.06 %
SendPort.Receive.Json.50KB	503.79	2.1735	-99.57 %	476.19	2.4452	-99.49 %
SendPort.Receive.Json.500KB	5194.21	5.1109	-99.90 %	4971.65	5.1561	-99.90 %
SendPort.Receive.Json.5MB	5579.90	19.9008	-99.98 %	3991.09	19.9308	-99.98 %
SendPort.Receive.Nop	0.84245	0.83353	-1.059 %	0.94660	0.95422	0.8056 %
SendPort.Send.BinaryTree.2	2.6803	0.94366	-64.79 %	2.7459	0.99268	-63.85 %
SendPort.Send.BinaryTree.4	7.5588	2.1675	-71.33 %	8.2269	2.6841	-67.37 %
SendPort.Send.BinaryTree.6	28.467	7.6131	-73.26 %	31.281	10.248	-67.24 %
SendPort.Send.BinaryTree.8	105.96	30.193	-71.51 %	117.82	30.515	-74.10 %
SendPort.Send.BinaryTree.10	419.48	110.05	-73.77 %	404.91	112.05	-72.33 %
SendPort.Send.BinaryTree.12	1738.89	544.90	-68.66 %	1739.35	523.80	-69.89 %
SendPort.Send.BinaryTree.14	7359.78	2336.84	-68.25 %	7435.21	2295.89	-69.12 %
SendPort.Send.Json.400B	5.3041	1.3875	-73.84 %	5.3811	1.2730	-76.34 %
SendPort.Send.Json.5KB	87.984	18.209	-79.30 %	83.766	18.214	-78.26 %
SendPort.Send.Json.50KB	837.65	177.17	-78.85 %	799.39	160.88	-79.88 %
SendPort.Send.Json.500KB	9134.69	2186.36	-76.07 %	8697.33	1972.85	-77.32 %
SendPort.Send.Json.5MB	113778.09	50449.93	-55.66 %	110797.18	46839.12	-57.73 %
SendPort.Send.Nop	0.30535	0.30177	-1.171 %	0.28601	0.30424	6.376 %

Here we can see that the isolate receiving messages will no longer pay a O(n) cost, it is rather constant single-digit us.
We can also see that sending json has become significantly faster, around 4x.

Together the send-and-receive is around 8x faster.

Benchmark	JIT Before	JIT After	JIT Change	AOT Before	AOT After	AOT Change
Isolate.SendReceiveBytes100KB	12903.66	25176.57	95.11 %	12683.20	25339.64	99.79 %
Isolate.SendReceiveBytes100MB	5.9831	7.3162	22.28 %	7.3251	11.073	51.17 %
Isolate.SendReceiveBytes10KB	36008.31	62362.08	73.19 %	35959.83	59863.60	66.47 %
Isolate.SendReceiveBytes10MB	59.690	55.972	-6.229 %	75.595	110.89	46.68 %
Isolate.SendReceiveBytes1KB	48756.69	80447.05	65.00 %	50621.82	78842.88	55.75 %
Isolate.SendReceiveBytes1MB	212.57	477.19	124.5 %	398.70	745.68	87.03 %

We can observe that sending bytes between isolates became between 1.5-2x faster.

(Metric is runs/second, larger numbers are therefore better)

Benchmark	JIT Before	JIT After	JIT Change	AOT Before	AOT After	AOT Change
IsolateJson.Decode100KBx1	1.7628	41.694	2265 %	19.562	38.126	94.90 %
IsolateJson.Decode100KBx4	1.2302	35.571	2791 %	11.319	34.181	202.0 %
IsolateJson.Decode1MBx1	1.0308	4.0661	294.5 %	2.3259	3.4675	49.08 %
IsolateJson.Decode1MBx4	0.67100	3.3272	395.9 %	1.3408	2.5559	90.64 %
IsolateJson.Decode250KBx1	1.4840	16.600	1019 %	8.3048	14.133	70.18 %
IsolateJson.Decode250KBx4	1.0702	11.765	999.4 %	5.1592	10.635	106.1 %
IsolateJson.Decode50KBx1	1.8633	23.011	1135 %	28.397	70.602	148.6 %
IsolateJson.Decode50KBx4	1.4497	61.380	4134 %	17.875	61.395	243.5 %
IsolateJson.SendAndExit_Decode100KBx1	1.7593	42.210	2299 %	19.747	38.317	94.04 %
IsolateJson.SendAndExit_Decode100KBx4	1.2352	36.599	2863 %	11.922	34.356	188.2 %
IsolateJson.SendAndExit_Decode1MBx1	1.0358	4.1429	300.0 %	2.3442	3.4788	48.40 %
IsolateJson.SendAndExit_Decode1MBx4	0.66169	2.3940	261.8 %	1.3342	2.4545	83.97 %
IsolateJson.SendAndExit_Decode250KBx1	1.4895	16.591	1014 %	8.2664	14.057	70.05 %
IsolateJson.SendAndExit_Decode250KBx4	1.0751	12.811	1092 %	5.2002	11.584	122.8 %
IsolateJson.SendAndExit_Decode50KBx1	1.9263	68.975	3481 %	29.038	68.695	136.6 %
IsolateJson.SendAndExit_Decode50KBx4	1.4529	61.024	4100 %	18.488	62.664	239.0 %

(Metric is runs/second, larger numbers are therefore better)

We also measure json decoding on helper isolates (in x1 or x4 isolates). The isolates are receiving bytes, perform utf-8 decoding followed by json decoding and send the result back.

We can observe

Due to re-using JITed code, the helper isolate is much faster in decoding json than before (where it had to re-JIT everything).
The faster isolate communication (which is only part of this benchmark's work) has led to a speedup of the benchmark by 1.5x-3x

Benchmark	JIT Before	JIT After	JIT Change	AOT Before	AOT After	AOT Change
EventLoopLatencyJson.Percentile95	1008.00	13815.0	1271 %	1006.00	11897.0	1083 %
EventLoopLatencyJson.Percentile99	1023.00	19878.0	1843 %	1015.00	18424.0	1715 %
EventLoopLatencyJson350KB.Percentile95	1010.00	1400.00	38.61 %	1005.00	1010.00	0.4975 %
EventLoopLatencyJson350KB.Percentile99	1018.00	2112.00	107.5 %	1012.00	1992.00	96.84 %

This shows us that if helper isolates allocate a lot of objects which die young the pause times in the main isolate are not affected much (1-2 ms). If the generational hypothesis doesn't hold (all objects survive - hypothetical worst case scenario) then the main isolate will have the same pause time as the helper isolate that triggers young collections (since young generation is collected with stop-the-world) - which is between 10-20 ms.

Though for flutter this would look different, since it will use idle time between frames to cause GCs, therefore doing more young space collections (before it's full) therefore reducing the pause times (for this hypothetical worst case scenario).

gmpassos · 2021-07-30T21:09:19Z

This is a very important "upgrade" to DartVM Isolate.

I want to highlight the context of changes in performance:

Isolates in the same group will collaborate with the same GC:
- Since the amount of shared memory between Isolates is increased (what reduces the total number of Objects), the total CPU time of GC is reduced (if compared with multiple Isolates in separated groups).
- Total JIT time is reduced (if compared with multiple Isolates in separated groups).
The current DartVM Isolate situation (Dart 2.13.4) makes impractical to spawn a high number of Isolates, due to bottlenecks (naturals to the current situation) in memory, GC and JIT.
- The new approach points a theoretical decrease in performance for servers with a high number of Isolates, but this is already difficult with the current situation. The share of memory, GC and JIT can actually improve the capacity of Isolates.
The gain in SendPort performance impact.
- The main trade-off for message based parallelism (Isolates) is the time to serialize, send, receive and deserialize the messages. Sometimes this trade-off makes impractical to use multiple Isolates, since the time to send a message is near the time to compute the related task.
- The significant improvement in SendPort increases the cases where is interesting to use a DartVM Isolate. The current Isolate performance actually makes many scenarios impractical for parallelism in the current DartVM.

Some questions?:

Will be possible to force a new spawned Isolate to be in a different group?
- This can be interesting to craft a solution that separates GCs.
- This feature is important not only for spawnUri, but for any kind of entrypoint (normal Isolate.spawn).

aam · 2021-07-30T21:22:04Z

Will be possible to force a new spawned Isolate to be in a different group?

Isolate spawnUri (unlike Isolate spawn) spawns new isolate in its own isolate group.

gmpassos · 2021-07-30T21:26:09Z

It will be interesting to spawn a normal entrypoint/function (not spwanUri) in a different group, to allow specific optimizations for some solutions.

gmpassos · 2021-07-30T21:45:30Z

It's important to test if spawnUri, with a different projectPackageConfig still works well with all this changes.

Also, when the provided projectPackageConfig is the same of the current Isolate, this should be treated as the same "group" (if spawnUri have the ability to spawn in a current group).

See package dart_spawner for use cases:
https://pub.dev/packages/dart_spawner

mkustermann · 2021-07-31T02:44:03Z

Will be possible to force a new spawned Isolate to be in a different group?
This can be interesting to craft a solution that separates GCs.
This feature is important not only for spawnUri, but for any kind of entrypoint (normal Isolate.spawn).

That is indeed a very interesting question, that we also thought about.

In fact, right now we have an internal boolean flag that can be used to make Isolate.spawn(<entry>, ...) spawn <entry> in a newly created isolate group (basically the old behavior). We intentionally did not expose this in the API.

The reasons for that are multifold:

There are current limitations (which we intend to lift with this change, see sdk/issues/46623) of what isolates can send to each other. Those limitations make using isolates often hard or cumbersome (e.g. one cannot use a closure function as an <entry>). Those restrictions are currently in-place mostly because they are quite hard to implement for our current share-nothing isolates.
=> If we allowed Isolate.spawn() to spawn new isolate groups, they would be more restricted in what they can communicate. Effectively leaving the open feature requests in sdk/issues/46623 unaddressed for that use case.
Isolates created with Isolate.spawn(<entry>) are today share-nothing and run in different isolate groups. Yet we still allow them communicating user-defined objects with each other.
That poses a problem for our very popular hot-reload development feature: A hot-reload acts on one isolate group only. So there's no way to atomically perform a reload of multiple isolates (if they are of different isolate groups). Right now developer tools try to work around that by applying the same program change to all alive isolate groups (and hoping that the change gets accepted by all of them or none as well as hoping that no isolate creations are in-progress). It's a best effort that if fails can lead to crashes.
Furthermore after hot-reloads are performed and new isolates are spawned, they would need to be created by loading the initial program and applying all past reload changes. It would need to do so in a guaranteed way before that isolate interacts with others (which is hard to guarantee). It would also require the VM to keep the initial program as well as all program diffs (which can be quite big, because those program diffs are currently represented in a very course-grained way [much bigger than needed]) in-memory indefinitly, therefore effectively creating a memory leak.
=> By making Isolate.spawn() only spawn lightweight isolates within the same group, we fix all of those issues.
=> By still allowing Isolate.spawn() to spawn into new groups we leave those existing issues unfixed.

So in summary:

Isolate.spawn() is the mechanism to use to create isolates from the exact same (as well as possibly hot-reloaded) application code. We want to allow rich message exchanges between them (including user defined classes, closures, ...). It currently has a lot of issues which we are trying to solve with this work on lightweight isolates.

Isolate.spawnUri()is our mechanism to use to create isolate from possibly different application code (possibly the same app, but different hot-reloaded state). They will live in their own newly created isolate group. We intentionally restrict communication between such isolate groups to JSON-like data (no user-defined classes, closures, ...) - because there is no guarantee that the code in the spawner and spawnee isolate are compatible.

As mentioned on the mitigations, one can always use Isolate.spawnUri() on the same code as the original isolate, thereby achieving the goal of a separate isolate group, separate heap and independent GC - but one will have to accept the limited communication.

@gmpassos It was a long explanation, but I hope it makes some sense?

gmpassos · 2021-07-31T04:34:32Z

Thanks for the good answer!

So, how about 2 types of Isolate modes, lightweight and fully isolated. Depending of the mode, the category of shared types between Isolates are broader or restricted. If the types sent between Isolates are controlled correctly, the issues goes away (correct me if I'm wrong).

I haven't looked at the new code, but SendPort will need to control if they are in the same group, to define the correct category of types to be shared, or issues can happen.

Note that I vote for a future where the features at #46623 are implemented.

Another question:

How SendPort will work with this new implementation when Isolate.spawnUri is used? (It seems that you already have 2 modes).

BTW, nice job! This is a hard and very important work.

mnordine · 2021-08-02T18:00:03Z

@mkustermann Is there anywhere we can read on how isolates in an isolate group are distributed across CPU cores?

mkustermann · 2021-08-02T18:16:18Z

So, how about 2 types of Isolate modes, lightweight and fully isolated. Depending of the mode, the category of shared types
between Isolates are broader or restricted. If the types sent between Isolates are controlled correctly, the issues goes away
(correct me if I'm wrong).

That is precisely what happens already now:

Isolates spawned via Isolate.spawn() will run same code and allow sending rich user-defined data structures.
Isolates spawned via Isolate.spawnUri() will run (possibly different) code and only allow json-like data to be transferred.

SendPort.send() knows the destination and will apply appropriate validation on the transitive object graph (it does that already now).

BTW, nice job! This is a hard and very important work.

Glad to receive the positive feedback 👍

@mkustermann Is there anywhere we can read on how isolates in an isolate group are distributed across CPU cores?

It's not explicitly documented anywhere AFAIK.

The Dart VM is quite flexible, so the answer depends: For example in Flutter (an embedder of the Dart VM), the flutter engine will decide the specific OS thread on which the UI isolate runs and how it processes messages (helper isolates are run by the VM). When an isolate is idle and receives a message, the VM will run it's message handler on a VM-internal thread pool.

That means over the lifetime of an isolate, it may run on different OS threads. So far we have not had a need for thread-pinning support, but it may come up in the future.

The operating system is responsible for mapping OS threads (e.g. pthreads on linux) onto CPU cores. Also there a OS thread can - over it's lifetime - be running on different cores - depending on the OS scheduler. We don't do any core-pinning in the VM

gmpassos · 2021-08-02T19:53:24Z

@mkustermann if DartVM already have 2 modes of communication, will be good to be able to know the mode/capabilities of communication from the current Isolate.

With the correct documentation and helpers to know the current status/mode, the developer won't have surprises and will be able to craft what he needs.

I vote for:

spawnUri: Isolate in a different group (separated JIT and GC).
spawn: 2 modes
- Sibling: same group (same JIT and GC) with broader types for SendPort.
- Colleague: separated groups (separated JIT and GC) with a limited types for SendPort.

the mode names could be better 😎

About "thread-pinning":

Note that most OS and devices tries to not overheat an specific core, so it rotates the "tasks" between cores to avoid that.

mit-mit · 2021-08-03T14:16:21Z

@Hixie @vsmenon can you approve this breaking change request?

Hixie · 2021-08-06T20:09:00Z

Improving performance seems great to me.

vsmenon · 2021-08-06T21:11:36Z

lgtm

mit-mit · 2021-08-09T07:16:39Z

Marking approved

mkustermann · 2021-08-10T08:56:26Z

Thank you all for the great discussions. The Breaking Change has been approved and we'll be performing this change sometime in the next couple of weeks (after current stable branch is cut)- allowing a long baking time on dev and beta branches.

If there's any more feature requests, performance bugs or anything else, please file a new github issue under https://github.com/dart-lang/sdk/issues/new

Last replies on this thread:

@mkustermann if DartVM already have 2 modes of communication, will be good to be able to know the mode/capabilities of
communication from the current Isolate.

With the correct documentation and helpers to know the current status/mode, the developer won't have surprises and will be
able to craft what he needs.

The documentation has recently been updated to remove some ambiguity. You can see the newest docs at SendPort.send. It lists what is always supported and what is supported if-and-only-if Isolate.spawn was used.

I vote for:

spawnUri: Isolate in a different group (separated JIT and GC).

spawn: 2 modes
Sibling: same group (same JIT and GC) with broader types for SendPort.
Colleague: separated groups (separated JIT and GC) with a limited types for SendPort.

Supporting Isolate.spawn() into new groups has its issues. As outlined above it is especially problematic with hot-reload - one could spawn before / after reload and the new isolate group would need to have before/after program state. That might lead to memory leaks of hot-reload diffs.

The complexity and issues involved in supporting this makes us believe this is not a good choice (also considering users can use Isolate.spawnUri - although a little less convenient). If there is really strong demand (with actual real world use cases where the Sibling solution is insufficient) for this, we may reconsider this.

@gmpassos If you feel strongly about it, I would encourage you to file a new github issue as a feature request, then any discussion can continue there.

About "thread-pinning":
Note that most OS and devices tries to not overheat an specific core, so it rotates the "tasks" between cores to avoid that.

What I mean by thread pinning is that a given isolate is "pinned" to a specific OS thread (e.g. pthread). There are use cases where this is needed (e.g. interacting with C code that uses thread local storage). It doesn't mean that the OS thread is pinned to a specific CPU core.

gmpassos · 2021-08-10T09:13:39Z

@mkustermann thanks for the response.

About the "spawn in a separated group":

Now I understand better the complexity to implement it. I think that "spawnUri" and "Platform.script"
(https://api.dartlang.org/stable/dart-io/Platform/script.html) can resolve most use cases when lightweight Isolates goes to production.

About OS Thread pinning, this can be very useful to avoid issues with "dart:ffi". Also will help integrations with C/C++ or existing compiled libraries that need that.

Look the issues that Python has and the use of GIL (Global Interpreter Lock), that is mandatory, an approach that is totally wrong for me.

Regards.

xster · 2021-08-15T08:13:26Z

Can we create/post some sample code for this? Also cc @RedBrogdon for devrel.

mit-mit · 2021-08-16T07:11:39Z

Can we create/post some sample code for this?

That'd be nice! But please note that this won't be in stable until 2.15.

mkustermann · 2021-09-09T08:28:06Z

The flag is now on by-default in all modes. Closing this issue.

Can we create/post some sample code for this? Also cc @RedBrogdon for devrel.

We'll ensure there's good documentation by the time the stable is released.

Issue #46754 Issue #36097 TEST=ci Change-Id: Ic0b1ecf88790576ae1f31b6a003b2175b9af1c66 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/213343 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Kevin Moore <[email protected]>

maxim-saplin · 2021-11-10T18:58:48Z

Any chance there's a detailed doc on migration to Isolate.spawnUir() to preserve legacy behaviour of isolates? While refactoring my code I've managed to launch isolates via this method yet there's native crash, most likely on the main isolate side when receiving messages from spawned isolates via SendPort/ReceivePort it there's little I was able to fin on the internet that might help with troubleshooting:

../../runtime/vm/message_snapshot.cc: 557: error: expected: !cls.IsNull()
version=2.14.4 (stable) (Wed Oct 13 11:11:32 2021 +0200) on "macos_x64"
pid=80103, thread=13323, isolate_group=main(0x7f9a58926000), isolate=main(0x7f9a5892b000)
isolate_instructions=10bdf30a0, vm_instructions=10bdf30a0
  pc 0x000000010c05b154 fp 0x000070000a3f3b00 dart::Profiler::DumpStackTrace(void*)+0x64
  pc 0x000000010bdf3274 fp 0x000070000a3f3be0 dart::Assert::Fail(char const*, ...)+0x84
  pc 0x000000010bfbf71a fp 0x000070000a3f3c40 dart::ReadApiMessage(dart::Zone*, dart::Message*)+0x896a
  pc 0x000000010bfb6334 fp 0x000070000a3f3cb0 dart::MessageDeserializer::Deserialize()+0x274
  pc 0x000000010bfb6d6f fp 0x000070000a3f3d00 dart::ReadMessage(dart::Thread*, dart::Message*)+0x5f

  pc 0x000000010bf849c9 fp 0x000070000a3f3de0 dart::IsolateMessageHandler::HandleMessage(std::__2::unique_ptr<dart::Message, std::__2::default_delete<dart::Message> >)+0x1a9
  pc 0x000000010bfb1d4c fp 0x000070000a3f3e50 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)+0x12c
  pc 0x000000010bfb247f fp 0x000070000a3f3eb0 dart::MessageHandler::TaskCallback()+0x1df
  pc 0x000000010c0e5bd8 fp 0x000070000a3f3f30 dart::ThreadPool::WorkerLoop(dart::ThreadPool::Worker*)+0x148
  pc 0x000000010c0e603d fp 0x000070000a3f3f60 dart::ThreadPool::Worker::Main(unsigned long)+0x5d
  pc 0x000000010c055b1f fp 0x000070000a3f3fb0 dart::OSThread::GetMaxStackSize()+0xaf
  pc 0x00007ff818400514 fp 0x000070000a3f3fd0 _pthread_start+0x7d
  pc 0x00007ff8183fc02f fp 0x000070000a3f3ff0 thread_start+0xf
-- End of DumpStackTrace

aam · 2021-11-10T19:25:36Z

Any chance there's a detailed doc on migration to Isolate.spawnUir() to preserve legacy behaviour of isolates?

Generally speaking spawnUri'ed isolates have limitations regarding what can be sent to them that have to be worked around(https://api.dart.dev/dev/2.16.0-0.0.dev/dart-isolate/SendPort/send.html), they can not be used as a direct replacement of spawn'ed isolates(lightweight or legacy heavyweight). Also note that spawnUri is not supported in AOT configuration

While refactoring my code I've managed to launch isolates via this method yet there's native crash, most likely on the main isolate side when receiving messages from spawned isolates via SendPort/ReceivePort it there's little I was able to fin on the internet that might help with troubleshooting:

Sorry about the crash. Would you mind opening up a new issue with hopefully some instructions on how to reproduce it?

../../runtime/vm/message_snapshot.cc: 557: error: expected: !cls.IsNull()
version=2.14.4 (stable) (Wed Oct 13 11:11:32 2021 +0200) on "macos_x64"
pid=80103, thread=13323, isolate_group=main(0x7f9a58926000), isolate=main(0x7f9a5892b000)
isolate_instructions=10bdf30a0, vm_instructions=10bdf30a0
  pc 0x000000010c05b154 fp 0x000070000a3f3b00 dart::Profiler::DumpStackTrace(void*)+0x64
  pc 0x000000010bdf3274 fp 0x000070000a3f3be0 dart::Assert::Fail(char const*, ...)+0x84
  pc 0x000000010bfbf71a fp 0x000070000a3f3c40 dart::ReadApiMessage(dart::Zone*, dart::Message*)+0x896a
  pc 0x000000010bfb6334 fp 0x000070000a3f3cb0 dart::MessageDeserializer::Deserialize()+0x274
  pc 0x000000010bfb6d6f fp 0x000070000a3f3d00 dart::ReadMessage(dart::Thread*, dart::Message*)+0x5f

  pc 0x000000010bf849c9 fp 0x000070000a3f3de0 dart::IsolateMessageHandler::HandleMessage(std::__2::unique_ptr<dart::Message, std::__2::default_delete<dart::Message> >)+0x1a9
  pc 0x000000010bfb1d4c fp 0x000070000a3f3e50 dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)+0x12c
  pc 0x000000010bfb247f fp 0x000070000a3f3eb0 dart::MessageHandler::TaskCallback()+0x1df
  pc 0x000000010c0e5bd8 fp 0x000070000a3f3f30 dart::ThreadPool::WorkerLoop(dart::ThreadPool::Worker*)+0x148
  pc 0x000000010c0e603d fp 0x000070000a3f3f60 dart::ThreadPool::Worker::Main(unsigned long)+0x5d
  pc 0x000000010c055b1f fp 0x000070000a3f3fb0 dart::OSThread::GetMaxStackSize()+0xaf
  pc 0x00007ff818400514 fp 0x000070000a3f3fd0 _pthread_start+0x7d
  pc 0x00007ff8183fc02f fp 0x000070000a3f3ff0 thread_start+0xf
-- End of DumpStackTrace

maxim-saplin · 2021-11-10T20:01:58Z

@aam thanks for the clarifications! Incompatibility of spawnUri() with AOT effectively means you can't use it with Flutter...

The above logs come from DartVM running unit tests and most probable cause is the kind of payload that was OK with legacy spawn() is now not supported in spawnUri() - don't think it is worth a separate Issue.

P.S.: I'm opening a separate issues regarding performance troubles with this breaking change shortly.
P.P.S: Having some toggle for old heavyweight isolates would be a remedy for my app.

aam · 2021-11-10T22:10:50Z

@aam thanks for the clarifications! Incompatibility of spawnUri() with AOT effectively means you can't use it with Flutter...

To provide further clarification - conceptually when using spawnUri in AOT you would have to point at dart vm snapshot, rather than .dart-source code. If you do that, dart vm should be able to spawn new isolate group from the snapshot you provided. This has not been well documented or provisioned in [flutter, for example] AOT build flow. Basically those aot vm snapshots have to be built/prepared ahead of time, build/distribution setup has to ensure that children snapshots are compatible with parent snapshots.
Before diving deeper into this it would be helpful to understand the use case that requires spawnUri(in flutter or anywhere else), rather than spawn.

The above logs come from DartVM running unit tests and most probable cause is the kind of payload that was OK with legacy spawn() is now not supported in spawnUri() - don't think it is worth a separate Issue.

It would help if you could share command line of how the unit tests were launched and revision of dart sdk where you see this happening. It should not be happening. :-)

P.S.: I'm opening a separate issues regarding performance troubles with this breaking change shortly. P.P.S: Having some toggle for old heavyweight isolates would be a remedy for my app.

Okay, please cc me and @mkustermann on those.

mtc-jed · 2022-03-03T10:01:04Z

@mkustermann, you mention sharing String objects between Isolates of a same group ("allow sharing of objects (program structure, JITed code, constants as well as any String objects - in the future possibly also user-defined data structures)").
How is this done ? Is this only internal to the engine ?

mkustermann · 2022-03-03T11:24:39Z

... you mention sharing String objects between Isolates of a same group. ... How is this done ? Is this only internal to the engine ?

When sending messages (e.g. via SendPort.send(<message>)) to other isolates that were spawned using Isolate.spawn() (or higher-level wrappers, e.g. flutter's compute() function) the <message> graph is transitively copied, but certain objects in there are not copied but shared, that includes String objects.

That means if you e.g. spawn an isolate, which loads data from the internet and decodes the bytes to a string, you can send that string to the UI isolate in O(1) time and it will be sent by-pointer.

We can do that because String objects (as some other objects) are transitively immutable.

mtc-jed · 2022-03-03T12:12:24Z

Ah ok, here I thought there was a way to access shared memory freely, I misunderstood.
Is there any plans of adding a way for pointers to any object to be transmissible ? This would be highly useful when downloading big json payloads (thousands of objects).

mkustermann · 2022-03-03T15:08:44Z

Is there any plans of adding a way for pointers to any object to be transmissible ? This would be highly useful when downloading big json payloads (thousands of objects).

We do have some limited support using Isolate.exit() it avoids a transitive copy by exiting the current isolate and giving the message to the receiver isolate by-pointer. Though it still performs a possibly O(n) verification pass on the sender side.

There have been some talks about allowing general shared (mutable) memory (as e.g. Java) - though we have no concrete plans atm to introduce that.

mit-mit · 2022-03-03T15:27:43Z

@mtc-jed I'd love if you could try the Isolate.exit() approach Martin mentions, and then give us your feedback on whether that worked or not. As Martin details, the verification pass is on the sender side, so that should not cause any slowdown of your UI on the main isolate.

gmpassos · 2022-03-03T20:19:38Z

Not an Isolate solution, but still a shared pointer solution for the Dart VM:

I needed to implement something similar and one way was through dart:ffi.

Take a look at:

Abstract SharedPointerBytes:
https://github.com/eneural-net/async_task/blob/master/lib/src/async_task_shared_pointer.dart#L266
implementation SharedPointerUInt8List:
https://github.com/eneural-net/async_task/blob/master/lib/src/async_task_shared_pointer.dart#L10

Note that the shared pointer (and related memory segment) won't have any concurrency control (no MUTEX). So it only works for some scenarios, where one side only writes and the other side reads and is capable to know if the read operation was successful.

If dart:ffi had some MUTEX control to access a pointer's bytes it will be possible to do a lot of interesting things.

Another important thing about Isolate communication:

I didn't known that String (an immutable object) won't be copied between Isolates. Maybe this subject requires some documentation.
Are const instances copied between Isolates? How this is actually allocated in the VM between Isolates?
Immutable Lists, Sets and Maps could have some way to create them as immutable and shared/"uncopied" between Isolates. For now we only have immutable views of this collections, not an actual immutable object like String.

Note that in some applications where we use Isolates to use all the cores of the computer the time that it takes to return the data is significant, but we are not using the Isolate.exit approach since we need to keep the Isolate running, since won't be efficient to recreate it and send all the initial data to be processed. Isolate.exit is a nice strategy but it's an antagonist of the bootstrap of a task in a new Isolate.

mtc-jed · 2022-03-04T07:44:32Z

@gmpassos Very cool, however this only works with a list of bytes. I am not aware of any way in Dart of casting any random user-defined object to its bytes.

mtc-jed · 2022-03-04T07:48:29Z

@mit-mit Running tests in 2.16.1, I get a linearly growning delay between the call to Isolate.exit() and the running of a print() in the calling Isolate. This seems in line with the behaviour described by @mkustermann.
I will probably use the Isolate.exit() method for my particular use case (one-time download of a bunch of data, in order for the app to be able function without an internet connection), since it is clearly adapted.

That being said, a pass by pointer would very much be appreciated, not only by myself but also by a lot of other people (the benefits of this for a local database could be tremendous, I believe).
The question that's bugging me is : What is gained by preventing developpers frop passing pointers to other Isolates ?

Here's how a data downloading would be with pointer passing :
I start my Flutter app. An Isolate is spawned to handle data downloading, and deserialization to a user-defined object.
I order my Isolate to download data, and the Isolate responds after a time with a pointer to the object.
Repeat however many times you want.
Cost to the main Isolate : 1 Isolate.spawn() at app startup.

Without pointer passing, we can only :

Pass by value. The main Isolate incurres the cost of deserialization.
Use Isolate.exit(). The main Isolate incurres the cost of spawning a new Isolate for every download.

I do not see, AFAIK, a reason to prevent the developers from implementing this use case. I of course have only barebones knownledge of the Dart VM and of the general philosophy applied when developping the language ; feel free to enlighten me.
I am of the opinion that developers should be given as much freedom as possible, provided the right tools to manage that freedom are available. In addition to this, implementation of such a solution seems trivial given the work already done for Isolate.exit() is closely related (I could be 100% wrong on this).

To note : Isolate spawning is apparently way less costly with 2.15's Isolate groups. So Isolate.exit() seems like a sustainable solution, but there's no denying it could be much better with pointer passing.

gmpassos · 2022-03-04T07:49:01Z

You can use:

https://pub.dev/packages/data_serializer

And make your object implement:

https://pub.dev/documentation/data_serializer/latest/data_serializer/Writable-class.html

Example in the tests:

https://github.com/gmpassos/data_serializer/blob/master/test/data_serializer_writable_test.dart

mtc-jed · 2022-03-04T07:58:07Z

@gmpassos This would mean I still have to deserialize my data in the main Isolate, which is exactly the cost I'm trying to avoid.

gmpassos · 2022-03-04T08:02:47Z

Yes I agree. To have the original allocated memory delegated from one Isolate to another the only way now is with 'Isolate.exit'

mraleph · 2022-03-04T09:12:43Z

The question that's bugging me is : What is gained by preventing developpers frop passing pointers to other Isolates ?

It's a language level design decision - allowing to pass arbitrary objects around creates shared memory with all associated issues and pitfalls. So a simpler programming model is gained by outlawing shared memory.

That being said I suggest to move discussions about shared memory into other channels - it is offtopic here.

If you want to outline specific issues you are facing, there is dart-lang/language#333

mkustermann · 2022-03-04T09:14:53Z

Maybe this issue isn't the right place for this discussion. Could I ask you to open new issues for feature requests / bug reports?

Regarding dart:ffi

Yes it's the escape from the sound and safe world. It allows unsafe access to shared writable C memory from multiple isolates.

Using dart:ffi it is also possible to call into C for auxiliary things (e.g. acquire/release locks, release/acquire memory order, fences, atomics, etc.) Maybe we will eventually provide some of those primitives in dart:ffi itself instead of needing to call out to C code.

Especially regarding blocking synchronization mechanisms like locks, I'd like to mention that use of such must be done with care: Dart's eventloop based programming model is based on the fact that Dart code doesn't synchronously block for longer periods of time. This is especially important for Flutter apps where the UI isolate needs to be able to render animations with 60+fps.

gmpassos · 2022-03-04T10:31:46Z

That being said I suggest to move discussions about shared memory into other channels - it is offtopic here.

Can be a little bit off topic. But actually we are talking about shared memory as an alternative of the issues of how to send large amount of data or complex objects between Isolates. Actually we don't want to use shared memory, we prefer an elegant and native approach. The idea here is just to show how we are bypassing the performance issue, so the awesome job done with Isolate groups can achieve its full potential with some future improvements.

Best regards.

mkustermann added the breaking-change-request This tracks requests for feedback on breaking changes label Jul 29, 2021

a-siva added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Jul 29, 2021

mkustermann mentioned this issue Jul 30, 2021

Lightweight Isolates & Faster isolate communication #36097

Closed

15 tasks

mit-mit added the enhancement-breaking-change An enhancement which is breaking. label Aug 9, 2021

xvrh mentioned this issue Sep 1, 2021

.jpg Image can't decode xvrh/zxing-dart#7

Open

mkustermann closed this as completed Sep 9, 2021

aam mentioned this issue Sep 18, 2021

provide a way to pass objects between isolates cheaply flutter/flutter#23473

Closed

maxim-saplin mentioned this issue Nov 11, 2021

Performance degradation after upgrading to newer Flutter with Isolate Groups enabled by default #47672

Open

Mehmetyaz mentioned this issue Nov 24, 2021

Multithreaded request processing styledart/style#19

Open

gmpassos mentioned this issue Dec 19, 2021

[Question] Is SharedData the same as Isolate Groups in Dart v2.15? eneural-net/async_task#6

Closed

mtc-jed mentioned this issue Mar 4, 2022

It is necessary to implement threads dart-lang/language#333

Open

KyleFin mentioned this issue Dec 17, 2022

Questions about Isolate Groups and "move" operator gaaclarke/agents#6

Closed

zanderso mentioned this issue Jan 9, 2024

File.copySync() run in separate isolate somehow causes UI thread to churn flutter/flutter#140763

Closed

Breaking Change Request: Enable isolate groups by-default - Will result in changes to Performance Characteristics #46754

Breaking Change Request: Enable isolate groups by-default - Will result in changes to Performance Characteristics #46754

Comments

mkustermann commented Jul 29, 2021 • edited Loading

mkustermann commented Jul 29, 2021 • edited Loading

xster commented Jul 29, 2021

aam commented Jul 29, 2021

mkustermann commented Jul 30, 2021

Jonas-Sander commented Jul 30, 2021

mkustermann commented Jul 30, 2021

Jonas-Sander commented Jul 30, 2021

aam commented Jul 30, 2021

mkustermann commented Jul 30, 2021 • edited Loading

gmpassos commented Jul 30, 2021 • edited Loading

aam commented Jul 30, 2021

gmpassos commented Jul 30, 2021

gmpassos commented Jul 30, 2021 • edited Loading

mkustermann commented Jul 31, 2021

gmpassos commented Jul 31, 2021 • edited Loading

mnordine commented Aug 2, 2021

mkustermann commented Aug 2, 2021

gmpassos commented Aug 2, 2021 • edited Loading

mit-mit commented Aug 3, 2021

Hixie commented Aug 6, 2021

vsmenon commented Aug 6, 2021

mit-mit commented Aug 9, 2021

mkustermann commented Aug 10, 2021

gmpassos commented Aug 10, 2021

xster commented Aug 15, 2021

mit-mit commented Aug 16, 2021

mkustermann commented Sep 9, 2021

maxim-saplin commented Nov 10, 2021 • edited Loading

aam commented Nov 10, 2021

maxim-saplin commented Nov 10, 2021 • edited Loading

aam commented Nov 10, 2021

mtc-jed commented Mar 3, 2022

mkustermann commented Mar 3, 2022

mtc-jed commented Mar 3, 2022

mkustermann commented Mar 3, 2022

mit-mit commented Mar 3, 2022

gmpassos commented Mar 3, 2022 • edited Loading

mtc-jed commented Mar 4, 2022

mtc-jed commented Mar 4, 2022

gmpassos commented Mar 4, 2022 • edited Loading

mtc-jed commented Mar 4, 2022

gmpassos commented Mar 4, 2022

mraleph commented Mar 4, 2022

mkustermann commented Mar 4, 2022

gmpassos commented Mar 4, 2022 • edited Loading

mkustermann commented Jul 29, 2021 •

edited

Loading

mkustermann commented Jul 29, 2021 •

edited

Loading

mkustermann commented Jul 30, 2021 •

edited

Loading

gmpassos commented Jul 30, 2021 •

edited

Loading

gmpassos commented Jul 30, 2021 •

edited

Loading

gmpassos commented Jul 31, 2021 •

edited

Loading

gmpassos commented Aug 2, 2021 •

edited

Loading

maxim-saplin commented Nov 10, 2021 •

edited

Loading

maxim-saplin commented Nov 10, 2021 •

edited

Loading

gmpassos commented Mar 3, 2022 •

edited

Loading

gmpassos commented Mar 4, 2022 •

edited

Loading

gmpassos commented Mar 4, 2022 •

edited

Loading