Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native_assets_cli] Semantics of one vs multiple hook invocations #1739

Closed
mkustermann opened this issue Nov 20, 2024 · 5 comments · Fixed by #1891
Closed

[native_assets_cli] Semantics of one vs multiple hook invocations #1739

mkustermann opened this issue Nov 20, 2024 · 5 comments · Fixed by #1891
Assignees
Labels
P1 A high priority bug; for example, a single project is unusable or has many test failures package:native_assets_cli

Comments

@mkustermann
Copy link
Member

mkustermann commented Nov 20, 2024

A bundling tool may support multiple asset types. It then has the option to

  • Perform a build for each asset type separately
  • Perform a build for all asset types together

The motivation for this is for example flutter: Currently flutter has a loop over architectures and performs a build for each of them separately. Now if we add data asset support we are in a strange spot if we say supportedAssetTypes: [code, data] for all builds:

  • the hook/build.dart will emit data assets for each of the builds (which differ only in architecture)
    => There's room for error, the duplicate data assets could be conflicting (imagine the arm32 build emits data asset with ID foo and file X, but arm64 build emits data asset with ID foo with file Y)

Though flutter only wants one set of data assets irrespective of whether the app runs in arm32 or arm64. So the natural choice is for it to build code assets separately from data assets (and it may not want to supply the architecture for data asset build)

Though that will have implications for our system:

  • the hooks shouldn't send different asset types to the same linker (as linking is done once for a build)
    => already with current system the arm32 and arm64 bit code assets are linked separately, so a linker cannot depend on getting both of them at the same time
  • the hooks shouldn't treat it as an error if a asset type isn't supported, instead they should just not build it in that case (and another hook invocation with a different asset type may obtain it later)

To me this sounds somewhat reasonable. It puts more restrictions on the system but allows more flexibility on the bundling tool (e.g. it can ensure data assets don't depend on architecture). It's not entirely ideal because it means sometimes the hook may be invoked with only data asset type with no architecture and sometimes it may be invoked with code+data asset type with architecture, so the second one the hook theoretically could make it's output dependent on the architecture.

The alternatives would be

  • to ensure there's only one build. But that would possibly require changing CLI protocol to have a List<Architecture> instead of just one Architecture - which then would have other implications (e.g. one may need a per-architecture cCompilerConfig compiler toolchain, ...)
  • to ensure the bundling tool allows different data assets (and other kinds of things) per architecture
  • to make bundling tool error if the different architecture builds resulted in conflicting data assets
    (maybe I'll do that for now)

/cc @dcharkes

@dcharkes
Copy link
Collaborator

dcharkes commented Nov 20, 2024

  • Perform a build for each asset type separately

We could consider this. It indeed solves the issue of inconsistent data assets. However, it doesn't solve the issue with inconsistent code assets. Code assets must be provided identically for the various architectures as well.

  • to ensure there's only one build. But that would possibly require changing CLI protocol to have a List<Architecture> instead of just one Architecture - which then would have other implications (e.g. one may need a per-architecture cCompilerConfig compiler toolchain, ...)

This is what my very first prototypes had. (Back then os and architecture was combined into "target", so you could build android_arm64 and linux_x64 in one invocation.) This indeed led to the issue described, that all config that is correlated with target ends up having to stuck in a map.

  • to ensure the bundling tool allows different data assets (and other kinds of things) per architecture

We probably don't want that, due to Flutter supporting multi-arch apps where we don't want to increase the app size by duplicating assets.

  • to make bundling tool error if the different architecture builds resulted in conflicting data assets (maybe I'll do that for now)

That seems reasonable. 👍 (And is consistent with the checking for code assets. (I hope we have that checking for code assets!))

@dcharkes dcharkes added this to the Native Assets v1.0 milestone Nov 20, 2024
@dcharkes dcharkes changed the title Semantics of one vs multiple hook invocations [native_assets_cli] Semantics of one vs multiple hook invocations Nov 28, 2024
@dcharkes
Copy link
Collaborator

dcharkes commented Dec 4, 2024

Perform a build for each asset type separately

As discussed offline:

  • Another issue with invoking the hooks per asset type is that we don't exactly know what to do with metadata, which meta of which invocation should flow to another invocation.
  • And another issue is outputting code and data assets that belong strictly together: Some dylib that requires some data file at runtime.

Conceptually speaking it would be nice if we could let the framework/embedder decide to invoke per asset type. Then we could completely omit architecture (and even OS) from the invocation to build data assets.

@dcharkes
Copy link
Collaborator

dcharkes commented Jan 9, 2025

Thinking in the direction of keeping one invocation with multiple asset types, and keeping one arch for code assets:

We have asset-specific config leading to different output directories. Due to the extensible nature of input.config adding new asset types with new config could lead to more invocations where input.outputDirectory is not shared. We could consider having input.data.outputDirectory and input.code.outputDirectory. Assuming that the embedder would know about input.config.flutter it would know whether the input.data.outputDirectory can change or not. The ugly thing then becomes that we can't simply hash input.config to get an output dir.

Also, it will be unclear where to put intermediary asset types emitted for linking. E.g. one could emit an encodedAsset for linking that depends on the target OS, which should then be put in the input.code.outputDirectory.

If we have the output directories per asset type, you'd still have multiple invocations building the same data asset in flutter multi-arch builds, but hopefully these should be no-ops on subsequent invocations.

Thinking in the direction of invoking per asset type, map reduce style:

We could merge whatever is emitted metadata-wise. This does require something mergable. E.g. currently it's key value pairs, so we could provide all keys and override values if keys are provided multiple times (or error out). However, we want to go in the direction of using files for metadata (#1251). We should most likely go for a list of metadata files (or keyed metadata files, with a string key).

outputting code and data assets that belong strictly together

This becomes somewhat awkward. Especially if you have an external build process outputting a dylib and some binary blob that is included as data. You'd have to invoke that external process both in the data assets invocation and the code assets invocation. (Basically, for a 2 arch build, you'd invoke (hopefully cached) the external process 3 times.)

It would be a feature that this forces you to have byte-for-byte the same data assets for different archs in the external process.

However, Flutter and Dart don't have multi-OS apps, so it's perfectly fine currently to output a different data asset per OS. And in the data-assets invocation you wouldn't know what the target OS is. (So you can't have your binary blob being different for Android vs host OS.)

Thinking in the direction of one invocation per embedder build/run:

to ensure there's only one build. But that would possibly require changing CLI protocol to have a List<Architecture> instead of just one Architecture - which then would have other implications (e.g. one may need a per-architecture cCompilerConfig compiler toolchain, ...)

This becomes conceptually weird for toolchains now that we have defined the environment variables for the toolchains to be set by the embedder (#1606). I don't believe one can build multi-architecture (x64+arm64) Windows apps, and none of the other OSes currently provide environment variables. But what if some embedder wants to put some archspecific libs in the include path in the environment? If we want to go this direction, we should have a input.config.code.x64.cCompiler.environment instead of passing it in to the environment of the hook invocation.

Passing in multiple architectures seems to align pretty well with XCode. It invokes tools with ARCHS which are all the target architectures.

Passing in multiple architectures doesn't align with the clang world. It has different executables for different target archs.

Passing in multiple architectures doesn't align with the MSVC world. It requires different env vars for different archs.

Passing in multiple architectures doesn't align well with Rust: It requires a single target tripple.

Passing in multiple architectures aligns well with the Android world: You can build a jar with dylibs for different target archs in it.

So it really seems to depend mostly on whether the packaging for the target OS supports multi-arch-apps.

Because in Flutter we do the packaging of multi-arch apps (both for Android and iOS/MacOS), we have been able to hide the complexity so far for building code assets and provide an API that looks a lot like single arch clang/msvc/rust.

The question is, when we start adding new asset types, can we abstract that away for users. E.g. if we add a JarAsset type, it will already need to contain the dylibs for the right target archs, and preferably not for other target archs. So the input.config.jar will need to contain input.config.jar.targetArchitectures. Ditto, if we ever need to support FrameworksAssets for iOS/MacOS, they will also contain multiple architectures.

This points into either a single invocation for everything (and refactoring input.config.code to contain a map with target architectures), or a single invocation per asset type. (Otherwise we'd end up with config.code.targetArchitecture: 'arm64' && config.jar.targetArchitectures: [arm64, riscv64] which then leads to a potential cross product of invocations.)

@mkustermann
Copy link
Member Author

However, Flutter and Dart don't have multi-OS apps, so it's perfectly fine currently to output a different data asset per OS. And in the data-assets invocation you wouldn't know what the target OS is.
(So you can't have your binary blob being different for Android vs host OS.)

With the current status quo, hook invocations to get data assets will not be provided a config that has the OS in it. So one cannot emit per-OS data assets as the hook wouldn't know the OS. (Sidenote: The OS would be the wrong abstraction anyway, because on the web we don't know the OS. So if at all it would be the targetPlatform that we could provide in the config)

But what the hook can always do is it can emit N different data assets, e.g. "foo_.data". The Dart code can then use if (Platform.isLinux) const DataAsset('foo_linux.data') else .... The tree shaking would simplify this and only one DataAsset would be used, ensuring we only bundle one of them at runtime.

The downside is of course, that the development builds don't do linking and therefore would have more assets bundled than needed. But maybe that's ok?

That approach, moving the branching code to Dart doesn't work

  • for things that the Dart code cannot switch on and/or doesn't have compile-time evaluation support in AOT
  • for APIs that are truly declarative, like dart:ffis Native or possibly JAR support

@dcharkes dcharkes added the P1 A high priority bug; for example, a single project is unusable or has many test failures label Jan 10, 2025
@dcharkes
Copy link
Collaborator

dcharkes commented Jan 13, 2025

TL:DR; multi-invocation (single asset type per invocation, single architecture per code-asset invocation).

From a discussion with @mkustermann @mosuem @HosseinYousefi and @liamappelbe:

Pros for single invocation; build multiple asset types, and multiple architectures for code assets:

  • Mental model is easier, write all the code in one go.
  • Aligns with XCode/Gradle.
  • Aligns with Flutter because it supports universal binaries.
  • Con: Painful for data-assets connected to code assets. The non-hot-reload case is easy, but for the hot-reload case the data asset needs to be obtained from the non-hot-reload-build that happened earlier.

Pros for multiple invocation; build one asset type per invocation, and one target architecture for code assets per invocation:

  • More consistent with hot restart, it will invoke data-assets-only.
  • Aligns with clang/apple-clang/MSVC/Rust.
  • Aligns with Dart, because Dart doesn’t support universal binaries.
  • Easier caching for downloading data assets (don’t have to manually use the shared output dir to share between iOS/MacOS).
  • Data assets cannot accidentally, or maliciously, depend on the CodeConfig.
  • It is explicit that universal binaries are the responsible of the embedder, only one architecture is provided per invocation. (Pre-signed frameworks or multi-architecture binaries will require a new asset type, for example a FrameworkAsset.) Users cannot have access to all target archs at the same time.
  • Dependencies are more fine-grained, they are per invocation instead of all together.
  • A hook downloading data assets from a CDN doesn't have to use the shared output directory to not do double work for building for two target OSes. Output dir will be for data assets.
  • Con: Painful for data-assets connected to code assets. Obtaining the data asset needs to somehow invoke a native build. And if you don't want to redo work it needs to be in the shared output directory.

Data assets connected to code assets are painful in either solution (assuming you want to support Flutter hot-reload/hot-restart).

I see more pro's for multiple invocations. So let's consider going that direction:

Multiple invocations and linking: We can choose to do both multiple and single invocation for link hooks even if build hooks are doing multiple invocations.

Pros for multi invocation for linker:

  • Code config is single arch, same is in build hook.
  • Better caching locality.
  • Explicit that embedder is responsible for universal binaries.
  • Consistency with build hook.
  • Con: You cannot have input-asset-type != output-asset-type.

Single invocation for linker:

  • You can have input-asset-type != output-asset-type.

I can't come up with a use case for the linkers emitting different asset types at the moment. E.g. linkers for native code will pass in static libs. And linkers for any asset type built on top of data assets will pass in these asset types.

My current assumption is that linkers really only do tree shaking with the recorded_uses.json to make things smaller. We don't run the linkers in JIT mode, which means that the same assets (or just less assets) with the same or less symbols (native assets) or keys (json data assets) need to be outputted in JIT and AOT.

So I'm leaning towards multiple invocations for link hooks as well.

Multiple invocations and metadata:

  • It could be merge-able as proposed in a previous post.
  • Or it could be only flow to the other packages with exactly the same config.
    • So metadata for building code assets can be code-config dependent.
    • => This would align well with multiple linker invocations, as well. So, I'm leaning this way now.

If we go towards multiple invocations, then we could consider changing input.config.buildAssetTypes : List<String> to input.config.buildAssetType : String to make this explicit in the protocol.
(And double check that we don't invoke it with two asset types in the current Dart/Flutter implementations. Yep, both only pass in [CodeAsset.type] right now.)

@dcharkes dcharkes self-assigned this Jan 13, 2025
@dcharkes dcharkes moved this to In Progress in Native Assets Jan 13, 2025
auto-submit bot pushed a commit that referenced this issue Jan 15, 2025
Document the behavior of multiple invocations of hooks.

Closes: #1739

We could keep `buildAssetTypes` as a list of strings for now. We might want to make link hooks work with multiple asset types later as a non-breaking change.

Both Dart and Flutter only pass `[CodeAsset.type]`, so they are already satisfying that every asset type has its own invocation.
@github-project-automation github-project-automation bot moved this from In Progress to Done in Native Assets Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 A high priority bug; for example, a single project is unusable or has many test failures package:native_assets_cli
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants