Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Julep/WIP] Standalone AOT compilation mode #32273

Closed

Conversation

tshort
Copy link
Contributor

@tshort tshort commented Jun 10, 2019

This mode of compilation aims to statically compile Julia code to libraries or executables that do not need a system image. This will allow Julia to support more use cases:

  • Smaller standalone executables with faster startup.
  • Compilation to standalone libraries. For example, R or Python packages could link to Julia binary libraries.
  • Cross compilation to more limited systems. This could be an embedded system or WebAssembly for web apps.

To support these modes, the following compilation targets could be supported:

  • A shared library that links to the libjulia shared library.
  • An executable that links to the libjulia shared library.
  • An object file meant to dynamically link to the libjulia shared library.

In addition to these, we'd also like to support these same targets, but statically link to libjulia.a for smaller standalone executables or libraries.

My main interest is compilation to WebAssembly (see this issue). See here for a simple web app compiled using this branch of Julia.

Approach

This is based on @vtjnash's work on jn/codegen-norecursion. That capability will be great to have for codegen work. Hopefully, that can be merged soon.

This approach works by introducing a standalone-aot-mode into Julia's code generation process. This is similar to the imaging-mode. The main differences are:

  • ccall -- foreigncall's normally are converted to calls to function pointers. In standalone-aot-mode, these are compiled to normal external function calls to be resolved at link time.

  • cglobal -- As with ccall's, these are compiled to normal external references.

  • Global variables -- This is a tricky part. Global variables (symbols, strings,
    and Julia global variables) are serialized to a "mini image" (a binary array). An
    initialization function is provided to restore the global variables upon startup. The serialization code reuses the machinery in "src/dump.c". Some non-core structs and types are converted to tuples or other types that have the same memory layout.

  • Initialization -- This is another tricky part. Initialization includes a
    simplified version of jl_init that does not load the standard library. It
    initializes many types, including some defined in base/boot.jl.

Miscellaneous notes

  • Generic code that uses jl_invoke() or jl_apply_generic() isn't supported. A warning is currently issued for code that is compiled with either of these. This often includes error-handling code.

  • cfunction isn't supported. I'm not sure how to handle that.

  • The tests target Linux. The tests currently use julia-debug.

  • There's a garbage-collection bug lurking somewhere. For at least the rand() test, it crashes unless GC is disabled.

Feedback / next Steps

I'm looking forward to guidance on steps needed to get this into Julia as an experimental feature. This includes tests and code cleanups. If anyone one sees any big gotcha's or problems with the approach, that discussion would help, too.

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a working wasm target would be amazing ❤️

Here's a few naive impressions. Hopefully they are more useful than distracting (I am not an expert on this part of the code).

About jl_apply_generic — what might you hope for this to do? Would it be acceptable to embed the julia IR for such functions and run it in the interpreter? I suppose I'm a bit confused about overall aim here, other than "make wasm work well". Is the goal to

  • Avoid runtime codegen?
  • Avoid paying for the size of the standard sysimage?
  • Allow embedding a sysimage?
  • Avoid paying for the size of libjulia?

// }
Module *M = data->M.get();
Function* init_lib_f = cast<Function>(
M->getOrInsertFunction("init_lib", Type::getVoidTy(Context), NULL));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a jl_ prefix? jl_init_lib()?

return LLVMNativeCode(native_code)
catch e
ccall(:jl_clear_standalone_aot_mode, Nothing, ())
throw(e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Use rethrow() rather than throw(e). The latter will duplicate the exception on the exception stack. (2) But instead, you could just put the call to jl_clear_standalone_aot_mode in a finally block (3) Instead of both of those... this global setting seems kind of icky anyway - is it possible to put it in CodegenParams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in CodegenParams is probably the way to go. Will work on it.

@@ -832,6 +832,180 @@ void _julia_init(JL_IMAGE_SEARCH rel)
jl_install_sigint_handler();
}

void jl_init_types2(void) JL_GC_DISABLED
{
jl_module_t *core = NULL; // will need to be assigned later
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still TODO or is "later" elsewhere in the diff? What's special about this set of types and how does it relate to the init cycle? Could we handle them in a way which is more similar way to the usual system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should remove the comment. It's confusing. Later is somewhere else. These types are created in base/boot.jl and then assigned in C in init.c during post_boot_hooks. Without a system image, none of that happens, so this extra init step fixes up a few more basic types.

jl_perm_symsvec(1, "msg"), jl_svec(1, jl_string_type), 0, 0, 1);
}

// Basic initialization that doesn't load a system image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some duplicate logic going on here. Could you generalize _julia_init to take the sysimage via a resource interface rather than expecting to find it in a file? Then various resource loaders could then be plugged in, for example:

  • In the usual case, loading from file via some search paths
  • An embedded binary blob like your mini sysimage
  • Loading it over the network (maybe this could have benefits for wasm, or maybe it's just crazy talk.)

It might allow you to avoid duplicating the init logic so much?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some duplicate logic, but I tried to minimize that by using the code in 'dump.c'. Your interface idea is interesting. I don't see how it minimizes duplication, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I noticed that jl_init_basics shares some 100 lines of code with _julia_init. It seems like this could possibly be factored back together with some more flags or factored apart by extracting some of the shared code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, "factored apart" might be best. I'm worried about too many if (standalone_aot_mode) statements.

I'm still struggling with the resource interface. I understand what you mean, but I'm not sure how to code it, yet.

void hello();
]]
lib.init_lib()
lib.hello()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. Ultimately better to do this test from C though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm not really sure how C tests fit in with Julia's testing infrastructure, though (not that Lua helps with that--Lua was just easy to try).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the test/embedding directory which seems quite similar in concept.

@@ -239,6 +331,113 @@ static void makeSafeName(GlobalObject &G)
G.setName(StringRef(SafeName.data(), SafeName.size()));
}

bool isinlibjulia(std::string name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what's going on here, but this and jl_name_from_type look fishy :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By which I mean - having these lists written out makes me think "there must be a better way" ;-)

@tshort
Copy link
Contributor Author

tshort commented Jun 10, 2019

About jl_apply_generic — what might you hope for this to do? Would it be acceptable to embed the julia IR for such functions and run it in the interpreter?

Running in an interpreter might be an option. I'm not sure how complex that would be to handle or how much overhead it would add. For now, I'm planning to just not support it.

I suppose I'm a bit confused about overall aim here, other than "make wasm work well". Is the goal to

  • Avoid runtime codegen?

Yes.

  • Avoid paying for the size of the standard sysimage?

Yes.

  • Allow embedding a sysimage?

No. There's a mini image embedded that holds global variables, but it doesn't hold code, and it's very limited.

  • Avoid paying for the size of libjulia?

Maybe. libjulia isn't that big. But, if you use static linking, you might be able to strip out unused parts of libjulia. The same is true of other C/C++ libraries some compiled Julia code uses.

@andyferris
Copy link
Member

I just wanted to say thank you for looking at this - this could immensely expand where I could use Julia (e.g. at work we were discussing the difficulty of using Julia in AWS Lambda functions; smallish precompiled binaries would make this feasible, same for responsive CLI tools).

@c42f
Copy link
Member

c42f commented Jun 14, 2019

Looking more at this, I still feel like the concepts of sysimage and mini sysimage might not really be that different and could share more code.

  • _julia_init and jl_init_basics are quite similar
  • jl_save_incremental and jl_save_mini_image_to_stream share many similarities
  • _jl_restore_incremental and jl_restore_mini_sysimg are similar

What are the essential points of difference, and can we make things neater by closing the gap a bit? For example

  • Generalizing the code which locates the image data (cf comments about "resource" data above)
  • Allowing a few more things to go into the mini image to possibly avoid some special cases like jl_init_types2

@tshort
Copy link
Contributor Author

tshort commented Jun 14, 2019

You're right on about the overlap, @c42f. I'll spend some time looking for ways to bridge the gap.

dt = jl_uint32_type;
}
else if (dt->size == 8) { // change the type to a UInt64
dt = jl_uint64_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this. Saving the wrong type tag for something doesn't seem useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was useful in the sense that it could make some code compile and run where it wouldn't compile before. The wrong type is often not a problem because the compiled code doesn't really use the type information, it just needs to get the size right (I know that this isn't always true, so it'd be nice not to do this). I was having problems where saving some types would cause the mini-image to explode in size as it tried to pull in more dependent modules and types. That is likely a sign of another problem, and maybe (hopefully) this is just a temporary band aid, but I haven't found the right approach.

dt = jl_uint64_type;
}
else if (dt->size > 0) { // change the type to a primitive type with correct size
dt = jl_new_primitivetype(jl_symbol("BitsTypeX"), jl_core_module, jl_any_type, jl_emptysvec, dt->size * 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code doesn't have a fixed repertoire of types --- such that it can handle this new primitive type --- then why not just save the correct type to begin with?

@JeffBezanson
Copy link
Member

I don't understand the notion that jl_apply_generic can't work. It's a perfectly normal C-callable function. It just does a table lookup, gets a function pointer from that, and calls it. It's also going to be very hard to get any significant piece of julia code working without it. I agree with sometimes wanting to remove the JIT, possibly wanting to remove eval, wanting to remove large parts of Base/stdlib, and maybe even removing the GC. But removing jl_apply_generic would need a very unusual and restrictive context indeed --- an architecture with no indirect call instruction perhaps?

@tshort
Copy link
Contributor Author

tshort commented Jun 19, 2019

I don't understand the notion that jl_apply_generic can't work.

I think of it more as "can't easily work" (at least by me at my state of understanding). I don't understand how to generate or store the table and the functions it points to. I don't think the functions that are pointed to are compiled as part of jl_create_native(). I'm probably missing something...

@tshort tshort mentioned this pull request Jun 19, 2019
@JeffBezanson
Copy link
Member

I don't understand how to generate or store the table and the functions it points to.

We already do that in the system image, so it's possible...

@JeffBezanson
Copy link
Member

Let me address some of the goals of this:

  • Avoid runtime codegen?

There are three nascent mechanisms related to this that could be developed further:

  • You can pass --compile=no or --compile=min to disable the JIT.
  • You can build a system image with --compile=all and we'll attempt to exhaustively compile everything, such that --compile=no can work in a subsequent run.
  • You can change the build-time variable JULIACODEGEN to exclude LLVM codegen from libjulia entirely. This is not tested so probably needs some attention.
  • Avoid paying for the size of the standard sysimage?

https://www.youtube.com/watch?v=4NHJqGA6fTw
"Since the dawn of time mankind hath sought to make things smaller"

The easiest way to do this currently is to remove the stdlibs from the sysimg build (base/sysimg.jl). To improve further, I suspect we need some kind of tree-shaking mechanism that tries to remove everything that won't be used at run time (e.g. global bindings that are never referenced). Of course that can't work in general (e.g. if a program calls eval) but can be addressed per-application when needed.

@tshort
Copy link
Contributor Author

tshort commented Jun 19, 2019

On avoiding the runtime codegen, @Keno did all that in julia-wasm.

Regarding size, the cut-down "PackageCompiler" approach is interesting. It's not clear to me how the tree shaking would work. Another issue with that is if/how it would support cross-compilation. With the jl_create_native() approach, that is straightforward a la CUDAnative.

@tshort tshort closed this Jun 27, 2019
@tshort
Copy link
Contributor Author

tshort commented Jun 27, 2019

Closing as core developers have suggested that a better approach is through a PackageCompiler / static compilation approach. A key to small code size will be the tree shaking.

@tkoolen
Copy link
Contributor

tkoolen commented Jun 27, 2019

Thanks for meta-shaking the AOT compilation tree anyway!

@datnamer
Copy link

datnamer commented Jul 1, 2019

@tshort can that path also potentially support minimal runtime targets like WASM and embedded ?

@Keno
Copy link
Member

Keno commented Jul 1, 2019

WASM is not necessarily a minimal runtime target. In any case, that'll be the right place to start. We can come back here and any missing features to julia as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants