-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggestions for faster startup #260
Comments
Man, you really love mmap :) Not unjustified, to be sure. But compressing the trees will eliminate most heap objects and shrink the size of the image file, so it will probably be fine just to load it. |
|
Startup seems ok to me now. Jeff has implemented the tree compression, and although startup improved, it is not instantaneous. Do we know where the time is spent in startup, and which of the suggestions above may help? |
It's certainly code generation. Part julia->llvm and part llvm->native. The second one is probably bigger since it runs optimization passes. In fact disabling llvm optimization passes cuts 0.3s off startup time for me. |
One thing to do is experiment with removing optimization passes (codegen.cpp:1990), and see what can be removed without hurting performance. |
While that's helpful, it's never going to get us to really instant startup. Your analysis suggests that what we really need to get there is storing pre-generated machine code in the startup image. On Dec 17, 2011, at 3:01 PM, [email protected] wrote:
|
As the standard library keeps growing, startup will continue to becoming slower. It seems that Stefan's suggestion of pre-generation is the right one. Do notice how building sys.ji has become so slow. |
FWIW I just tried Julia and this is the first thing I noticed: $ time julia hello.j real 0m2.259s $ cat /proc/cpuinfo | grep model It seems excessively slow. One thing that gives a good first impression about node.js is that it starts up fast -- faster than Python or Ruby. And you can use it in shell scripts. The big mistake in Python and Ruby is that they allow arbitrary code at import time of any module. So big programs often take multiple seconds to even get to main(). Same with C++ -- in a large code base, static initialization before main() often takes multiple seconds in large codebases. It also leads to all sorts of annoying language specification issues. I think Dart has some concept of immutable modules (no top level mutable state?) that might address this, but I haven't found any details (but fast application startup is a design goal: http://www.dartlang.org/docs/technical-overview/index.html#goals) It is pretty easy to get this wrong. But I'm excited about Julia, it's amazingly full-featured for an initial release. |
Yep. Startup is slow. It's a serious annoyance. Obviously we want it to be lightning fast, but it's hard when you're doing all that JITing. I think the conclusion at this point is that we maybe want the ability to make compiled binaries, which would allow the repl itself to be compiled and startup faster. That's essentially equivalent to storing pre-generated machine code (that's basically what a binary is). |
Once we modularize our libraries, we may not need to load the entire world on startup. We can also try to get to the prompt earlier and let stuff happen in the background for a few seconds. -viral On 19-Feb-2012, at 2:54 PM, Stefan [email protected] wrote:
|
Suggestion: move everything into a module, and then provide a .juliastartup file. Here is my .pystartup file: $ cat ~/.pystartup $ python
^^^^ Now I don't have to "import os" when starting an interactive python interpreter, at the expense of slightly slower startup. I could have also dong "from os import *" to import everything into the main namespace. When I run a Julia program that does "print("hello"), or a unit test, I would like it if it doesn't compile any FFT functions (I presume it's compiling everything in http://julialang.org/manual/standard-library-reference/). You could also provide a variable for the .juliastartup to tell if it's running a program in batch mode or an interactive prompt. And then users can add a bunch of their own stuff if they want everything loaded. When there is a module system, I imagine that the code will be compiled on import. So if everything is moved into a module, it will solve the startup time problem mostly without having to speed up compilation itself. I think there is a bit too much in the global namespace now. |
It's not actually compiling the entire standard library, just methods needed at startup and their dependencies. This does touch a large amount of code since we use regexes, various data structures, etc., but not stuff like FFTs. |
Yes, that is what I have been thinking as well. Once we have support for modules (soon enough), we should be able to move most of the stuff into modules. For that reason, I am holding off on any major refactoring of the library code. |
@andychu: I should point out that "bare" Julia with no imports is very different from "bare" Python with no imports. Bare Julia literally doesn't even have the ability to add or print integers, let alone floats or strings. That's because almost all functionality is implemented in Julia itself instead of in C code that's pre-compiled and always available. |
I see that most of the time is spent compiling Julia->LLVM and then LLVM->native. With sys.ji most of the LLVM code is cached but at this point caching the native code is more important. Most of the standard library does not change so why can't we compile the sys.ji into sys.so . From my experience with LLVM compiling LLVM bitcode into .so is extremely simple. Loading shared libraries is very fast and does not require jumping through hoops to get it fast. Instant start-up is also very essential if we want to use julia to interface other unix applications using standard shell scripting. It is also essential when developing and debugging code since the functions and types in Julia are immutable. |
Again, why can't we compile everything into a .so file and let the system handle mmap (and function look-up) for us. |
Nobody said we can't compile to a .so. In fact that's exactly what we're talking about doing; there have been several threads on the topic. |
@salehqt: The main impediment here is that we use LLVM's JIT infrastructure, which doesn't generate bitcode. It's unclear what the best way to generate .so files from Julia code is. One option would be to port our JIT over to MCJIT, which seems to basically generate a .so in memory and then use it. If you've got any expertise in generating .so files from jitted code, it would be quite welcomed. |
In sum, there is no need to replace the current JIT. The best way to do it is to use .so file as a cache. In my experience with LLVM (implementing a toy language) , JIT compiling a code is a shortcut for writing bitcode, compiling it to .so and reading it back into memory using dlopen. I think .so file generation is only useful when compiling standard libraries or a package and should be restricted to that. The difficult part is using the .so file as a cache, basically the JIT compiler should check if compiled version of function can be found in .SO and use it before JITing the function. SO files are also efficient data structures, one could put all kinds of metadata, hash tables and even Julia AST inside the SO file so one SO file would represent a complete Julia package. (similar approach is used in .NET DLLs) A sample use case would be:
|
How do you tell the jit to put the code in a .so? |
We're all sold I the idea, I just don't think it's nearly as simple as you're making it out to be. But I'd be extremely happy to find out I'm wrong. |
All the work is in arranging things in the runtime and startup so that the
|
The challenge is not with the JIT storing the bitcode, as much as the dynamic nature of the language. But as Jeff said, most of the heavy lifting is already done. |
This is the prototype of the function that writes bitcode, in the same page there is one that reads bitcode from file. However, I haven't seen any API that saves the native code after it was generated by the ExecutaionEngine. what I was suggesting was to save the bitcode and then run an offline compiler (llc & gcc) to generate executable code. This is my code that does this (in ruby).
This is why it is not really useful for the interpreter and it should be used for offline compiling of standard libraries and other big packages. e.g. generate sys.so instead of sys.jl |
Unfortunately, that's not helpful in this situation – if we had an offline compiler that could generate LLVM bitcode the problem would already be solved. |
After examining the code, I can see that it is quite possible with minimal changes to the code. Like Jeff said, all the pieces are there, SO can be used just as another level of caching. The solution would be calling jl_compile on all regular functions and jl_compile_hint for generic functions to generate all the LLVM code. Once the code is generated, it can be saved to a file for offline compilation. The bitcode can be compiled to a shared library. At Julia start-up, in addition to the system image, the shared library is loaded along with it, jl_compile then should check for existence of a compiled function using dlsym before generating a new one. This would avoid all Julia->LLVM->Native JIT compilation that takes up most of the start-up cost. |
A fine description of the easy part of this work.
|
@salehqt, could you implement your approach for a package to see how it works? That might be an interesting exercise, and there are packages that could use faster start-up times. |
Implementing this requires changing internals of Julia and it can break a lot of things. In the current state, Julia is implemented like C, all of the includes are imported in one big soup of AST and compiled on-demand to LLVM. The resulting LLVM module is also a soup of everything that is compiled. Julia modules and namespaces are merely scoping constructs and do not really separate the code. I was hoping to just compile a native system image to accompany the current system image (that only contains Julia AST). Later on other developers would modularize Julia and make separate .so modules. The first step in modularizing would be compiling packages to AST and storing them as binaries so Julia doesn't end up with problems that RubyGems had: loading and parsing too many source files at the start-up of real applications. In my experience with Julia, the front-end is one of the weakest points of Julia and you cannot rely on it for fast start-up. This is my testing implementation. (does not really work) |
Of course they are merely scoping constructs. Try disabling inlining of functions in different modules and see how well that goes over. |
Unfortunately turning inlining off will absolutely destroy performance. |
Using --no-history and -f flags, start up takes ~ 0.6 seconds less. Would be great have a --script or --batch flag optimized for run scripts in this way. |
I see essentially no difference on my Macbook Air (commit 2356fb8):
@diegozea do you have a very long |
Yes, is because is longer...
|
do i get to close this now? (#4898) |
Yes |
I dare say you've earned it. Please do the honors, @vtjnash. |
This patch updates SparseArrays. In particular it contains JuliaSparse/SparseArrays.jl#260 which is necessary to make progress in #46759. All changes: ``` 4fb8f0e Fix direction of circshift (#260) ead48fe Fix `vcat` of sparse vectors with numbers (#253) d88be9f decrement should always return a vector (#241) dfcc48a change order of arguments in fkeep, fix bug with fixed elements (#240) 43b4d01 Sparse matrix/vectors with fixed sparsity pattern. (#201) ```
Ideas for faster startup:
With all of these we may be able to get instantaneous startup.
The text was updated successfully, but these errors were encountered: