suggestions for faster startup #260

StefanKarpinski · 2011-11-12T22:20:18Z

Ideas for faster startup:

cache LLVM bitcode
cache native code generated by LLVM
mmap the heap data structures on startup

With all of these we may be able to get instantaneous startup.

JeffBezanson · 2011-11-12T23:58:22Z

Man, you really love mmap :) Not unjustified, to be sure. But compressing the trees will eliminate most heap objects and shrink the size of the image file, so it will probably be fine just to load it.

StefanKarpinski · 2011-11-13T19:45:59Z

mmap is the best system call.

ViralBShah · 2011-12-17T08:59:13Z

Startup seems ok to me now. Jeff has implemented the tree compression, and although startup improved, it is not instantaneous. Do we know where the time is spent in startup, and which of the suggestions above may help?

JeffBezanson · 2011-12-17T19:34:55Z

It's certainly code generation. Part julia->llvm and part llvm->native. The second one is probably bigger since it runs optimization passes. In fact disabling llvm optimization passes cuts 0.3s off startup time for me.

JeffBezanson · 2011-12-17T20:01:00Z

One thing to do is experiment with removing optimization passes (codegen.cpp:1990), and see what can be removed without hurting performance.

StefanKarpinski · 2011-12-18T02:47:38Z

While that's helpful, it's never going to get us to really instant startup. Your analysis suggests that what we really need to get there is storing pre-generated machine code in the startup image.

On Dec 17, 2011, at 3:01 PM, [email protected] wrote:

One thing to do is experiment with removing optimization passes (codegen.cpp:1990), and see what can be removed without hurting performance.

Reply to this email directly or view it on GitHub:
#260 (comment)

ViralBShah · 2012-01-08T04:34:55Z

As the standard library keeps growing, startup will continue to becoming slower. It seems that Stefan's suggestion of pre-generation is the right one. Do notice how building sys.ji has become so slow.

andychu · 2012-02-19T06:12:29Z

FWIW I just tried Julia and this is the first thing I noticed:

$ time julia hello.j
hello

real 0m2.259s
user 0m2.176s
sys 0m0.076s

$ cat /proc/cpuinfo | grep model
model name : Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz

It seems excessively slow. One thing that gives a good first impression about node.js is that it starts up fast -- faster than Python or Ruby. And you can use it in shell scripts.

The big mistake in Python and Ruby is that they allow arbitrary code at import time of any module. So big programs often take multiple seconds to even get to main(). Same with C++ -- in a large code base, static initialization before main() often takes multiple seconds in large codebases. It also leads to all sorts of annoying language specification issues.

I think Dart has some concept of immutable modules (no top level mutable state?) that might address this, but I haven't found any details (but fast application startup is a design goal: http://www.dartlang.org/docs/technical-overview/index.html#goals) It is pretty easy to get this wrong.

But I'm excited about Julia, it's amazingly full-featured for an initial release.

StefanKarpinski · 2012-02-19T09:24:45Z

Yep. Startup is slow. It's a serious annoyance. Obviously we want it to be lightning fast, but it's hard when you're doing all that JITing. I think the conclusion at this point is that we maybe want the ability to make compiled binaries, which would allow the repl itself to be compiled and startup faster. That's essentially equivalent to storing pre-generated machine code (that's basically what a binary is).

ViralBShah · 2012-02-19T09:29:45Z

Once we modularize our libraries, we may not need to load the entire world on startup. We can also try to get to the prompt earlier and let stuff happen in the background for a few seconds.

-viral

On 19-Feb-2012, at 2:54 PM, Stefan [email protected] wrote:

Yep. Startup is slow. It's a serious annoyance. Obviously we want it to be lightning fast, but it's hard when you're doing all that JITing. I think the conclusion at this point is that we maybe want the ability to make compiled binaries, which would allow the repl itself to be compiled and startup faster. That's essentially equivalent to storing pre-generated machine code (that's basically what a binary is).

Reply to this email directly or view it on GitHub:
#260 (comment)

andychu · 2012-02-20T18:06:04Z

Suggestion: move everything into a module, and then provide a .juliastartup file. Here is my .pystartup file:

$ cat ~/.pystartup
import os
import sys
if sys.platform == 'darwin':
import rlcompleter
import readline
readline.parse_and_bind('bind ^I rl_complete')

$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

os.system

^^^^ Now I don't have to "import os" when starting an interactive python interpreter, at the expense of slightly slower startup. I could have also dong "from os import *" to import everything into the main namespace.

When I run a Julia program that does "print("hello"), or a unit test, I would like it if it doesn't compile any FFT functions (I presume it's compiling everything in http://julialang.org/manual/standard-library-reference/). You could also provide a variable for the .juliastartup to tell if it's running a program in batch mode or an interactive prompt. And then users can add a bunch of their own stuff if they want everything loaded.

When there is a module system, I imagine that the code will be compiled on import. So if everything is moved into a module, it will solve the startup time problem mostly without having to speed up compilation itself. I think there is a bit too much in the global namespace now.

JeffBezanson · 2012-02-20T18:11:20Z

It's not actually compiling the entire standard library, just methods needed at startup and their dependencies. This does touch a large amount of code since we use regexes, various data structures, etc., but not stuff like FFTs.
Getting the size of the global namespace just right is indeed a delicate and important balance.

ViralBShah · 2012-02-20T18:12:34Z

Yes, that is what I have been thinking as well. Once we have support for modules (soon enough), we should be able to move most of the stuff into modules. For that reason, I am holding off on any major refactoring of the library code.

StefanKarpinski · 2012-02-20T22:23:45Z

@andychu: I should point out that "bare" Julia with no imports is very different from "bare" Python with no imports. Bare Julia literally doesn't even have the ability to add or print integers, let alone floats or strings. That's because almost all functionality is implemented in Julia itself instead of in C code that's pre-compiled and always available.

salehqt · 2013-04-12T18:27:29Z

I see that most of the time is spent compiling Julia->LLVM and then LLVM->native. With sys.ji most of the LLVM code is cached but at this point caching the native code is more important. Most of the standard library does not change so why can't we compile the sys.ji into sys.so . From my experience with LLVM compiling LLVM bitcode into .so is extremely simple. Loading shared libraries is very fast and does not require jumping through hoops to get it fast.
Another suggestion was to compile each module of the standard library into a separate .bc or .so file and load them on demand. However, for an extremely dynamic language like julia .bc and .so files should be only used as a cache, but this would make start-up instant.

Instant start-up is also very essential if we want to use julia to interface other unix applications using standard shell scripting. It is also essential when developing and debugging code since the functions and types in Julia are immutable.

salehqt · 2013-04-12T18:30:58Z

Again, why can't we compile everything into a .so file and let the system handle mmap (and function look-up) for us.

JeffBezanson · 2013-04-12T18:44:54Z

Nobody said we can't compile to a .so. In fact that's exactly what we're talking about doing; there have been several threads on the topic.

StefanKarpinski · 2013-04-12T19:07:51Z

@salehqt: The main impediment here is that we use LLVM's JIT infrastructure, which doesn't generate bitcode. It's unclear what the best way to generate .so files from Julia code is. One option would be to port our JIT over to MCJIT, which seems to basically generate a .so in memory and then use it. If you've got any expertise in generating .so files from jitted code, it would be quite welcomed.

salehqt · 2013-04-13T00:56:36Z

In sum, there is no need to replace the current JIT. The best way to do it is to use .so file as a cache.

In my experience with LLVM (implementing a toy language) , JIT compiling a code is a shortcut for writing bitcode, compiling it to .so and reading it back into memory using dlopen.

I think .so file generation is only useful when compiling standard libraries or a package and should be restricted to that. The difficult part is using the .so file as a cache, basically the JIT compiler should check if compiled version of function can be found in .SO and use it before JITing the function.

SO files are also efficient data structures, one could put all kinds of metadata, hash tables and even Julia AST inside the SO file so one SO file would represent a complete Julia package. (similar approach is used in .NET DLLs)

A sample use case would be:

Implement a package in Julia and test it
Compile the package down to .so file
Use the package in an application by loading the .so instead of source code.

StefanKarpinski · 2013-04-13T01:29:58Z

How do you tell the jit to put the code in a .so?

StefanKarpinski · 2013-04-13T01:31:20Z

We're all sold I the idea, I just don't think it's nearly as simple as you're making it out to be. But I'd be extremely happy to find out I'm wrong.

JeffBezanson · 2013-04-13T01:54:16Z

All the work is in arranging things in the runtime and startup so that the
reloaded code actually works.
On Apr 12, 2013 9:31 PM, "Stefan Karpinski" [email protected]
wrote:

We're all sold I the idea, I just don't think it's nearly as simple as
you're making it out to be. But I'd be extremely happy to find out I'm
wrong.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/260#issuecomment-16325361
.

ViralBShah · 2013-04-13T04:43:01Z

The challenge is not with the JIT storing the bitcode, as much as the dynamic nature of the language. But as Jeff said, most of the heavy lifting is already done.

salehqt · 2013-04-13T05:15:16Z

This is the prototype of the function that writes bitcode, in the same page there is one that reads bitcode from file.
int LLVMWriteBitcodeToFile (LLVMModuleRef M, const char * Path )
http://llvm.org/docs/doxygen/html/group__LLVMCBitWriter.html

However, I haven't seen any API that saves the native code after it was generated by the ExecutaionEngine. what I was suggesting was to save the bitcode and then run an offline compiler (llc & gcc) to generate executable code. This is my code that does this (in ruby).

# create the module mod and add some code to it
mod.write_bitcode("#{@module_name}.bc")
# Now build a shared library
system "llc -relocation-model=pic #{@module_name}.bc"
system "cc -shared #{@module_name}.s -o #{@module_name}.so"
# now load the shared library back into Ruby
require "./#{@module_name.so}"

This is why it is not really useful for the interpreter and it should be used for offline compiling of standard libraries and other big packages. e.g. generate sys.so instead of sys.jl

StefanKarpinski · 2013-04-13T14:28:20Z

Unfortunately, that's not helpful in this situation – if we had an offline compiler that could generate LLVM bitcode the problem would already be solved.

salehqt · 2013-04-13T21:14:55Z

After examining the code, I can see that it is quite possible with minimal changes to the code. Like Jeff said, all the pieces are there, SO can be used just as another level of caching.

The solution would be calling jl_compile on all regular functions and jl_compile_hint for generic functions to generate all the LLVM code. Once the code is generated, it can be saved to a file for offline compilation.

The bitcode can be compiled to a shared library. At Julia start-up, in addition to the system image, the shared library is loaded along with it, jl_compile then should check for existence of a compiled function using dlsym before generating a new one. This would avoid all Julia->LLVM->Native JIT compilation that takes up most of the start-up cost.

JeffBezanson · 2013-04-14T04:08:04Z

A fine description of the easy part of this work.
On Apr 13, 2013 5:14 PM, "Saleh" [email protected] wrote:

After examining the code, I can see that it is quite possible with minimal
changes to the code. Like Jeff said, all the pieces are there, SO can be
used just as another level of caching.

The solution would be calling jl_compile on all regular functions and
jl_compile_hint for generic functions to generate all the LLVM code. Once
the code is generated, it can be saved to a file for offline compilation.

The bitcode can be compiled to a shared library. At Julia start-up, in
addition to the system image, the shared library is loaded along with it,
jl_compile then should check for existence of a compiled function using
dlsym before generating a new one. This would avoid all Julia->LLVM->Native
JIT compilation that takes up most of the start-up cost.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/260#issuecomment-16341008
.

tshort · 2013-04-16T17:29:14Z

@salehqt, could you implement your approach for a package to see how it works? That might be an interesting exercise, and there are packages that could use faster start-up times.

salehqt · 2013-04-16T21:22:48Z

Implementing this requires changing internals of Julia and it can break a lot of things. In the current state, Julia is implemented like C, all of the includes are imported in one big soup of AST and compiled on-demand to LLVM. The resulting LLVM module is also a soup of everything that is compiled. Julia modules and namespaces are merely scoping constructs and do not really separate the code.

I was hoping to just compile a native system image to accompany the current system image (that only contains Julia AST). Later on other developers would modularize Julia and make separate .so modules.

The first step in modularizing would be compiling packages to AST and storing them as binaries so Julia doesn't end up with problems that RubyGems had: loading and parsing too many source files at the start-up of real applications. In my experience with Julia, the front-end is one of the weakest points of Julia and you cannot rely on it for fast start-up.

This is my testing implementation. (does not really work)
salehqt@37eba2a
I am successful at extracting the LLVM code and compiling it to a shared object.
But it fails when using the so because the shared
object cannot link with the julia library, I did not spend much time in figuring out why the
symbols cannot be resolved.

JeffBezanson · 2013-04-16T21:37:31Z

Of course they are merely scoping constructs. Try disabling inlining of functions in different modules and see how well that goes over.

StefanKarpinski · 2013-04-16T21:55:35Z

Unfortunately turning inlining off will absolutely destroy performance.

diegozea · 2013-04-26T14:17:18Z

dzea@deepthought:~$ time julia-m -e "println(\"Hello\")"
Hello

real    0m1.907s
user    0m2.064s
sys 0m0.344s
dzea@deepthought:~$ time julia-m hello.j 
Hello

real    0m1.932s
user    0m2.056s
sys 0m0.376s
dzea@deepthought:~$ time julia-m --no-history -f hello.j 
Hello

real    0m1.415s
user    0m1.536s
sys 0m0.376s

Using --no-history and -f flags, start up takes ~ 0.6 seconds less. Would be great have a --script or --batch flag optimized for run scripts in this way.

jiahao · 2013-04-28T22:50:49Z

I see essentially no difference on my Macbook Air (commit 2356fb8):

$ wc -l ~/.julia_history 
    5935 /Users/jiahao/.julia_history
$ time ./julia  --no-history -f -e "println(\"Hello\")"
Hello

real    0m2.191s
user    0m2.224s
sys 0m0.104s
$ time ./julia -e "println(\"Hello\")"
Hello

real    0m2.262s
user    0m2.292s
sys 0m0.110s
$ cat > hello.jl
println("Hello")
$ time ./julia hello.jl 
Hello

real    0m2.232s
user    0m2.261s
sys 0m0.109s
$ time ./julia  --no-history -f hello.jl 
Hello

real    0m2.284s
user    0m2.313s
sys 0m0.108s

@diegozea do you have a very long ./julia_history file?

diegozea · 2013-04-30T04:04:27Z

Yes, is because is longer...

dzea@deepthought:~$ wc -l ~/.julia_history
66392 /home/dzea/.julia_history
dzea@deepthought:~$ rm .julia_history
dzea@deepthought:~$ time julia-m --no-history -f hello.j 
Hello

real    0m1.593s
user    0m1.716s
sys 0m0.376s
dzea@deepthought:~$ time julia-m hello.j 
Hello

real    0m1.569s
user    0m1.708s
sys 0m0.364s

vtjnash · 2013-12-13T05:56:51Z

do i get to close this now? (#4898)

Keno · 2013-12-13T05:57:56Z

Yes

StefanKarpinski · 2013-12-13T06:05:50Z

I dare say you've earned it. Please do the honors, @vtjnash.

* adds fieldoffset, fixes #218 * fixes fieldoffset to return a UInt * fixes fieldoffset on v0.3

This patch updates SparseArrays. In particular it contains JuliaSparse/SparseArrays.jl#260 which is necessary to make progress in #46759. All changes: ``` 4fb8f0e Fix direction of circshift (#260) ead48fe Fix `vcat` of sparse vectors with numbers (#253) d88be9f decrement should always return a vector (#241) dfcc48a change order of arguments in fkeep, fix bug with fixed elements (#240) 43b4d01 Sparse matrix/vectors with fixed sparsity pattern. (#201) ```

- Fix for change in CodeInfo.slotnames type on julia master (#251) - Add a way to break on throw (#253) - Exclude Union{} from is_vararg_type (#254) - Various performance improvements (#254)

JeffBezanson mentioned this issue Feb 19, 2012

libamos error running ubuntu binaries #381

Closed

pao mentioned this issue Jul 18, 2012

Slow Interpreter #1064

Closed

ghost assigned JeffBezanson Apr 12, 2013

JeffBezanson mentioned this issue Apr 28, 2013

Julia on Sublime Text on Mac #2963

Closed

ihnorton mentioned this issue Jun 18, 2013

Failing to build with llvm3.3 #3418

Closed

JeffBezanson mentioned this issue Jun 18, 2013

compiler optimization tracker #3440

Closed

65 tasks

ihnorton mentioned this issue Jul 4, 2013

WIP: generate sysimg.so #3625

Closed

6 tasks

ihnorton mentioned this issue Sep 27, 2013

Improving package load times #4373

Closed

quinnj mentioned this issue Oct 8, 2013

Hello world is slow ... #4452

Closed

staticfloat mentioned this issue Nov 19, 2013

Roadmap for 0.3 #4853

Closed

21 tasks

vtjnash closed this as completed Dec 13, 2013

mmagnuski mentioned this issue Oct 18, 2014

Julia 0.3.1 slow startup on win64 (no sys.dll) #8724

Closed

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018

adds fieldoffset, fixes #218 (#260)

97a8849

* adds fieldoffset, fixes #218 * fixes fieldoffset to return a UInt * fixes fieldoffset on v0.3

JeffBezanson mentioned this issue Jul 13, 2018

startup time #28092

Closed

Keno pushed a commit that referenced this issue Oct 9, 2023

JuliaInterpreter v0.4.1: (#260)

901df4d

- Fix for change in CodeInfo.slotnames type on julia master (#251) - Add a way to break on throw (#253) - Exclude Union{} from is_vararg_type (#254) - Various performance improvements (#254)

suggestions for faster startup #260

suggestions for faster startup #260

Comments

StefanKarpinski commented Nov 12, 2011

JeffBezanson commented Nov 12, 2011

StefanKarpinski commented Nov 13, 2011

ViralBShah commented Dec 17, 2011

JeffBezanson commented Dec 17, 2011

JeffBezanson commented Dec 17, 2011

StefanKarpinski commented Dec 18, 2011

ViralBShah commented Jan 8, 2012

andychu commented Feb 19, 2012

StefanKarpinski commented Feb 19, 2012

ViralBShah commented Feb 19, 2012

andychu commented Feb 20, 2012

JeffBezanson commented Feb 20, 2012

ViralBShah commented Feb 20, 2012

StefanKarpinski commented Feb 20, 2012

salehqt commented Apr 12, 2013

salehqt commented Apr 12, 2013

JeffBezanson commented Apr 12, 2013

StefanKarpinski commented Apr 12, 2013

salehqt commented Apr 13, 2013

StefanKarpinski commented Apr 13, 2013

StefanKarpinski commented Apr 13, 2013

JeffBezanson commented Apr 13, 2013

ViralBShah commented Apr 13, 2013

salehqt commented Apr 13, 2013

StefanKarpinski commented Apr 13, 2013

salehqt commented Apr 13, 2013

JeffBezanson commented Apr 14, 2013

tshort commented Apr 16, 2013

salehqt commented Apr 16, 2013

JeffBezanson commented Apr 16, 2013

StefanKarpinski commented Apr 16, 2013

diegozea commented Apr 26, 2013

jiahao commented Apr 28, 2013

diegozea commented Apr 30, 2013

vtjnash commented Dec 13, 2013

Keno commented Dec 13, 2013

StefanKarpinski commented Dec 13, 2013