-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Base.build_sysimg() #7124
Conversation
What if system image files installed in read-only location (for user), say in Linux, and being managed by package managers? I think this question was asked already. |
why does this need a full-blown this should probably be a future pull-request, but it would also be cool to have it make a |
Would be nice if building the .ji sysimg can be performed regardless of whether or not a compiler/linker can be found. Only a handful of Windows users can use this otherwise. There is the .bat file for this, but would be cool if as much as possible can be done through the same command on all platforms. |
This is great but looks like it should be an external script. You don't really need to be able to call it as part of the standard library. |
Maybe a package? I feel this could be easier to maintain in julia, since you don't have to worry about bash-isms and cmd-isms.
The llvm lld project is still a WIP, however, shipping |
@vtjnash: Whats the size of We cannot use the pure It seems that this is a very crutial thing to be solved in order to allow precompilation of arbitrary packages on the user side. So if shipping a linker is not too much of a problem I think it is absolutely justified. Or is there some other masterplan that I missed how to enable package precompilation? |
You can build the system image wherever you want, so you could put it in
Good point; I've switched it over to
You're thinking ahead a few steps and wanting to make stand-alone Julia programs that are simply linked against
Julia won't startup without a
It certainly seems that way right now, but I have a couple of reasons for not wanting to make it an external command:
|
I was thinking this could also replace the use case of |
Is there some drawback to use the |
The implementation in julia is totally fine, I just want to run it as |
Ah, I see. I agree; I'll put it in |
Alright, I've pushed a new version of the commit up. I'm going to copy-paste the commit message here: Support building of system image in binary buildsThis commit adds a few new pieces of functionality:
When testing this change out, I found this gist helpful to put into my |
If I had to give one criticism about this however, it's that I've been unable to find an instance where the difference between an
I even added in support for
It also surprises me that the |
This sounds really great, but I don't understand what happens by default: is a native image built and loaded automatically on subsequent starts? Do you need to build the image manually? As I see it, most people are going to use distribution packages, in which case it is great to be able to build specialized images in addition to the (But indeed the fact that |
Do you have a Haswell CPU? If so, you're seeing #7155 |
|
||
const char * cpu_target = jl_compileropts.cpu_target; | ||
if (strcmp(cpu_target, "native") == 0) | ||
cpu_target = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that an empty target string defaults to i386 as a lowest common denominator and not to native.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty string is what we were passing before, but I'm definitely not knowledgable about LLVM, so if you've got documentation you can share, I'd love to go over it. I can't find much about setMCPU()
, but then again I don't really know where to look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find some reliable documentation and the only thing I know is that clang defaults to i386 if no -march and -mcpu is given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't clang though. This is MCJIT, which defaults to cpuid (from TargetSelect.cpp):
Triple TheTriple(TargetTriple);
if (TheTriple.getTriple().empty())
TheTriple.setTriple(sys::getProcessTriple());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't TargetTriple -march?
In trunk MCPU is just passed through to createTargetMachine (compare https://github.com/llvm-mirror/llvm/blob/0b6cb7104b15504cd41f48cc2babcbcee70775f3/lib/ExecutionEngine/TargetSelect.cpp#L100)
And the question is what the backend is going to do with it. I haven't found the implementation for that yet.
And the TargetTriple just makes sure that the right bitcode is created for x86_64 vs x86 and linux vs mac vs windows. While MCPU sets the capabilities of the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's my point with MCPU we are setting the CPU and not the target triplet and also not MARCH.
I think we probably should set it like here https://github.com/llvm-mirror/llvm/blob/master/unittests/ExecutionEngine/MCJIT/MCJITTestBase.h#L328
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vchuravy if you can give me example lines to put in, I'm willing to experiment. We've already got some pretty good evidence that the .setMCPU("")
call is causing AVX instructions to be emitted (whereas .setMCPU("i386")
is stopping at sse2
) but perhaps there are other considerations to be taken into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, .setMCPU(sys::getHostCPUName())
has the same effect as .setMCPU("")
. Indeed, it's when I discovered that sys::getHostCPUName()
was returning a generic x86 string for a "Haswell" processor that I worked on implementing #7155.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @ArchRobison is correct. If you still don't believe me, I'd encourage you to step through the function you referenced in a debugger to see the control flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see an effect.
~/src/analyze-x86/analyze-x86 sys_i386.o
instructions:
cpuid: 0 nop: 42077 call: 0 count: 490683
i486: 1
i686: 1104
mmx: 7625
sse: 8915
sse2: 482
~/src/analyze-x86/analyze-x86 sys_native_pre.o
instructions:
cpuid: 0 nop: 264 call: 0 count: 457087
i486: 1
i686: 1105
mmx: 7625
sse: 8915
sse2: 482
~/src/analyze-x86/analyze-x86 sys_native_post.o
instructions:
cpuid: 0 nop: 258 call: 0 count: 459942
i486: 1
i686: 1107
mmx: 7376
sse4.2: 3
avx: 9720
Where I applied the following change to codegen.cpp in L4330
if (strcmp(cpu_target, "native") == 0)
cpu_target = sys::getHostCPUName().data();
The sysimages where generated by
../julia -C i386 --build /tmp/sys_i386 sysimg.jl
../julia -C native --build /tmp/sys_native_pre sysimg.jl
Applying change and rebuilding julia
../julia -C native --build /tmp/sys_native_post sysimg.jl
I am on a second generation i5 (Sandybridge???) and my architecture should be something like corei7-avx.
So for me there is a definite change when I specify my host architecture and I am also seeing llvm defaulting to i386.
Edit: Oh and I am on llvm trunk, if that matters.
@nalimilan What happens by default, (e.g. if you run That's the main problem with shipping multiple system images right now; because you will have to supply the @Keno doh! I read that issue, and I was like "Haswell..... that seems pretty recent, so that can't be what I have!". Turns out it is! I applied ArchRobison's patch, rebuilt LLVM and I now get more interesting results from
Unfortunately, I still don't see much of a difference in benchmarks. The only one that has any kind of difference is |
@staticfloat Images could be called |
Or we could try and do what OpenBLAS does for dynamic arch. That would probably be a lot more work than having multiple libraries and loading the right one. |
Until we know for certain that there is a measurable performance difference between cpu targets in the system image, this is all moot. If we can ship an |
I'm not sure I understood you correctly, but from the discussion above I got the feeling that the CPU target of the image determines the instruction sets that are enabled when compiling all further code. This would mean that with an |
BTW, regarding @staticfloat's comment #7124 (comment): the
(Though strictly speaking i486 instructions should not be used if somebody wanted to run Julia on a true 80386 machine... designed when I wasn't even born. :-). |
Ah, that makes perfect sense. I currently have a shortage of 32-bit processors around me, so I didn't notice. ;) I also think that if someone wants to run Julia an an 80386, I will build a custom binary for them myself.
Yes, that is what I expect, and what is shown by |
I am surprised by that analysis. When I implemented that option I ran the perf test and noticed about a 20% performance drop on the micro benchmarks when using "core2" as opposed to "native". |
Here are my results, from which I draw this analysis. First; build this branch (I've rebased it on top of ArchRobison's LLVM patch, but you'll have to rebuild LLVM to reap the benefits of that) and create three separate system images:
Next, run the
Note that I have the aforementioned gist as my
|
@Keno when you get a free moment, I'd love to hear your input on reasons why I might not be seeing any differences between cpu architectures. |
This commit adds a few new pieces of functionality: * The `contrib/build_sysimg.jl` script which builds a new system image. This method can save the system image wherever the user desires, e.g. it could be stored in `~/.julia`, to allow for per-user system images each customized with packages in their own `userimg.jl`, for instance. Or on a restricted system, this allows for creation of a system image without root access. * The removal of compile-time `JULIA_CPU_TARGET`, in favor of runtime `--cpu-target`/`-C` command-line flags which default to `"native"` but can be set to `"native"`, `"i386"` or `"core2"`. This allows the creation of a system image targeting user-supplied cpu features, e.g. `cd base; ../julia -C i386 --build /tmp/sys_i386 sysimg.jl`. * I implemented runtime selection of the cpu target by adding a new member to the `jl_compileropts_t` structure called `cpu_target`. * Because all julia executables are now created equal, (rather than before where a julia executable needed to have the same `JULIA_CPU_TARGET` set internally as the system image had when it was built) we need to know what CPU feature set the system image is targeting before we initialize code generation. So a new function `jl_get_system_image_cpu_target()` is exported, which does exactly what it sounds like. * I added newlines to the end of a few error messages. * I found an old parser option `-T` which hadn't been removed yet, so I took the opportunity to do so. When testing this change out, I found [this gist](https://gist.github.com/staticfloat/93d7050a08ff7bb52373) helpful to put into my `~/.juliarc.jl`
Rebased on top of recent tweaks to |
This is basically just a julia translation of the makefile targets for generating
sys.$(dlext)
fromsys.ji
. I'm sure there are border cases where this won't work, but I've tested it successfully on 64-bit Linux and OSX 10.9.My thinking behind this is that we can then ship a
sys.ji
that has been created with.setMCPU("x86")
, or.setMCPU("core2")
, or whatever, but after we've started up, looked around, and decided that it is possible to regenerate the system image, we can build a native system image that should hopefully be faster.@Keno (nice!), @vtjnash, @JeffBezanson what do you guys think of this? If this general approach is looked upon with favour, I will add on to this PR with commits to make the
.setMCPU()
calls a little more dynamic. As it stands now, if we distribute a.setMCPU("i386")
binary, even if we rebuild the system image like this, we don't get any benefit, since the binary already has these restrictions baked in.In the meantime, this branch can be tested out by installing Julia, (e.g.
make install prefix=/my/temp/prefix
), deletingsys.{so,dll,dylib}
, and then opening Julia and runningBase.build_sysimg()
. This should "just work", or print out intelligent errors when something doesn't work. I think something like this would be much nicer for our "power users" than manually mucking around with system images and such, and even allows for users to play around with theiruserimg.jl
to get package insta-loading, regardless of whether they built from source or not.