-
Notifications
You must be signed in to change notification settings - Fork 37
Precompile interface file option (better paralellize build) #174
Comments
First of all, let me list the current limitations to help you compare things in a fair way:
Interesting idea, I never thought about this. I think we can add support to it in the new build system without much difficulty. At the moment we have the following rule: matchBuildResult buildPath "hi" ?> \hi ->
need [ hi -<.> osuf (detectWay hi) ] Basically it says: if you'd like to get a |
@snowleopard yes, that's exactly what I had in mind. The only problem here is that probably for interface files to stay intact or applicable for *.o compilation you need to use exactly the same compiler params like for later *.o compilation. -- my wild guess. Anyway, if there is a possibility to build just necessary hi file that easily, then I'm ready to give it a try on my heavily threaded slow sparc box next week and see if with 32 thread I get better performance. Thanks! |
@kgardas I can enforce absolutely the same build flags for One question: do I have to do |
Looks like I don't need to use |
I've added initial support for this feature. To activate it set Note, I sometimes experience the following errors that I cannot explain:
As you can see this happens in the |
@snowleopard First of all thanks a lot for your fast reaction to this RFE. I'm testing the feature and twice I've seen this error:
perhaps we stretch GHC too much? Also what little bit worries me is sometimes messages
although I've correctly configured system gmp... |
@kgardas: see https://ghc.haskell.org/trac/ghc/ticket/11331 for that panic. They are working on it. |
@snowleopard w.r.t. gmp issue, after doing proper gmake clean; rm -rf .build inplace; configure I see gmp is detected well so this was perhaps some error caused by merge and not running cleanup/configure properly. I'll keep an eye on this anyway. |
@thomie thanks! This really helps to know this. |
I also see failures of GHC sometimes, especially when parallelism is high. Next time I see one, I'll check whether it is the same error. @kgardas Other than spurious GMP issue and GHC panicking, was the build successful? Note I've just pushed a fix to suppress another lint error related to GMP: f63e9db. |
Yes, this is strange. Are you sure |
@snowleopard I'll double-check and will let you know. This was on -j12 build. Now I'm trying simple -j1 |
@snowleopard complete side note, I'm not sure if this is for another issue or not. I usually hit following error when rm -rf .build inplace and when starting build without -B option:
this is very reproducible. Perhaps also pointing to a need to have proper shake clean way... |
@kgardas There is |
@snowleopard it looks like lookupVers2 GHC panic is a showstopper for me now. Anyway, on completely clean build which was done as
so -j1 and with compileInterfaceFilesSeparately = True -- I'm able to see 109
the panic hits me while compiling |
@kgardas I'm afraid I don't have any insight into why we see |
The issue with |
@snowleopard few cases I've investigated and the behavioural pattern is still the same:
|
@kgardas Ah, I see! Now I understand what's going on.
I think this is because the We probably need to add (one of?) the following steps:
Maybe we only need to do the second step. I'll do a quick experiment and will commit a fix if it works for me. |
I confirm that I'm also hit by the This also happens when generating interface for |
I've committed some further work on this. The build has finished now, improving average parallelism from |
@ezyang might have something to say about this ticket. |
Just to clarify the reason behind the lint errors in the current implementation: we first create Is there a way to disable writing of interface files in the normal mode of operation? While searching I came across |
Shot in the dark: you can use -hisuf to divert them. What happens when Em domingo, 17 de janeiro de 2016, Andrey Mokhov [email protected]
|
@kgardas ok, willing to believe in certain situations code gen could dominate. Could you measure on one module you think makes a difference? A useful number to guide us. This technique trades additional cost for more parallelism, but I always want parallel for free where possible. I wonder if GHC could be persuaded to generate the C plus the command to compile, then Shake could do both pieces separately. |
Obviously, typechecking can't be parallelized, unless you actually and go and modify GHC a bit. The hope here is to parallelize optimization and code generation. Unfortunately, the results you get optimizing here are not going to be as good as doing it the normal way. When you write out an interface for In principle, such a system may still be useful for incremental recompilation, because avoiding unfoldings means that things recompile less when you make modifications; it also means that you can parallelize optimization and code generation (so this IS a little different from compiling with In any case, the current implementation is a bit dodgy, because GHC does not guarantee that an If you still want to implement this, here's how GHC could be adjusted to make this possible:
|
Brain wave! I think @ezyang's comments mean this isn't going to really give us what we were hoping for. But, there is an alternative which might be faster in all circumstances, never noticeably slower than a normal compile, and therefore could become the default. My scheme:
You get the parallelism of compiling and codegen separately, without having to do anything funky to GHC. There's the assumption that GHC won't break if we replace the .o file after (I think that's fine. Compilation checking is the only possible niggle, and I think compilation checking is timestamp only.) and that GHC doesn't use the .o file in that compile (I don't think that's true for stub files, but I think it's true most of the time). |
@ezyang thanks a lot for the indepth explanation. I was afraid that we may hit the wall here by exploring paths not directly tested/supported by GHC. From your description it looks like -fwrite-interface -fno-code combination is simply GHC buggy as it write interface too early, probably on different place than usual interface write is done during normal compilation. Your reference to GHC ticket leads to quite a lot of other information w.r.t. backpack etc which is quite hard to distile. |
@ndmitchell interesting idea, but what about different thing? Let's test (if it does already) or change GHC to write hi files as quickly as possible (but reliably like @ezyang pointed out), it should be done kind of transactional so write hi to hi_promise and once done rename hi_promise to hi (or kind of that) and then continue with compilation to asm/llvm/C as usual. Then hack shake to not wait for ghc compiling hs -> hi,o but check for hi only and if it's there it can fire out new compilations of dependent modules. Am I clear on this? IMHO this is what GNU make is not able to do (or current build system) and this is why it sucks on parallel compilation... |
@kgardas - I think my hacking the C file is just an implementation of promises that would be simple and quite robust. Having a Shake rule that starts, produces something, but then continues and produces more without the possibility of pausing it, doesn't really fit with Shake. Polling for the .hi file or using notify techniques is a pain. I think the benefit over my promises technique would be small (you save computing the ASM, but little else), and the cost high (polling, watching, Shake side implementation). |
I should say, while this "doesn't really fit" with Shake, I have no doubt it is somehow possible, so if that becomes the only reason holding this back, I'll have a think. |
@ndmitchell your hacking on C compiler is interesting but I fear the build system for unregisterised and registerised builds will be different which is something I would try to avoid. Or well, perhaps I do not understand how hard would be to put your idea into GHC's shake build machinery... |
@ndmitchell yes, my idea is about polling for kind of intermediate result since other modules compilations depend on this intermediate result and not on the actual result of compilation. i.e. hi files versus o files. Surely final linkage depends on o files, but this is not important in situation where you E.g. see build process compiles DynFlags.hs and you wait ~30 minutes for C compiler to finish compiling generated DynFlags C files while hi is already on the drive and your machine is idle since only one thread from 32 available is working. ;-) (DynFlags is classical example which I need to break here). |
Hmm, true, I hadn't thought of registered vs unregistered - that may be the place where GHC mangles the .o file afterwards, which is mostly fatal to my technique. That said, maybe combining our techniques gives something better - when GHC calls out to the C compiler we know the .hi file is done, so we could use that as the trigger to allow everything else to continue. |
Is it really 30 mins to build DynFlags C code? That seems like a bug - as though GHC is tickling some bad complexity in the C compiler. Is that using gcc, or some ancient system compiler? |
@ndmitchell next week I will measure DynFlags for you, but please keep in mind this is UltraSPARC T1 so basically 1GHz single-issue in-order 8 core with 4 threads per core machine. Old and tuned for highly parallel work indeed. Also if you are curious just test --enable-unregisterised on Linux and see how build times differ. NCG is really a kind of speed here... Unfortunately I still do have some bugs to fix in SPARC NCG before the build may be sped up this way... |
@ndmitchell small correction to your fake C compiler idea. The problem to solve is that at the time real C compiler is invoked original C source is long time gone since this is GHC's temporary file which is deleted on GHC process exit (more or less). So your idea involves also copying C code to some other temporary location to prevent its deletion and compiling this then. |
Let me add, it's relatively straightforward using the GHC API to bail out before running the compiler/assembler. But I really don't think it's C's fault. For example, on DynFlags, I'm pretty sure the bottleneck is due to instance deriving on the giant data structure. |
@ezyang by your last note about DynFlags, do you suggest this also means that DynFlags.hi will not be generated that soon as I hope so? Well, will need to test this in real for sure... |
I don't know. Here's the GHC bug tracking: https://ghc.haskell.org/trac/ghc/ticket/7258 Whether or not it generates quickly enough depends on whether or not the type checker is slow (if so, nothing will help), or if the optimizer is slow (if so, |
Thanks all for your input. Very interesting discussion and I hope we'll eventually find a way forward. In the meanwhile, shall I remove my experimental implementation from the codebase? It looks like it won't bring us to a right solution. Or will anyone still like to play with it? The |
@snowleopard could you be so kind and keep it at least for benchmarking purposes? IMHO if it's done correctly then it may represent times of parallel build we can get either with shake polling for hi file as intermediate or with fixed ghc emitting correct hi file...At least good for the reference, isn't it? I think @ndmitchell idea of fake C compiler may run only a little bit slower due to a wait for actual C code to be generated... |
@kgardas OK, let's keep it for now. Let us know if you get any interesting benchmarking results. |
Hi, |
A note about correct and incorrect hi files for DynFlags:
so hi-no-foce-write-interface is generated by -fno-code -fwrite-interface which took those 20 seconds. The other one is from ordinary compilation and took ~13 minutes to get to its write. |
I've experimented some with this too, though in the makefile system (most of it messing about just to get the rules to run properly for .hi and .o files). The current state of that mess is at https://github.com/olsner/ghc/commits/separate_hi_2 That included some code to write the "full" interface with -fno-code, attempting to get the same interface generated as when running without -fno-code: olsner/ghc@b762186 An issue with this is that the interface sometimes changes when building the .o file, and then you end up with e.g. DynFlags.o compiled against a new interface and dependent modules built against an old mismatching interface, and the dependent modules sometimes end up referencing symbols that aren't there or got different names in the actual object file... To detect that, I added some code to panic instead of updating the interface when subsequently compiling the .o file: |
There hasn't been much activity here and it looks like the approach we were exploring got stuck anyway, so I'm inclined to close this issue for now. I will also remove the associated experimental code, as it often makes it more difficult for me to work with the build system. We can always bring it back if need be. |
https://ghc.haskell.org/trac/ghc/ticket/4012 seems to have made good progress since my last experiment, so it might be worthwhile to make an attempt at picking this up again. It would probably also catch #216 as a bonus, by having the .hi files as actual targets rather than by-products of .o compiles. |
@olsner Note: issue #216 can be solved by using Shake's multiple-output rules, which I planed to implement soon (currently progress on Hadrian is slow due to various other commitments). This is not too difficult, but requires some refactoring of Hadrian. Still much simpler, I think, than solving this issue. However, if you wish to come back to this issue and optimise the build that would be great! If you'd like to do so, I'd suggest to open a new issue with an outline of the proposed approach, because this thread got a bit too long to make sense of all the discussions. |
I've done small experiment on 6 core xeon comparing shake -j12 and gmake -j12 build on Solaris 11.2 on the same code base.
Shake:
Gmake:
I've disabled haddock in gmake build, but I'm still not sure both gmake and shake build things in equal way, I'm afraid gmake may build more so basically performance of builds is let say the same.
In https://mail.haskell.org/pipermail/ghc-devs/2015-March/008474.html I've provided some data about performance of highly parallel build. As you can see performance was quite bad at that time but it looks like shake based build is around the same perf. My idea from following builds rolling on the console is that sometimes build just waits on one or two files for which it needs to have interface file generated since this is dependency for a set of other files which need to wait in a queue instead of being compiled quickly. So my idea is if we somehow are able to divide actual file compilation and generation of the file's interface file, then the performance of the parallel build may be much higher. It looks like GHC supports this option with -fno-code -fwrite-interface command-line options. Now the question is how hard would be to add that capability to shake-based build? Thanks a lot for consideration!
The text was updated successfully, but these errors were encountered: