-
Notifications
You must be signed in to change notification settings - Fork 37
Successful build error #206
Comments
It's very weird. If we had a little more call stack that would help - building with GHC 8.0 might give that to us. I've also seen this "error" some times - next time it happens can you immediately do |
@snowleopard, you said something about opening this file, writing it, opening it again and modifying it? Can you point me at the code that does that? I can only assume that's at least tangentially related to the issue. |
Sure, here is the line that does post-processing of Note, the second example above does not involve rewriting a file, however, what they both have in common is |
On https://github.com/snowleopard/shaking-up-ghc/blob/master/src/Rules/Data.hs#L138 I see:
Can you give a bit more background on that? As far as I am aware, I support slashes in all the normal places. If there's a difference from GNU Make at parsing the dependency patterns I'd like to fix it. |
Sure, here are some problematic lines:
Note the slashes in names of the variables, and some dollars here and there. I don't think you'll be able to do anything about dollars (apart from ignoring such lines instead of failing), but fixing the slash issue shouldn't be difficult. |
Do they both have I've just reviewed |
@snowleopard, ah, those are variable definitions, which is what Shake doesn't really like - the makefile parser is designed to slurp things from |
In fact, I think the optimal form is:
This ensures that |
Thanks, I'll fix
Looks like I'm mistaken. I think I can't use |
Yes, not saying you should use |
Have a look at this line: https://github.com/snowleopard/shaking-up-ghc/blob/master/src/Settings/Builders/Ghc.hs#L31. This is where I read the file created by that call to |
Nothing obvious... Do you use readFileLines before writing it? That might be a potential problem. |
Well, |
I realised that file With this file gone, I suggest to close this issue. |
I note we've had the issue occur with a few different files, and your patch only removes one. I have a suspicion this one might recur - but we can always open another when it does. |
@ndmitchell There are actually two patches:
So, I hope we won't come across the instances reported at the top again. But I agree, we might come across others in which case we can just reopen this issue. |
@ndmitchell Were you hit by this again? Which file this time? |
I just got:
So it remains an error. Another build causes more to build, so it seems it didn't finish. So I wonder what's going on... Maybe lint checking is failing? Maybe something else? I'd love a stack trace... Do we still do lint checking even if there is a failure in the build (answer: no)? Does timing print regardless (answer:yes)? Does something change directory, which would result in the file not existing (at least the relative file)? Why did it do lint checking happen if the build didn't finish? |
Ouch, this seems more serious. I always thought that Shake starts to print build stats and do lint checking only after a successful build, but it appears not to be the case? By the way, as part of the work on #207 I accidentally broke the build yesterday, which I have hopefully just fixed by 5fcb480. Not sure if this could be related, but in any case your error is interesting to investigate. |
No, build stats are always printed (if you request that using the flags). They are really intended for debugging (although showing them always is quite happy). It's possible your recent fix was the root cause, but I remain deeply skeptical about what's going on... |
I actually meant that I thought Shake does link checking only after a build completes successfully. However, we have It seems though that we are still hitting the same problem: lint is trying to open a file that hasn't yet been properly closed. In case of |
Shake only does lint checking after a successful build. When I reran the build built more. Therefore it was not a complete build. I am just very confused... And the file is not there. If it were merely open already that should be fine. But it is there. Again, confused... |
I've modified Shake with lots more try/catch around the relevant pieces, and logging saying its starting lint/finishing. I'm now running a loop which wipes/builds repeatedly until it hits an error. Hopefully that will flush out anything transient. |
Hopefully this will help to solve the mystery 👍 |
I can't reproduce it either. Lets assume it got fixed and reopen if it didn't. |
I keep getting:
Which looks like a Shake bug to me, and unrelated to this issue - I'll fix that. |
@ndmitchell Can you clarify what's going on? I'm still lost. |
@snowleopard - not really sure. Are you getting random lint failures? I suddenly am this morning. I even wonder if it's my computer reporting dodgy modtimes at various times (it sometimes happens). |
@ndmitchell I've never seen random lint errors as far a I remember. Well, apart from the strange behaviour we are investigating in this issue, but it's not yet certain that lint is to blame here. Are you using the same version of Shake as I (the released one)? |
I'm using HEAD. But it worked fine a few weeks ago, and I don't think anything relevant has changed, so I'm still confused. |
Perhaps, to provide another point of reference, you could follow the steps I listed in #211 (comment) on your VM installation? Let me copy the steps here for convenience:
I've just repeated the steps on a different Windows machine and the error is still reproducible, albeit with a different file:
So, things seem to be converging on |
And I'm getting a lint error exactly where you are getting an openFile error. Maybe this is really a lint error that comes out wrong? Or the lint is in some way important? |
OK, I'll debug this tonight, but sounds like we've caught something. |
Thanks! We might want to re-examine the -- Transform a given file by applying a function to its contents
fixFile :: FilePath -> (String -> String) -> Action ()
fixFile file f = do
putBuild $ "| Fix " ++ file
contents <- liftIO $ IO.withFile file IO.ReadMode $ \h -> do
old <- IO.hGetContents h
let new = f old
IO.evaluate $ rnf new
return new
liftIO $ writeFile file contents Is there a chance it could still let a rule (which called it) finish before the file is written to and closed? This is the only sane explanation of the above errors I can come up with, because it can cause both lint errors (rule finished, modtime/value locked, but the mod time is changed again when the file is closed), and |
No, fixFile looks correct, it's pretty simple. I'll investigate it though... |
I note that you are still building |
Build products produced by The relevant lines are: -- TODO: get rid of this, see #113
liftIO $ IO.copyFile inTreeMk dataFile
[...]
postProcessPackageData context dataFile As far as I can see, this shouldn't be relevant to this issue, but I may be wrong... |
I'm sure it's relevant, I just don't know how :) - best theory is Shake gets confused with also files and lint checking, but still not sure just how yet. |
Ah, could it be because two logically different rules are mixed in one |
It does seem that there are two separate processes which generate the separate files, so splitting it seems the simpler way to go. But I'd still like to track down this bug separately, since I suspect it is in Shake itself. |
I found the bug :) - ndmitchell/shake#427 My guess is that in the last release it comes up with the error we saw, and in the latest release I fixed something (absolutely no idea what!) so it now gives the error I see. Perhaps just disable lint checking until we've got this squashed? |
I've added a new bug-upstream label, to indicate things that are bugs in the build system, but that are the fault of something upstream (typically Shake) - if that tagging doesn't suit you just delete it. |
Cool! Thanks for hunting the bug down :-) The new label is fine!
I'm very rarely affected by this, and CI never fails because of this. Lint checking on the other hand is very useful, so I think we better keep things as they are. The important thing is that we now know it's not a bug in the build system. Do I understand correctly that if I split the rule in two (which seems to be a reasonable thing to do anyway) we will no longer be hit by the bug? |
Correct, splitting will fix the issue here - although it could equally happen in other also-file rules. It sounds like splitting is also semantically the right thing to do anyway. |
OK, I'll do the splitting and will close this issue. |
After a build is successful we sometimes see a mysterious error:
or
I can see that the file does actually exist.
The current conjecture is that this happens during Shake's lint checking phase.
If I run a rebuild, nothing gets rebuilt, which confirms that the build was actually successful.
The text was updated successfully, but these errors were encountered: