Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error #E component 'PDFFile' must be bound to a string denoting a relative path to a readable file in macOS CI #3967

Open
lgoettgens opened this issue Jul 24, 2024 · 16 comments · May be fixed by #4421
Labels
bug Something isn't working package: GAP

Comments

@lgoettgens
Copy link
Member

Seen on a recent master, e.g. in https://github.com/oscar-system/Oscar.jl/actions/runs/10066028282/job/27826455858#step:9:239.
It seems to only occur on the macOS runners.

The only reference to PDFFile I found in this repo is

PDFFile := "doc/manual.pdf",
so I suspect the GAP interface to be somehow the culprit of this.
Maybe @fingolfin or @ThomasBreuer have some insight into this?

More breadcrumbs:

@lgoettgens lgoettgens added bug Something isn't working package: GAP labels Jul 24, 2024
@lgoettgens lgoettgens changed the title Error #E component PDFFile' must be bound to a string denoting a relative path to a readable file` in macOS CI Error #E component 'PDFFile' must be bound to a string denoting a relative path to a readable file in macOS CI Jul 24, 2024
@fingolfin
Copy link
Member

The comment could in principle refer to any of the 150 packages shipped with GAP and being loaded. But they all should have their manual as a PDF bundled, except for OscarInterface and JuliaInterface.

This message shows up when something calls ValidatePackageInfo on a package's metadata. I'll submit a PR to avoid this warning for OscarInterface

But I wonder: why is it being shown at all? Why / what is validating GAP package metadata here? And is it really OscarInterface it complains about, or something else? Perhaps @ThomasBreuer has an idea?

@lgoettgens
Copy link
Member Author

maybe doing something similar to #3968 for JuliaInterface (and JuliaExperimental) gets rid of it? (But that of course needs #3688 to be merged before we can see any changes here in CI)

@ThomasBreuer
Copy link
Member

Code in GAP's PackageManager calls ValidatePackageInfo in several places, I guess this is the source of the output (without having verified this).

One of the problems is that ValidatePackageInfo calls Print if it finds something to complain about; this function should have a quiet mode, and PackageManager should pass its own quiet flag on to ValidatePackageInfo.

(I do not understand why we did not get these messages earlier.)

@lgoettgens
Copy link
Member Author

After some digging around I now have the following additional information:

  • This happened before switching recog to the github version as well (rearrange how Oscar loads GAP packages at runtime #3874), e.g. on the CI run of the v1.1.1 tag.
  • On a recent master, if replacing the RPTU-macOS-runs with github's own runners, the message does not get printed, neither on macOS-latest nor macOS-13 or macOS-12, so the issue seems to be independent of the host architecture.

So @ThomasBreuer and I think that this is somehow due to the way that the RPTU runner re-uses things from previous runs.

@aaruni96 will look into properly separating and cleaning up the Julia Homes on the RPTU runner when he is back in KL.
Let's hope that this then resolves this issue

@fingolfin
Copy link
Member

@lgoettgens @ThomasBreuer so which things are re-used? Is this about a git clone of a dev version of recog sticking around?

If so it could probably be solved by doing rm -rf ~/.gap ~/.julia/gaproot at the start of our CI workflows (at least when they are running on the self-hosted runners).

Or is there anything else you suspect might be relevant?

@lgoettgens
Copy link
Member Author

This unfortunately won't work, as it will introduce race conditions between the different runners on the self-hosted machine.
But combining this together with separating the JULIA_HOMEs of the different runners is exactly what @aaruni96 wanted to try

@fingolfin
Copy link
Member

@aaruni96 already separated them now

@lgoettgens
Copy link
Member Author

I just checked the latest CI run on master (which was 3 days ago), and this doesn't have these prints anymore.
See https://github.com/oscar-system/Oscar.jl/actions/runs/10130857450/job/28149030285#step:9:352.
Thus I think this issue can be closed

@aaruni96
Copy link
Member

which was 3 days ago

I made the change last evening, and re ran just the short test, to make sure nothing is broken as a side effect.

But the long test appears to have failed "one hour ago" with a similar (but not the same) error ( https://github.com/oscar-system/Oscar.jl/actions/runs/10130857450/job/28149030653#step:9:240 ). You can see that its using artefacts from the runner specific depot

Error, file "/Users/aaruni/Desktop/oscar-runners/runner-4/julia-depot/scratchspaces/c863536\
a-3901-11e9-33e7-d5cd0df7b904/gap_14920021487703909718_1.10/pkg/gapdoc/lib/GAP\
Doc2HTML.gi" must exist and be readable

This might be related to the same problem ?

@aaruni96
Copy link
Member

Note, that we have just done a separation of the depots, there is still no "cleanup" phase which removes stuff for a fresh start.

@aaruni96
Copy link
Member

Github runners do allow Pre / Post hook scripts ( https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/running-scripts-before-or-after-a-job ), so, a decision needs to be made about what we should do in the cleanup phase, and then I can work on making it happen.

@fingolfin
Copy link
Member

I am surprised to see this error about GAPDoc2HTML.gi missing -- the suspected cause for this (race condition) should be eliminated now that the runner have separate depots. I just had a look at the machine and the file is there now...

But this suggests that in addition to the two directories I already suggested, perhaps we should also rm -rf JULIADEPOT/scratchspaces ?

@lgoettgens
Copy link
Member Author

I think we should just clear the complete JULIA_DEPOT and instead use the julia-actions/cache action again. This should cache exactly those parts that are safe to keep between runs.

@fingolfin
Copy link
Member

I disagree. That wastes many dozens of gigabytes of bandwidth every day.

@lgoettgens
Copy link
Member Author

I disagree. That wastes many dozens of gigabytes of bandwidth every day.

True...

So removing .gap, .julia/gaproot, and .julia/scratchspaces (or whatever they are called in the multi-depot setup) in a job hook seems to be a good compromise

@fingolfin
Copy link
Member

Is this still occurring anywhere for anyone?

@fingolfin fingolfin linked a pull request Jan 9, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working package: GAP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants