Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.2.12 fails to work on CentOS7, but 1.2.11 was fine #178

Open
iakov opened this issue Jan 27, 2024 · 15 comments
Open

Version 1.2.12 fails to work on CentOS7, but 1.2.11 was fine #178

iakov opened this issue Jan 27, 2024 · 15 comments

Comments

@iakov
Copy link

iakov commented Jan 27, 2024

Thus, this action fails to work where it was used via @v1.2.
Please, remove this release or re-release as v1.2.14 the backward-compatible version.
The problem is hidden in the cache action version 3.2.3 (PR #175), that introduces node20 dependency.
I suggest to re-release the update as v1.3 or even v2, because actually It is not a "minor fix".

@hendrikmuhs
Copy link
Owner

Thanks for the report.

Is this a custom runner image? CentOs7 is not listed at https://github.com/actions/runner-images.

To set expectations right: I can't provide any support for this.

that introduces node20 dependency

you probably saw the merge commit, the change is here: #177

The reason is this.

It is not a "minor fix".
You are right about classification, however I would also not make a big deal out of it. It should be totally fine for you to stay on version 1.2.11. I don't have the capacity to maintain 2 versions, especially as node 16 is EOL anyway. You might as well fix it on your side if you are using a custom runner.

Anyway, can you explain your setup in more detail and if possible provide logs?

Are you effected by this?
Can you confirm that you can execute node 18?

If so, downgrading to node 18 might be the better alternative. I would love to avoid maintaining 2 versions.

@firewave
Copy link

Also occurs with workflows utilizing older docker images.

See
https://github.com/danmar/cppcheck/actions/runs/7677304162/job/20926809147?pr=5917
danmar/cppcheck#5918

@firewave
Copy link

firewave commented Jan 28, 2024

It also seems like this caused a major slowdown in the Post ccache step - it now takes way above 2 minutes instead of being of a few seconds:

Before: https://github.com/danmar/cppcheck/actions/runs/7646437504/job/20835239981
After: https://github.com/danmar/cppcheck/actions/runs/7681894296/job/20935549647?pr=5918

The jobs fixated on the 1.2.11 version are not experiencing this and are much faster with way less data to cache:
https://github.com/danmar/cppcheck/actions/runs/7681894302/job/20935550032?pr=5918

@hendrikmuhs
Copy link
Owner

@firewave

You're 1st problem is similar to the reported one, you are running it inside a custom image. That image seems to lack node 20. It would be nice if you could find out if node 18 would work.

It would be technically better if you run the action outside of docker(on the official github runner) and use docker only for building. This requires exposing the host fs and setting the ccache path accordingly like it is done here.

There is no alternative to an upgrade of the nodejs version, node 16 is EOL as clearly stated by github. However we are flexible w.r.t. upgrading it to 18 vs. 20.

Your 2nd problem: It ran slower, because it did not found a cache. This seems like a different problem, which might or might not be caused by the upgrade of nodejs. For such cases it is best to let it run a 2nd time to see if the problem persists. It might be that the underlying github cache layer puts something more into the key. A 2nd run should get the cache from the 1st run.

@iakov
Copy link
Author

iakov commented Jan 28, 2024

@hendrikmuhs , we use containers for CentOS:7. Imho, there is no need to maintain two versions. Please, do not introduce such a dramatical change with the patch version increment next time :) Probably, there is no action actually needed ATM, because all poor projects, unhappy with the nodejs hidden version shift will stick with the 1.2.11 after reading this thread.

@hendrikmuhs
Copy link
Owner

hendrikmuhs commented Jan 28, 2024

@iakov

As said, there has never be a guarantee that custom runners work. All official github runners support node 20.

Repeating my question: Would node 18 work? - if you or anyone else affected can answer this question, I happily change this, otherwise it stays at it is. I don't have the resources to test this myself.

@iakov
Copy link
Author

iakov commented Jan 28, 2024

Thank you. Node18 does not help. Node18 requires newer GLIBC and is incompatible with Ubuntu 18.04 and other obsolete/EOL releases. The only thing I ask you is about versioning. We are happy to use unsupported older 1.2 version of your action as we do with checkout@v3, it works fine. I kindly as you to tag release as v2 or at least v1.3 for new features with new incompatible dependencies. However, maybe those few projects with broken CI and CD will come to this thread for solution and choose to stick with 1.2.11 until build server update.

@firewave
Copy link

I second to start a new version for this change and leave v1.2 on Node 16.

Your 2nd problem: It ran slower, because it did not found a cache. This seems like a different problem, which might or might not be caused by the upgrade of nodejs. For such cases it is best to let it run a 2nd time to see if the problem persists. It might be that the underlying github cache layer puts something more into the key. A 2nd run should get the cache from the 1st run.

It did find a cache. Otherwise it would not have been able to restore that and have any hits. Here's the same results from a main branch build after the version has been fixated for the docker-based workflows:

1.2.11: https://github.com/danmar/cppcheck/actions/runs/7687753684/job/20948156884
1.2.12: https://github.com/danmar/cppcheck/actions/runs/7687753675/job/20948156975

The stats are different because different ccache version are being used. So I wondered if this might just be an issue with the newer version of that but that is refuted by a build on a newer platform which also uses 1.2.11:
https://github.com/danmar/cppcheck/actions/runs/7687753684/job/20948157383

More 1.2.12 cases on various runners without docker involved:
https://github.com/danmar/cppcheck/actions/runs/7687753687

I do assume this is actually a Node issue and I will file an issue upstream. We should also track this in a different issue within this repo.

@hendrikmuhs
Copy link
Owner

hendrikmuhs commented Jan 29, 2024

@firewave

You might have a problem with cache eviction as I see many many jobs. Back to your initial post:

It did find a cache. Otherwise it would not have been able to restore that and have any hits.

This is from your run:

Run hendrikmuhs/[email protected]
Install ccache
Restore cache
  No cache found.
Configure ccache, linux

Source:

After: https://github.com/danmar/cppcheck/actions/runs/7681894296/job/20935549647?pr=5918

Update:

Or do you mean: It finds the cache now? I am not sure I fully understand. What we know so far is that after the update, the cache was not found.

The inner workings of the cache, meaning the github backend side (not this action, not ccache), are best to my knowledge not open/public source. I found other cases where it behaved strangely, maybe the node version is somehow backed into the key in an implicit way. We also had similar behaviour after github upgraded the client libraries. In summary it happens that a cache entry is for some reason missing, but usually it fixes itself with consecutive runs. This is just the usual SLA of a cache, but as long as it works most of the time, this seems fine.

@hendrikmuhs
Copy link
Owner

The parallel issue actions/runner#2906 reports some success after downgrade, however https://github.com/nodejs/node/blob/v18.x/BUILDING.md#platform-list clearly states the newer glibc for x64.

In general I advise not to use tag sliding. I slide the tag on release, but I consider this bad practice. The suggested practice is an explicit version in combination with dependabot.

The action has very little QA coverage especially when it comes to exotic setups.

@johnwason
Copy link

I am seeing the same issue with older Ubuntu docker images. Version @1.2.11 seems to work, but @v1 fails.

https://github.com/robotraconteur/robotraconteur/actions/runs/7691430275

@hendrikmuhs
Copy link
Owner

Alright, I think here is what we should do:

  • revert the switch to node 20 back to 16 and release it as 1.2.13 with tags for v1 and v1.2
  • add a note in the Readme
  • jump to node 18 on main and release it as v2 and go forward from there

If you want to help:

  • it would be nice to setup a test that executes the action in a custom environment, e.g. an ubuntu 16.04 container

@firewave
Copy link

Alright, I think here is what we should do:

  • revert the switch to node 20 back to 16 and release it as 1.2.13 with tags for v1 and v1.2
  • add a note in the Readme
  • jump to node 18 on main and release it as v2 and go forward from there

Agreed!

If you want to help:

  • it would be nice to setup a test that executes the action in a custom environment, e.g. an ubuntu 16.04 container

Can do.

@iakov
Copy link
Author

iakov commented Jan 31, 2024

I am ready to test new tags for obsolete builders. You are free to send a PRs with changes in this line https://github.com/trikset/trikRuntime/blob/master/.github/workflows/centos.yml#L73 , or I can do it by request.

firewave added a commit to danmar/cppcheck that referenced this issue Feb 6, 2024
#5952)

As the work on hendrikmuhs/ccache-action#178
and hendrikmuhs/ccache-action#179 has stalled
roll back all usages to the previous version for now. This prevents the
steps from running into a 2 minutes timeout.
@firewave
Copy link

firewave commented Jul 8, 2024

Even with fixating to version 1.2.11 the two minute hang is now present because the underlying runners have all been updated to Node 20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants