-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to repro memory out of bounds errors in CI #1199
Conversation
Memory access out of bounds error:
It's very possible this is caused by the recent memory leak workaround, but is it possible this a real out of bounds memory access that was previously masked by the fact that adjacent memory was not truly freed? To test this, I'm going to try commenting out the "free()" call in wasm_memory_storage, and see what happens. If no tests run out of memory (except the OOM test I just added) and the failing test is no longer triggering an out of bounds error, that may be a clue. |
3ea3a86
to
dab7060
Compare
Interesting. When I commented out the free() call in wasm_memory_storage.c, the memory access out of bounds test failure goes away. With free() in wasm_memory_storage.c, the test failure is due to access out of bounds. From an earlier CI run:
Without free() is wasm_memory_storage.c, the only test failure is the Out of Memory test we just added. From the most recent CI run:
Unless the access out of bounds error has something to do with wasm_memory_storage's free() call itself, maybe there is a real access out of bounds condition. What was previously accessing leaked memory could now be attempting to access freed memory. I'm trying to keep my mind open, but why else would stopping free() silence the access out of bounds error? At least two possibilities:
|
Note: I added a commit to hopefully log OS and node version but found afterward that the info is already linked to in CI job "Set up job" section. This includes the OS version and various software versions, including Node.js. |
In my testing with whatever version of Emscripten is installed with homebrew (3.1.56-git), it turns out that both assigning out of bounds and free()ing out of bounds trigger I alternately commented out the out of range assignment and free() to test in the example below. #include <stdio.h>
#include <malloc.h>
int main() {
char* ptr = malloc(8 * 1024 * 1024);
printf("Ptr: %p\n", ptr);
char* ptr_oob = ptr + (17 * 1024 * 1024);
printf("Oob: %p\n", ptr_oob);
*ptr_oob = 'A';
//free(ptr_oob);
return 0;
} I am trying to reproduce this by running the CI workflow locally using |
One thing that keeps bugging me is what's up with these socket connections in the stack trace? The crash stems from the I wonder if
|
@adamziel, that's helpful! 😄 The socket connections seemed strange, but since Playground is new to me, I wasn't certain they were indeed strange. That's something more to think about. Will try to log each call to original functions from instrumentWasmExports(). Perhaps that will give us a hint about what is triggering this. |
5a147a8
to
cf0dc4e
Compare
^ Hmm... not sure how this review request happened. This is just a PR for trying whatever we can to obtain more info about how we're getting a "memory access out of bounds" error in CI unit tests. I tried:
So far, I haven't learned much more than we already knew. The most interesting thing I noticed from the instrumentation was that there is an apparently anonymous PHP function call not too long before the out of bounds access error. I'm going to try upgrading building with a more up-to-date Emscripten version to see if that has any effect. I doubt it will, but it is worth a try. After that, I plan to let this lie for a couple days of AFK and then resume as able on Tuesday. |
2b4f00b
to
350f625
Compare
A couple more things to explore:
|
The network call is:
The error is gone when DISABLE_WP_CRON is set to true. |
Aha, it's not about the DISABLE_WP_CRON constant. It's about having the it('should set the site option', async () => {
// await defineWpConfigConsts(php, {
// consts: {
// TEST: 1,
// },
// });
await setSiteOptions(php, {
options: {
blogname: 'My test site!',
},
});
console.log('after setSiteOptions');
const response = await php.run({
code: `<?php
require '/wordpress/wp-load.php';
echo get_option('blogname');
`,
});
expect(response.text).toBe('My test site!');
}); |
da3afdd
to
350f625
Compare
CI in this PR now seems to be failing with these errors:
I'm not sure what that's all about – it works in #1212 and, other than the newer emscripten version, I don't see any significant changes between 5a6343a and 350f625. I wonder if the version upgrade changed anything significant in the FS handling – or perhaps one of our "regex patches" no longer applies cleanly? |
@brandonpayton a few questions that came to mind:
You may call JavaScript code from C, too, to inspect the HEAP directly or print stack traces. |
@adamziel Sorry, I was trying a bunch of different things on Saturday and could have left this in a better state for you and others to look at. This is just an error seen after updating Emscripten, and it is probably a distraction (though very interesting for later :). The point of updating Emscripten was just a thing to try at the end of the day to see if it affected anything. Though if it did avoid the error, we couldn't be sure it wasn't just hiding the problem for the moment. If I have time later today, I'll try to restore where this branch was with instrumented JS and PHP, so you can more easily see the CI output for that.
Yeah... that's something we can test sometime apart from this memory-related fix. It was just something to try to see if any behavior changed or if the memory handling caught this earlier or provided more info on the crash.
When wasm_memory_storage.c was instrumented to log the start and end of all chunk_alloc() and chunk_free() calls, I didn't see anything amiss. Here are the wasm_memory_storage logs from one CI run: One thing I saw in test is that tests passed after commenting out free() in wasm_memory_storage. This could mean that there is some other issue in Playground that was relying upon memory never being freed by munmap(). Or it might be something else. But based on the logs above, the crash hasn't appeared to occur within wasm_memory_storage.c. After I restore this branch to using our current Emscripten version and the instrumented JS and PHP, I plan to test this again. Sometimes, CI tests don't catch the "memory access out of bounds" error, but most of the time they have. So I'd like to re-run the tests a bit with the free() removed to be more certain it's not a fluke.
This is possible, but the logs above don't show that.
The logs show the addresses, but I'd like to additionally have The only odd thing I see here is this call to
I've wondered this too and just made a note to look and consider this further. Maybe there are other places we can instrument with log messages. The hardest thing in nailing this down is that we've only been able to reproduce this with GitHub CI. I tried using act to run the CI workflow locally, but it passed with no errors. I've thought about setting up a self-hosted GitHub actions runner to see if local repro was possible there.
Thank you for the link! I wasn't sure what was possible here. One thing I really want is what address is out of bounds. That could help us know whether it is part of previously allocated space or just something else, like a totally bad address that is far out of bounds. |
With trying multiple things within this PR, especially with some of the force-pushing that rewrote previous history and hid associated test runs, it turned out to be very hard for anyone to really follow all the things I've tried so far. And it was even hard for me to follow because links to previous CI runs were hidden after force push. I think we should probably close this and use dedicated PRs for each of the approaches. We can leave multiple investigatory PRs open at the same time and just keep documenting the findings under the original issue #1128. For example, the following would be separate PRs:
I am closing this and working on creating more focused PRs to help nail this problem down with less confusion. |
This is a test PR to hopefully reproduce memory access out of bounds errors with #1189.
So far, @adamziel has only encountered errors in CI (see #1194), but there was one error report in WP.org Slack:
https://wordpress.slack.com/archives/C06Q5DCKZ3L/p1712232849960669