-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
instance churn causes xvfb to occasionally hang #561
Comments
Quick update: I was going through the CircleCI documents to make sure the above wasn't caused by something silly. Per the documentation, CircleCI uses To that end, I think this might be |
Now that I know what to look for, if I'm reading this correctly, the Chromium team has already hit this. It appears to be an issue with forking and a memory allocation deadlock in one of the underlying libraries. I'm going to take a swing at a workaround. |
👍 to spending some time researching this. That said, I think it’s probably a given that, since every instantiation of Given that Chromium has also hit this, if we add a pooling mechanism as above, that also gives us a way to cap the number of Nightmare instances running simultaneously. We can just queue up the rest to run when a window becomes available. |
Consider the time spent. :) I've got a fix working locally: I can churn through hundreds of instances and finish without a problem - a huge improvement over the average failure point of a dozen or two. I also still need to verify CircleCI will use my patch to execute tests. Once I'm reasonably confident that's working, I'll open a PR for review. By the by, the fix opens up an avenue of problems having to do with #502. That's probably a topic best left for that issue or for the eventual PR, and I'll be doing a writeup in one of those spots at some point. This isn't necessarily a resource problem. Having more memory or a faster CPU would likely mask the problem, but wouldn't eliminate it entirely. It has more to do with the underpinnings of (Warning: this thread is about to go offtopic.) That said, I completely agree with you: spinning up multiple Electron instances gets resource-intensive in a hurry. I also agree that Nightmare should probably be what amounts to a That approach has a couple of small hurdles:
I'd also have a couple of questions:
That's all I can think of off the top of my head. I'm willing to dig into the above and start working through some of the implementation bits as time permits. Thoughts? |
PS: the comment I couldn't find? I just found. Of course the second I say I can't find it, I happen to spot it. It's buried in the middle of #553. |
Iiiinteresting. All 👂👂👂
Oh, totally; didn’t mean to imply it was! Just that our heavy resource use probably exacerbates 😬
Yup.
Well, this is kind of funny. It turns out that, if you use the command line to launch a given app, e.g. Due to the way it’s written, the solution is to simply add our own
Insofar as the test suite confines itself to public methods/properties on
Indeed :D I have a private laundry list of things I’d love to see changed in Nightmare, but the only one that is a Big Deal™ (aka the one that gives me that good old “is this a jenga tower” feel [I kid, but still]) is that one. However! I think that issue is largely independent of this one (save that care should be taken not to make it worse).
No. I don’t think switching from “N electrons, 1 window” to “1 electron, N windows” should have any externally visible impact on the API (save for private, non-documented bits). The rule that an instance of
No idea, but:
Back in #479 I wrote:
The current screenshot API should be much lighter, but I’d guesstimate ~50 MB per window on my machine (13" retina MacBook Pro, 64 bit). That figure will definitely vary across hardware and OS. |
TL;DR: Make isn't the friendliest thing to Windows. There's a night sunk into VMs and tinkering that I haven't had time to do yet.
It aaaaabsolutely does. The results I get under chroot on my Chromebook (4gb of ram with a Tegra K1 processor) vs my main development box (16gb/i5) are pretty different. :)
Hhhhhuh. The more you know. 🌠 I had seen the event before, but I hadn't realized that Electron internally deals with it in that way, and the documentation is ... misleading? Thanks for the education, and good eye!
I think the only time internal methods - and specifically,
Oooh, I'm not sure I agree.
True. I have a foul habit of "since we're going to have this down to the studs anyway, why not also do [insert stupidly complex thing]?" Reorganizing how Nightmare and Electron interact seems like a good time to think about simultaneous calls/multiple runs/handling messaging spaghetti.
Mmmm, don't know that I agree. The "window manager" would need to expose methods for listing, retrieving, creating, and closing windows. One of the biggest side perks to doing this is being able to handle pages that open new tabs/windows: when a new window is created from the browser, it could then be created with the Nightmare sugar and registered with the window manager. I guess my point is that windows are not just created by users, they can also be created by the site, and both cases should probably be handled.
Deathly curious what the memory footprint difference is. I'd like to put together something about that, if nothing else for justifying the change.
Strongly agree.
I also like to live dangerously.
Not that I'm against a high-fallutin' implementation like this, but I worry that then you're at the whim and mercy of the OS and may introduce cross-platform problems that are hard to debug/track down. At least for now, I'm in favor of keeping it simple.
(Emphasis mine.) Per |
Well, if I understand it right, that all only applies for “non-bundled” stuff. So if your electron app is sitting inside electron’s resources folder (as it is when a standalone app is all bundled up for distribution),
I guess I was thinking this would be the role of some component that is entirely private—a new nightmare instance would ask that manager for a communications channel to a window when created and would close the channel/tell the manager it’s done when
Ah, hadn’t thought about child windows. This would still necessarily require new API for interacting with those child windows, though, right? Or I guess you could have an event for
+ 💯
Toooootally. Said much more clearly than my “wobbly heuristics” note :P
I have totally forgotten! I think my “20-30 MB normally” and “50-300 MB screenshotting” estimates were a combined 1 Electron + 1 window, so I guess each window would be a bit lighter (but I’m also guessing not a lot—the electron app process is probably relatively light). I also don’t remember whether this was with an essentially blank page or something fairly busy like yahoo.com. |
Not totally disagreeing. I think there are really two different things tied up in #493—one is about pure safety (making sure that two simultaneous attempts to perform a routine in electron’s processes don’t wind up with crossed wires), which fits in more tightly with the work we’re talking about here. The other is whether the overall behavior/API should change to make those kinds of potential crossed-wire situations less likely to occur. That one is a big enough deal all on its own that I think it’s worth separating out. Both of those changes and this one are all potentially high impact, though. Ideally this process/window management work won’t have any API impact for users, but I suspect there’s a reasonable likelihood it might change the story for plugins. Whatever we do with IPC will almost undoubtedly affect both plugins and end-user usage. It might be worth releasing all these changes together as a major revision given they all might be breaking changes. |
I wasn't sure where to put this - a new issue felt the most logical.
Digging into the test failures that happen intermittently, it looks like the behavior that causes the tests to fail also will present in regular applications (albeit seemingly not as often). Consider the following (extremely contrived) case:
This example creates a lot of instance churn, causing Electron instances to be spun up and killed, much like how the unit test suite works. After running many times (I've had it happen in as few as 8 and as many as ~450 iterations, seems to be luck of the draw), Nightmare will hang. In a test context, this will cause Mocha to timeout, which at least partially explains the behavior with the intermittent test failures.
The hang appears to happen when a new
BrowserWindow
instance is created. EnablingELECTRON_ENABLE_LOGGING
andELECTRON_ENABLE_STACK_DUMPING
doesn't yield useful information. Piping Electron output directly to the parent processstd*
buffers also doesn't yield anything terribly interesting.What I'd like to try next:
node-inspector
working with Electron. So far, my attempts to get a debugger attached have been met with limited success.BrowserWindow
churn natively under Electron to see if the behavior can be recreated internally to Electron.I'm open to suggestions and ideas on all points.
Some other peculiar (possibly unrelated?) things that cropped up as a result:
.kill()
call is made because the window is destroyed. I'll be pulling together a small bugfix for that as time permits. (I don't think it has any bearing on how the rest of the project behaves, but it certainly can't hurt to fix.)close
event case for a clean exit never happens unlesswin.close
orwin.destroy
are manually called. In other words, you'll never seeelectron child process exited with code 0: success!
.The text was updated successfully, but these errors were encountered: