Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: state of _stackprof.last_sample_at can leak between invocations #220

Open
dalehamel opened this issue Nov 10, 2023 · 0 comments
Open

Comments

@dalehamel
Copy link

dalehamel commented Nov 10, 2023

We noticed a very peculiar profile:

Screenshot 2023-11-10 at 9 56 27 AM

The top frame is StackProf.start. It is supposedly occupying 20% of the overall aggregated profile time (~10 out of ~50 minutes). How can this be possible?

Checking the code, it looks like it is technically possible that we can have a signal be received within stackprof_start, as the signal handler is installed but there are a few lines left to execute before we return:

https://github.com/tmm1/stackprof/blob/master/ext/stackprof/stackprof.c#L204-L234

Many of the other variables on _stackprof struct are cleared between runs, but notably, last_sample_at is never reset within stackprof_results

Since we never explicitly acquire and relinquish the GVL within stackprof_start, and never reset the value, this is my theory as to what happened:

  • stackprof run A occurs, and on its final sample collection, sets last_sample_at, t_0
  • Some time passes
  • stackprof run B occurs, and during the initialization in stackprof_start, between the installation of the signal handler and before we reset last_sample_at to the current time, the ruby VM swaps to another thread
  • Enough wall clock time is spent on the other thread that our timer fires, and we collect a sample
  • When calculating the weight of the sample, we use t_0 as the base, instead of t_1, because it has not been reset yet
  • Execution returns and we finish executing stackprof_start, and last_sample_at is set to t_1, the current time.

I'm not sure if the above is plausible in ruby's threading model, but it's the best I could come up with^.

I was very surprised to see StackProf.start at all in profiles, let alone so prominently. But from what I can tell, the above scenario is a plausible explanation for how this could occur.

Looking at the same interval of time, we do not see it appear in :cpu mode profiles collected. This makes sense though, as those don't use timestamps for their weighting.

The solution to this I think would be some combination of:

  • Acquire the GVL during stackprof_start and release it just before we return
  • Reset the last_sample_at during the results function, as we do with other variables on the _stackprof struct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant