Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suspend/resume profiler #968

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

foxtran
Copy link
Contributor

@foxtran foxtran commented Jan 14, 2025

This patch implements suspend/resume calls for profiler that stops generating data for server. It is useful for long-living applications which can produce extremely huge data.

Users must be careful about usage TracySuspend/TracyResume inside of zones.

I have fixed couple bugs with proper exit of Tracy client.

Closes #952

@wolfpld
Copy link
Owner

wolfpld commented Jan 14, 2025

I don't really see how this could work correctly in a multithreaded application.

@foxtran
Copy link
Contributor Author

foxtran commented Jan 14, 2025

It just stops collecting of data from Tracy calls for whole application, so it works fine.

For example, there is 4-threaded OpenMP application, which I'm using for testing:

image

Client does not get Tracy events and therefore it is just a long bar comes after TracySuspend call. When TracyResume was called, Tracy works as before.

P.S. I still did not get why main thread does not have OpenMP sections :(

@wolfpld
Copy link
Owner

wolfpld commented Jan 14, 2025

It just stops collecting of data from Tracy calls for whole application, so it works fine.

This is exactly why it won't work fine.

T1: suspend profiling
T2: enter zone, ignore event
T1: resume profiling
T2: leave zone, send event

At this point the state on client and server is desynchronized and things can't work properly.

@foxtran
Copy link
Contributor Author

foxtran commented Jan 14, 2025

Yep, that is true. Luckily, some applications (mostly, HPC) use control-flow thread only on which suspending/resuming can happen.
So,
T1: suspends profiling
T1: spawns T2-Tn
T1-Tn: do collective work
T1-Tn: may enter and exit from zones (according to zone scope rules)
T2-Tn: die (actually, just sleeping)
T1: resumes profiling

@foxtran
Copy link
Contributor Author

foxtran commented Jan 14, 2025

If one will add some unique info about zones that would be unique for each zone (even the same in cycle), it will be possible to improve understanding by GUI app, that zone was created in suspend mode, but exited in resumed. Unfortunately, it will not help, if zone was created in resumed mode and ended in suspended.

@wolfpld
Copy link
Owner

wolfpld commented Jan 14, 2025

It seems to me like what you are looking for here is the active parameter in the macros.

@foxtran
Copy link
Contributor Author

foxtran commented Jan 14, 2025

active is not a solution, since it still produces a lot of data (mostly from callstacks, I think).

@wolfpld
Copy link
Owner

wolfpld commented Jan 14, 2025

Callstack collection should be paused when on demand mode is used and no connection is established. You should be able to extend this to you use case.

@foxtran
Copy link
Contributor Author

foxtran commented Jan 14, 2025

These is an example how I'm using Tracy on HPC cluster. It would be a bit hard to detect a right time when to start data collection (more specifically, start tracy-capture) and when to stop for avoiding out-of-memory errors from logs of application. I think I can start tracy-capture from my app for debugging itself :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C API: manual lifetime: pause and resume profiler
2 participants