Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing federate interactions #178

Merged
merged 61 commits into from
Mar 24, 2023
Merged

Tracing federate interactions #178

merged 61 commits into from
Mar 24, 2023

Conversation

edwardalee
Copy link
Contributor

@edwardalee edwardalee commented Mar 9, 2023

This PR, co-authored by @ChadliaJerad, adds trace points to the interactions between federates and the RTI.

It also removes the tracing mechanism's dependency on the ability to create threads and uses the platform-independent critical section API instead.

edwardalee and others added 30 commits February 16, 2023 16:51
…ssages sent to RTI. But contents are not set yet. Federate tracing is also augmented to report the federation id in the reactor field. This will help in merging information from different lft files.
…fter the check. This reduces the discrepancies while matching communications events with the RTI.
…ld when calling federate tracing (not done yet when calling RTI tracing, because of a segmentation fault)
@ChadliaJerad
Copy link
Collaborator

With the federates, the trace functions are defined as empty macros when tracing is turned off. There is no need for any further check.

With the RTI, I think we should have just one RTI executable and that the -t command-line option should specify whether tracing is on or not. So I don't think compile-time checks make sense for the RTI.

Both are addressed in the two previous commits.
Still, for reducing the memory footprint, currently, RTI_TRACE is kept and used in trace.c.
Any suggestions @petervdonovan?

I think that in an unthreaded runtime, tracing can be implemented without threads. I think we can change the tracing code to use only the critical section features of platform.h, in which case, it will automatically work with both the threaded and unthreaded runtimes. But I have not tried to do this yet, so I could be underestimating the effort.

This PR #182 makes compiling pass. The generated trace file, however, is empty.
The reason is that _lf_flush_trace_thread is the thread who writes to the file. Since in an unthreaded execution (case of bare-metal), lf_thread_create will not do anything, then no thread is created.

I think that writing to the trace file needs to be part of the sequential execution.
What do you think @edwardalee and @erlingrj?

@petervdonovan
Copy link
Contributor

for reducing the memory footprint, currently, RTI_TRACE is kept and used in trace.c.
Any suggestions @petervdonovan?

I still stand by my suggestion here unless there is some reason why that suggestion (moving RTI-specific code to the federated directory instead of having it in trace.c) would not work.

@ChadliaJerad
Copy link
Collaborator

This PR #182 makes compiling pass. The generated trace file, however, is empty. The reason is that _lf_flush_trace_thread is the thread who writes to the file. Since in an unthreaded execution (case of bare-metal), lf_thread_create will not do anything, then no thread is created.

This breaks the CI though...

@ChadliaJerad
Copy link
Collaborator

Refactoring the tracing mechanism this way is much cleaner than in this PR. Thanks, Edward!
There is a longer overhead that appears from time to time though, due to flush_trace() call whenever the buffer is full.
Does it make sense to account for the execution time of flush_trace() and subtract it?
The aim is to make the maximum execution time of reactions agnostic to that overhead.

@ChadliaJerad
Copy link
Collaborator

for reducing the memory footprint, currently, RTI_TRACE is kept and used in trace.c.
Any suggestions @petervdonovan?

I still stand by my suggestion here unless there is some reason why that suggestion (moving RTI-specific code to the federated directory instead of having it in trace.c) would not work.

What do you think we should do here?
I think that having tracing in different files (each specific to a particular endpoint) may lead to inconsistencies when updating.
Otherwise, we can use FEDERATED only.

@petervdonovan
Copy link
Contributor

I think that having tracing in different files (each specific to a particular endpoint) may lead to inconsistencies when updating.

This is true. There are pros and cons.

I think its fine to consider this a matter of opinion and keep it as-is for now. We can always change this later.

@lhstrh lhstrh changed the title Tracing federate interactions. Tracing federate interactions Mar 22, 2023
Copy link
Member

@lhstrh lhstrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, but there are some minor unaddressed issues. Once fixed, let's merge.

core/federated/RTI/rti.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_lib.c Outdated Show resolved Hide resolved
core/trace.c Outdated Show resolved Hide resolved
core/trace.c Outdated Show resolved Hide resolved
core/trace.c Outdated Show resolved Hide resolved
core/trace.c Outdated Show resolved Hide resolved
core/trace.c Show resolved Hide resolved
@lhstrh lhstrh added the feature New feature label Mar 22, 2023
@edwardalee
Copy link
Contributor Author

edwardalee commented Mar 23, 2023

I think that having tracing in different files (each specific to a particular endpoint) may lead to inconsistencies when updating.

This is true. There are pros and cons.

I think its fine to consider this a matter of opinion and keep it as-is for now. We can always change this later.

As a practical matter, separating out the code for tracing of federated events is difficult because we would have to somehow split the enums that define event types. I’m sure it could be done, but I think it will complicate maintenance of the code and introduce opportunities for errors. So i vote for keeping the code together.

@edwardalee
Copy link
Contributor Author

Refactoring the tracing mechanism this way is much cleaner than in this PR. Thanks, Edward!
There is a longer overhead that appears from time to time though, due to flush_trace() call whenever the buffer is full.
Does it make sense to account for the execution time of flush_trace() and subtract it?
The aim is to make the maximum execution time of reactions agnostic to that overhead.

Oh yes, good point. It may not be necessary to subtract it, however. What if instead we ensure that the physical time of a tracepoint is always recorded after the flush for start events (reaction start, etc) and before the flush for end events? This suggests one more argument to the tracepoint function.

@lhstrh
Copy link
Member

lhstrh commented Mar 23, 2023

@ChadliaJerad, when a concrete code suggestion is given, you can also just click on "Commit suggestion" and save the time of copying the code, committing, and pushing. Also, once a comment has been addressed, please hit the "Resolve conversation" button (unless, of course, there is still more conversation to be had about the issue at hand). Thanks!

@edwardalee
Copy link
Contributor Author

This looks good overall, but there are some minor unaddressed issues. Once fixed, let's merge.

This looks good overall, but there are some minor unaddressed issues. Once fixed, let's merge.

I think all the issues have been addressed, but there is one that says "I agree with Peter. Let's handle this now, before merging." I have no idea what this refers to and there seems to be no way to mark it resolved.

@petervdonovan
Copy link
Contributor

The comment

I agree with Peter. Let's handle this now, before merging.

relates to this discussion on RTI-specific code in trace.c.

I read your comment and can kind of see why the change I requested might be difficult. I would like to avoid having a long discussion about this. If it turns out to be a problem later, then someone else can submit another PR.

Also, I think this minor comment might still be unresolved.

Copy link
Member

@lhstrh lhstrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: the failing test is due to the runner not being able to download Gradle. This problem comes up with Windows runners because its caching is broken (see lf-lang/lingua-franca#1664)

@edwardalee edwardalee merged commit b363e49 into main Mar 24, 2023
@edwardalee edwardalee deleted the tracing-federates branch March 24, 2023 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants