-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sof performance build enabling #6322
Conversation
scripts/xtensa-build-zephyr.py
Outdated
@@ -185,6 +185,8 @@ def parse_args(): | |||
default=[], help="Paths to overlays") | |||
parser.add_argument("-p", "--pristine", required=False, action="store_true", | |||
help="Perform pristine build removing build directory.") | |||
parser.add_argument("-s", "--perf", required=False, action="store_true", | |||
help="enable performance build for core/task/module.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only 26 letters and I don't think we should spend one for each overlay. It barely saves any typing compared to -o app/perf<TAB>
and it's an overlay that few people (if any) will use regularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performance build is important, and for convinence, let's first use -s first, current only 7 chars are used, in future, if there is a
necessary, let's combine or use -o, for now, let's keep it, since this is the first time for enabling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment it's important... to you. Once enough people complain that typing -o app/perf<TAB>
takes too much of their time then we can discuss again. This seems unlikely to me; I expect most people to use the "up" arrow most of the time.
It's not just about using one out 26 characters. It's also about using one of the most common characters in English and also about making this build script longer when it should become shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let me remove the changes and use -o option, when I made these changes, I am not realized that -o can help me achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how many independent changes there here but for sure this requires more than a single commit https://wiki.openstack.org/wiki/GitCommitMessages#Structural_split_of_changes
EDIT: always start a PR as a Github draft. This avoids reviewers seeing so many build failures.
Please fix your email address. it's either @intel.com or @linux.intel.com, thanks. |
Thanks, Fixed |
Sorry, first time uploading to sof, there is some build errors, i will fix them EOW |
db6fa8e
to
55c200e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "performance build" sounds like a faster build. You mean a "Performance analysis" build or a "performance profiling" build, not a "performance" build.
Here are some terminology examples:
https://en.wikipedia.org/wiki/Perf_(Linux)
https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-target/linux-targets/compiler-switches-for-perf-analysis-linux-targets.html
xtos/include/sof/lib/perf_cnt.h
Outdated
uint64_t cpu_ts = \ | ||
(uint64_t)perf_cnt_get_cpu_ts(); \ | ||
(pcd)->plat_delta_last = (uint32_t)(plat_ts - (pcd)->plat_ts); \ | ||
(pcd)->cpu_delta_last = (uint32_t)(cpu_ts - (pcd)->cpu_ts); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No whitespace changes, they make it much harder to see what actually changed.
xtos/include/sof/lib/perf_cnt.h
Outdated
uint32_t plat_ts; | ||
uint32_t cpu_ts; | ||
uint64_t plat_ts; | ||
uint64_t cpu_ts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must explain real code changes like this one.
The commit message sounds like it's merely adding a new build configuration but this does more than that.
This most likely requires multiple commits https://wiki.openstack.org/wiki/GitCommitMessages#Structural_split_of_changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets keep these light at 32bit as 64 bit generates math generates many more instructions on 32bit ISAs. Fwiw, CCOUNT (cpu_ts) is 32 bit anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONFIG_TIMER_HAS_64BIT_CYCLE_COUNTER this was enabled already, current cpu_ts and plat_ts are all get from:
k_cycle_get_64
32 bit need more implementation to avoid overflow, it is just for profiling and only be enabled with -o app/perf_overlay.conf.
e949cdb
to
21dc0ed
Compare
Please review, the only left error is not introduced by this PR, it is already there before the rule(checkpatch_list.sh) created. |
@@ -0,0 +1,2 @@ | |||
CONFIG_PERFORMANCE_COUNTERS=y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@btian1 Can you split this patch into 2 separate patches with the first one to introduce the macros and the second one for enabling perf build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Can you please fix your name in the sign-off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me check, I already tried to fix sign off.
ok, seems change too much in one patch is not a good way , let me split based on you, marc, Liam's requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to keep the value NO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrula-song this overlay is not used by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* TODO: Remove this once the bug gets fixed. | ||
*/ | ||
#ifndef __ZEPHYR__ | ||
#if CONFIG_PERFORMANCE_COUNTERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the above bug fixed by this commit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet, let me fix that comp_copy first, then split patch for merge, according to songchao, it caused by xcc compiler, will check the exact error and try to fix it.
src/ipc/ipc-helper.c
Outdated
@@ -40,7 +40,7 @@ struct comp_buffer *buffer_new(const struct sof_ipc_buffer *desc) | |||
{ | |||
struct comp_buffer *buffer; | |||
|
|||
tr_info(&buffer_tr, "buffer new size 0x%x id %d.%d flags 0x%x", | |||
tr_info(&buffer_tr, "perf_mem buffer new size 0x%x id %d.%d flags 0x%x", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put memory based profiling in another patch / PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@btian1 We have heap debug separately and that is probably better place to track the memory usage.
Thanks all for your review, let me first split it to 2 PRs, will take Rajani and Marc's comments, change perf_cnt as a separate commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll continue the review in next version. But looks good so far. We need to pay close attention to much trace output we change -- i.e. if there's no need to change formatting of a message, it should be kept. We also need to ensure we are still aligned with these enhancement requests:
#5518 and #5521
src/include/sof/audio/component.h
Outdated
@@ -233,12 +233,14 @@ enum { | |||
#endif /* #if defined(__ZEPHYR__) && defined(CONFIG_ZEPHYR_LOG) */ | |||
|
|||
#define comp_perf_info(pcd, comp_p) \ | |||
comp_info(comp_p, "perf comp_copy peak plat %u cpu %u", \ | |||
comp_info(comp_p, "perf_cycle comp_copy peak plat %u cpu %u", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know all users of these performance traces, so I'd keep the names unchanged unless the semantics change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/schedule/zephyr_domain.c
Outdated
@@ -84,7 +84,7 @@ static void zephyr_domain_thread_fn(void *p1, void *p2, void *p3) | |||
|
|||
if (++runs == 1 << CYCLES_WINDOW_SIZE) { | |||
cycles_sum >>= CYCLES_WINDOW_SIZE; | |||
tr_info(&ll_tr, "ll timer avg %u, max %u, overruns %u", | |||
tr_info(&ll_tr, "perf_cycle ll timer avg %u, max %u, overruns %u", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here I'd prefer to keep old naming unless there's a really good reason to change. We might have downstream users with scripts parsing this data, and this would be causing churn for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent with keyword-value pairs. Use everywhere comma as separator or just blanks to help make automatic parsing scripts for trace log. I'd also align the previous perf traces to same format.
src/ipc/ipc-helper.c
Outdated
@@ -40,7 +40,7 @@ struct comp_buffer *buffer_new(const struct sof_ipc_buffer *desc) | |||
{ | |||
struct comp_buffer *buffer; | |||
|
|||
tr_info(&buffer_tr, "buffer new size 0x%x id %d.%d flags 0x%x", | |||
tr_info(&buffer_tr, "perf_mem buffer new size 0x%x id %d.%d flags 0x%x", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@btian1 We have heap debug separately and that is probably better place to track the memory usage.
@singalsu does your script update align with this PR and are all the data points you need available ? |
I need to try this myself to confirm it has everything needed. One known limitation is not in this PR but in IPC4 mtrace that does not provide component instance identifiers like sof-logger does. The topologies have multiple gain and mixer parts and we need to know for every instance separately the timer counts. So I added to traces additional identifier data but such should be only a temporary workaround. |
I get build fail with IPC4 (
|
src/include/sof/audio/component.h
Outdated
(uint32_t)((pcd)->plat_delta_peak), \ | ||
(uint32_t)((pcd)->cpu_delta_peak)) | ||
|
||
#define comp_perf_avg_info(pcd, comp_p) \ | ||
comp_info(comp_p, "perf comp_copy cpu avg %u (current peak %u)",\ | ||
comp_info(comp_p, "perf_cycle comp_copy samples %u period %u cpu avg %u peak %u",\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one extra blank in " peak". It's nicer for automatic parsing if there's no need to remove multiple successive blanks. It's good that you use only blanks as separators and removed the above () characters.
This patch works OK with my script with IPC3 build. Here's an example of a playback to PCM0P device with sof-hda-generic.tplg. |
424919a
to
1071675
Compare
@@ -0,0 +1,2 @@ | |||
CONFIG_PERFORMANCE_COUNTERS=y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, start commit message sentences with capital letters. Doesn't checkpatch complain about this.
Enable a specific performance profiling build to cover core/task/component performance, this build use option -o app/perf_overlay.conf to enable. it utilize perf_cnt module to measure performance. Signed-off-by: btian1 <[email protected]>
comp_info(comp_p, "perf comp_copy cpu avg %u (current peak %u)",\ | ||
comp_info(comp_p, "perf comp_copy samples %u period %u cpu avg %u peak %u",\ | ||
(uint32_t)((comp_p)->frames), \ | ||
(uint32_t)((comp_p)->period), \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frames
and period
are defined as uint32_t
in src/include/sof/audio/component.h
. Why are they cast to uint32_t
? Casts hide bugs from the compiler, we want fewer casts not more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW renaming comp_p
to comp_dev_p
would save a lot of time to people new to this code (which involves an unreasonable number of macro indirections for "functional programming in C")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still some unresolved issues in review, please see inline.
@@ -0,0 +1,2 @@ | |||
CONFIG_PERFORMANCE_COUNTERS=y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable a specific performance profiling build to cover core/task/component
performance, this build use option -o app/perf_overlay.conf to enable.
it utilize perf_cnt module to measure performance.
Signed-off-by: btian1 [email protected]