Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sof performance build enabling #6322

Closed
wants to merge 1 commit into from
Closed

Conversation

btian1
Copy link
Contributor

@btian1 btian1 commented Sep 22, 2022

Enable a specific performance profiling build to cover core/task/component
performance, this build use option -o app/perf_overlay.conf to enable.
it utilize perf_cnt module to measure performance.

Signed-off-by: btian1 [email protected]

@@ -185,6 +185,8 @@ def parse_args():
default=[], help="Paths to overlays")
parser.add_argument("-p", "--pristine", required=False, action="store_true",
help="Perform pristine build removing build directory.")
parser.add_argument("-s", "--perf", required=False, action="store_true",
help="enable performance build for core/task/module.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only 26 letters and I don't think we should spend one for each overlay. It barely saves any typing compared to -o app/perf<TAB> and it's an overlay that few people (if any) will use regularly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performance build is important, and for convinence, let's first use -s first, current only 7 chars are used, in future, if there is a
necessary, let's combine or use -o, for now, let's keep it, since this is the first time for enabling.

Copy link
Collaborator

@marc-hb marc-hb Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment it's important... to you. Once enough people complain that typing -o app/perf<TAB> takes too much of their time then we can discuss again. This seems unlikely to me; I expect most people to use the "up" arrow most of the time.

It's not just about using one out 26 characters. It's also about using one of the most common characters in English and also about making this build script longer when it should become shorter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me remove the changes and use -o option, when I made these changes, I am not realized that -o can help me achieve this.

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how many independent changes there here but for sure this requires more than a single commit https://wiki.openstack.org/wiki/GitCommitMessages#Structural_split_of_changes

EDIT: always start a PR as a Github draft. This avoids reviewers seeing so many build failures.

@plbossart
Copy link
Member

Signed-off-by: btian1 [email protected]

Please fix your email address. it's either @intel.com or @linux.intel.com, thanks.

@btian1
Copy link
Contributor Author

btian1 commented Sep 22, 2022

Thanks, Fixed

@btian1
Copy link
Contributor Author

btian1 commented Sep 22, 2022

Sorry, first time uploading to sof, there is some build errors, i will fix them EOW

@btian1 btian1 force-pushed the performance branch 2 times, most recently from db6fa8e to 55c200e Compare September 23, 2022 02:50
@fredoh9 fredoh9 self-requested a review September 23, 2022 03:22
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "performance build" sounds like a faster build. You mean a "Performance analysis" build or a "performance profiling" build, not a "performance" build.

Here are some terminology examples:
https://en.wikipedia.org/wiki/Perf_(Linux)
https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-target/linux-targets/compiler-switches-for-perf-analysis-linux-targets.html

uint64_t cpu_ts = \
(uint64_t)perf_cnt_get_cpu_ts(); \
(pcd)->plat_delta_last = (uint32_t)(plat_ts - (pcd)->plat_ts); \
(pcd)->cpu_delta_last = (uint32_t)(cpu_ts - (pcd)->cpu_ts); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No whitespace changes, they make it much harder to see what actually changed.

uint32_t plat_ts;
uint32_t cpu_ts;
uint64_t plat_ts;
uint64_t cpu_ts;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must explain real code changes like this one.

The commit message sounds like it's merely adding a new build configuration but this does more than that.

This most likely requires multiple commits https://wiki.openstack.org/wiki/GitCommitMessages#Structural_split_of_changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep these light at 32bit as 64 bit generates math generates many more instructions on 32bit ISAs. Fwiw, CCOUNT (cpu_ts) is 32 bit anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONFIG_TIMER_HAS_64BIT_CYCLE_COUNTER this was enabled already, current cpu_ts and plat_ts are all get from:
k_cycle_get_64

32 bit need more implementation to avoid overflow, it is just for profiling and only be enabled with -o app/perf_overlay.conf.

@btian1 btian1 force-pushed the performance branch 3 times, most recently from e949cdb to 21dc0ed Compare September 23, 2022 06:57
@btian1
Copy link
Contributor Author

btian1 commented Sep 23, 2022

Please review, the only left error is not introduced by this PR, it is already there before the rule(checkpatch_list.sh) created.

@@ -0,0 +1,2 @@
CONFIG_PERFORMANCE_COUNTERS=y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btian1 Can you split this patch into 2 separate patches with the first one to introduce the macros and the second one for enabling perf build?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Can you please fix your name in the sign-off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check, I already tried to fix sign off.
ok, seems change too much in one patch is not a good way , let me split based on you, marc, Liam's requirement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to keep the value NO?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrula-song this overlay is not used by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btian1 The signed-off is still wrong, you should have proper full name there.
Also, I recommend @ranj063 's idea of splitting to two patches. One to move the task perf tracking to new interface and a separate patch to add the performance build.

* TODO: Remove this once the bug gets fixed.
*/
#ifndef __ZEPHYR__
#if CONFIG_PERFORMANCE_COUNTERS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the above bug fixed by this commit ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, let me fix that comp_copy first, then split patch for merge, according to songchao, it caused by xcc compiler, will check the exact error and try to fix it.

@@ -40,7 +40,7 @@ struct comp_buffer *buffer_new(const struct sof_ipc_buffer *desc)
{
struct comp_buffer *buffer;

tr_info(&buffer_tr, "buffer new size 0x%x id %d.%d flags 0x%x",
tr_info(&buffer_tr, "perf_mem buffer new size 0x%x id %d.%d flags 0x%x",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put memory based profiling in another patch / PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btian1 We have heap debug separately and that is probably better place to track the memory usage.

@btian1
Copy link
Contributor Author

btian1 commented Sep 26, 2022

Thanks all for your review, let me first split it to 2 PRs, will take Rajani and Marc's comments, change perf_cnt as a separate commit.
due to this module will cause the CI failure, it is always there.

@lgirdwood
Copy link
Member

@singalsu can you work with @btian1 and fold in your recent updates here too. Thanks !

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll continue the review in next version. But looks good so far. We need to pay close attention to much trace output we change -- i.e. if there's no need to change formatting of a message, it should be kept. We also need to ensure we are still aligned with these enhancement requests:
#5518 and #5521

@@ -233,12 +233,14 @@ enum {
#endif /* #if defined(__ZEPHYR__) && defined(CONFIG_ZEPHYR_LOG) */

#define comp_perf_info(pcd, comp_p) \
comp_info(comp_p, "perf comp_copy peak plat %u cpu %u", \
comp_info(comp_p, "perf_cycle comp_copy peak plat %u cpu %u", \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know all users of these performance traces, so I'd keep the names unchanged unless the semantics change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -84,7 +84,7 @@ static void zephyr_domain_thread_fn(void *p1, void *p2, void *p3)

if (++runs == 1 << CYCLES_WINDOW_SIZE) {
cycles_sum >>= CYCLES_WINDOW_SIZE;
tr_info(&ll_tr, "ll timer avg %u, max %u, overruns %u",
tr_info(&ll_tr, "perf_cycle ll timer avg %u, max %u, overruns %u",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here I'd prefer to keep old naming unless there's a really good reason to change. We might have downstream users with scripts parsing this data, and this would be causing churn for them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent with keyword-value pairs. Use everywhere comma as separator or just blanks to help make automatic parsing scripts for trace log. I'd also align the previous perf traces to same format.

@@ -40,7 +40,7 @@ struct comp_buffer *buffer_new(const struct sof_ipc_buffer *desc)
{
struct comp_buffer *buffer;

tr_info(&buffer_tr, "buffer new size 0x%x id %d.%d flags 0x%x",
tr_info(&buffer_tr, "perf_mem buffer new size 0x%x id %d.%d flags 0x%x",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btian1 We have heap debug separately and that is probably better place to track the memory usage.

src/schedule/zephyr_ll.c Show resolved Hide resolved
@btian1
Copy link
Contributor Author

btian1 commented Sep 28, 2022

this patch depend on #6344, once #6344 merged, I will update a new patch set to cover above comments

@lgirdwood
Copy link
Member

lgirdwood commented Oct 3, 2022

@singalsu does your script update align with this PR and are all the data points you need available ?

@singalsu
Copy link
Collaborator

singalsu commented Oct 7, 2022

@singalsu does your script update align with this PR and are all the data points you need available ?

I need to try this myself to confirm it has everything needed. One known limitation is not in this PR but in IPC4 mtrace that does not provide component instance identifiers like sof-logger does. The topologies have multiple gain and mixer parts and we need to know for every instance separately the timer counts. So I added to traces additional identifier data but such should be only a temporary workaround.

@singalsu
Copy link
Collaborator

singalsu commented Oct 7, 2022

I get build fail with IPC4 (./sof/scripts/xtensa-build-zephyr.py -u -s -i IPC4 tgl):

In file included from /home/singalsu/work/current/sof/sof/src/ipc/ipc-helper.c:10:
/home/singalsu/work/current/sof/sof/zephyr/../src/include/sof/audio/component_ext.h: In function ‘comp_copy’:
/home/singalsu/work/current/sof/sof/zephyr/../src/include/sof/audio/component_ext.h:189: error: ‘__log_level’ undeclared (first use in this function)
/home/singalsu/work/current/sof/sof/zephyr/../src/include/sof/audio/component_ext.h:189: error: (Each undeclared identifier is reported only once
/home/singalsu/work/current/sof/sof/zephyr/../src/include/sof/audio/component_ext.h:189: error: for each function it appears in.)
/home/singalsu/work/current/sof/sof/zephyr/../src/include/sof/audio/component_ext.h:189: error: ‘__log_current_const_data’ undeclared (first use in this function)
/home/singalsu/work/current/sof/sof/zephyr/../src/include/sof/audio/component_ext.h:189: error: ‘__log_current_dynamic_data’ undeclared (first use in this function)

(uint32_t)((pcd)->plat_delta_peak), \
(uint32_t)((pcd)->cpu_delta_peak))

#define comp_perf_avg_info(pcd, comp_p) \
comp_info(comp_p, "perf comp_copy cpu avg %u (current peak %u)",\
comp_info(comp_p, "perf_cycle comp_copy samples %u period %u cpu avg %u peak %u",\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one extra blank in " peak". It's nicer for automatic parsing if there's no need to remove multiple successive blanks. It's good that you use only blanks as separators and removed the above () characters.

@singalsu
Copy link
Collaborator

singalsu commented Oct 7, 2022

@singalsu does your script update align with this PR and are all the data points you need available ?

This patch works OK with my script with IPC3 build. Here's an example of a playback to PCM0P device with sof-hda-generic.tplg.

Screenshot from 2022-10-07 16-15-39

@btian1 btian1 force-pushed the performance branch 4 times, most recently from 424919a to 1071675 Compare October 10, 2022 01:35
@btian1 btian1 requested review from singalsu and removed request for mrajwa October 10, 2022 07:06
@@ -0,0 +1,2 @@
CONFIG_PERFORMANCE_COUNTERS=y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, start commit message sentences with capital letters. Doesn't checkpatch complain about this.

Enable a specific performance profiling build to cover core/task/component
performance, this build use option -o app/perf_overlay.conf to enable.
it utilize perf_cnt module to measure performance.

Signed-off-by: btian1 <[email protected]>
comp_info(comp_p, "perf comp_copy cpu avg %u (current peak %u)",\
comp_info(comp_p, "perf comp_copy samples %u period %u cpu avg %u peak %u",\
(uint32_t)((comp_p)->frames), \
(uint32_t)((comp_p)->period), \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frames and period are defined as uint32_t in src/include/sof/audio/component.h. Why are they cast to uint32_t? Casts hide bugs from the compiler, we want fewer casts not more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW renaming comp_p to comp_dev_p would save a lot of time to people new to this code (which involves an unreasonable number of macro indirections for "functional programming in C")

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some unresolved issues in review, please see inline.

@@ -0,0 +1,2 @@
CONFIG_PERFORMANCE_COUNTERS=y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btian1 The signed-off is still wrong, you should have proper full name there.
Also, I recommend @ranj063 's idea of splitting to two patches. One to move the task perf tracking to new interface and a separate patch to add the performance build.

@btian1
Copy link
Contributor Author

btian1 commented Oct 12, 2022

due to this patch cover too much, I closed this PR and submit 3 new PR for your review, please goto:
#6406
#6407
#6408

to continue your review, this PR is closing.

Thanks
Tim

@btian1 btian1 closed this Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants