Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary statistics for checkers #4018

Closed
wants to merge 4 commits into from

Conversation

cservakt
Copy link
Collaborator

It can be used to see the number of reports per checker for parse command. Before storing it is helpful to verify with the --summary flag, which checkers are generating too many reports in a large report directory. The result is a table that has a checker name and number of reports columns. It shows the results in descending order.

Example command: CodeChecker parse reports/ --summary

The result table would be:

---==== Checkers Summary Statistics ====----
-------------------------------------------------------------------------------
Checker name                                                | Number of reports
-------------------------------------------------------------------------------
readability-avoid-const-params-in-decls                     |            266836
modernize-use-trailing-return-type                          |            255116
readability-magic-numbers                                   |            138216
modernize-avoid-c-arrays                                    |            116741
-------------------------------------------------------------------------------
----=================----

@cservakt cservakt added CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands new feature 👍 New feature request labels Sep 22, 2023
@cservakt cservakt added this to the release 6.23.0 milestone Sep 22, 2023
@cservakt cservakt requested a review from Szelethus September 22, 2023 12:58
Copy link
Contributor

@bruntib bruntib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a useful feature, so thanks for the development. However, I'm not sure if its current implementation is a good direction.

This solution relies on the fact that grep utility is installed.

Another problem is that the output is printed in a custom format by an additional code. If I would search the implementation where this new summary table is assembled and printed, then I started with the files where all other statistics are put together: https://github.com/Ericsson/codechecker/blob/master/tools/report-converter/codechecker_report_converter/report/statistics.py

As far as I know, the goal of this patch is to provide an option which prints statistics quickly without parsing the .plist files. The implementation could use the metadata.json. This file could contain some statistics data that could be used. I'm not too familiar with the report-converter tool, but after some quick search I found that it knows about the metadata file. It also writes its content. So it is not an out-of-scope thing for this tool to read it too. So the Statistics class could either fetch its data from the metadata.json or fallback to the .plist files.

What do you think of this direction? Maybe we could continue some consultation about it, and in the meantime I'll get familiar with the report-converter.
Thank you!

@cservakt cservakt added the WIP 💣 Work In Progress label Sep 25, 2023
@cservakt cservakt requested a review from dkrupp as a code owner September 26, 2023 13:17
@cservakt cservakt force-pushed the checker-summary branch 4 times, most recently from 695d3ea to 9b564df Compare September 28, 2023 13:41
@cservakt cservakt requested a review from bruntib September 28, 2023 13:59
It can be used to see the number of reports per checker for 'parse' command. Before storing it is helpful to verify with the '--summary' flag, which checkers are generating too many reports in a large report directory. The result is a table that has a checker name and number of reports columns. It shows the results in descending order.
Example command: 'CodeChecker parse reports/ --summary'
the result table would be:
---==== Checkers Summary Statistics ====----
-------------------------------------------------------------------------------
Checker name                                                | Number of reports
-------------------------------------------------------------------------------
readability-avoid-const-params-in-decls                     |            266836
modernize-use-trailing-return-type                          |            255116
readability-magic-numbers                                   |            138216
modernize-avoid-c-arrays                                    |            116741
-------------------------------------------------------------------------------
----=================----
Copy link
Contributor

@Szelethus Szelethus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of reports per checker can be verified with

```sh
CodeChecker parse ./my_plists --summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like us to start dissociating from plists, because even though that has been the only file format CodeChecker since its inception, thats really not the point here. This is a report directory, regardless of what kinds of files we have in there (not to mention that have issues requesting us to handle sarif natively as well).

Internally, I also hear a lot of well intentioned but misleading discussions on this. We talk about plists, not reports, even well past the point of parsing. Unless we are specifically talking about the parsing process (not even parsing in general), we should omit mentioning the file format.

This is less of a comment on your PR, just my general gripe with CodeChecker.

Suggested change
CodeChecker parse ./my_plists --summary
CodeChecker parse ./my_results --summary

@@ -90,3 +90,20 @@ def add_report(self, report: Report):
self.checker_statistics[
Checker(report.checker_name, report.severity)] += 1
self.file_statistics[report.file.original_path] += 1

def write_checker_summary(self, checker_stats, out=sys.stdout):
""" Print checker summary statistics if it's available. """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it available? I don't think the external factors are expected to change much, we can say a little more.

analyzer/codechecker_analyzer/cmd/parse.py Show resolved Hide resolved
Copy link
Member

@dkrupp dkrupp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test cases. Otherwise looks ok.

Copy link
Member

@dkrupp dkrupp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of the summary and the normal CodeChecker parse command differs.
I guess they should print the same.

I suspect the root cause is that we don't do bug deduplacation at the calculation of this summary.
This is related:
#4042

This summary:

Checker name | Number of reports

cppcoreguidelines-special-member-functions | 23068
misc-misplaced-const | 345
bugprone-sizeof-expression | 335
clang-diagnostic-implicit-int-float-conversion | 85
clang-diagnostic-non-virtual-dtor | 68
clang-diagnostic-switch-enum | 65
clang-diagnostic-implicit-int-conversion | 48
google-global-names-in-headers | 44
cert-err09-cpp | 37
clang-diagnostic-implicit-fallthrough | 13
clang-diagnostic-misleading-indentation | 13
bugprone-misplaced-widening-cast | 6
clang-diagnostic-implicit-float-conversion | 6
clang-diagnostic-implicitly-unsigned-literal | 4
clang-diagnostic-double-promotion | 3
clang-diagnostic-unused-parameter | 2
clang-diagnostic-float-conversion | 1
clang-diagnostic-unused-variable | 1
misc-unconventional-assign-operator | 1
clang-diagnostic-sign-compare | 1

Parse:
----==== Checker Statistics ====----

Checker name | Severity | Number of reports

cppcoreguidelines-special-member-functions | LOW | 533
misc-misplaced-const | LOW | 7
clang-diagnostic-implicit-int-conversion | MEDIUM | 31
clang-diagnostic-switch-enum | MEDIUM | 65
bugprone-sizeof-expression | HIGH | 80
cert-err09-cpp | HIGH | 37
clang-diagnostic-implicit-fallthrough | MEDIUM | 13
clang-diagnostic-misleading-indentation | MEDIUM | 11
clang-diagnostic-float-conversion | MEDIUM | 1
bugprone-misplaced-widening-cast | HIGH | 3
clang-diagnostic-non-virtual-dtor | MEDIUM | 3
clang-diagnostic-unused-parameter | MEDIUM | 2
clang-diagnostic-implicit-int-float-conversion | MEDIUM | 17
clang-diagnostic-unused-variable | MEDIUM | 1
misc-unconventional-assign-operator | MEDIUM | 1
clang-diagnostic-sign-compare | MEDIUM | 1
google-global-names-in-headers | STYLE | 26
clang-diagnostic-implicit-float-conversion | MEDIUM | 6
clang-diagnostic-double-promotion | MEDIUM | 3
clang-diagnostic-implicitly-unsigned-literal | MEDIUM | 4

@dkrupp dkrupp modified the milestones: release 6.23.0, release 6.24.0 Oct 19, 2023
@whisperity whisperity marked this pull request as draft March 27, 2024 11:13
@cservakt
Copy link
Collaborator Author

cservakt commented Apr 2, 2024

This PR is not the appropriate implementation. After redesigning the report directory, there will be a correct solution to sum statistics.

@cservakt cservakt closed this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLI 💻 Related to the command-line interface, such as the cmd, store, etc. commands new feature 👍 New feature request WIP 💣 Work In Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants