Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug: match-results-forward command generates invalid Sarif (duplicated rules). #2505

Merged
merged 13 commits into from
Aug 19, 2022

Conversation

yongyan-gh
Copy link
Collaborator

@yongyan-gh yongyan-gh commented Jul 20, 2022

Description

Fixes for below issue:
match-results-forward command generates invalid SARIF when results contain SubRuleId (#2486)

  • When merging results and rules in multiple runs, we create a rule-id to index mapping (Dictioanry) to keep tracking all referenced rules. It uses ruleId from result entity as the key. If there are multiple unique sub ruleIds of a parent rule in the results, it caches the same parent rule for each unique sub ruleId, which caused the invalid Sarif output.

ruleId = possilbeParentRuleId;
}
}

if (!string.IsNullOrEmpty(ruleId))
{
if (RuleIdToIndex.TryGetValue(ruleId, out int ruleIndex))
Copy link
Collaborator Author

@yongyan-gh yongyan-gh Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuleIdToIndex

It uses this rule-id to index mapping to track all referenced rules.
If there are multiple unique sub ruleIds of a same parent rule in the results, it caches the same parent rule for each unique sub ruleId, which caused the invalid Sarif output (same rule duplicated multiple times) #ByDesign

@yongyan-gh yongyan-gh marked this pull request as ready for review July 22, 2022 16:44
@@ -278,6 +278,7 @@ private List<ExtractedResult> ExtractResultsFromRuns(IEnumerable<Run> sarifRuns)
run.Graphs = graphs;
run.Invocations = invocations;
run.Properties = properties;
run.VersionControlProvenance = MergeVersionControlDetails(currentRuns.Concat(previousRuns));
Copy link
Collaborator

@eddynaka eddynaka Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MergeVersionControlDetails

can you verify if we have other relevant properties that we should keep? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are couple properties of run like Addresses/WebRequest/WebResponse etc are not been set.
I am not sure if any special requirements to merge these properties. Can I just combine the values of the properties in the final run object?

Copy link
Collaborator Author

@yongyan-gh yongyan-gh Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this change to another PR. resolving this comment.

{
ruleId = possilbeParentRuleId;
}
}
Copy link
Collaborator

@eddynaka eddynaka Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we replace this for a smaller logic?

ReportingDescriptor rule = result.GetRule(CurrentRun);
use rule everywhere when possible. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. The additional checks is to make sure the case is a rule with sub rule. I can trust result.GetRule(...) and remove these checks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with simple implementation


RuleIdToIndex[ruleId] = newIndex;
node.RuleIndex = newIndex;
}

node.RuleId = ruleId;
node.RuleId = resultRuleId;
Copy link
Collaborator

@eddynaka eddynaka Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resultRuleId

why do we have to change this? #Resolved

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we remove this entire line or just replace if node.ruleid is null/empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why are we using resultRuleId when the remainder of the logic in this if statement is around using ruleId?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resultRuleId keeps the value from result entity, ruleId is the value from referenced rule entity. They may be different if result uses a sub rule id of the rule.
e.g.
in rules array there is a rule id is "SEC105/003", value of ruleId should be "SEC105/003"
a result has rule id "SEC105/003/maybenull", which is a sub rule of above rule, resultRuleId keeps this id.
ruleId should be used as key of the cache RuleIdToIndex.
resultRuleId used to set on result rule id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the implementation pls see latest iteration

@marmegh
Copy link
Contributor

marmegh commented Jul 22, 2022

We also need a release history update for this.


In reply to: 1192800154


In reply to: 1192800154


In reply to: 1192800154

@@ -78,7 +78,23 @@ public override Result VisitResult(Result node)
}

// Cache RuleId and set Result.RuleIndex to the (new) index
string ruleId = node.ResolvedRuleId(CurrentRun);
string resultRuleId = node.ResolvedRuleId(CurrentRun);
string ruleId = resultRuleId;
Copy link
Contributor

@marmegh marmegh Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need both of these? Why isn't this just:
string ruleId = node.ResolvedRuleId(CurrentRun);? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was to keep the value of the result's rule id and can be used later, i updated the implemenation pls see the latest iteration.

int newIndex = Rules.Count;
Rules.Add(rule);
Rules.Add(rule ?? node.GetRule(CurrentRun));
Copy link
Collaborator

@eddynaka eddynaka Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? node.GetRule(CurrentRun)

you can remove this, because is exact the same as line 82 #Resolved

@eddynaka
Copy link
Collaborator

Also, the title of the PR should tell more about the bug. What bug are you fixing? Is it one? Many? Should be in the same PR or should be separate?


In reply to: 1192800154

@yongyan-gh yongyan-gh changed the title ResultMatching bug fixes Fixes 2 bugs of match-results-forward command: invalid Sarif (duplicated rules) and missing VersionControlProvenance data. Jul 25, 2022
@yongyan-gh
Copy link
Collaborator Author

updated release history. since both issue are related to result-match-foward command, i put the fix in same pr.


In reply to: 1193015158


private static IList<VersionControlDetails> MergeVersionControlDetails(IEnumerable<Run> runs)
{
// use HashSet with EqualityComparer to avoid duplicated entries
Copy link
Collaborator

@shaopeng-gh shaopeng-gh Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u

Cap and add . at the end #Closed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the entire comment. I don't see the difference between comment and the actual code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comments.

@@ -1,5 +1,10 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

## Unreleased

* BUGFIX: Fix issue `match-results-forward` command generates invalid Sarif when results contain sub ruleId. [#2486](https://github.com/microsoft/sarif-sdk/pull/2486)
Copy link
Collaborator

@eddynaka eddynaka Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUGFIX: Fix issue match-results-forward

I would either separate in two PRs so each PR has its own fix. Or, make an improvement in the text to be one entry that explains the current behavior for both issues and point to the correct PullRequest Id. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated into 2 PRs.

@yongyan-gh
Copy link
Collaborator Author

remove fix for #2487 to another PR. keep only fix for #2486


In reply to: 1194682619

@yongyan-gh yongyan-gh changed the title Fixes 2 bugs of match-results-forward command: invalid Sarif (duplicated rules) and missing VersionControlProvenance data. Fix bug: match-results-forward command generate invalid Sarif (duplicated rules). Jul 27, 2022
@yongyan-gh yongyan-gh changed the title Fix bug: match-results-forward command generate invalid Sarif (duplicated rules). Fix bug: match-results-forward command generates invalid Sarif (duplicated rules). Aug 5, 2022
@@ -1,5 +1,9 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

## Unreleased

* BUGFIX: Fix issue `match-results-forward` command generates invalid Sarif when results contain sub ruleId. [#2486](https://github.com/microsoft/sarif-sdk/pull/2486)
Copy link
Contributor

@marmegh marmegh Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUGFIX: Fix issue match-results-forward command generates invalid Sarif when results contain sub ruleId. #2486

I'm not sure if this is just unclear or not descriptive enough. What was invalid about the SARIF results before this fix? #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue of invalid Sarif produced before the fix is it contains duplicated rule definitions in rules array if the Sarif results contain results have sub ruleId.

E.g. if there are 2 results' ruleId is "TESTRULE/001", the rules array creates 2 duplicated rules of "TESTRULE".

Will update the description with the duplicated rules issue, and update the issue with the detail errors. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is fixing an issue where the match-results-forward command generates Sarif with duplicated results if there is a sub ruleId?
Yes, please update both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just to be clear, also update the release note here. If the above captures everything, you can copy/paste most of it. Don't forget the backticks around ruleId.

string ruleId = node.ResolvedRuleId(CurrentRun);
string resultRuleId = node.ResolvedRuleId(CurrentRun);
ReportingDescriptor rule = node.GetRule(CurrentRun);
string ruleId = rule?.Id ?? resultRuleId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string ruleId = rule?.Id ?? resultRuleId;

Is there a case where rule.Id might not be null and resultRuleId may be another value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the ruleId from result (resultRuleId) is a sub ruleId. The rule's Id (rule.Id) is the parent ruleId.

e.g. a result has sub rule id ("SEC105/001/allocator"), and a rule index points to its parent rule definition in rules array.

      "results": [
        {
          "ruleId": "SEC105/001/allocator",
          "ruleIndex": 0,
          ...
        }
          "rules": [
            {
              "id": "SEC105/001", // index 0
              ...
            }

The fix here is to use the parent ruleId instead of sub ruleId as the RuleIdToIndex dictionary's key, to avoid the (parent) rule definition to be added multiple times.

{
subRuleId.StartsWith(parentRuleId).Should().BeTrue();
}
}
Copy link
Contributor

@marmegh marmegh Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consolidate all of this:

int rule_001Count = 0;
int rule_002Count = 0;
int rule_003Count = 0;
int rule004Count = 0;
int rule_005Count = 0;
int rule_006Count = 0;
int rule_DoesNotExistCount = 0;

mergedRun.Results.Count.Should().Be(11); //Even better if you have a way of calculating this expected value from the inputs above.

foreach (Result result in mergedRune.Results)
{
    ...// keep everything the same and add something like this:
   if(result.Id == "Rule/001")
   {
       rule_001Count+=1;
   }
}

rule_001Count.Should().Be(expectedRule001Count);
//etc

We're trying to avoid iterating over the same lists multiple times. Let me know if this makes sense.
#Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, updated & removed duplicated iterations

@marmegh
Copy link
Contributor

marmegh commented Aug 10, 2022

            throw new InvalidOperationException("RemapIndicesVisitor requires CurrentRun to be set before Visiting Results.");

Code coverage (via Live Unit Testing) shows this is the only segment of code in VisitResults that is not covered by a test. Stepped through the added tests and confirmed they cover remaining if/else blocks. All changed code appears fully covered.

If at all possible to sneak in a unit test to cover this section, that would help improve our code coverage overall.


In reply to: 1211334197


In reply to: 1211334197


Refers to: src/Sarif/Visitors/RunMergingVisitor.cs:77 in 9548a62. [](commit_id = 9548a62, deletion_comment = False)

@yongyan-gh yongyan-gh requested a review from cfaucon as a code owner August 11, 2022 08:49
@yongyan-gh
Copy link
Collaborator Author

            throw new InvalidOperationException("RemapIndicesVisitor requires CurrentRun to be set before Visiting Results.");

test case added for the case CurrentRun is null.


In reply to: 1211334197


Refers to: src/Sarif/Visitors/RunMergingVisitor.cs:77 in 9548a62. [](commit_id = 9548a62, deletion_comment = False)

@shaopeng-gh
Copy link
Collaborator

shaopeng-gh commented Aug 18, 2022

Need to resolve the conflicts.


In reply to: 1218847548

Copy link
Collaborator

@shaopeng-gh shaopeng-gh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yongyan-gh
Copy link
Collaborator Author

resolved


In reply to: 1218847548

@@ -1,5 +1,9 @@
# SARIF Package Release History (SDK, Driver, Converters, and Multitool)

## Unreleased

* BUGFIX: Fix an issue where the `match-results-forward` command generates Sarif with duplicated rules if there are results with sub `ruleId`. [#2486](https://github.com/microsoft/sarif-sdk/pull/2486)
Copy link
Member

@michaelcfanning michaelcfanning Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix an issue where the match-results-forward command generates Sarif with duplicated rules if there are results with sub `ruleId

Please change this to:

Remove duplicated rule definitions when executing match-results-forward commands for results with sub-rule ids. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelcfanning michaelcfanning merged commit 7dbdf41 into main Aug 19, 2022
@michaelcfanning michaelcfanning deleted the users/yongyan-gh/ResultMatchingBugFix branch August 19, 2022 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants