Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track how many times patterns are obfuscated #65

Merged
merged 14 commits into from
Sep 21, 2021

Conversation

josefkarasek
Copy link

Signed-off-by: Josef Karasek [email protected]

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 16, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 16, 2021
@josefkarasek josefkarasek changed the title Track how many times patters was obfuscated Track how many times pattern was obfuscated Sep 16, 2021
@josefkarasek josefkarasek changed the title Track how many times pattern was obfuscated Track how many times patterns are obfuscated Sep 16, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 16, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: josefkarasek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 16, 2021
pkg/obfuscator/tracker.go Outdated Show resolved Hide resolved
@@ -15,11 +15,11 @@ type ReplacementTracker interface {
Initialize(replacements map[string]string)

// Report returns a mapping of strings which were replaced.
Report() map[string]string
Report() map[string]Replacement
Copy link
Contributor

@tjungblu tjungblu Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's what I would return instead:

type ReplacementReport struct {
    replacements []Replacement
}

type Replacement {
   original string
   replaced string
   occurrences int 
}

// for backwards compatibility in all our tests
func (ReplacementReport) asMap() map[string]string {
}

that way we can serialize the list in yaml correctly, can properly sort on the struct level and do diffs in the e2e tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the implementation, you can just keep a second map for the counts and on Report() just collate, similar to the defensive copy we do

pkg/obfuscator/tracker.go Outdated Show resolved Hide resolved
pkg/obfuscator/tracker.go Outdated Show resolved Hide resolved
pkg/obfuscator/tracker.go Outdated Show resolved Hide resolved
pkg/obfuscator/ip.go Outdated Show resolved Hide resolved
Josef Karasek added 6 commits September 17, 2021 16:11
Introduction of a hierarchical Report schema with normalized keys.
```
    Canonical: "EB:A1:2A:B2:09:BF"
    Replacement: "x-mac-0000000001-x"
    Occurrences:
    - Original: "eb-a1-2a-b2-09-bf"
      Count: 5
    - Original: "eb:a1:2a:b2:09:bf"
      Count: 15
```
Normalized keys apply to:
* IPv4 addresses - dots `'.'`: `255.255.255.255`
* MAC addresses - upper case + `':'`: `29:7E:8C:8A:60:D9`
Josef Karasek added 2 commits September 20, 2021 17:10
Occurences of values are tied to the specific
format in which they appear in the source.

For example:
`192-168-1-10` and `192.168.1.10` are the same IP addresses.
Just written differently.
In the report both will be tied by the same normalized key
`192.168.1.10` and each will have its specific number
of occurrences
@openshift openshift deleted a comment from josefkarasek Sep 20, 2021
@@ -48,6 +48,7 @@ func (d *domainObfuscator) replaceDomains(input string) string {
replacement = obfuscatedBaseDomain
}
output = strings.ReplaceAll(output, m[0], replacement)
d.ReplacementTracker.AddReplacement(baseDomain, baseDomain, obfuscatedBaseDomain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't that be:

d.ReplacementTracker.AddReplacement(baseDomain, m[0], obfuscatedBaseDomain)

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that too, but the unit tests were failing when i tried it.

fmt.Println(m)
[docs.okd.io docs. okd.io]
...
expected: map[string]string{"okd.io":"domain0000000002", "openshift.com":"domain0000000001"}
actual  : map[string]string{"docs.okd.io":"domain0000000002", "openshift.com":"domain0000000001"}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be that the unit tests have different expectations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the map representation now changed a bit, the report will show the right thing. and there is also the side-effect of the generator changes below.

I reckon the tests here then should use the Report directly instead of the map.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the unit tests to use the actual reports

@@ -58,12 +58,12 @@ func (o *ipObfuscator) replace(s string) string {
continue
}

cleaned := strings.ReplaceAll(m, "-", ".")
cleaned := strings.ToUpper(strings.ReplaceAll(m, "-", "."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what toUpper does with numbers, what's the rationale to add that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPv6 addresses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dang, makes sense. another good reason to split this up. Is there no test case that needs adaptation to this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 new test cases

pkg/obfuscator/ip_test.go Show resolved Hide resolved
pkg/obfuscator/keywords.go Show resolved Hide resolved
Count uint `yaml:"count,omitempty"`
}

func (r *Replacement) Increment(original string, count uint) Replacement {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should definitely add some tests for this method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why do you return the Replacement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you already have a pointer to the replacement, you can just increment the counter directly without returning it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on tests

I'll review how Replacement is used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also below in the reporting is nothing else like a good old std::multimap. I'm sure there must be a good library for us to reuse?

return
}
s.mapping[original] = replacement
new := Replacement{Canonical: canonical, ReplacedWith: replacement}
new.Increment(original, count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a NewReplacement method that would add the increment internally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

klog.Exitf("'%s' already has a value reported as '%s', tried to report '%s'", original, val, replacement)
}
if r, ok := s.mapping[canonical]; ok {
s.mapping[canonical] = r.Increment(original, count)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a bit nitty, but instead of return you can also put the other clause into an else branch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

r := generator()
s.mapping[key] = r
return r
return generator()
Copy link
Contributor

@tjungblu tjungblu Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have to save that in our mapping? Otherwise we will generate a different identifier for the same canonical key.

also be mindful of race conditions, since everything you return here is outside the lock

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified AddReplacement so that it takes the canonical key (cleaned version for IP and MAC) along with the original key

AddReplacement(canonical, original string, replacement string)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason it was there, is because there can be a race condition where two goroutines find an empty mapping for the canonical string, generator then creates two representations for the same string.

having this here under a single locking was guaranteed that this could never happen. You also removed that exitf that was basically covering that case if it would happen. Put that exitf back and try it out on a couple of MGs, it should happen fairly frequently.

r := generator()
s.mapping[key] = r
return r
return generator()
}

func (s *SimpleTracker) Initialize(replacements map[string]string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this is currently used, probably not. But that ideally should take your ReplacementReport. I assume that also is touching @sairameshv a bit - fyi.

omissions []string
}

var _ Reporter = (*SimpleReporter)(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does that do? should we typedef this thing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compile time check member functions of SimpleReporter implement functions of interface Reporter

Copy link
Contributor

@tjungblu tjungblu Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind blown, why are we not returning the Reporter interface in the New func below? that should do the same 👯

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll change that.

@@ -85,12 +87,22 @@ func (s *SimpleTracker) GenerateIfAbsent(canonical string, original string, coun
return g
}

func (s *SimpleTracker) Initialize(replacements map[string]string) {
func (s *SimpleTracker) Initialize(report ReplacementReport) {
close(onlyOneInit) // panics when called twice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nifty!

@tjungblu
Copy link
Contributor

/label tide/merge-method-squash

@josefkarasek josefkarasek marked this pull request as ready for review September 21, 2021 15:37
@tjungblu
Copy link
Contributor

/lgtm

@tjungblu
Copy link
Contributor

/hold cancel

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 21, 2021
@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Sep 21, 2021
@openshift-merge-robot openshift-merge-robot merged commit 6d26026 into openshift:main Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants