-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track how many times patterns are obfuscated #65
Track how many times patterns are obfuscated #65
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: josefkarasek The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/obfuscator/tracker.go
Outdated
@@ -15,11 +15,11 @@ type ReplacementTracker interface { | |||
Initialize(replacements map[string]string) | |||
|
|||
// Report returns a mapping of strings which were replaced. | |||
Report() map[string]string | |||
Report() map[string]Replacement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's what I would return instead:
type ReplacementReport struct {
replacements []Replacement
}
type Replacement {
original string
replaced string
occurrences int
}
// for backwards compatibility in all our tests
func (ReplacementReport) asMap() map[string]string {
}
that way we can serialize the list in yaml correctly, can properly sort on the struct level and do diffs in the e2e tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the implementation, you can just keep a second map for the counts and on Report() just collate, similar to the defensive copy we do
19b9919
to
0c9f322
Compare
Signed-off-by: Josef Karasek <[email protected]>
0c9f322
to
73791b8
Compare
Introduction of a hierarchical Report schema with normalized keys. ``` Canonical: "EB:A1:2A:B2:09:BF" Replacement: "x-mac-0000000001-x" Occurrences: - Original: "eb-a1-2a-b2-09-bf" Count: 5 - Original: "eb:a1:2a:b2:09:bf" Count: 15 ``` Normalized keys apply to: * IPv4 addresses - dots `'.'`: `255.255.255.255` * MAC addresses - upper case + `':'`: `29:7E:8C:8A:60:D9`
Occurences of values are tied to the specific format in which they appear in the source. For example: `192-168-1-10` and `192.168.1.10` are the same IP addresses. Just written differently. In the report both will be tied by the same normalized key `192.168.1.10` and each will have its specific number of occurrences
pkg/obfuscator/domain.go
Outdated
@@ -48,6 +48,7 @@ func (d *domainObfuscator) replaceDomains(input string) string { | |||
replacement = obfuscatedBaseDomain | |||
} | |||
output = strings.ReplaceAll(output, m[0], replacement) | |||
d.ReplacementTracker.AddReplacement(baseDomain, baseDomain, obfuscatedBaseDomain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that be:
d.ReplacementTracker.AddReplacement(baseDomain, m[0], obfuscatedBaseDomain)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that too, but the unit tests were failing when i tried it.
fmt.Println(m)
[docs.okd.io docs. okd.io]
...
expected: map[string]string{"okd.io":"domain0000000002", "openshift.com":"domain0000000001"}
actual : map[string]string{"docs.okd.io":"domain0000000002", "openshift.com":"domain0000000001"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be that the unit tests have different expectations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the map representation now changed a bit, the report will show the right thing. and there is also the side-effect of the generator changes below.
I reckon the tests here then should use the Report directly instead of the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the unit tests to use the actual reports
@@ -58,12 +58,12 @@ func (o *ipObfuscator) replace(s string) string { | |||
continue | |||
} | |||
|
|||
cleaned := strings.ReplaceAll(m, "-", ".") | |||
cleaned := strings.ToUpper(strings.ReplaceAll(m, "-", ".")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what toUpper does with numbers, what's the rationale to add that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPv6 addresses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dang, makes sense. another good reason to split this up. Is there no test case that needs adaptation to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added 2 new test cases
pkg/obfuscator/report_types.go
Outdated
Count uint `yaml:"count,omitempty"` | ||
} | ||
|
||
func (r *Replacement) Increment(original string, count uint) Replacement { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should definitely add some tests for this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and why do you return the Replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, you already have a pointer to the replacement, you can just increment the counter directly without returning it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on tests
I'll review how Replacement is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also below in the reporting is nothing else like a good old std::multimap. I'm sure there must be a good library for us to reuse?
pkg/obfuscator/tracker.go
Outdated
return | ||
} | ||
s.mapping[original] = replacement | ||
new := Replacement{Canonical: canonical, ReplacedWith: replacement} | ||
new.Increment(original, count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a NewReplacement method that would add the increment internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
pkg/obfuscator/tracker.go
Outdated
klog.Exitf("'%s' already has a value reported as '%s', tried to report '%s'", original, val, replacement) | ||
} | ||
if r, ok := s.mapping[canonical]; ok { | ||
s.mapping[canonical] = r.Increment(original, count) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bit nitty, but instead of return you can also put the other clause into an else branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
pkg/obfuscator/tracker.go
Outdated
r := generator() | ||
s.mapping[key] = r | ||
return r | ||
return generator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we have to save that in our mapping? Otherwise we will generate a different identifier for the same canonical key.
also be mindful of race conditions, since everything you return here is outside the lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified AddReplacement
so that it takes the canonical key (cleaned
version for IP and MAC) along with the original key
AddReplacement(canonical, original string, replacement string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason it was there, is because there can be a race condition where two goroutines find an empty mapping for the canonical string, generator then creates two representations for the same string.
having this here under a single locking was guaranteed that this could never happen. You also removed that exitf that was basically covering that case if it would happen. Put that exitf back and try it out on a couple of MGs, it should happen fairly frequently.
pkg/obfuscator/tracker.go
Outdated
r := generator() | ||
s.mapping[key] = r | ||
return r | ||
return generator() | ||
} | ||
|
||
func (s *SimpleTracker) Initialize(replacements map[string]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this is currently used, probably not. But that ideally should take your ReplacementReport
. I assume that also is touching @sairameshv a bit - fyi.
omissions []string | ||
} | ||
|
||
var _ Reporter = (*SimpleReporter)(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does that do? should we typedef this thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compile time check member functions of SimpleReporter implement functions of interface Reporter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind blown, why are we not returning the Reporter interface in the New func below? that should do the same 👯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll change that.
@@ -85,12 +87,22 @@ func (s *SimpleTracker) GenerateIfAbsent(canonical string, original string, coun | |||
return g | |||
} | |||
|
|||
func (s *SimpleTracker) Initialize(replacements map[string]string) { | |||
func (s *SimpleTracker) Initialize(report ReplacementReport) { | |||
close(onlyOneInit) // panics when called twice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nifty!
/label tide/merge-method-squash |
/lgtm |
/hold cancel |
Signed-off-by: Josef Karasek [email protected]