-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Regex based IP detection #44
Comments
/cc @datenreisen |
Over another channel the question of performance impact was raised. In order get some numbers on this, I've implemented a prototype of the feature and hacked together a quick and dirty profiling script. For the tests, I've used an exampe In order to smooth any spikes, I ran this file 1000 times through anonip with regex matching and another 1000 times with normal column based matching. Here are the results:
Based on those numbers, I'd say the performance hit definately is a concern. OTOH: For normal parsing of a (configurable) As the effort needed to properly implement this feature is manageabIe, I propose we implement it and transparently document its performance impact, thus advising users to only use it when absolutely necessary and when performance is not very critical. |
Some thoughts:
|
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
Awesome 🚀
Great idea! Implemented in a similar way:
Great idea! Let's save this for a later iteration though. |
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#42, closes DigitaleGesellschaft#44
This commit implements regex based IP detection. This is intended to use for logfiles where column based detection doesn't work. See RFC (DigitaleGesellschaft#44) for more information. Closes DigitaleGesellschaft#42, closes DigitaleGesellschaft#44
Rationale
Our column-based approach of specifying the location of an IP address is not flexible enough to cover all usecases.
A good example of such a usecase can be seen in this issue. Since it's not possible to configure the log format for error logs in nginx, Anonip can't reliably detect IP addresses.
Proposal
I propose an alternative regex matching IP detection.
I don't intend to match IP addresses with regexes! But I'd like to provide a way to point Anonip to the locations of IP addresses with a regex.
This alternative approach should be provided alongside the already existing column-based approach.
When using the new
--regex
argument, the arguments--column
and--delimiter
will become obsolete.--replace
can still be used, for cases, where we have matching groups, but they're not valid IP addresses.Example
The regexes provided in the examples are simplified and should just illustrate the proposed feature. For production environments you want to have more robust ones.
Let's use the log line from the before mentioned issue:
With the new feature in place, we could do:
$ ./anonip.py --regex ".* client\: ([^,]+), .*"
This would then match the provided log line and capture the IP address (
XXX.XXX.XXX.XXX
) into the first group.In order to find all IP addresses, Anonip would then iterate over all available matched groups (just one in this example).
More involved example
Let's say we still want to handle above log line, but additionally we expect lines in the following format:
Note the two IP addresses.
This can be handled in one single regex:
$ ./anonip.py --regex "(?:.*, client\: ([^,]+), .*|.* - somefixedstring\: ([^,]+) - .* - ([^,]+))"
Considerations
This opens a box of very verbose and hardly readable commands needed to run Anonip against certain logs.
But for more advanced users, it would fill the gap which exists now for parsing log files with formats that are not parseable by Anonip.
The text was updated successfully, but these errors were encountered: