Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct handling of "source" #63

Closed
alexcpsec opened this issue Sep 4, 2014 · 8 comments
Closed

Correct handling of "source" #63

alexcpsec opened this issue Sep 4, 2014 · 8 comments
Assignees
Milestone

Comments

@alexcpsec
Copy link
Member

Today, the "source" field corresponds to the URL from where the indicator was gathered from.

According to the docs (and to my opinion :P) it should be an identifying string that describes that source and that should be documented on the Wiki. It bothers me because I cannot match these sources up with the data we provided for the tiq-test samples, so it is an enhancement and a bug at the same time...

Perhaps the thresher_map should be the place for that or somewhere equivalent on the plugin system from #23. Is there a short term solution to this that does not require waiting for the plugin refactoring?

@krmaxwell
Copy link
Member

(not ignoring - this one requires 💭 )

@alexcpsec alexcpsec added this to the v0.1.1 Ascended Capybara milestone Sep 16, 2014
@krmaxwell krmaxwell self-assigned this Sep 26, 2014
@krmaxwell
Copy link
Member

Per @alexcpsec - turn inbound_urls.txt and outbound_urls.txt into proper config files, mapping each config name string into the URLs.

@gbrindisi
Copy link
Contributor

I've played a little with the config files and I've produced a poc code in a local branch to address this issue.

Basically I've added the feeds to the config file, like:

[feeds.outbound]
feed_o_label1 = feed_url1
...
...
feed_o_labelN = feed_urlN

[feeds.inbound]
feed_i_label1 = feed_url1
...
...
feed_i_labelN = feed_urlN

Then reaper.py reads the feeds from the config file (sections feeds.outbound and feeds.inbound) and store the harvested results by label (ie feed_o_label1).

Next I've improved thresher.py too, to let it read the associated parser function from the config file too.

For example in the config the user can now define the preferred parsed function like so:

[feeds.parsers]
feed_whatever = whatever_parser

whatever_parser() is then used to parse the result's labeled as feed_whatever.

This behaviour should be a good starting point to implement a plugin system in which the parser's are read from other modules.

Please let me know if you like this approach, you can find the code in https://github.com/gbrindisi/combine/tree/labeled-feeds

Hopefully I'll be able to tidy up the code a bit more tomorrow.

@alexcpsec
Copy link
Member Author

I was thinking about this today while I was working to merge your stuff and I share your thoughts on this.

I'll have a look at your stuff tonight and comment on some other suggestions.

On Sun, Oct 12, 2014 at 11:28 AM, Gianluca Brindisi
[email protected] wrote:

I've played a little with the config files and I've produced a poc code in a local branch to address this issue.
Basically I've added the feeds to the config file, like:

[feeds.outbound]
feed_o_label1 = feed_url1
...
...
feed_o_labelN = feed_urlN
[feeds.inbound]
feed_i_label1 = feed_url1
...
...
feed_i_labelN = feed_urlN

Then reaper.py reads the feeds from the config file (sections feeds.outbound and feeds.inbound) and store the harvested results by label (ie feed_o_label1).
Next I've improved thresher.py too, to let it read the associated parser function from the config file too.
For example in the config the user can now define the preferred parsed function like so:

[feeds.parsers]
feed_whatever = whatever_parser

whatever_parser() is then used to parse the result's labeled as feed_whatever.
This behaviour should be a good starting point to implement a plugin system in which the parser's are read from other modules.
Please let me know if you like this approach, you can find the code in https://github.com/gbrindisi/combine/tree/labeled-feeds

Hopefully I'll be able to tidy up the code a bit more tomorrow.

Reply to this email directly or view it on GitHub:

#63 (comment)


This e-mail message and any files transmitted with it contain legally
privileged, proprietary information, and/or confidential information,
therefore, the recipient is hereby notified that any unauthorized
dissemination, distribution or copying is strictly prohibited. If you have
received this e-mail message inappropriately or accidentally, please notify
the sender and delete it from your computer immediately.

@krmaxwell
Copy link
Member

I like the idea of paving the way for plugins. The feed_i_whatever syntax feels a little ugly to me for some reason. Need to brain on this a bit.

@gbrindisi
Copy link
Contributor

Just to put some perspective, this is the test configuration I've used (most of the entries are commented out):

[feeds.outbound]
#malwaregroup     = http://www.malwaregroup.com/ipaddresses
#malc0de          = http://malc0de.com/bl/IP_Blacklist.txt
#zeustracker      = https://zeustracker.abuse.ch/blocklist.php?download=ipblocklist
#spyeyetracker    = https://spyeyetracker.abuse.ch/blocklist.php?download=ipblocklist
#palevotracker    = https://palevotracker.abuse.ch/blocklists.php?download=ipblocklist
alienvault       = http://reputation.alienvault.com/reputation.data
#nothink-malware-dns = http://www.nothink.org/blacklist/blacklist_malware_dns.txt
#nothink-malware-http = http://www.nothink.org/blacklist/blacklist_malware_http.txt
#nothink-malware-irc = http://www.nothink.org/blacklist/blacklist_malware_irc.txt

[feeds.inbound]
#projecthoneypot = http://www.projecthoneypot.org/list_of_ips.php?rss=1

[feeds.parsers]
alienvault = process_alienvault

Also I've used the feeds.XXX naming schema for the config sections to see if it was feasible to abstract the standard inbound/outbound categorization. It's just an idea though.

@krmaxwell
Copy link
Member

I like having that confined to the categories / sections much better than having the names. But I am reminded that there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

@gbrindisi
Copy link
Contributor

I've made a pull request to discuss it better: #86

@alexcpsec alexcpsec modified the milestones: v0.2.0 Unnamed Dolphin, v0.1.3 Captivating Capybara Jan 8, 2015
@alexcpsec alexcpsec removed the ready label Jan 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants