Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use more of the feed #84

Closed
paulpc opened this issue Oct 16, 2014 · 22 comments
Closed

use more of the feed #84

paulpc opened this issue Oct 16, 2014 · 22 comments

Comments

@paulpc
Copy link
Contributor

paulpc commented Oct 16, 2014

why not keep some of the metadata from the feeds and use it for enrichment? For example, the AlienVault feed has some interesting information as to why that IP is there and would make for better context.

@alexcpsec
Copy link
Member

Yes, I agree with that 100% and would be one of the objectives of the plugin model in #23 to have precise parsing from each feed and normalizing the different types of information we get from different ones.

There are some pre-requisites to get there, but if you want to help us categorizing the different kinds of metadata the feeds have we would greatly appreciate it.

@paulpc
Copy link
Contributor Author

paulpc commented Oct 16, 2014

on my way there. First, i'm working on shoving the results into CRITs (https://github.com/crits/crits)

@alexcpsec
Copy link
Member

Nice! 👍 Let me know how that goes :)

@alexcpsec
Copy link
Member

Hey, @mgoffin, can you give us some pointers on the best way to integrate with CRITS?

@mgoffin
Copy link

mgoffin commented Oct 17, 2014

Sure! The best thing for now would probably be to write a script using the CRITs API to consume the feed and ingest it into CRITs. Ultimately what would be more beneficial is to create a service which has the ability to pull down the feed, parse it, display results to a user, and let them "approve" which items to accept into the system. That will probably become the standard model for any service(s) that deal with feed ingestion.

@alexcpsec
Copy link
Member

Cool! I saw the services API on the Wiki, I guess that is what you mean, right?

Do you have a reference implementation I could look at maybe? it sounds like a good idea to build this integration as more and more people use CRITs

@mgoffin
Copy link

mgoffin commented Oct 17, 2014

You'll wanto check out the Authenticated API on the wiki. It gives some examples and such. It's not 100% but you can read and write all of the different TLOs. Just can't do updates or removal.

@alexcpsec
Copy link
Member

👍
Thanks!

@paulpc
Copy link
Contributor Author

paulpc commented Oct 17, 2014

@alexcpsec @mgoffin working on uploading the IOCs via the web API - it's a bit too slow (a few hours for the 300K+ IPs), so i'll try multithreading and if that works better, i'll submit the code for review.

@mgoffin
Copy link

mgoffin commented Oct 17, 2014

I'll note that we haven't tried hammering the API like that before, so we don't have any useful benchmarks for what speeds we should be getting :)

@paulpc
Copy link
Contributor Author

paulpc commented Oct 17, 2014

as for the original topic and more context, before Combine i wrote something to do this and I implemented it by trying to get STIX-like fields from the sources.
I did that by defining my sources in this format:

{
  "impact": "high", 
  "source": "malwareDomainList",
  "campaign":"testCampaign", 
  "confidence": "medium", 
  "format": "^\\\".*\\\"\\,\\\"(.*?)\\\"\\,\\\"(\\d+\\.\\d+\\.\\d+\\.\\d+|-)\\\"\\,\\\"(.*?)\\\"\\,\\\".*?\\\"\\,\\\".*?\\\"\\,\\\"(\\d+|-)\\\"", 
  "reference": "http://www.malwaredomainlist.com/updatescsv.php", 
  "fields": ["URI - URL", "Address - ipv4-addr", "URI - Domain Name","Address - asn"] 
}

My intention was to design a relationship engine for all the IOCs from here and upload them related into CRIts, but i never got to it

@paulpc
Copy link
Contributor Author

paulpc commented Oct 17, 2014

@alexcpsec and @mgoffin , here's my single threaded code: https://github.com/paulpc/combine. I'll wait until I can get better performance before I submit an official pull request

@alexcpsec
Copy link
Member

@paulpc Got the gist of it by looking at your code, nice work. To speed things up by making the requests parallel, I would suggest you have a look at the grequests package we are using on reaper.py.

@paulpc
Copy link
Contributor Author

paulpc commented Oct 21, 2014

@alexcpsec , i'll give it a look. I did it manually using multithread and was able to do it 25% faster for 5380 IPs/Domains - not sure it's worth the code complications yet.

Fetching inbound URLs
Fetching outbound URLs
Storing raw feeds in harvest.json
Loading raw feed data from harvest.json
Evaluating http://www.projecthoneypot.org/list_of_ips.php?rss=1
Parsing feed from http://www.projecthoneypot.org/list_of_ips.php?rss=1
Evaluating http://www.openbl.org/lists/base_30days.txt
Parsing feed from http://www.openbl.org/lists/base_30days.txt
Evaluating http://www.blocklist.de/lists/ssh.txt
Parsing feed from http://www.blocklist.de/lists/ssh.txt
Parsing feed from http://www.malwaregroup.com/ipaddresses
Parsing feed from http://malc0de.com/bl/IP_Blacklist.txt
Parsing feed from http://www.nothink.org/blacklist/blacklist_malware_dns.txt
Storing parsed data in crop.json
Reading processed data from crop.json
1413901165.67 *** trying single thread***
1413901165.67 reading configs
1413901165.67 going through list
don't yet know what to do with: None[ckaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa]successfully added 5380 IP addresses and 201 domains

1413901615.68 done in  450.012173176  seconds
make sure you have the following sources in CRITs: [u'www.projecthoneypot.org', u'www.openbl.org', u'www.blocklist.de', u'www.malwaregroup.com', u'malc0de.com', u'www.nothink.org']
1413901615.68 *** trying multi thread***
1413901615.68 reading configs
1413901615.69 initializing queue
1413901615.7 starting threads
don't yet know what to do with: None[ckaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa]

1413901945.92 done in  330.232031107  seconds

@alexcpsec
Copy link
Member

So maybe 5-ish mins for 5500 indicators? That is not too bad. As to Mike's point, who knows how much CRITs can handle. :)

LMK if you want to merge back when you think you are ready. We might tinker with it in the near future or so to try to add grequests to it.

@paulpc
Copy link
Contributor Author

paulpc commented Oct 21, 2014

will do - i'm testing with a few more indicators (a couple more blocklist.de
files), will clean up my code and submit it. I uploaded my current code on
my branch.

Paul Poputa-Clean

On Tue, Oct 21, 2014 at 8:24 AM, Alex Pinto [email protected]
wrote:

So maybe 5-ish mins for 5500 indicators? That is not too bad. As to Mike's
point, who knows how much CRITs can handle. :)

LMK if you want to merge back when you think you are ready. We might
tinker with it in the near future or so to try to add grequests to it.


Reply to this email directly or view it on GitHub
#84 (comment).

@paulpc
Copy link
Contributor Author

paulpc commented Oct 21, 2014

turns out, the more indicators, the more speed gains:

Fetching inbound URLs
Fetching outbound URLs
Storing raw feeds in harvest.json
Loading raw feed data from harvest.json
Evaluating http://www.projecthoneypot.org/list_of_ips.php?rss=1
Parsing feed from http://www.projecthoneypot.org/list_of_ips.php?rss=1
Evaluating http://www.openbl.org/lists/base_30days.txt
Parsing feed from http://www.openbl.org/lists/base_30days.txt
Evaluating http://www.blocklist.de/lists/ssh.txt
Parsing feed from http://www.blocklist.de/lists/ssh.txt
Evaluating http://www.blocklist.de/lists/apache.txt
Parsing feed from http://www.blocklist.de/lists/apache.txt
Evaluating http://www.blocklist.de/lists/asterisk.txt
Parsing feed from http://www.blocklist.de/lists/asterisk.txt
Evaluating http://www.blocklist.de/lists/bots.txt
Parsing feed from http://www.blocklist.de/lists/bots.txt
Parsing feed from http://www.malwaregroup.com/ipaddresses
Parsing feed from http://malc0de.com/bl/IP_Blacklist.txt
Parsing feed from http://www.nothink.org/blacklist/blacklist_malware_dns.txt
Storing parsed data in crop.json
Reading processed data from crop.json
1413904907.94 *** trying single thread***
1413904907.94 reading configs
1413904907.94 going through list
-- omitted parsing issues for brevity -- 
successfully added 21444 IP addresses and 201 domains
1413907121.27 done in  2213.32997704  seconds
make sure you have the following sources in CRITs: [u'www.projecthoneypot.org', u'www.openbl.org', u'www.blocklist.de', u'www.malwaregroup.com', u'malc0de.com', u'www.nothink.org']
1413907121.27 *** trying multi thread***
1413907121.27 reading configs
1413907121.27 initializing queue
1413907121.33 starting threads
-- omitted parsing issues for brevity -- 
1413908147.52 done in  1026.25563312  seconds

I'll get everything ready for it and submit it for a pull request

@alexcpsec
Copy link
Member

Looks good. Thanks!

@krmaxwell
Copy link
Member

So TL;DR: this is multithreading the submission to CRITs and possibly grabbing some additional data from the feeds?

@alexcpsec
Copy link
Member

No extra info from feeds in this submission, just crits. But the original discussion was about the extra info. :)

On Wed, Oct 22, 2014 at 7:33 AM, Kyle Maxwell [email protected]
wrote:

So TL;DR: this is multithreading the submission to CRITs and possibly grabbing some additional data from the feeds?

Reply to this email directly or view it on GitHub:

#84 (comment)


This e-mail message and any files transmitted with it contain legally
privileged, proprietary information, and/or confidential information,
therefore, the recipient is hereby notified that any unauthorized
dissemination, distribution or copying is strictly prohibited. If you have
received this e-mail message inappropriately or accidentally, please notify
the sender and delete it from your computer immediately.

@paulpc
Copy link
Contributor Author

paulpc commented Oct 22, 2014

sorry, @technoskald! discussion got derailed with CRITs. We can get back to the metadata when I have time to code some more. I might wait and see what comes out of the labeled-feeds-branch. Do you know if the conf reader library will read regex out of a conf file or try to interpret / clobber them?

@krmaxwell
Copy link
Member

OK, so this is just about CRITs? Cool then. 👍

@alexcpsec alexcpsec mentioned this issue Dec 23, 2014
@alexcpsec alexcpsec added this to the v0.2.0 Unnamed Dolphin milestone Jan 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants