Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move over to plugins and more #121

Closed
wants to merge 37 commits into from
Closed

Conversation

sooshie
Copy link

@sooshie sooshie commented Feb 13, 2015

plugins, code cleanup, moved to key:value for all information passed, did away with inbound and outbound files (all of that is handled via plugins and the json docs), and added clean-mx plugin to demonstrate HASH and URL types that were also added in addition to IPv4 and FQDN, cleaned up enrichment (added DNS resolution), and expanded CSV output (enriched).

I don't have a CRITS setup to test against so I haven't touched that stuff, nor did I touch the tiq_output.

closes #23
closes #102
closes #101
closes #100
closes #79
closes #84
closes #63
closes #37

@alexcpsec alexcpsec self-assigned this Feb 13, 2015
@krmaxwell
Copy link
Member

well I know what I'm doing on my 3-day weekend! this looks great, thanks!

@alexcpsec
Copy link
Member

It seems to be missing some pre-requisites. What is this uniaccept thing about? Quick google search indicates this (https://github.com/icann/uniaccept-python) and that it should be a part of dnspython.

Can you help?

$ ./combine.py
Traceback (most recent call last):
  File "./combine.py", line 10, in <module>
    from reaper import reap
  File "/Users/alexcp/src/combine/reaper.py", line 10, in <module>
    import uniaccept
ImportError: No module named uniaccept

@alexcpsec
Copy link
Member

You mention it on the README file, but it is not clear how to install it.

@alexcpsec
Copy link
Member

I added this to the requirements.txt, seemed to do the trick:

-e git+https://github.com/icann/uniaccept-python.git@2fd43061c729fdd834b93ee64ea33695266ddae0#egg=uniaccept-master

@alexcpsec
Copy link
Member

@sooshie A minor annoyance is this "double logging" at baler and winnower:

2015-02-21 15:31:39,661 - combine.baler - INFO - Reading processed data from crop.json
[2015-02-21 23:31:39.661727] INFO: combine.baler: Reading processed data from crop.json
2015-02-21 15:31:46,640 - combine.baler - INFO - Output regular data as CSV to harvest.csv
[2015-02-21 23:31:46.641092] INFO: combine.baler: Output regular data as CSV to harvest.csv

Not familiar with the logbook package, so not sure what is going on here.

@krmaxwell
Copy link
Member

Since uniaccept-python is BSD-licensed and no longer maintained, we might as well bring it in directly to the repository (subject to license compliance which will be straightforward for us).

@alexcpsec
Copy link
Member

I am tracking this on branch sooshie-master

@krmaxwell
Copy link
Member

Also: can you talk a little about the grequests replacement with multiprocessing? Having some trouble on a test system with it.

@sroberts
Copy link

@sooshie It's actually pretty easy to get up and running at this point, the shell script is great for that.

@sooshie
Copy link
Author

sooshie commented Mar 10, 2015

Ok, got it fixed. Still haven't tested it against a CRITS instance (because $dayjob has other priorities currently). But it was an easy fix. The function was still using the CSV fields vs the JSON I used for re-plumbing. Somebody might check that I used source and reference correctly. Other than that, no errors on running it.

@sooshie
Copy link
Author

sooshie commented Mar 10, 2015

Don't forget my initial note (all the way at the top), I haven't touched the tiq part of the code, and likely won't for the foreseeable future. But it looks like it shouldn't need it since it relies on functions I've already fixed.

@krmaxwell
Copy link
Member

There are two fixes to the dnspython/uniaccept issue, the "right" way and the "quick" way.

Right: Fix uniaccept so that it's a true package. I don't know if they'd accept a PR, but since it's BSD licensed we can do just about anything we need to do. This probably involves configuring Combine with setuptools as well.

Quick: Import our fork of uniaccept as a git subtree and update the install instructions in the README.

For now I have gone with the "quick" option so we can get this done. We really need to do it the "right" way at some point. But hey, technical debt is our friend!

@krmaxwell
Copy link
Member

techdebt
from @alexcpsec

@alexcpsec
Copy link
Member

Another one for the list. It seems the thread is not ending processing when there is an ERROR and then the program just waits forever.

[2015-03-10 23:12:10.017235] DEBUG: reaper: Added: http://reputation.alienvault.com/reputation.data
[2015-03-10 23:12:17.232505] ERROR: reaper: Requests Error: HTTPConnectionPool(host='support.clean-mx.de', port=80): Read timed out. (read timeout=7.0)

^CTraceback (most recent call last):
  File "combine.py", line 40, in <module>
    reap('harvest.json')
  File "/Users/alexcp/src/combine/reaper.py", line 80, in reap
    responses = [q.get() for q in queues]
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 117, in get
    res = self._recv()
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/util.py", line 325, in _exit_function
    p.join()
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/process.py", line 145, in join
    res = self._popen.wait(timeout)
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/forking.py", line 154, in wait
    return self.poll(0)
  File "/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/forking.py", line 135, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

@krmaxwell
Copy link
Member

OK good it's not just me then. (I let it run for over 45 minutes :( )

We can also disable PalevoTracker (404s) and SpyeyeTracker (no longer active).

@alexcpsec
Copy link
Member

Typo on Palevo. Should be blocklists.php (for some reason).

SpyEye is dead, can be removed.

@krmaxwell
Copy link
Member

(venv)kmaxwell@leibniz:~/src/combine(sooshie-master)$ python reaper.py
[2015-03-11 03:16:21.246795] INFO: reaper: Loading Plugins
[2015-03-11 03:16:21.285367] INFO: reaper: Processing: sans
[snip]
[2015-03-11 03:16:33.723057] ERROR: reaper: Requests Error: HTTPConnectionPool(host='support.clean-mx.de', port=80): Read timed out. (read timeout=7.0)
Process Process-26:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "reaper.py", line 32, in get_file
    q.task_done()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 330, in task_done
    raise ValueError('task_done() called too many times')
ValueError: task_done() called too many times

@krmaxwell krmaxwell mentioned this pull request Mar 11, 2015
@krmaxwell
Copy link
Member

FWIW this runs successfully for me now (see the sooshie-master branch in this repo). There are a few minor things to tweak here but we're just about there.

@krmaxwell
Copy link
Member

OK, this is in dev now and we should continue to iterate on issues from there. We will document those separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment