FSI Sketch

Relased under the Apache Software Foundation license. See LICENSE.txt. Contains code licensed under the Apache Software Foundation license. See COPYRIGHT.txt for details.

About

This is a disk-backed implementation of the Count-Min Sketch algorithm. The Count-Min Sketch is an Approximate Member Query algorithm, which means that you can say with certainty that something is not in the set and that there is a chance of error when you say that something is in the set. This algorithm is different from a Bloom filter in that it is a multiset instead of a set and you can use it to count instances. It works much like a python set except that you can never retrieve keys, only values. All typical set operations, except union, are supported. You will have better results if you use the add/remove/discard functions instead of the assignment operators as the former are less disruptive to the internal data structure.

Usage

Print out all lines occurring 5 or more times in a file.

from fsisketch import CMSketch

sketch = CMSketch('example', 'I', 10000)
for line in open('example.txt'):
    if sketch[line] == 4:
        print (line.rstrip())
    sketch.add(line)

Track when you last saw something.

from fsisketch import CMSketch
from time import time

sketch = CMSketch('example2', 'd', 10000)
for key in some_generator():
    now = time()
    if key in sketch:
        print ('Saw "{}" {} seconds ago.'.format(key, now - sketch[key]))
    sketch[key] = now

Find unknown words.

from fsisketch import CMSketch

sketch = CMSketch('example3', 'B', 10000)
sketch.update(line.rstrip() for line in open('/usr/share/dict/words'))
new_words = sketch.difference(some_document.split())

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src/fsisketch		src/fsisketch
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
COPYRIGHT		COPYRIGHT
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSI Sketch

About

Usage

About

Releases

Packages

Languages

License

hstern/fsisketch

Folders and files

Latest commit

History

Repository files navigation

FSI Sketch

About

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages