Merge pull request #2 from whysage/develop

Update README
whysage · Jul 28, 2022 · 383508e · 383508e
2 parents 4e2376e + 1279cb9
commit 383508e
Show file tree

Hide file tree

Showing 2 changed files with 30 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1 +1,30 @@
 # Hash Chunker
+
+Generator that yields hash chunks for distributed data processing.
+
+### TLDR
+
+```
+pip install hash-chunker
+```
+
+```
+from hash_chunker import HashChunker 
+
+chunks = list(HashChunker().get_chunks(chunk_size=1, all_items_count=2))
+
+assert chunks == [("0000000000", "8000000000"), ("8000000000", "ffffffffff")]
+```
+
+### Description
+
+Imagine a situation when you need to process huge amount data rows in parallel.
+Each data row has a hash field and the task is to use it for chunking.
+
+Possible reasons for using hash field and not int id field:
+- No auto increment id field.
+- Id field has many blank lines (1,2,3, 100500, 100501, 1000000).
+- Chunking by id will break data that must be in one chunk to different chunks
+(in user behavioral analytics id can be autoincrement for all users actions and
+user_session hash is linked to concrete user, so if we chunk by id one user session may
+not be in one chunk).
diff --git a/pyproject.toml b/pyproject.toml
@@ -2,7 +2,7 @@
 name = "hash_chunker"
 homepage = "https://github.com/whysage/hash_chunker"
 repository = "https://github.com/whysage/hash_chunker"
-version = "0.1.1"
+version = "0.1.2"
 description = "Generator that yields hash chunks for distributed data processing."
 authors = ["Volodymyr Kochetkov <[email protected]>"]
 license = "MIT"