Skip to content
switham edited this page Jan 29, 2014 · 10 revisions

(Of course there's nothing quite so refreshing as source code.)

WARNING: You may notice that this code calls a cryptographic hash function. That does NOT mean this code is secure. treeprng is NOT a cryptographic pseudorandom number generator. Do NOT use it in a crypto or security-sensitive application.

TreePRNG takes a "secure hash" (SHA-1 by default) of the path down the tree. At each step, the key that the user provides is serialized with Python's pickle, submethod 2, which is binary rather than text. Each pickle is prepended with an ASCII 'k'. Because pickles are self-delimiting, no two sequences of different k+pickles produce the same byte stream. Pickle itself isn't a hash, so all of the information in the key sequence is fed to the "secure hash".

A new TreePRNG object, like tree here...

    tree = TreePRNG()

...also contains a new hashlib hash object, whose size is fixed by the particular hash algorithm. When a subtree is accessed...

    subtree = tree[key]

...the original hash object is unchanged, but cloned into the child, and then the pickled key bytes are fed to the child's .update() method. The hash object absorbs the key information (no matter how large) without changing size (the word "digest" really applies). By holding onto a subtree object in a variable, a program can access areas within the subtree without needing to reiterate the whole path. However, the digest from a subtree object is exactly what would be produced by updating a single hash with the entire path.

The effect is like having a "current directory" in a file system except you can hold onto as many as you like. Each subtree object is self-contained and has the same fixed size, no matter how deep in the tree. Subtree objects that aren't saved in variables, or are in variables that go out of scope, are recycled.

Besides being convenient, this implies that it's natural for many key paths to be mostly the same. Unless the hash algorithm has a weakness, I think that's not a problem, the digests should appear random everywhere, with high probability, I think.

At the end of a path, the user calls a random method. The first 160 random bits (with SHA1) are just the digest of the sequence of k+pickled keys. Often that's enough since only one random method call is allowed at the end of a key path. But some single method calls (like shuffle()) can use more bits. Each further chunk of (e.g.) 160 bits is the hash of the same path with (ASCII 'p' + pickle(chunk_number)) appended.

Clone this wiki locally