-
Notifications
You must be signed in to change notification settings - Fork 50
Add an Encoder to the pipeline #75
Comments
A much more radical design. @tixxit, @NathanHowell, curious what you think. Assume that All the games with (In a world with metadata, we also want If it's unsatisfying to determine all of the possible splits up front at the root, you can always add in a re-encoding phase where, given an established tree, you train encoders for each of the leaves, and the union the resulting encodings to get a superset of all of them. Some advantages:
Cons:
This all make sense? |
... apologies for thinking out loud, but: The above - encoding to a bitset - assumes that there's a bounded, finite number of predicates we might want in the tree. But that's more limiting than we want, and more limiting than necessary. As a simple example, imagine a text feature, where we want the predicates to be of the form The properties we actually want from an encoder are:
I'm not sure yet how that translates to an implementation. |
Ok, so in concrete terms, maybe something like:
The In this scheme you would probably specify all the available |
It would be helpful to have a concept of an
Encoder[K,U,V]
which can transformMap[K,U]
to theMap[K,V]
used by a given set of trees, and which can be serialized along with the trees.We also want (in some cases) to be able to "train" these encoders from a training set - for example to figure out which numeric features are continuous vs. discrete, or even to do dimensionality reduction etc.
The text was updated successfully, but these errors were encountered: