Some Notes #27

EricForgy · 2019-02-12T15:36:35Z

Hi there,

I really like this package and see potential in many applications (and finance being the one on my mind at the moment) ❤️

In fact, the following registered packages have a dependency on AbstractTrees.jl:

BioMedQuery.jl
BioServices.jl
Cascadia.jl
D3Trees.jl
DataDepsGenerators.jl
DiffEqBayes.jl
DiffEqFlux.jl
DiffEqSensitivity.jl
ExprOptimization.jl
ExprRules.jl
Flux.jl
Gumbo.jl
Metalhead.jl
ONNX.jl
Omega.jl
PhysOcean.jl
ReinforcementLearning.jl
TextAnalysis.jl
Turing.jl

Thank you @Nosferican for the list 😊

Looking through a few of these packages, I see most of them just use children and printnode. I'd be curious to find a package that makes deeper use of some of the other functionality of this package. Any suggestions?

The purpose of these notes is to try to help me understand how things work and maybe to suggest some alternate ways forward (and probably learn a lot in the process as I discover my errant ways).

Children or Parents?

I can think of two ways to define a tree structure. It seems the most common way is top down, i.e.

struct TopDownTree
    name::String
    children::Vector{TopDownTree}
end

But, depending on the application, I could see the use for a bottom-up approach, i.e.

struct BottomUpTree
    name::String
    parent::BottomUpTree
end

I think either of the above are sufficient to uniquely define a tree. However, we could be redundant and define something like

struct RedundantTree
    name::String
    parent::RedundantTree
    children::Vector{RedundantTree}
    siblings::Vector{RedundantTree}
end

This is redundant (assuming each node has a single parent which I think is safe for this package) because given only parent info, you can deduce both children and siblings info. Similarly, given only children info, you can deduce both parent and siblings info. Because of this, you'd probably want some consistency checks for a RedundantTree, but no matter.

From what I can tell (and I could always be wrong and this is a caveat for everything I say in these notes 😅), this package seems to favor the TopDownTree approach, but has some types defined to cater for when parent and sibling info is available.

Indexed or Regular?

For my application, I need to keep track of financial accounts for a general ledger. Before looking into tree structures or this package, my initial definition was something along the lines of

struct Account
    name::String
    number::String
    children::Dict{String,Account}
end

As I worked on my package, I started to feel like I was reinventing the wheel, which brought me to AbstractTrees.jl 👍

Digging through the code, I found there is an abstract type TreeKind with two concrete subtypes:

RegularTree <: TreeKind
IndexedTree <: TreeKind

Since I access children nodes via Account.number from a Dict, I guess you would say this type of tree is an IndexedTree with the index being the account number.

If you don't need to access specific nodes in the tree and just want the ability to quickly iterate over children, then you would use something like TopDownTree above where children is a vector of nodes. This type of tree is a RegularTree.

My first instinct, expressed in issue #23, was to have my Account be a subtype of IndexedTree, i.e. I wanted to write something like

struct Account <: IndexedTree
    name::String
    number::String
    children::Dict{String,Account}
end

This led me to spend some time looking at TreeKind. I spent a good chunk of time today making changes so that IndexedTree and RegularTree are both abstract (which breaks the method treekind).

I didn't understand the purpose of treekind at first, but as I write these notes a 💡 just came on.

I guess the intention of treekind is to specify whether my type is a RegularTree or an IndexedTree like

AbstractTrees.treekind(acct::Account) = IndexedTree()

Hmm 🤔 That is clever 😊

Anyway, since I have a high tolerance for pain (at least for the first two times), I started going through the code and replacing things like

https://github.com/Keno/AbstractTrees.jl/blob/master/src/AbstractTrees.jl#L120

tree != roottree && isa(treekind(roottree), IndexedTree) ?

with

tree != roottree && isa(roottree, IndexedTree) ?

in which case, treekind isn't even needed.

It seemed cleaner to me to have RegularTree and IndexedTree to be abstract, but I am feeling less confident about that the more I write 😅

Full Abstract

When I started writing these notes, I was going to suggest reducing the number of concrete types here and rely more on subtyping abstract types, but that would be a very breaking change and maybe not worth it. Especially now that I understand treekind. It comes down to the choice of either

struct Account <: IndexedTree
    name::String
    number::String
    children::Dict{String,Account}
end

or

struct Account
    name::String
    number::String
    children::Dict{String,Account}
end
AbstractTrees.treekind(acct::Account) = IndexedTree()

The former feels more natural to me, where the latter feels like we are implementing some dispatch functionality in the package itself. Maybe the latter is faster or more efficient somehow?

An illustrative example is nextsibling

https://github.com/Keno/AbstractTrees.jl/blob/master/src/AbstractTrees.jl#L401

nextsibling(node) = nextsibling(node, parentlinks(node), siblinglinks(node), treekind(node))

As mentioned above, TreeKind has two concrete subtypes and to make use of the packages functionality, I would define

AbstractTrees.treekind(acct::Account) = IndexedTree()

Simlarly, the abstract type ParentLinks has two concrete subtypes

StoredParents <: ParentLinks
ImplicitParents <: ParentLinks

and I would need to define

AbstractTrees.parentlinks(acct::Account) = ImplicitParents()

since Account does not store parent info explicitly.

In the same way, the abstract type SiblingLinks has two concrete subtypes

StoredSiblings <: SiblingLinks
ImplicitSiblings <: SiblingLinks

and I would need to define

AbstractTrees.siblinglinks(acct::Account) = ImplicitSiblings()

since Account does not store siblings info explicitly.

All together, it would look like

struct Account
    name::String
    number::String
    children::Dict{String,Account}
end
AbstractTrees.treekind(acct::Account) = IndexedTree()
AbstractTrees.parentlinks(acct::Account) = ImplicitParents()
AbstractTrees.siblinglinks(acct::Account) = ImplicitSiblings()

then we can dispatch on the concrete TreeKind, the concrete ParentLinks and the concrete SiblingLinks as in

nextsibling(node) = nextsibling(node, parentlinks(node), siblinglinks(node), treekind(node))

Now that I understand how it works, I think this is cool, but I still have to ask, is this better than defining abstract types instead and having your custom types be subtypes of those?

For example, the most common type of tree, I imagine would be

RegularTree()
ImplicitParents()
ImplicitSiblings()

so why not introduce a top-level abstract AbstractTree and just call this type of tree something like

abstract type RegularTopDownTree <: AbstractTree end

Then instead of

nextsibling(node,ImplicitParent(),ImplicitSibling(),RegularTree())

you could just have

nextsibling(node::RegularTopDownTree)

See what I mean?

Then, instead of

IndexedTree()
ImplicitParents()
ImplicitSiblings()

just call this something like

abstract type IndexedTopDownTree <: AbstractTree end

Then instead of

nextsibling(node,ImplicitParent(),ImplicitSibling(),IndexedTree())

you could just have

nextsibling(node::IndexedTopDownTree)

Then we need names for other combinations, but I don't think that would be too hard, e.g.

nextsibling(node::RegularParentTree) 
= nextsibling(node,StoredParents(),ImplicitSiblings(),RegularTree())

nextsibling(node::IndexedSiblingTree) 
= nextsibling(node,ImpliciyParents(),StoreSiblings(),IndexedTree())

nextsibling(node::RegularRedundantTree) 
= nextsibling(node,StoredParents(),StoredSiblings(),RegularTree())

etc.

Then we wouldn't need the concrete types [Stored|Implicit][Parents|Siblings]() or [Regular|Indexed]Tree() and I'd just need to define my type like

struct Account <: IndexedTopDownTree
    name::String
    number::String
    children::Dict{String,Account}
end

I kind of like this because the abstract type structure itself is a tree 😊

AbstractTree
- RegularTree
  - RegularTopDownTree
  - RegularParentTree
  - RegularSinglingTree
  - RegularRedundantTree
  - RegularBottomUpTree
- IndexedTree
  - IndexedTopDownTree
  - IndexedParentTree
  - IndexedSiblingTree
  - IndexedRedundantTree
  - IndexedBottomUpTree

Granted, this is a lot of abstract trees, but given the name of this package is "AbstractTrees", I'd think that is ok 😅

I'm very curious to hear if anyone has any thoughts on this. I am inclined to go ahead and make these changes, but they would be significant changes and I understand there is a chance it wouldn't get merged here. Obviously, I would be more motivated if I thought there was a chance 😅

Anyway, I accomplished one goal by writing these notes. I learned quite a bit 👍😊

The text was updated successfully, but these errors were encountered:

EricForgy · 2019-02-13T03:14:35Z

My learning continues...

Thanks to a helpful comment on Slack from @kleinschmidt and this video https://www.youtube.com/watch?v=j9w8oHfG1Ic, I now see one source of my ignorance is based on ignorance of "traits". All these singleton concrete types I was confused about are being used as traits and is intentional and a good thing. I still hope to understand a bit more about "Why?" but at least I have a good lead to follow now 😊

EricForgy · 2019-02-13T05:46:52Z

I think I'll close this, but maybe use some knowledge gained to write some docs (other than code comments) 😊

EricForgy closed this as completed Feb 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Notes #27

Some Notes #27

EricForgy commented Feb 12, 2019 •

edited

Loading

EricForgy commented Feb 13, 2019

EricForgy commented Feb 13, 2019

Some Notes #27

Some Notes #27

Comments

EricForgy commented Feb 12, 2019 • edited Loading

Children or Parents?

Indexed or Regular?

Full Abstract

EricForgy commented Feb 13, 2019

EricForgy commented Feb 13, 2019

EricForgy commented Feb 12, 2019 •

edited

Loading