Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Notes #27

Closed
EricForgy opened this issue Feb 12, 2019 · 2 comments
Closed

Some Notes #27

EricForgy opened this issue Feb 12, 2019 · 2 comments

Comments

@EricForgy
Copy link
Contributor

EricForgy commented Feb 12, 2019

Hi there,

I really like this package and see potential in many applications (and finance being the one on my mind at the moment) ❤️

In fact, the following registered packages have a dependency on AbstractTrees.jl:

  • BioMedQuery.jl
  • BioServices.jl
  • Cascadia.jl
  • D3Trees.jl
  • DataDepsGenerators.jl
  • DiffEqBayes.jl
  • DiffEqFlux.jl
  • DiffEqSensitivity.jl
  • ExprOptimization.jl
  • ExprRules.jl
  • Flux.jl
  • Gumbo.jl
  • Metalhead.jl
  • ONNX.jl
  • Omega.jl
  • PhysOcean.jl
  • ReinforcementLearning.jl
  • TextAnalysis.jl
  • Turing.jl

Thank you @Nosferican for the list 😊

Looking through a few of these packages, I see most of them just use children and printnode. I'd be curious to find a package that makes deeper use of some of the other functionality of this package. Any suggestions?

The purpose of these notes is to try to help me understand how things work and maybe to suggest some alternate ways forward (and probably learn a lot in the process as I discover my errant ways).

Children or Parents?

I can think of two ways to define a tree structure. It seems the most common way is top down, i.e.

struct TopDownTree
    name::String
    children::Vector{TopDownTree}
end

But, depending on the application, I could see the use for a bottom-up approach, i.e.

struct BottomUpTree
    name::String
    parent::BottomUpTree
end

I think either of the above are sufficient to uniquely define a tree. However, we could be redundant and define something like

struct RedundantTree
    name::String
    parent::RedundantTree
    children::Vector{RedundantTree}
    siblings::Vector{RedundantTree}
end

This is redundant (assuming each node has a single parent which I think is safe for this package) because given only parent info, you can deduce both children and siblings info. Similarly, given only children info, you can deduce both parent and siblings info. Because of this, you'd probably want some consistency checks for a RedundantTree, but no matter.

From what I can tell (and I could always be wrong and this is a caveat for everything I say in these notes 😅), this package seems to favor the TopDownTree approach, but has some types defined to cater for when parent and sibling info is available.

Indexed or Regular?

For my application, I need to keep track of financial accounts for a general ledger. Before looking into tree structures or this package, my initial definition was something along the lines of

struct Account
    name::String
    number::String
    children::Dict{String,Account}
end

As I worked on my package, I started to feel like I was reinventing the wheel, which brought me to AbstractTrees.jl 👍

Digging through the code, I found there is an abstract type TreeKind with two concrete subtypes:

  1. RegularTree <: TreeKind
  2. IndexedTree <: TreeKind

Since I access children nodes via Account.number from a Dict, I guess you would say this type of tree is an IndexedTree with the index being the account number.

If you don't need to access specific nodes in the tree and just want the ability to quickly iterate over children, then you would use something like TopDownTree above where children is a vector of nodes. This type of tree is a RegularTree.

My first instinct, expressed in issue #23, was to have my Account be a subtype of IndexedTree, i.e. I wanted to write something like

struct Account <: IndexedTree
    name::String
    number::String
    children::Dict{String,Account}
end

This led me to spend some time looking at TreeKind. I spent a good chunk of time today making changes so that IndexedTree and RegularTree are both abstract (which breaks the method treekind).

I didn't understand the purpose of treekind at first, but as I write these notes a 💡 just came on.

I guess the intention of treekind is to specify whether my type is a RegularTree or an IndexedTree like

AbstractTrees.treekind(acct::Account) = IndexedTree()

Hmm 🤔 That is clever 😊

Anyway, since I have a high tolerance for pain (at least for the first two times), I started going through the code and replacing things like

https://github.com/Keno/AbstractTrees.jl/blob/master/src/AbstractTrees.jl#L120

tree != roottree && isa(treekind(roottree), IndexedTree) ?

with

tree != roottree && isa(roottree, IndexedTree) ?

in which case, treekind isn't even needed.

It seemed cleaner to me to have RegularTree and IndexedTree to be abstract, but I am feeling less confident about that the more I write 😅

Full Abstract

When I started writing these notes, I was going to suggest reducing the number of concrete types here and rely more on subtyping abstract types, but that would be a very breaking change and maybe not worth it. Especially now that I understand treekind. It comes down to the choice of either

struct Account <: IndexedTree
    name::String
    number::String
    children::Dict{String,Account}
end

or

struct Account
    name::String
    number::String
    children::Dict{String,Account}
end
AbstractTrees.treekind(acct::Account) = IndexedTree()

The former feels more natural to me, where the latter feels like we are implementing some dispatch functionality in the package itself. Maybe the latter is faster or more efficient somehow?

An illustrative example is nextsibling

https://github.com/Keno/AbstractTrees.jl/blob/master/src/AbstractTrees.jl#L401

nextsibling(node) = nextsibling(node, parentlinks(node), siblinglinks(node), treekind(node))

As mentioned above, TreeKind has two concrete subtypes and to make use of the packages functionality, I would define

AbstractTrees.treekind(acct::Account) = IndexedTree()

Simlarly, the abstract type ParentLinks has two concrete subtypes

  1. StoredParents <: ParentLinks
  2. ImplicitParents <: ParentLinks

and I would need to define

AbstractTrees.parentlinks(acct::Account) = ImplicitParents()

since Account does not store parent info explicitly.

In the same way, the abstract type SiblingLinks has two concrete subtypes

  1. StoredSiblings <: SiblingLinks
  2. ImplicitSiblings <: SiblingLinks

and I would need to define

AbstractTrees.siblinglinks(acct::Account) = ImplicitSiblings()

since Account does not store siblings info explicitly.

All together, it would look like

struct Account
    name::String
    number::String
    children::Dict{String,Account}
end
AbstractTrees.treekind(acct::Account) = IndexedTree()
AbstractTrees.parentlinks(acct::Account) = ImplicitParents()
AbstractTrees.siblinglinks(acct::Account) = ImplicitSiblings()

then we can dispatch on the concrete TreeKind, the concrete ParentLinks and the concrete SiblingLinks as in

nextsibling(node) = nextsibling(node, parentlinks(node), siblinglinks(node), treekind(node))

Now that I understand how it works, I think this is cool, but I still have to ask, is this better than defining abstract types instead and having your custom types be subtypes of those?

For example, the most common type of tree, I imagine would be

  • RegularTree()
  • ImplicitParents()
  • ImplicitSiblings()

so why not introduce a top-level abstract AbstractTree and just call this type of tree something like

abstract type RegularTopDownTree <: AbstractTree end

Then instead of

nextsibling(node,ImplicitParent(),ImplicitSibling(),RegularTree())

you could just have

nextsibling(node::RegularTopDownTree)

See what I mean?

Then, instead of

  • IndexedTree()
  • ImplicitParents()
  • ImplicitSiblings()

just call this something like

abstract type IndexedTopDownTree <: AbstractTree end

Then instead of

nextsibling(node,ImplicitParent(),ImplicitSibling(),IndexedTree())

you could just have

nextsibling(node::IndexedTopDownTree)

Then we need names for other combinations, but I don't think that would be too hard, e.g.

nextsibling(node::RegularParentTree) 
= nextsibling(node,StoredParents(),ImplicitSiblings(),RegularTree())
nextsibling(node::IndexedSiblingTree) 
= nextsibling(node,ImpliciyParents(),StoreSiblings(),IndexedTree())
nextsibling(node::RegularRedundantTree) 
= nextsibling(node,StoredParents(),StoredSiblings(),RegularTree())

etc.

Then we wouldn't need the concrete types [Stored|Implicit][Parents|Siblings]() or [Regular|Indexed]Tree() and I'd just need to define my type like

struct Account <: IndexedTopDownTree
    name::String
    number::String
    children::Dict{String,Account}
end

I kind of like this because the abstract type structure itself is a tree 😊

AbstractTree
- RegularTree
  - RegularTopDownTree
  - RegularParentTree
  - RegularSinglingTree
  - RegularRedundantTree
  - RegularBottomUpTree
- IndexedTree
  - IndexedTopDownTree
  - IndexedParentTree
  - IndexedSiblingTree
  - IndexedRedundantTree
  - IndexedBottomUpTree

Granted, this is a lot of abstract trees, but given the name of this package is "AbstractTrees", I'd think that is ok 😅

I'm very curious to hear if anyone has any thoughts on this. I am inclined to go ahead and make these changes, but they would be significant changes and I understand there is a chance it wouldn't get merged here. Obviously, I would be more motivated if I thought there was a chance 😅

Anyway, I accomplished one goal by writing these notes. I learned quite a bit 👍😊

@EricForgy
Copy link
Contributor Author

My learning continues...

Thanks to a helpful comment on Slack from @kleinschmidt and this video https://www.youtube.com/watch?v=j9w8oHfG1Ic, I now see one source of my ignorance is based on ignorance of "traits". All these singleton concrete types I was confused about are being used as traits and is intentional and a good thing. I still hope to understand a bit more about "Why?" but at least I have a good lead to follow now 😊

@EricForgy
Copy link
Contributor Author

I think I'll close this, but maybe use some knowledge gained to write some docs (other than code comments) 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant