Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #22

Merged
merged 32 commits into from
Sep 6, 2022
Merged

Documentation #22

merged 32 commits into from
Sep 6, 2022

Conversation

ffevotte
Copy link
Collaborator

The documentation effort made by @Vincent-Sivadon is still under way, but I'm creating this WIP PR so that we have a space to discuss issues related to it.

@ffevotte
Copy link
Collaborator Author

One issue that was brought up in earlier discussions comes from the use of a halo or zeros in the Gauss-Seidel example. IIUC, this is currently implemented by using a modified version of PseudoTiledMatrix that is aware of the existence of a halo, but in which only the "interior" of the matrix is tiled. This makes it possible to make the algorithm loops completely unaware of the existence of a halo. In particular, all tiles can assume that they have neighboring data in all directions.

OTOH, an other approach would consist in making the matrix itself unaware of the halo. The algorithm itself should therefore take care of iterating only on valid indices. In terms of documentation, this would have the added benefit that boundary tiles have less than 4 neighbors, so that the number of data dependencies is not always the same. It would be interesting to see whether this can be handled in a friendly way by DataFlowTasks and, if so, to use Gauss Seidel as an example to show how to.

A first idea that comes to mind to declare dependencies that don't always exist would be something along those lines:

@R (i>2 ? A[i-1,:] : nothing)

But I'm not sure whether this already works out of the box from a syntax viewpoint. Maybe it also requires implementing new methods like

memory_overlap(::Nothing, ::Any) = false

@codecov
Copy link

codecov bot commented Aug 16, 2022

Codecov Report

Merging #22 (4a51faa) into main (097ce4d) will decrease coverage by 1.76%.
The diff coverage is 46.42%.

@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
- Coverage   83.33%   81.56%   -1.77%     
==========================================
  Files          10       10              
  Lines         660      678      +18     
==========================================
+ Hits          550      553       +3     
- Misses        110      125      +15     
Impacted Files Coverage Δ
src/dag.jl 75.00% <0.00%> (-5.29%) ⬇️
src/plotdag.jl 81.81% <0.00%> (-3.90%) ⬇️
src/scheduler.jl 76.13% <ø> (ø)
src/plotgeneral.jl 89.92% <76.47%> (-2.71%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

README.md Outdated Show resolved Hide resolved
@maltezfaria
Copy link
Owner

One issue that was brought up in earlier discussions comes from the use of a halo or zeros in the Gauss-Seidel example. IIUC, this is currently implemented by using a modified version of PseudoTiledMatrix that is aware of the existence of a halo, but in which only the "interior" of the matrix is tiled. This makes it possible to make the algorithm loops completely unaware of the existence of a halo. In particular, all tiles can assume that they have neighboring data in all directions.

OTOH, an other approach would consist in making the matrix itself unaware of the halo. The algorithm itself should therefore take care of iterating only on valid indices. In terms of documentation, this would have the added benefit that boundary tiles have less than 4 neighbors, so that the number of data dependencies is not always the same. It would be interesting to see whether this can be handled in a friendly way by DataFlowTasks and, if so, to use Gauss Seidel as an example to show how to.

A first idea that comes to mind to declare dependencies that don't always exist would be something along those lines:

@R (i>2 ? A[i-1,:] : nothing)

But I'm not sure whether this already works out of the box from a syntax viewpoint. Maybe it also requires implementing new methods like

memory_overlap(::Nothing, ::Any) = false

I am lost here, can you guys give me a reference to what you are calling the Gauss-Seidel algorithm? I can't reconcile what I have in mind, which is the iterative method, to this five-point stencil description that I found on the docs 😕

@ffevotte
Copy link
Collaborator Author

I am lost here, can you guys give me a reference to what you are calling the Gauss-Seidel algorithm? I can't reconcile what I have in mind, which is the iterative method, to this five-point stencil description that I found on the docs confused

Yes, I couldn't agree more. I should have mentioned that this is also a question I asked Vincent yesterday. @Vincent-Sivadon seems to have found this terminology in a paper on dataflow parallelism that I haven't seen yet...

Copy link
Collaborator Author

@ffevotte ffevotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this write-up!

Here are my comments on the brand-new README.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated

We'll cover in details the usage and possibilities of the visualization in the documentation.

Note that the visualization tools are not loaded by default, it requires a `Makie` backend and/or `GraphViz` loaded in the REPL. It's meant to be used in development, so it won't pollute the environment you want to use DFT in.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! If/when #26 (or something like it) gets merged, we'll have to update the doc here accordingly.

README.md Outdated Show resolved Hide resolved
@ffevotte
Copy link
Collaborator Author

A global comment on this branch is: is there a reason why we need to have generated documentation files (such as build/*.html) checked in the git repo ?

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Vincent-Sivadon and others added 2 commits August 19, 2022 16:35
This allows testing whether all code in the README is runnable.

Literate.jl also brings collateral benefits, such as the automatic generation of
Jupyter notebooks.
@ffevotte
Copy link
Collaborator Author

Commit 9085534 introduces a way to automatically generate the README.md file from a Literate.jl julia script (located in docs/readme/README.jl). A collateral benefit is that the same contents are now also generated in the form of a Jupyter notebook (which can be seen directly from github or from nbviewer).

The new version of README.md has been generated this way, and we can now be sure that all code in it is runnable (which incidentally increases the number of test cases, although these are run during documentation generation and are therefore not accounted for in the coverage analysis).

I'll now go ahead and try to resolve all remaining issues with README.md, which should get us close to the final contents. Maybe a last review from y'all would be good though, in order to make sure we're on the same page in terms of how we want to present DataFlowTasks to new users.

More work is still needed to finalize the full Documenter docs, especially if we want to include all examples. But maybe we can reach pretty quickly a "presentable" state, where all features are documented. And from there we can add more tutorials/examples (maybe using DemoCards.jl or Literate.jl to more easily ensure they are runnable).

Copy link
Collaborator Author

@ffevotte ffevotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the README is now in a state where it can be reviewed by anyone willing to.

I've commented on the few issues that still need to be discussed collectively IMO.

# Let's look at a simple example:

using DataFlowTasks
DataFlowTasks.reset!() #hide
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolved questions regarding reset:

OTOH: if we don't want to have/advertize it, make sure removing it doesn't affect the outcome of the examples shown here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and if we want to keep reset(), we'll have to add tests for it.

Comment on lines +165 to +166
DataFlowTasks.reset!()
DataFlowTasks.enable_log()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these lines needed? Would resetlogger! be enough?

Comment on lines +180 to +182
err = norm(F.L*F.U-A,Inf)/max(norm(A),norm(F.L*F.U)) #hide
@show err #hide
@assert err < eps(Float64) #hide
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question: why not something simpler, like

err = norm(F.L*F.U - A) / norm(A)

Also: I arbitrarily put the tolerance to machine precision, but this might not be the right thing to do.

Examples have been restructured, but are not ready yet.
For now it is used only on Cholesky
@maltezfaria maltezfaria marked this pull request as ready for review September 6, 2022 07:50
@maltezfaria maltezfaria merged commit 55de4b7 into main Sep 6, 2022
@ffevotte ffevotte deleted the documentation branch September 20, 2022 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants