Documentation #22

ffevotte · 2022-08-16T14:51:47Z

The documentation effort made by @Vincent-Sivadon is still under way, but I'm creating this WIP PR so that we have a space to discuss issues related to it.

…owTasks.jl into documentation

ffevotte · 2022-08-16T15:04:11Z

One issue that was brought up in earlier discussions comes from the use of a halo or zeros in the Gauss-Seidel example. IIUC, this is currently implemented by using a modified version of PseudoTiledMatrix that is aware of the existence of a halo, but in which only the "interior" of the matrix is tiled. This makes it possible to make the algorithm loops completely unaware of the existence of a halo. In particular, all tiles can assume that they have neighboring data in all directions.

OTOH, an other approach would consist in making the matrix itself unaware of the halo. The algorithm itself should therefore take care of iterating only on valid indices. In terms of documentation, this would have the added benefit that boundary tiles have less than 4 neighbors, so that the number of data dependencies is not always the same. It would be interesting to see whether this can be handled in a friendly way by DataFlowTasks and, if so, to use Gauss Seidel as an example to show how to.

A first idea that comes to mind to declare dependencies that don't always exist would be something along those lines:

@R (i>2 ? A[i-1,:] : nothing)

But I'm not sure whether this already works out of the box from a syntax viewpoint. Maybe it also requires implementing new methods like

memory_overlap(::Nothing, ::Any) = false

codecov · 2022-08-16T15:39:16Z

Codecov Report

Merging #22 (4a51faa) into main (097ce4d) will decrease coverage by 1.76%.
The diff coverage is 46.42%.

@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
- Coverage   83.33%   81.56%   -1.77%     
==========================================
  Files          10       10              
  Lines         660      678      +18     
==========================================
+ Hits          550      553       +3     
- Misses        110      125      +15

Impacted Files	Coverage Δ
src/dag.jl	`75.00% <0.00%> (-5.29%)`	⬇️
src/plotdag.jl	`81.81% <0.00%> (-3.90%)`	⬇️
src/scheduler.jl	`76.13% <ø> (ø)`
src/plotgeneral.jl	`89.92% <76.47%> (-2.71%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

README.md

maltezfaria · 2022-08-17T10:37:41Z

One issue that was brought up in earlier discussions comes from the use of a halo or zeros in the Gauss-Seidel example. IIUC, this is currently implemented by using a modified version of PseudoTiledMatrix that is aware of the existence of a halo, but in which only the "interior" of the matrix is tiled. This makes it possible to make the algorithm loops completely unaware of the existence of a halo. In particular, all tiles can assume that they have neighboring data in all directions.

OTOH, an other approach would consist in making the matrix itself unaware of the halo. The algorithm itself should therefore take care of iterating only on valid indices. In terms of documentation, this would have the added benefit that boundary tiles have less than 4 neighbors, so that the number of data dependencies is not always the same. It would be interesting to see whether this can be handled in a friendly way by DataFlowTasks and, if so, to use Gauss Seidel as an example to show how to.

A first idea that comes to mind to declare dependencies that don't always exist would be something along those lines:
@R (i>2 ? A[i-1,:] : nothing)
But I'm not sure whether this already works out of the box from a syntax viewpoint. Maybe it also requires implementing new methods like
memory_overlap(::Nothing, ::Any) = false

I am lost here, can you guys give me a reference to what you are calling the Gauss-Seidel algorithm? I can't reconcile what I have in mind, which is the iterative method, to this five-point stencil description that I found on the docs 😕

ffevotte · 2022-08-17T11:34:00Z

I am lost here, can you guys give me a reference to what you are calling the Gauss-Seidel algorithm? I can't reconcile what I have in mind, which is the iterative method, to this five-point stencil description that I found on the docs confused

Yes, I couldn't agree more. I should have mentioned that this is also a question I asked Vincent yesterday. @Vincent-Sivadon seems to have found this terminology in a paper on dataflow parallelism that I haven't seen yet...

ffevotte

Thanks for this write-up!

Here are my comments on the brand-new README.

README.md

ffevotte · 2022-08-19T09:09:08Z

README.md

+
+We'll cover in details the usage and possibilities of the visualization in the documentation.
+
+Note that the visualization tools are not loaded by default, it requires a `Makie` backend and/or `GraphViz` loaded in the REPL. It's meant to be used in development, so it won't pollute the environment you want to use DFT in.


Yep! If/when #26 (or something like it) gets merged, we'll have to update the doc here accordingly.

README.md

ffevotte · 2022-08-19T10:37:30Z

A global comment on this branch is: is there a reason why we need to have generated documentation files (such as build/*.html) checked in the git repo ?

…owTasks.jl into documentation

README.md

This allows testing whether all code in the README is runnable. Literate.jl also brings collateral benefits, such as the automatic generation of Jupyter notebooks.

ffevotte · 2022-08-22T21:08:33Z

Commit 9085534 introduces a way to automatically generate the README.md file from a Literate.jl julia script (located in docs/readme/README.jl). A collateral benefit is that the same contents are now also generated in the form of a Jupyter notebook (which can be seen directly from github or from nbviewer).

The new version of README.md has been generated this way, and we can now be sure that all code in it is runnable (which incidentally increases the number of test cases, although these are run during documentation generation and are therefore not accounted for in the coverage analysis).

I'll now go ahead and try to resolve all remaining issues with README.md, which should get us close to the final contents. Maybe a last review from y'all would be good though, in order to make sure we're on the same page in terms of how we want to present DataFlowTasks to new users.

More work is still needed to finalize the full Documenter docs, especially if we want to include all examples. But maybe we can reach pretty quickly a "presentable" state, where all features are documented. And from there we can add more tutorials/examples (maybe using DemoCards.jl or Literate.jl to more easily ensure they are runnable).

ffevotte

I think the README is now in a state where it can be reviewed by anyone willing to.

I've commented on the few issues that still need to be discussed collectively IMO.

ffevotte · 2022-08-22T22:47:43Z

docs/readme/README.jl

+# Let's look at a simple example:
+
+using DataFlowTasks
+DataFlowTasks.reset!() #hide


Unresolved questions regarding reset:

do we want to keep it? (cf. DAG environnement not beeing cleaned #24)

even if so, do we want to advertize its use?

OTOH: if we don't want to have/advertize it, make sure removing it doesn't affect the outcome of the examples shown here.

Oh, and if we want to keep reset(), we'll have to add tests for it.

ffevotte · 2022-08-22T22:49:35Z

docs/readme/README.jl

+DataFlowTasks.reset!()
+DataFlowTasks.enable_log()


Are these lines needed? Would resetlogger! be enough?

ffevotte · 2022-08-22T22:52:08Z

docs/readme/README.jl

+err = norm(F.L*F.U-A,Inf)/max(norm(A),norm(F.L*F.U))       #hide
+@show err                                                  #hide
+@assert err < eps(Float64)                                 #hide


Naive question: why not something simpler, like

err = norm(F.L*F.U - A) / norm(A)

Also: I arbitrarily put the tolerance to machine precision, but this might not be the right thing to do.

Examples have been restructured, but are not ready yet.

For now it is used only on Cholesky

Vincent-Sivadon and others added 15 commits July 25, 2022 10:21

README

117adda

Benchmarking subfolder deleted (now in TiledFactorization)

981bdea

README updates

377d01b

README update

d9407be

LP proposal

28d205f

Exemple

f8c4aec

Merge branch 'documentation' of https://github.com/maltezfaria/DataFl…

b7e37b6

…owTasks.jl into documentation

BugFix (plot)

ef8e23d

Removed documentation build

7f45170

BugFix: Labels

96510da

Updates

fe41203

Updates

4dcbd17

New clean README.md

79fbf2f

No Labels BugFix

7f7232d

README update

35997c7

ffevotte assigned Vincent-Sivadon Aug 16, 2022

maltezfaria reviewed Aug 17, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

Vincent-Sivadon and others added 8 commits August 17, 2022 14:44

README executable

7757d62

Reset function

17c583a

README basic exemple changes

23c4e9e

PseudoTiledMatrix not in the README anymore

d6093ef

PseudoTiledMatrix not in the Cholesky example anymore

07128bc

PseudoTiledMatrix not in the Seidel example anymore

917e4c0

Simplified local documentation generation

7d90229

Fixed missing image path

f439ee7