async-traversal-paper

tuura · May 10, 2018 · 5d76c64 · 5d76c64
commit 5d76c64
Show file tree

Hide file tree

Showing 16 changed files with 2,298 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,27 @@
+
+# Created by https://www.gitignore.io/api/windows
+
+### Windows ###
+# Windows image file caches
+Thumbs.db
+ehthumbs.db
+
+# Folder config file
+Desktop.ini
+
+# Recycle Bin used on file shares
+$RECYCLE.BIN/
+
+# Windows Installer files
+*.cab
+*.msi
+*.msm
+*.msp
+
+# Windows shortcuts
+*.lnk
+
+build/
+project.sublime-workspace
+
+*.~*
diff --git a/colorpen.json b/colorpen.json
@@ -0,0 +1,7 @@
+{
+	"LaTeX Error: (.*)": "red bold",
+	"^!(.*)": "red bold",
+	"LaTeX Warning: (.*)": "yellow",
+	".*BibTeX.*": "magenta",
+	"Done.": "green bold"
+}
diff --git a/figures/fig_async.pdf b/figures/fig_async.pdf
diff --git a/figures/fig_mapping.pdf b/figures/fig_mapping.pdf
diff --git a/makefile b/makefile
@@ -0,0 +1,20 @@
+SRC = ./src
+BUILD = ./build
+
+main: paper.pdf
+
+paper.pdf: $(SRC)/*.tex
+	@ mkdir -p $(BUILD)
+	@ pdflatex \
+		-interaction=nonstopmode \
+		-output-directory=$(BUILD) \
+		-include-directory=src \
+		$(SRC)/paper.tex paper.pdf
+
+	@ cp $(SRC)/*.bib $(BUILD)
+	@ cd $(BUILD); bibtex paper.aux
+	@ rm $(BUILD)/*.bib
+
+clean:
+	rm $(BUILD)/*
+	rmdir $(BUILD)
diff --git a/project.sublime-project b/project.sublime-project
@@ -0,0 +1,18 @@
+{
+    "folders": [
+        {
+            "path": ".",
+            "folder_exclude_patterns": [
+                "build",
+                "figure-sources",
+                "figures",
+                "old",
+                "visio"
+            ],
+            "file_exclude_patterns": [
+                "*.xlsx"
+            ]
+        }
+    ],
+    "settings": {}
+}
diff --git a/src/abstract.tex b/src/abstract.tex
@@ -0,0 +1,19 @@
+\begin{abstract}
+
+The availability of detailed protein-protein interaction networks or
+``interactomes'' has made it possible to exploit network analysis techniques to
+discover better drugs in faster and more efficient ways than ever before.
+e-Therapeutics has developed a practical, in silico, approach to drug
+discovery based on the construction and analysis of network representations of
+disease mechanisms. Disease network construction and analysis is based on the
+human interactome, a network of currently about 19K nodes linked by over 0.5M
+interactions. Traversal operations on such a network are expensive and can
+benefit from custom hardware acceleration. In this paper we outline two
+approaches: a synchronous FPGA-based accelerator, which is very simple and
+fast but limited to networks of thousands of proteins and 100s of thousands of
+interactions, and an asynchronous alternative, which is more scalable and can
+cope with networks comprising millions of nodes, but requires much more
+sophisticated graph traversal algorithms. We present our current solution as a
+challenge for the community: can you help us make it simpler?
+
+\end{abstract}
diff --git a/src/authors.tex b/src/authors.tex
@@ -0,0 +1,33 @@
+% lineno
+\ifdefined \blindreview
+
+	\author{
+		\IEEEauthorblockN{
+			\hspace{1cm}
+		}
+		\IEEEauthorblockA{
+			\\
+		}
+		\\
+	}
+
+\else
+
+
+	\author{
+		\IEEEauthorblockN{
+			Ghaith Tarawneh\textsuperscript{1},
+			Alessandro de Gennaro\textsuperscript{1},
+			Jonny Wray\textsuperscript{2},
+			Andrey Mokhov\textsuperscript{1} and
+			Alex Yakovlev\textsuperscript{1}
+		}
+		\IEEEauthorblockA{
+			\textsuperscript{1}School of Engineering, Newcastle University, UK
+			\\
+			\textsuperscript{2}e-Therapeutics, UK
+		}
+	}
+
+
+\fi
diff --git a/src/bibliography.bib b/src/bibliography.bib
@@ -0,0 +1,26 @@
+@INPROCEEDINGS{fdl2017,
+  author    ={Mokhov, Andrey and de Gennaro, Alessandro and Tarawneh, Ghaith and Wray, Jonny and Lukyanov, Georgy and Mileiko, Sergey and Scott, Joe and Yakovlev, Alex and Brown, Andrew},
+  booktitle ={{Forum on specification & Design Languages (FDL)}},
+  title     ={{Language and Hardware Acceleration Backend for Graph Processing}},
+  year      ={2017},
+  volume    ={},
+  number    ={},
+  ISSN      ={},
+}
+
+@INPROCEEDINGS{parco2017,
+  author    ={Andrew Brown and David Thomas and Jeff Reeve and Ghaith Tarawneh and Alessandro de Gennaro and Andrey Mokhov and Matthew Naylor and Tom Kazmierski},
+  booktitle ={{ParCo Conference}},
+  title     ={{Distributed Event-based Computing}},
+  year      ={2017},
+  volume    ={},
+  number    ={},
+  ISSN      ={},
+}
+
+@book{lynch1996distributed,
+  title={Distributed algorithms},
+  author={Lynch, Nancy A},
+  year={1996},
+  publisher={Elsevier}
+}
diff --git a/src/defs.tex b/src/defs.tex
@@ -0,0 +1,8 @@
+% \def \blindreview{} % Hide author information
+
+\hyphenation{meta-stability}
+\hyphenation{synchro-nous}
+\hyphenation{veri-fication}
+\hyphenation{hand-shake}
+
+\newcommand{\mpsat}{\textsc{Mpsat}}
diff --git a/src/fig_async.tex b/src/fig_async.tex
@@ -0,0 +1,18 @@
+% fig_async
+
+\begin{figure}[!t]
+\begin{center}
+
+\vspace{2.3cm}
+
+\includegraphics[width=8.8cm]{figures/fig_async}
+
+\caption{An example to demonstrate the challenge with asynchronous network
+traversal. How do we determine the minimum distance between nodes $A$ and $B$
+if nodes communicate with their neighbours using asynchronous mechanisms and
+the ordering of operations cannot be constrained?}
+
+\label{fig_async}
+\end{center}
+
+\end{figure}
diff --git a/src/fig_mapping.tex b/src/fig_mapping.tex
@@ -0,0 +1,17 @@
+% fig_mapping
+
+\begin{figure}[!t]
+\begin{center}
+
+\includegraphics[width=8.8cm]{figures/fig_mapping}
+
+\caption{
+Mapping a network to hardware. Nodes are implemented as flip-flops, and edges as combinational paths.
+Traversal is performed by propagating a logic high status between flip-flops, starting from a root node,
+and completion is detected by monitoring the status of all nodes.
+}
+
+\label{fig_mapping}
+\end{center}
+
+\end{figure}
diff --git a/src/packages.tex b/src/packages.tex
@@ -0,0 +1,7 @@
+\usepackage[noadjust]{cite}
+\usepackage[pdfborder={0 0 0},bookmarks=false]{hyperref}
+\usepackage[pdftex]{graphicx}
+\usepackage[T1]{fontenc}
+\usepackage{cleveref}
+\usepackage{inconsolata}
+\usepackage[listings,skins]{tcolorbox}
diff --git a/src/paper.tex b/src/paper.tex
@@ -0,0 +1,110 @@
+\documentclass[conference]{IEEEtran}
+
+\input{packages}
+\input{defs}
+\title{Asynchronous Network Traversal for Computational Drug Discovery}
+\input{authors}
+
+\begin{document}
+\maketitle
+\input{abstract}
+
+\vspace{0.2cm}
+
+\section{Synchronous Traversal}
+\label{sec_sync}
+
+In the first approach, we use individual flip-flops to represent network nodes
+and connect these with combinational paths to form edges (\Cref{fig_mapping}).
+Initially, all flip-flops have a logic low value, indicating that none of the
+nodes have been visited. We~simulate network traversal by propagating a logic
+high value between flip-flops, corresponding to the propagation of a
+\emph{visited} state. Nodes at a distance $k$ are visited on the
+$k$\textsuperscript{th} cycle, and the process completes after $D$ cycles
+(where $D$ is network diameter). Here, we use OR gates to consolidate multiple
+inputs (edges) and compute the next \emph{visited} state of each node.
+A~global reset signal clears all flip-flops at start, after which we set one
+flip-flop to begin traversal. To detect completion, we read the state of all
+flip-flops and determine when all have been visited.
+
+This approach maps very naturally to FPGAs and provides significant speedups
+compared to software-based breadth-first search \cite{fdl2017}. However, many
+network science applications (including computational drug discovery) use
+small-world networks, characterized by their high connectivity. The abundance
+of edges exhausts interconnect density on planar FPGA devices quickly, setting
+an upper limit on the scale of networks that can be implemented.
+
+\section{Challenges of Asynchrony}
+
+Asynchronous mechanisms represent the only scalable scheme for implementing
+and traversing large-scale small-world networks in hardware. This scalability
+comes at the cost of extra complexity, however, since parallel (asynchronous)
+traversal operations across the network are not synchronized to a common
+reference and may proceed in random order.
+
+To illustrate why asynchronous network traversal is more challenging, we will
+refer to the network in \Cref{fig_async}. Here, we assume that network nodes
+communicate their \emph{visited} state by exchanging asynchronous messages,
+and that message delivery can occur in any order. Now, consider the problem of
+determining the minimum distance between nodes $A$ and $B$. We can do this by
+sending messages containing a~distance attribute $d$ between nodes, starting
+with $A$, and incrementing this on each hop. Since message delivery can occur
+in any order, it is possible that the chain of messages $A \rightarrow D
+\rightarrow E \rightarrow B$ is delivered before $A \rightarrow B$. If so,
+node $B$ will update its distance to $A$ twice, and possibly more if more
+paths between $A$ and $B$ exist in a larger network. The question is: how does
+$B$ determine when there are no further updates?
+
+The example above highlights one problem that emerges due to asynchrony:
+completion detection. Due to the lack of global state visibility, completion
+detection is not as straightforward compared to the synchronous solution in
+\Cref{sec_sync}. How can we perform asynchronous network traversal then? One
+way is to implement a softer form of local synchronization between neighboring
+nodes. We discuss one way to do this below.
+
+\input{fig_mapping}
+
+\section{Asynchronous Traversal}
+\label{sec_async}
+
+The proposed algorithm is from \cite{lynch1996distributed} and relies on
+traversing the network multiple times with increasing depth. Each round
+consists of a forward propagation of \emph{req} messages, starting from a root
+node. Messages contain a hoplimit integer $h$ which is decremented on each
+jump. When $h$ reaches zero, the node sends back an \emph{ack} message which
+is then collated with others and funneled back to the root node. The round is
+complete when the root receives \emph{ack} messages from all its neighbors.
+The root then starts the next round by sending \emph{req} messages to its
+neighbors, this time with an incremented hoplimit.
+
+In the above approach, we enforce some notion of ordering because traversal
+operations across the network are guaranteed to belong to the same round. We
+are therefore able to detect the termination of computations at network
+extremities and synchronize this back to the root. We can sum the number of
+discovered nodes and relay this back via \emph{ack} messages. This provides a
+method to detect completion: we terminate the algorithm after a round has been
+completed and no new nodes were discovered.
+
+\section{Future Work}
+
+Network traversal presents an interesting problem where the conventional
+benefits of asynchrony appear to come at the cost of algorithmic complexity.
+Massively-parallel architectures attempting to solve such problems must rely
+on asynchronous communication mechanisms for device scalability purposes
+\cite{parco2017}, but may have to resort to softer forms of synchronization
+for algorithmic reasons. Is this trade-off an intrinsic feature of problems
+such as network traversal? If so, are there other (asynchronous) solutions
+that are more optimal that one proposed \Cref{sec_async}.
+
+\newpage
+
+\input{fig_async}
+
+\section*{Acknowledgments}
+
+This work was supported by EPSRC grant EP/N031768/1 (project POETS).
+
+\bibliographystyle{IEEEtran}
+\bibliography{bibliography}
+
+\end{document}