Skip to content

Commit

Permalink
[CIVIS-7472] DOC add technical notes (#9)
Browse files Browse the repository at this point in the history
  • Loading branch information
jacksonlee-civis authored Nov 30, 2023
1 parent b127a6d commit b1bf2e5
Show file tree
Hide file tree
Showing 25 changed files with 1,187 additions and 183 deletions.
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
type: string
docker:
# Pick the highest Python 3.x version that this package is known to support
- image: cimg/python:3.11
- image: cimg/python:3.12
steps:
- checkout
- run:
Expand Down Expand Up @@ -96,7 +96,7 @@ workflows:
- bandit
matrix:
parameters:
python-version: ["3.10", "3.11"]
python-version: ["3.10", "3.11", "3.12"]
- build-python-win:
requires:
- flake8
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Security

## [1.4.0] - 2023-11-30

### Added
- Python 3.12 is officially supported and tested on CI.
- Added a technical notes page to the Sphinx documentation.

## [1.3.0] - 2023-06-20

### Added
Expand Down
2 changes: 1 addition & 1 deletion docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: b3dd4ed71bb2b56b4ce0349a56974cef
config: 82a13e1d440b1fccb9c82f453fa605a1
tags: 645f666f9bcd5a90fca523b33c5a78b7
13 changes: 4 additions & 9 deletions docs/_sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,17 +70,11 @@ Download and Install
Usage
-----

Start with :ref:`quickstart`, and then get inspired by :ref:`more_examples`.
Start with :ref:`quickstart`.
To better understand how the library works, see :ref:`technical`.
Then get inspired by and see what features are available from :ref:`more_examples`.
Don't forget to check out the :ref:`api` as well.

Under the Hood
--------------

``async-graph-data-flow`` chains asynchronous functions together
with a :class:`~asyncio.Queue` instance between two functions in the graph.
A queue keeps track of the data items yielded from a source node and feeds them
into its destination node.

License
-------

Expand All @@ -104,5 +98,6 @@ Table of Contents
:maxdepth: 2

quickstart
technical
more_examples
api
237 changes: 237 additions & 0 deletions docs/_sources/technical.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
.. _technical:

Technical Notes
===============

Functions Arranged as a Graph
-----------------------------

The ``async-graph-data-flow`` package provides a framework for arranging
functions in a directed acyclic graph (DAG).

.. mermaid::

flowchart LR

A[func1] --> B[func2]
B --> C[func3]
B --> D[func4]
C --> E[func5]
D --> E

Each function must be an asynchronous generator function,
which means it is defined with the ``async`` keyword
and yields something.

.. code-block:: python
async def func1(arg1, arg2):
...
yield output
For two nodes to be connected correctly in terms of function definition,
the source node's function must yield what the destination node's function
expects as input
(for more information,
see the ``unpack_input`` parameter of :func:`~async_graph_data_flow.AsyncGraph.add_node`).

.. mermaid::

flowchart LR

START[ ] -.-> A
B -.-> STOP[ ]
A["async def func2(...):\n&nbsp;&nbsp;&nbsp;&nbsp;...\n&nbsp;&nbsp;&nbsp;&nbsp;yield <strong>foo, bar</strong>"] --> B["async def func3(<strong>foo, bar</strong>):\n&nbsp;&nbsp;&nbsp;&nbsp;...\n&nbsp;&nbsp;&nbsp;&nbsp;yield ..."]
style A text-align:left
style B text-align:left
style START fill-opacity:0, stroke-opacity:0;
style STOP fill-opacity:0, stroke-opacity:0;

Tasks and Queues
----------------

During runtime, each node's function is run concurrently
as one or more :class:`tasks<asyncio.Task>` in the event loop.
The number of tasks for a given node is controlled by
the ``max_tasks`` parameter that can be set at :func:`~async_graph_data_flow.AsyncGraph.add_node`.

A node is associated with a :class:`Queue<asyncio.Queue>` instance
responsible for providing the items to the tasks of the node.
The queue receives its items as the source nodes yield them.
The maximum number of items a queue can hold is specified by ``queue_size``
at :func:`~async_graph_data_flow.AsyncGraph.add_node`.
The queue is first-in-first-out, which means that
it keeps track of the items yielded from the tasks of the source nodes
and feeds them one by one in the order by which the queue has received them.
An item leaves a queue when a task of the destination node becomes
available to process it.

.. mermaid::

flowchart LR

start1[ ] -.- queue1(("&nbsp;&nbsp;"))
subgraph box1 [ ]
queue1 --> node1["&nbsp;&nbsp;&nbsp;&nbsp;"]
end
style start1 fill-opacity:0, stroke-opacity:0;

start2[ ] -.- queue2(("&nbsp;&nbsp;"))
subgraph box2 [ ]
queue2 --> node2["&nbsp;&nbsp;&nbsp;&nbsp;"]
end
style start2 fill-opacity:0, stroke-opacity:0;

subgraph node and its associated queue
queue3((queue)) --> node3[task 1, task 2,\ntask 3, ...]
end

node1 --> |yields\nitems| queue3
node2 --> |yields\nitems| queue3
node3 -.-> |yields\nitems| STOP[ ]
style STOP fill-opacity:0, stroke-opacity:0;

Example
-------

Let's check out a sample script using async-graph-data-flow and processing actual data
that brings together some of the components discussed above.
The example below pulls data from `Open Brewery DB <https://www.openbrewerydb.org/>`_
into a local CSV file.


.. code-block:: python
# This Python script was tested with Python 3.11.
# Apart from async-graph-data-flow, it requires several other third-party dependencies,
# which can be installed by `pip install aiocsv aiofile aiohttp`.
import aiocsv
import aiofile
import aiohttp
from async_graph_data_flow import AsyncGraph, AsyncExecutor
# API doc: https://www.openbrewerydb.org/documentation
URL = "https://api.openbrewerydb.org/v1/breweries"
CSV_HEADER = [
"id",
"name",
"brewery_type",
"address_1",
"address_2",
"address_3",
"city",
"state_province",
"postal_code",
"country",
"longitude",
"latitude",
"phone",
"website_url",
"state",
"street",
]
OUTPUT_FILENAME = "breweries_us_async.csv"
has_written_csv_header = False
async def get_open_brewery_data():
page = 1
async with aiohttp.ClientSession() as session:
while True:
params = {
"by_country": "United States",
"page": page,
"per_page": 200,
}
async with session.get(URL, params=params) as response:
response.raise_for_status()
data = await response.json()
if not data:
break
else:
yield data
page += 1
async def write_to_csv(data: list[dict[str, str]]):
global has_written_csv_header
async with aiofile.async_open(OUTPUT_FILENAME, mode="a", encoding="utf8") as f:
csv_writer = aiocsv.AsyncDictWriter(f, CSV_HEADER)
if not has_written_csv_header:
await csv_writer.writeheader()
has_written_csv_header = True
await csv_writer.writerows(data)
yield
def main():
graph = AsyncGraph()
graph.add_node(get_open_brewery_data)
graph.add_node(write_to_csv)
graph.add_edge(get_open_brewery_data, write_to_csv)
executor = AsyncExecutor(graph)
executor.execute()
print("data downloaded:", OUTPUT_FILENAME)
if __name__ == "__main__":
main()
In this code, ``main()`` defines a graph and executes it.
The graph has two connected nodes.
The source node, with the asynchronous generator function ``get_open_brewery_data()``,
yields items to the destination node with ``write_to_csv()``:

.. mermaid::

flowchart LR

A[get_open_brewery_data] --> B[write_to_csv]

For the source node,
the following shows an abridged version of ``get_open_brewery_data()``
to highlight what the function yields:

.. code-block:: python
async def get_open_brewery_data():
page = 1
...
while True:
params = {"page": page, ...}
...
yield data
page += 1
As the data from Open Brewery DB is paginated from its API,
``get_open_brewery_data()`` makes an API call for one page worth of data,
yields this data to the destination node (``write_to_csv()``),
repeats this process, and stops once all pages of data have been retrieved.

The destination node with ``write_to_csv()`` has its associated queue provide
inputs from the items yielded by ``get_open_brewery_data()``.


.. mermaid::

flowchart LR

Q(("Queue items:\n[{'col1': 'val1', ...}, ...]\n[{'col1': 'val1', ...}, ...]\n...\n"))
A[get_open_brewery_data]
B[write_to_csv]
A --> |yields\nitems| Q
Q --> B

``get_open_brewery_data()`` yields a page of the Open Brewery DB data,
which is a list of records where each record is a dictionary of column names
mapped to values. The function signature of ``write_to_csv()`` expects exactly
such a list of dictionaries:

.. code-block:: python
async def write_to_csv(data: list[dict[str, str]]):
...
22 changes: 22 additions & 0 deletions docs/_static/basic.css
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,10 @@ a.headerlink {
visibility: hidden;
}

a:visited {
color: #551A8B;
}

h1:hover > a.headerlink,
h2:hover > a.headerlink,
h3:hover > a.headerlink,
Expand Down Expand Up @@ -670,6 +674,16 @@ dd {
margin-left: 30px;
}

.sig dd {
margin-top: 0px;
margin-bottom: 0px;
}

.sig dl {
margin-top: 0px;
margin-bottom: 0px;
}

dl > dd:last-child,
dl > dd:last-child > :last-child {
margin-bottom: 0;
Expand Down Expand Up @@ -738,6 +752,14 @@ abbr, acronym {
cursor: help;
}

.translated {
background-color: rgba(207, 255, 207, 0.2)
}

.untranslated {
background-color: rgba(255, 207, 207, 0.2)
}

/* -- code displays --------------------------------------------------------- */

pre {
Expand Down
3 changes: 1 addition & 2 deletions docs/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
const DOCUMENTATION_OPTIONS = {
VERSION: '',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
Expand Down
7 changes: 5 additions & 2 deletions docs/_static/pygments.css
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
.highlight .cs { color: #8f5902; font-style: italic } /* Comment.Special */
.highlight .gd { color: #a40000 } /* Generic.Deleted */
.highlight .ge { color: #000000; font-style: italic } /* Generic.Emph */
.highlight .ges { color: #000000; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
.highlight .gr { color: #ef2929 } /* Generic.Error */
.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
.highlight .gi { color: #00A000 } /* Generic.Inserted */
Expand Down Expand Up @@ -101,12 +102,13 @@ body[data-theme="dark"] .highlight .x { color: #d0d0d0 } /* Other */
body[data-theme="dark"] .highlight .p { color: #d0d0d0 } /* Punctuation */
body[data-theme="dark"] .highlight .ch { color: #ababab; font-style: italic } /* Comment.Hashbang */
body[data-theme="dark"] .highlight .cm { color: #ababab; font-style: italic } /* Comment.Multiline */
body[data-theme="dark"] .highlight .cp { color: #cd2828; font-weight: bold } /* Comment.Preproc */
body[data-theme="dark"] .highlight .cp { color: #ff3a3a; font-weight: bold } /* Comment.Preproc */
body[data-theme="dark"] .highlight .cpf { color: #ababab; font-style: italic } /* Comment.PreprocFile */
body[data-theme="dark"] .highlight .c1 { color: #ababab; font-style: italic } /* Comment.Single */
body[data-theme="dark"] .highlight .cs { color: #e50808; font-weight: bold; background-color: #520000 } /* Comment.Special */
body[data-theme="dark"] .highlight .gd { color: #d22323 } /* Generic.Deleted */
body[data-theme="dark"] .highlight .ge { color: #d0d0d0; font-style: italic } /* Generic.Emph */
body[data-theme="dark"] .highlight .ges { color: #d0d0d0; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
body[data-theme="dark"] .highlight .gr { color: #d22323 } /* Generic.Error */
body[data-theme="dark"] .highlight .gh { color: #ffffff; font-weight: bold } /* Generic.Heading */
body[data-theme="dark"] .highlight .gi { color: #589819 } /* Generic.Inserted */
Expand Down Expand Up @@ -186,12 +188,13 @@ body:not([data-theme="light"]) .highlight .x { color: #d0d0d0 } /* Other */
body:not([data-theme="light"]) .highlight .p { color: #d0d0d0 } /* Punctuation */
body:not([data-theme="light"]) .highlight .ch { color: #ababab; font-style: italic } /* Comment.Hashbang */
body:not([data-theme="light"]) .highlight .cm { color: #ababab; font-style: italic } /* Comment.Multiline */
body:not([data-theme="light"]) .highlight .cp { color: #cd2828; font-weight: bold } /* Comment.Preproc */
body:not([data-theme="light"]) .highlight .cp { color: #ff3a3a; font-weight: bold } /* Comment.Preproc */
body:not([data-theme="light"]) .highlight .cpf { color: #ababab; font-style: italic } /* Comment.PreprocFile */
body:not([data-theme="light"]) .highlight .c1 { color: #ababab; font-style: italic } /* Comment.Single */
body:not([data-theme="light"]) .highlight .cs { color: #e50808; font-weight: bold; background-color: #520000 } /* Comment.Special */
body:not([data-theme="light"]) .highlight .gd { color: #d22323 } /* Generic.Deleted */
body:not([data-theme="light"]) .highlight .ge { color: #d0d0d0; font-style: italic } /* Generic.Emph */
body:not([data-theme="light"]) .highlight .ges { color: #d0d0d0; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
body:not([data-theme="light"]) .highlight .gr { color: #d22323 } /* Generic.Error */
body:not([data-theme="light"]) .highlight .gh { color: #ffffff; font-weight: bold } /* Generic.Heading */
body:not([data-theme="light"]) .highlight .gi { color: #589819 } /* Generic.Inserted */
Expand Down
Loading

0 comments on commit b1bf2e5

Please sign in to comment.