Skip to content

Commit

Permalink
docs: readthedocs (#99)
Browse files Browse the repository at this point in the history
* docs: Contributor Guidelines
* fix: docstrings format for sphinx build
* docs: add User Guide
* docs: update readme
  • Loading branch information
helen-m-lin authored Sep 19, 2024
1 parent 7febec5 commit 073faa6
Show file tree
Hide file tree
Showing 18 changed files with 950 additions and 42 deletions.
13 changes: 13 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.10"

python:
install:
- method: pip
path: .
extra_requirements:
- dev
26 changes: 12 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,18 @@
![Code Style](https://img.shields.io/badge/code%20style-black-black)
[![semantic-release: angular](https://img.shields.io/badge/semantic--release-angular-e10079?logo=semantic-release)](https://github.com/semantic-release/semantic-release)

Script to create metadata analytics table and write to redshift table.
This script will parse through a list of s3 buckets and document whether data asset records in each of those buckets does or does not contain `metadata.nd.json`
Index jobs for AIND metadata in AWS DocumentDB and S3.

AIND metadata for data assets is stored in various places and must be
kept in sync:

## Usage
- Define the environment variables in the `.env.template`
- REDSHIFT_SECRETS_NAME: defining secrets name for Amazon Redshift
- BUCKETS: list of buckets. comma separated format (ex: "bucket_name1, bucket_name2")
- TABLE_NAME: name of table in redshift
- FOLDERS_FILEPATH: Intended filepath for txt file
- METADATA_DIRECTORY: Intended path for directory containing copies of metadata records
- AWS_DEFAULT_REGION: Default AWS region.
- Records containing metadata.nd.json file will be copies to `METADATA_DIRECTORY` and compared against list of all records in `FOLDERS_FILEPATH`
- An analytics table containing columns `s3_prefix`, `bucket_name`, and `metadata_bool` will be written to `TABLE_NAME` in Redshift
1. **S3 buckets** store raw metadata files, including the ``metadata.nd.json``.
2. A **document database (DocDB)** contains unstructured json
documents describing the ``metadata.nd.json`` for a data asset.
3. **Code Ocean**: data assets are mounted as CodeOcean data asssets.
Processed results are also stored in an internal Code Ocean bucket.

## Development
- It's a bit tedious, but the dependencies listed in the `pyproject.toml` file needs to be manually updated
We have automated jobs to keep changes in DocDB and S3 in sync.
This repository contains the code for these index jobs.

More information including a user guide and contributor guidelines can be found at [readthedocs](https://aind-data-asset-indexer.readthedocs.io).
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
273 changes: 273 additions & 0 deletions docs/source/Contributing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
Contributor Guidelines
======================

This document will go through best practices for contributing to this
project. We welcome and appreciate contributions or ideas for
improvement.

- `Bug Reports and Feature
Requests <#bug-reports-and-feature-requests>`__
- `Local Installation for
Development <#local-installation-for-development>`__
- `Branches and Pull Requests <#branches-and-pull-requests>`__
- `Release Cycles <#release-cycles>`__

Bug Reports and Feature Requests
--------------------------------

Before creating a pull request, we ask contributors to please open a bug
report or feature request first:
`issues <https://github.com/AllenNeuralDynamics/aind-data-asset-indexer/issues/new/choose>`__

We will do our best to monitor and maintain the backlog of issues.

Local Installation and Development
----------------------------------

For development,

- For new features or non-urgent bug fixes, create a branch off of
``dev``
- For an urgent hotfix to our production environment, create a branch
off of ``main``

Consult the `Branches and Pull Requests <#branches-and-pull-requests>`__
and `Release Cycles <#release-cycles>`__ for more details.

From the root directory, run:

.. code:: bash
pip install -e .[dev]
to install the relevant code for development.

.. _running-indexer-jobs-locally:

Running indexer jobs locally
~~~~~~~~~~~~~~~~~~~~~~~~~~~~


The jobs are intended to be run as scheduled AWS ECS tasks in the same VPC
as the DocDB instance. The job settings are stored in AWS Parameter Store.

If you wish to run the jobs locally, follow these steps:

1. In a new terminal, start ssh session. Credentials can be found in AWS
Secrets Manager.

.. code:: bash
ssh -L 27017:{docdb_host}:27017 {ssh_username}@{ssh_host} -N -v
2. For the `IndexAindBucketsJob`, you will need to set the ``INDEXER_PARAM_NAME``.
Then, run the following:

.. code:: python
from aind_data_asset_indexer.index_aind_buckets import IndexAindBucketsJob
from aind_data_asset_indexer.models import AindIndexBucketsJobSettings
if __name__ == "__main__":
main_job_settings = AindIndexBucketsJobSettings.from_param_store(param_store_name=INDEXER_PARAM_NAME)
main_job_settings.doc_db_host = "localhost"
main_job = IndexAindBucketsJob(job_settings=main_job_settings)
main_job.run_job()
3. For the `CodeOceanIndexBucketJob`, you will need to set the ``CO_INDEXER_PARAM_NAME``
and ``DEVELOPER_CODEOCEAN_ENDPOINT``. Then, run the following:

.. code:: python
from aind_data_asset_indexer.models import CodeOceanIndexBucketJobSettings
from aind_data_asset_indexer.codeocean_bucket_indexer import CodeOceanIndexBucketJob
if __name__ == "__main__":
main_job_settings = CodeOceanIndexBucketJobSettings.from_param_store(param_store_name=CO_INDEXER_PARAM_NAME)
main_job_settings.doc_db_host = "localhost"
main_job_settings.temp_codeocean_endpoint=DEVELOPER_CODEOCEAN_ENDPOINT
main_job = CodeOceanIndexBucketJob(job_settings=main_job_settings)
main_job.run_job()
4. Close the ssh session when you are done.


Branches and Pull Requests
--------------------------

Branch naming conventions
~~~~~~~~~~~~~~~~~~~~~~~~~

Name your branch using the following format:
``<type>-<issue_number>-<short_summary>``

where:

- ``<type>`` is one of:

- **build**: Changes that affect the build system
or external dependencies (e.g., pyproject.toml, setup.py)
- **ci**: Changes to our CI configuration files and scripts
(examples: .github/workflows/ci.yml)
- **docs**: Changes to our documentation
- **feat**: A new feature
- **fix**: A bug fix
- **perf**: A code change that improves performance
- **refactor**: A code change that neither fixes a bug nor adds
a feature, but will make the codebase easier to maintain
- **test**: Adding missing tests or correcting existing tests
- **hotfix**: An urgent bug fix to our production code
- ``<issue_number>`` references the GitHub issue this branch will close
- ``<short_summary>`` is a brief description that shouldn’t be more than 3
words.

Some examples:

- ``feat-12-adds-email-field``
- ``fix-27-corrects-endpoint``
- ``test-43-updates-server-test``

We ask that a separate issue and branch are created if code is added
outside the scope of the reference issue.

Commit messages
~~~~~~~~~~~~~~~

Please format your commit messages as ``<type>: <short summary>`` where
``<type>`` is from the list above and the short summary is one or two
sentences.

Testing and docstrings
~~~~~~~~~~~~~~~~~~~~~~

We strive for complete code coverage and docstrings, and we also run
code format checks.

- To run the code format check:

.. code:: bash
flake8 .
- There are some helpful libraries that will automatically format the
code and import statements:

.. code:: bash
black .
and

.. code:: bash
isort .
Strings that exceed the maximum line length may still need to be
formatted manually.

- To run the docstring coverage check and report:

.. code:: bash
interrogate -v .
This project uses NumPy’s docstring format: `Numpy docstring
standards <https://numpydoc.readthedocs.io/en/latest/format.html>`__

Many IDEs can be configured to automatically format docstrings in the
NumPy convention.

- To run the unit test coverage check and report:

.. code:: bash
coverage run -m unittest discover && coverage report
- To view a more detailed html version of the report, run:

.. code:: bash
coverage run -m unittest discover && coverage report
coverage html
and then open ``htmlcov/index.html`` in a browser.

Pull requests
~~~~~~~~~~~~~

Pull requests and reviews are required before merging code into this
project. You may open a ``Draft`` pull request and ask for a preliminary
review on code that is currently a work-in-progress.

Before requesting a review on a finalized pull request, please verify
that the automated checks have passed first.

Release Cycles
--------------------------

For this project, we have adopted the `Git
Flow <https://www.gitkraken.com/learn/git/git-flow>`__ system. We will
strive to release new features and bug fixes on a two week cycle. The
rough workflow is:

Hotfixes
~~~~~~~~

- A ``hotfix`` branch is created off of ``main``
- A Pull Request into is ``main`` is opened, reviewed, and merged into
``main``
- A new ``tag`` with a patch bump is created, and a new ``release`` is
deployed
- The ``main`` branch is merged into all other branches

Feature branches and bug fixes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- A new branch is created off of ``dev``
- A Pull Request into ``dev`` is opened, reviewed, and merged

Release branch
~~~~~~~~~~~~~~

- A new branch ``release-v{new_tag}`` is created
- Documentation updates and bug fixes are created off of the
``release-v{new_tag}`` branch.
- Commits added to the ``release-v{new_tag}`` are also merged into
``dev``
- Once ready for release, a Pull Request from ``release-v{new_tag}``
into ``main`` is opened for final review
- A new tag will automatically be generated
- Once merged, a new GitHub Release is created manually

Pre-release checklist
~~~~~~~~~~~~~~~~~~~~~

- ☐ Increment ``__version__`` in
``aind_data_asset-indexer/__init__.py`` file
- ☐ Run linters, unit tests, and integration tests
- ☐ Verify code is deployed and tested in test environment
- ☐ Update examples
- ☐ Update documentation

- Run:

.. code:: bash
sphinx-apidoc -o docs/source/ src
sphinx-build -b html docs/source/ docs/build/html
- ☐ Update and build UML diagrams

- To build UML diagrams locally using a docker container:

.. code:: bash
docker pull plantuml/plantuml-server
docker run -d -p 8080:8080 plantuml/plantuml-server:jetty
Post-release checklist
~~~~~~~~~~~~~~~~~~~~~~

- ☐ Merge ``main`` into ``dev`` and feature branches
- ☐ Edit release notes if needed
- ☐ Post announcement
Loading

0 comments on commit 073faa6

Please sign in to comment.