Skip to content

Latest commit

 

History

History
138 lines (97 loc) · 5.74 KB

README.md

File metadata and controls

138 lines (97 loc) · 5.74 KB

Data curation API

CI Status Documentation Status Test coverage percentage black

PyPi Status Anaconda Docker Image Version pyversions license

The building blocks of our data curation API.

Quick Tour

Running/stopping the service

You can run the API containers using this command:

make run

You can now navigate to http://localhost:8080/docs to interact with the API (or do it through HTTP requests) and explore the documentation.

In order to stop the service, run:

make stop

How is the database organized

The back-end core feature is to interact with the metadata tables. For the service to be useful for data curation, multiple tables/object types are introduced and described as follows:

Access-related tables

  • Accesses: stores the hashed credentials and access level for users & devices.

Core data curation worklow tables

  • Media: metadata of a picture and its storage bucket key.
  • Annotations: metadata of an annotation file and its storage bucket key.

UML

What is the full data curation workflow through the API

The API has been designed to provide, for each data entry:

  • timestamp
  • the picture that was uploaded
  • the annotation file associated with that picture

With the previously described tables, here are all the steps to upload a data entry:

  • Prerequisites (ask the instance administrator): register user
  • Create a media object & upload content: save the picture metadata and upload the image content.
  • Create an annotation object & upload content: save the annotation metadata and upload the annotation content.

Installation

Prerequisites

The project was designed so that everything runs with Docker orchestration (standalone virtual environment), so you won't need to install any additional libraries.

Configuration

In order to run the project, you will need to specific some information, which can be done using a .env file. This file will have to hold the following information:

  • S3_ACCESS_KEY: public key to access to the S3 storage service
  • S3_SECRET_KEY: private key to access the resource.
  • S3_REGION: your S3 bucket is geographically identified by its location's region
  • S3_ENDPOINT_URL: the URL providing a S3 endpoint by your cloud provider
  • BUCKET_NAME: the name of the storage bucket

Optionally, the following information can be added:

  • SENTRY_DSN: the URL of the Sentry project, which monitors back-end errors and report them back.
  • SERVER_NAME: the server tag to apply to events.
  • CORS_ORIGIN: comma-separated list of allowed origins

So your .env file should look like something similar to:

S3_ACCESS_KEY=YOUR_ACCESS_KEY
S3_SECRET_KEY=YOUR_SECRET_KEY
S3_REGION=bucket-region
S3_ENDPOINT_URL='https://s3.mydomain.com/'
BUCKET_NAME=my_storage_bucket_name
SENTRY_DSN='https://replace.with.you.sentry.dsn/'
SERVER_NAME=my_storage_bucket_name

The file should be placed at the root folder of your local copy of the project.

More goodies

Documentation

The full package documentation is available here for detailed specifications.

Python client

This project is a REST-API, and you can interact with the service through HTTP requests. However, if you want to ease the integration into a Python project, take a look at our Python client.

Contributing

Any sort of contribution is greatly appreciated!

You can find a short guide in CONTRIBUTING to help grow this project!

License

Distributed under the Apache 2.0 License. See LICENSE for more information.