Clarifai Python Data Utils

This is a collection of utilities for handling various types of multimedia data. Enhance your experience by seamlessly integrating these utilities with the Clarifai Python SDK. This powerful combination empowers you to address both visual and textual use cases effortlessly through the capabilities of Artificial Intelligence. Unlock new possibilities and elevate your projects with the synergy of versatile data utilities and the robust features offered by the Clarifai Python SDK. Explore the fusion of these tools to amplify the intelligence in your applications! 🌐🚀

Installation

Install from PyPi:

pip install clarifai-datautils

Install from Source:

git clone https://github.com/Clarifai/clarifai-python-datautils
cd clarifai-python-datautils
python3 -m venv env
source env/bin/activate
pip3 install -r requirements.txt

Getting started

Quick intro to Image Annotation Conversion feature

from clarifai_datautils.image import ImageAnnotations

annotated_dataset = ImageAnnotations.import_from(path= 'folder_path', format= 'annotation_format')

Features

Image Utils

Annotation Loader
- Load various annotated image datasets and export to clarifai Platform
- Convert from one annotation format to other supported annotation formats

Data Ingestion Pipeline

Easy to use pipelines to load data from files and ingest into clarifai platfrom.
Load text files(pdf, doc, etc..) , transform, chunk and upload to the Clarifai Platform

Usage

Image Annotation Loader

Setup

To use Image Annotation Loader, please install the extra libs required for annotations

from clarifai_datautils.image import ImageAnnotations
#import from folder
coco_dataset = ImageAnnotations.import_from(path='folder_path',format= 'coco_detection')

#Using clarifai SDK to upload to Clarifai Platform
#export CLARIFAI_PAT={your personal access token}  # set PAT as env variable
from clarifai.client.dataset import Dataset
dataset = Dataset(user_id="user_id", app_id="app_id", dataset_id="dataset_id")
dataset.upload_dataset(dataloader=coco_dataset.dataloader)

#info about loaded dataset
coco_dataset.get_info()


#exporting to other formats
coco_dataset.export_to('voc_detection')

Data Ingestion Pipelines

Setup

To use Data Ingestion Pipeline, please run

pip install -r requirements-dev.txt

from clarifai_datautils.text import Pipeline, PDFPartition
from clarifai_datautils.text.pipeline.cleaners import Clean_extra_whitespace

# Define the pipeline
pipeline = Pipeline(
    name='pipeline-1',
    transformations=[
        PDFPartition(chunking_strategy = "by_title",max_characters = 1024),
        Clean_extra_whitespace()
    ]
)


# Using SDK to upload
from clarifai.client import Dataset
dataset = Dataset(dataset_url)
dataset.upload_dataset(pipeline.run(files = file_path, loader = True))

More Examples

See many more code examples in this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/workflows		.github/workflows
clarifai_datautils		clarifai_datautils
docs		docs
testing		testing
tests		tests
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
.ruff.toml		.ruff.toml
.style.yapf		.style.yapf
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clarifai Python Data Utils

Table Of Contents

Installation

Getting started

Features

Image Utils

Annotation Loader

Data Ingestion Pipeline

Usage

Image Annotation Loader

Setup

Data Ingestion Pipelines

Setup

More Examples

About

Releases 7

Packages

Contributors 5

Languages

License

Clarifai/clarifai-python-datautils

Folders and files

Latest commit

History

Repository files navigation

Clarifai Python Data Utils

Table Of Contents

Installation

Getting started

Features

Image Utils

Annotation Loader

Data Ingestion Pipeline

Usage

Image Annotation Loader

Setup

Data Ingestion Pipelines

Setup

More Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 5

Languages

Packages