diff --git a/argilla/docs/assets/images/how_to_guides/distribution/taskdistribution.svg b/argilla/docs/assets/images/how_to_guides/distribution/taskdistribution.svg new file mode 100644 index 0000000000..490571f2cf --- /dev/null +++ b/argilla/docs/assets/images/how_to_guides/distribution/taskdistribution.svg @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/argilla/docs/how_to_guides/annotate.md b/argilla/docs/how_to_guides/annotate.md index 3209965f6e..ba5fdb64e9 100644 --- a/argilla/docs/how_to_guides/annotate.md +++ b/argilla/docs/how_to_guides/annotate.md @@ -72,7 +72,13 @@ If you are starting an annotation effort, all the records are initially kept in - **Pending**: The records without a response. - **Draft**: The records with partial responses. They can be submitted or discarded later. You can’t move them back to the pending queue. - **Discarded**: The records may or may not have responses. They can be edited but you can’t move them back to the pending queue. -- **Submitted**: The records have been fully annotated and have already been submitted. +- **Submitted**: The records have been fully annotated and have already been submitted. You can remove them from this queue and send them to the draft or discarded queues, but never back to the pending queue. + +!!! note + If you are working as part of a team, the number of records in your Pending queue may change as other members of the team submit responses and those records get completed. + +!!! tip + If you are working as part of a team, the records in the draft queue that have been completed by other team members will show a check mark to indicate that there is no need to provide a response. ### Suggestions @@ -115,9 +121,9 @@ The bulk view displays the records in a vertical list. Once this view is active, ### Annotation progress -The global progress of the annotation task from all users is displayed in the dataset list. This is indicated in the `Global progress` column, which shows the number of records still to be annotated, along with a progress bar. The progress bar displays the percentage and number of records submitted, conflicting (i.e., those with both submitted and discarded responses), discarded and pending by hovering your mouse over it. +You can track the progress of an annotation task in the progress bar shown in the dataset list and in the progress panel inside the dataset. This bar shows the number of records that have been completed (i.e., those that have the minimum number of submitted responses) and those left to be completed. -You can track your annotation progress in real time from the righ-bottom panel inside the dataset page. This means that, while you are annotating, the progress bar updates as you submit or discard a record. Expanding the panel, the distribution of `Pending`, `Draft`, `Submitted` and `Discarded` responses is displayed in a donut chart. +You can also track your own progress in real time expanding the right-bottom panel inside the dataset page. There you can see the number of records for which you have `Pending`, `Draft`, `Submitted` and `Discarded` responses. ## Use search, filters, and sort @@ -173,16 +179,4 @@ You can sort your records according to one or several attributes. The insertion time and last update are general to all records. -The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided. - -## Annotate in teams - -!!! note - Argilla 2.1 will come with automatic task distribution, which will allow you to distribute the work across several users more efficiently. - -### Edit guidelines in the settings - -As an `owner` or `admin`, you can edit the guidelines as much as you need from the icon settings on the header. Markdown format is enabled. - -!!! tip - If you want further guidance on good practices for guidelines during the project development, check this [blog post](https://argilla.io/blog/annotation-guidelines-practices/). +The suggestion scores, response, and suggestion values for rating questions and metadata properties are available only when they were provided. \ No newline at end of file diff --git a/argilla/docs/how_to_guides/dataset.md b/argilla/docs/how_to_guides/dataset.md index 5064d8f131..fd195fae83 100644 --- a/argilla/docs/how_to_guides/dataset.md +++ b/argilla/docs/how_to_guides/dataset.md @@ -42,6 +42,7 @@ A **dataset** is a collection of records that you can configure for labelers to vectors=[rg.VectorField(name="vector", dimensions=10)], guidelines="guidelines", allow_extra_metadata=True, + distribution=2 ) ``` @@ -96,6 +97,7 @@ settings = rg.Settings( guidelines="Select the sentiment of the prompt.", fields=[rg.TextField(name="prompt", use_markdown=True)], questions=[rg.LabelQuestion(name="sentiment", labels=["positive", "negative"])], + distribution=rg.TaskDistribution(min_submitted=3) ) dataset1 = rg.Dataset(name="sentiment_analysis_1", settings=settings) @@ -395,6 +397,19 @@ It is good practice to use at least the dataset guidelines if not both methods. !!! tip If you want further guidance on good practices for guidelines during the project development, check our [blog post](https://argilla.io/blog/annotation-guidelines-practices/). +### Distribution + +When working as a team, you may want to distribute the annotation task to ensure efficiency and quality. You can use the `TaskDistribution` settings to configure the number of minimum submitted responses expected for each record. Argilla will use this setting to automatically handle records in your team members' pending queues. + +```python +rg.TaskDistribution( + min_submitted = 2 +) +``` + +> To learn more about how to distribute the task among team members in the [Distribute the annotation guide](../how_to_guides/distribution.md). + + ## List datasets You can list all the datasets available in a workspace using the `datasets` attribute of the `Workspace` class. You can also use `len(workspace.datasets)` to get the number of datasets in a workspace. diff --git a/argilla/docs/how_to_guides/distribution.md b/argilla/docs/how_to_guides/distribution.md new file mode 100644 index 0000000000..94332df052 --- /dev/null +++ b/argilla/docs/how_to_guides/distribution.md @@ -0,0 +1,76 @@ +--- +description: In this section, we will provide a step-by-step guide to show how to distribute the annotation task among team members. +--- + +# Distribute the annotation task among the team + +This guide explains how you can use Argilla’s **automatic task distribution** to efficiently divide the task of annotating a dataset among multiple team members. + +Owners and admins can define the minimum number of submitted responses expected for each record depending on whether the dataset should have annotation overlap and how much. Argilla will use this setting to handle automatically the records that will be shown in the pending queues of all users with access to the dataset. + +When a record has met the minimum number of submissions, the status of the record will change to `completed` and the record will be removed from the `Pending` queue of all team members, so they can focus on providing responses where they are most needed. The dataset’s annotation task will be fully completed once all records have the `completed` status. + +![Task Distribution diagram](../assets/images/how_to_guides/distribution/taskdistribution.svg) + +!!! note + The status of a record can be either `completed`, when it has the required number of responses with `submitted` status, or `pending`, when it doesn’t meet this requirement. + + Each record can have multiple responses and each of those can have the status `submitted`, `discarded` or `draft`. + +!!! info "Main Class" + + ```python + rg.TaskDistribution( + min_submitted = 2 + ) + ``` + > Check the [Task Distribution - Python Reference](../reference/argilla/settings/task_distribution.md) to see the attributes, arguments, and methods of the `TaskDistribution` class in detail. + +## Configure task distribution settings + +By default, Argilla will set the required minimum submitted responses to 1. This means that whenever a record has at least 1 response with the status `submitted` the status of the record will be `completed` and removed from the `Pending` queue of other team members. + +!!! tip + Leave the default value of minimum submissions (1) if you are working on your own or when you don't require more than one submitted response per record. + +If you wish to set a different number, you can do so through the `distribution` setting in your dataset settings: + +```python +settings = rg.Settings( + guidelines="These are some guidelines.", + fields=[ + rg.TextField( + name="text", + ), + ], + questions=[ + rg.LabelQuestion( + name="label", + labels=["label_1", "label_2", "label_3"] + ), + ], + distribution=rg.TaskDistribution(min_submitted=3) +) +``` + +> Learn more about configuring dataset settings in the [Dataset management guide](../how_to_guides/dataset.md). + +!!! tip + Increase the number of minimum subsmissions if you’d like to ensure you get more than one submitted response per record. Make sure that this number is never higher than the number of members in your team. Note that the lower this number is, the faster the task will be completed. + +!!! note + Note that some records may have more responses than expected if multiple team members submit responses on the same record simultaneously. + +## Change task distribution settings + +If you wish to change the minimum submitted responses required in a dataset you can do so as long as the annotation hasn’t started, i.e. the dataset has no responses for any records. + +Admins and owners can change this value from the dataset settings page in the UI or from the SDK: + +```python +dataset = client.datasets(...) + +dataset.settings.distribution.min_submitted = 4 + +dataset.update() +``` \ No newline at end of file diff --git a/argilla/docs/how_to_guides/index.md b/argilla/docs/how_to_guides/index.md index 189b144f36..25cd4245f6 100644 --- a/argilla/docs/how_to_guides/index.md +++ b/argilla/docs/how_to_guides/index.md @@ -59,6 +59,22 @@ These guides provide step-by-step instructions for common scenarios, including d [:octicons-arrow-right-24: How-to guide](import_export.md) +- __Annotate a dataset__ + + --- + + Learn how to use the Argilla UI to navigate datasets and submit responses. + + [:octicons-arrow-right-24: How-to guide](annotate.md) + +- __Distribute the annotation__ + + --- + + Learn how to use Argilla's automatic task distribution to annotate as a team efficiently. + + [:octicons-arrow-right-24: How-to guide](distribution.md) + ## Advanced diff --git a/argilla/docs/how_to_guides/query.md b/argilla/docs/how_to_guides/query.md index 9704677c72..642cb915a6 100644 --- a/argilla/docs/how_to_guides/query.md +++ b/argilla/docs/how_to_guides/query.md @@ -122,7 +122,7 @@ You can use the `Filter` class to define the conditions and pass them to the `Da ## Filter by status -You can filter records based on their status. The status can be `pending`, `draft`, `submitted`, or `discarded`. +You can filter records based on record or response status. Record status can be `pending` or `completed` and response status can be `pending`, `draft`, `submitted`, or `discarded`. ```python import argilla as rg @@ -134,7 +134,12 @@ workspace = client.workspaces("my_workspace") dataset = client.datasets(name="my_dataset", workspace=workspace) status_filter = rg.Query( - filter=rg.Filter(("response.status", "==", "submitted")) + filter=rg.Filter( + [ + ("status", "==", "completed"), + ("response.status", "==", "discarded") + ] + ) ) filtered_records = list(dataset.records(status_filter)) diff --git a/argilla/docs/reference/argilla/SUMMARY.md b/argilla/docs/reference/argilla/SUMMARY.md index a36c2d8177..cfe33198e5 100644 --- a/argilla/docs/reference/argilla/SUMMARY.md +++ b/argilla/docs/reference/argilla/SUMMARY.md @@ -8,6 +8,7 @@ * [Questions](settings/questions.md) * [Metadata](settings/metadata_property.md) * [Vectors](settings/vectors.md) + * [Distribution](settings/task_distribution.md) * [rg.Record](records/records.md) * [rg.Response](records/responses.md) * [rg.Suggestion](records/suggestions.md) diff --git a/argilla/docs/reference/argilla/settings/settings.md b/argilla/docs/reference/argilla/settings/settings.md index 43d71154b8..96d2a6b4f0 100644 --- a/argilla/docs/reference/argilla/settings/settings.md +++ b/argilla/docs/reference/argilla/settings/settings.md @@ -30,7 +30,7 @@ dataset.create() ``` -> To define the settings for fields, questions, metadata, or vectors, refer to the [`rg.TextField`](fields.md), [`rg.LabelQuestion`](questions.md), [`rg.TermsMetadataProperty`](metadata_property.md), and [`rg.VectorField`](vectors.md) class documentation. +> To define the settings for fields, questions, metadata, vectors, or distribution, refer to the [`rg.TextField`](fields.md), [`rg.LabelQuestion`](questions.md), [`rg.TermsMetadataProperty`](metadata_property.md), and [`rg.VectorField`](vectors.md), [`rg.TaskDistribution`](task_distribution.md) class documentation. --- diff --git a/argilla/docs/reference/argilla/settings/task_distribution.md b/argilla/docs/reference/argilla/settings/task_distribution.md new file mode 100644 index 0000000000..48e2fe2070 --- /dev/null +++ b/argilla/docs/reference/argilla/settings/task_distribution.md @@ -0,0 +1,42 @@ +--- +hide: footer +--- +# Distribution + +Distribution settings are used to define the criteria used by the tool to automatically manage records in the dataset depending on the expected number of submitted responses per record. + +## Usage Examples + +The default minimum submitted responses per record is 1. If you wish to increase this value, you can define it through the `TaskDistribution` class and pass it to the `Settings` class. + +```python +settings = rg.Settings( + guidelines="These are some guidelines.", + fields=[ + rg.TextField( + name="text", + ), + ], + questions=[ + rg.LabelQuestion( + name="label", + labels=["label_1", "label_2", "label_3"] + ), + ], + distribution=rg.TaskDistribution(min_submitted=3) +) + +dataset = rg.Dataset( + name="my_dataset", + settings=settings +) +``` + +--- + +## `rg.TaskDistribution` + +::: src.argilla.settings._task_distribution.OverlapTaskDistribution + options: + heading_level: 3 + show_root_toc_entry: false \ No newline at end of file diff --git a/argilla/mkdocs.yml b/argilla/mkdocs.yml index c9a25aa8ea..fa7b9ed59e 100644 --- a/argilla/mkdocs.yml +++ b/argilla/mkdocs.yml @@ -142,7 +142,7 @@ nav: - Query and filter records: how_to_guides/query.md - Importing and exporting datasets: how_to_guides/import_export.md - Annotate a dataset: how_to_guides/annotate.md - - Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md + - Distribute the annotation task: how_to_guides/distribution.md - Advanced: - Use Markdown to format rich content: how_to_guides/use_markdown_to_format_rich_content.md - Migrate your legacy datasets to Argilla V2: how_to_guides/migrate_from_legacy_datasets.md diff --git a/argilla/src/argilla/settings/_task_distribution.py b/argilla/src/argilla/settings/_task_distribution.py index 593df1c681..5b6c44a2b7 100644 --- a/argilla/src/argilla/settings/_task_distribution.py +++ b/argilla/src/argilla/settings/_task_distribution.py @@ -21,8 +21,7 @@ class OverlapTaskDistribution: """The task distribution settings class. - This task distribution defines a number of submitted record required to complete a record. - We could support multiple task distribution strategies in the future + This task distribution defines a number of submitted responses required to complete a record. Args: min_submitted (int): The number of min. submitted responses to complete the record