-
Notifications
You must be signed in to change notification settings - Fork 399
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEATURE]
argilla
: add support to distribution (#5187)
# Description <!-- Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. --> This PR adds support to configure the task distribution strategy when creating or updating datasets. We can create datasets with specific task distribution setup ```python task_distribution = TaskDistribution(min_submitted=4) settings = Settings( fields=[TextField(name="text", title="text")], questions=[LabelQuestion(name="label", title="text", labels=["positive", "negative"])], distribution=task_distribution, ) dataset = Dataset(dataset_name, settings=settings).create() ``` or update an existing dataset (without any user response) ```python dataset = client.datasets(...) dataset.settings.distribution.min_submitted = 100 # or dataset.distribution.min_submitted = 100 # or dataset.distribution = TaskDistribution(min_submitted=100) dataset.update() ``` Closes #5033 Closes #5034 Refs: #5246 **Type of change** <!-- Please delete options that are not relevant. Remember to title the PR according to the type of change --> - New feature (non-breaking change which adds functionality) - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** <!-- Please add some reference about how your feature has been tested. --> **Checklist** <!-- Please go over the list and make sure you've taken everything into account --> - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: José Francisco Calvo <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Damián Pumar <[email protected]> Co-authored-by: José Francisco Calvo <[email protected]> Co-authored-by: Leire <[email protected]> Co-authored-by: David Berenstein <[email protected]> Co-authored-by: Natalia Elvira <[email protected]> Co-authored-by: Sara Han <[email protected]>
- Loading branch information
1 parent
4237e68
commit e640924
Showing
21 changed files
with
391 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33 changes: 33 additions & 0 deletions
33
argilla/docs/assets/images/how_to_guides/distribution/taskdistribution.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
--- | ||
description: In this section, we will provide a step-by-step guide to show how to distribute the annotation task among team members. | ||
--- | ||
|
||
# Distribute the annotation task among the team | ||
|
||
This guide explains how you can use Argilla’s **automatic task distribution** to efficiently divide the task of annotating a dataset among multiple team members. | ||
|
||
Owners and admins can define the minimum number of submitted responses expected for each record depending on whether the dataset should have annotation overlap and how much. Argilla will use this setting to handle automatically the records that will be shown in the pending queues of all users with access to the dataset. | ||
|
||
When a record has met the minimum number of submissions, the status of the record will change to `completed` and the record will be removed from the `Pending` queue of all team members, so they can focus on providing responses where they are most needed. The dataset’s annotation task will be fully completed once all records have the `completed` status. | ||
|
||
![Task Distribution diagram](../assets/images/how_to_guides/distribution/taskdistribution.svg) | ||
|
||
!!! note | ||
The status of a record can be either `completed`, when it has the required number of responses with `submitted` status, or `pending`, when it doesn’t meet this requirement. | ||
|
||
Each record can have multiple responses and each of those can have the status `submitted`, `discarded` or `draft`. | ||
|
||
!!! info "Main Class" | ||
|
||
```python | ||
rg.TaskDistribution( | ||
min_submitted = 2 | ||
) | ||
``` | ||
> Check the [Task Distribution - Python Reference](../reference/argilla/settings/task_distribution.md) to see the attributes, arguments, and methods of the `TaskDistribution` class in detail. | ||
|
||
## Configure task distribution settings | ||
|
||
By default, Argilla will set the required minimum submitted responses to 1. This means that whenever a record has at least 1 response with the status `submitted` the status of the record will be `completed` and removed from the `Pending` queue of other team members. | ||
|
||
!!! tip | ||
Leave the default value of minimum submissions (1) if you are working on your own or when you don't require more than one submitted response per record. | ||
|
||
If you wish to set a different number, you can do so through the `distribution` setting in your dataset settings: | ||
|
||
```python | ||
settings = rg.Settings( | ||
guidelines="These are some guidelines.", | ||
fields=[ | ||
rg.TextField( | ||
name="text", | ||
), | ||
], | ||
questions=[ | ||
rg.LabelQuestion( | ||
name="label", | ||
labels=["label_1", "label_2", "label_3"] | ||
), | ||
], | ||
distribution=rg.TaskDistribution(min_submitted=3) | ||
) | ||
``` | ||
|
||
> Learn more about configuring dataset settings in the [Dataset management guide](../how_to_guides/dataset.md). | ||
!!! tip | ||
Increase the number of minimum subsmissions if you’d like to ensure you get more than one submitted response per record. Make sure that this number is never higher than the number of members in your team. Note that the lower this number is, the faster the task will be completed. | ||
|
||
!!! note | ||
Note that some records may have more responses than expected if multiple team members submit responses on the same record simultaneously. | ||
|
||
## Change task distribution settings | ||
|
||
If you wish to change the minimum submitted responses required in a dataset you can do so as long as the annotation hasn’t started, i.e. the dataset has no responses for any records. | ||
|
||
Admins and owners can change this value from the dataset settings page in the UI or from the SDK: | ||
|
||
```python | ||
dataset = client.datasets(...) | ||
|
||
dataset.settings.distribution.min_submitted = 4 | ||
|
||
dataset.update() | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
argilla/docs/reference/argilla/settings/task_distribution.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
--- | ||
hide: footer | ||
--- | ||
# Distribution | ||
|
||
Distribution settings are used to define the criteria used by the tool to automatically manage records in the dataset depending on the expected number of submitted responses per record. | ||
|
||
## Usage Examples | ||
|
||
The default minimum submitted responses per record is 1. If you wish to increase this value, you can define it through the `TaskDistribution` class and pass it to the `Settings` class. | ||
|
||
```python | ||
settings = rg.Settings( | ||
guidelines="These are some guidelines.", | ||
fields=[ | ||
rg.TextField( | ||
name="text", | ||
), | ||
], | ||
questions=[ | ||
rg.LabelQuestion( | ||
name="label", | ||
labels=["label_1", "label_2", "label_3"] | ||
), | ||
], | ||
distribution=rg.TaskDistribution(min_submitted=3) | ||
) | ||
|
||
dataset = rg.Dataset( | ||
name="my_dataset", | ||
settings=settings | ||
) | ||
``` | ||
|
||
--- | ||
|
||
## `rg.TaskDistribution` | ||
|
||
::: src.argilla.settings._task_distribution.OverlapTaskDistribution | ||
options: | ||
heading_level: 3 | ||
show_root_toc_entry: false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
29 changes: 29 additions & 0 deletions
29
argilla/src/argilla/_models/_settings/_task_distribution.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Copyright 2024-present, Argilla, Inc. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
__all__ = ["TaskDistributionModel", "OverlapTaskDistributionModel"] | ||
|
||
from typing import Literal | ||
|
||
from pydantic import BaseModel, PositiveInt, ConfigDict | ||
|
||
|
||
class OverlapTaskDistributionModel(BaseModel): | ||
strategy: Literal["overlap"] | ||
min_submitted: PositiveInt | ||
|
||
model_config = ConfigDict(validate_assignment=True) | ||
|
||
|
||
TaskDistributionModel = OverlapTaskDistributionModel |
Oops, something went wrong.