Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dataset support to be created using distribution settings #5013

Conversation

jfcalvo
Copy link
Member

@jfcalvo jfcalvo commented Jun 13, 2024

Description

This PR is the first one related with distribution task feature, adding the following changes:

  • Added distribution JSON column to datasets table:
    • This column is non-nullable so a value is always required when a dataset is created.
    • By default old datasets will have the value {"strategy": "overlap", "min_submitted": 1}.
  • Added distribution attribute to DatasetCreate schema:
    • None is not a valid value.
    • If no value is specified for this attribute DatasetOverlapDistributionCreate with min_submitted to 1 is used.
    • DatasetOverlapDistributionCreate only allows values greater or equal than 1 for min_submitted attributed.
  • Now the context create_dataset function is receiving a dictionary instead of DatasetCreate schema.
  • Moved dataset creation validations to a new DatasetCreateValidator class.

Update of distribution attribute for datasets will be done in a different issue.

Closes #5005

Type of change

(Please delete options that are not relevant. Remember to title the PR according to the type of change)

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)
  • Documentation update

How Has This Been Tested

(Please describe the tests that you ran to verify your changes. And ideally, reference tests)

  • Adding new tests and passing old ones.
  • Check that migration works as expected with old datasets and SQLite.
  • Check that migration works as expected with old datasets and PostgreSQL.

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@jfcalvo jfcalvo requested a review from frascuchon June 13, 2024 14:19
Copy link

codecov bot commented Jun 19, 2024

Codecov Report

Attention: Patch coverage is 97.24771% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.00%. Comparing base (60d9340) to head (91eb6b1).

Files Patch % Lines
...server/src/argilla_server/search_engine/commons.py 33.33% 2 Missing ⚠️
...la-server/src/argilla_server/search_engine/base.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@                               Coverage Diff                                @@
##           feat/add-dataset-automatic-task-distribution    #5013      +/-   ##
================================================================================
+ Coverage                                         91.93%   92.00%   +0.06%     
================================================================================
  Files                                               135      137       +2     
  Lines                                              5818     5905      +87     
================================================================================
+ Hits                                               5349     5433      +84     
- Misses                                              469      472       +3     
Flag Coverage Δ
argilla-server 92.00% <97.24%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…5028)

# Description

This PR add changes to support update dataset distribution settings.
Allowing for example to update `min_submitted` attribute when `overlap`
distribution strategy is in use.

Closes #5010 

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Refactor (change restructuring the codebase without changing
functionality)
- [ ] Improvement (change adding some improvement to an existing
functionality)
- [ ] Documentation update

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [x] Adding new tests.

**Checklist**

- [ ] I added relevant documentation
- [ ] follows the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: Paco Aranda <[email protected]>
@jfcalvo jfcalvo merged commit f62d58a into feat/add-dataset-automatic-task-distribution Jul 1, 2024
14 checks passed
@jfcalvo jfcalvo deleted the feat/create-datasets-with-distribution branch July 1, 2024 10:31
Copy link

github-actions bot commented Jul 1, 2024

The URL of the deployed environment for this PR is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TASK] Allow datasets to be created with specific distribution settings
2 participants