-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distilabel 1.3.0
#857
Merged
Merged
distilabel 1.3.0
#857
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add step to combine keys in a dict * Redirect import * Add tests * Add internal function to combine keys in a dict * Fix docstrings per code review
#758) * Update: naming of CombineKeys to MergeColumns * Update: CombineColumns to GroupColumns * Fix: broken tests after refactor to columns directory * Add: deprecation test CombineColumns * Update src/distilabel/pipeline/utils.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> --------- Co-authored-by: Gabriel Martín Blázquez <[email protected]>
* Add requirements list for a pipeline * Add tests for the new requirements of a Pipeline * Create a RequirementsMixin class to contain common requirements functionality for Step and BasePipeline * Create a decorator to add requirements to Steps * Make the _Step inherit from RequirementsMixin to contain the needed functionality * Implement functionality to check requirements before starting running a Pipeline * Fix test to run with DummyPipeline * Add test for requirements to step created via decorator * Add requirements info to dump and ensure it's loaded back (if found * Update src/distilabel/mixins/requirements.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Apply suggestions from code review! * Update src/distilabel/pipeline/base.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update requirements to store the list of Requirement instances to avoid reinstantiation * Update tests * Fix doc errors from column step refactor * Add missing llm serving/sharing in how to guides * Fix error on internal requirements variable * Include guide to use the requirements decorator * Update docs/sections/how_to_guides/advanced/pipeline_requirements.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update docs/sections/how_to_guides/advanced/pipeline_requirements.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update docs/sections/how_to_guides/advanced/pipeline_requirements.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update docs/sections/how_to_guides/advanced/pipeline_requirements.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update mkdocs.yml Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Change ValueError with ModuleNotFoundError when stopping a pipeline due to requirements not installed --------- Co-authored-by: Gabriel Martín Blázquez <[email protected]>
* Create N replicas per `Step` * Update `_BatchManager` to handle batch sorting uncertainty * Add multiple replicas test * Fix unit tests * Fix `next_expected_seq_no` needed to be updated if `routing_batch_function` * Update `set_next_expected_batch_seq_no` only if no `data` * Fix `next_expected_seq_no` with `routing_batch_function` * Remove prints * Add `StepResource` import * Add missing return type hint * Add `StepResources` docs * Fix typos Co-authored-by: Agus <[email protected]> --------- Co-authored-by: Agus <[email protected]>
* Create N replicas per `Step` * Update `_BatchManager` to handle batch sorting uncertainty * Add multiple replicas test * Fix unit tests * Fix `next_expected_seq_no` needed to be updated if `routing_batch_function` * Update `set_next_expected_batch_seq_no` only if no `data` * Fix `next_expected_seq_no` with `routing_batch_function` * Remove prints * Add `StepResource` import * Add missing return type hint * Add `StepResources` docs * Add `get_steps_load_stages` method * Update to load steps in stages * Add `_teardown` method * Add load stages * Add printing info about stages * Refactor load stages to avoid race conditions * Add load stages integration test * Fix unit tests * Add unit tests for new methods * Move send last batch message * Refactor to make it work with routing batch function * Add integration test for load stages & routing batch function * Update docs to tell about resources as runtime parameters * Add missing doc pages * Update to load stages from cache * Fix bugs requesting initial batches * Add integration tests for recovering states from cache * Remove atexit * Fix docstring typos Co-authored-by: Agus <[email protected]> --------- Co-authored-by: Agus <[email protected]>
* Deprecate `python==3.8` * Fix format
…istiset (#762) * Add option to include the pipeline script as another artifact when pushing a distiset to the hub * Add documentation for the pipeline script uploaded * Inform of the new pieline script uploaded to the repository in the README * Add docs explaining how to run a pipeline using the CLI * Run python file with distilabel pipeline from CLI * Update docs with new running method * Run script by importing the pipeline from the remote module * Update src/distilabel/cli/pipeline/app.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update src/distilabel/cli/pipeline/utils.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update docs/sections/how_to_guides/advanced/cli/index.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update to importerror as per code review * Add missing import --------- Co-authored-by: Gabriel Martín Blázquez <[email protected]>
* Add `docs-pr.yml` workflow * Remove if condition * Add workflow to remove PR docs on close * Add `GITHUB_TOKEN`
* Create N replicas per `Step` * Update `_BatchManager` to handle batch sorting uncertainty * Add multiple replicas test * Fix unit tests * Fix `next_expected_seq_no` needed to be updated if `routing_batch_function` * Update `set_next_expected_batch_seq_no` only if no `data` * Fix `next_expected_seq_no` with `routing_batch_function` * Remove prints * Add `StepResource` import * Add missing return type hint * Add `StepResources` docs * Add `get_steps_load_stages` method * Update to load steps in stages * Add `_teardown` method * Add load stages * Add printing info about stages * Refactor load stages to avoid race conditions * Add load stages integration test * Fix unit tests * Add unit tests for new methods * Move send last batch message * Refactor to make it work with routing batch function * Add integration test for load stages & routing batch function * Update docs to tell about resources as runtime parameters * Add missing doc pages * Add `ray>=2.31.0` optional dependency * Initial work for `RayPipeline` * Update to load stages from cache * Fix bugs requesting initial batches * Add integration tests for recovering states from cache * Remove atexit * Move `_ProcessWrapper` to different file * `RayPipeline` mvp * Install `ray` if `python!=3.12` * Assign ray actor name * Fix setting `options` for Ray actor * Set name for all the queues * Add requirements * Add docstrings * Remove unit test * Add extra `resources` * Add `ray` method * Add `ray[default]` as dependency * Add `script_executed_in_ray_cluster` function * Fix step load fail didn't stop the pipeline * Run with `RayPipeline` if detected Ray cluster * Set built dag * Fix unit tests * Add `Pipeline` to `RayPipeline` unit tests * Add `ray_init_kwargs` argument * Add `memory` attribute * Add simple `RayPipeline` integration test * Override `RayPipeline.dump` method * Add docs for `RayPipeline` * Fix close PR docs
* Move `CudaDevicePlacementMixin` to new module * Initial work for implementing Magpie * Simplify magpie implementation * Remove `use_open_ai` and add `MagpieChatTemplateMixin` to `InferenceEndpointsLLM` * Add `MagpieChatTemplateMixin` to `vLLM` * Add `MagpieGenerator` task * Fix unit tests * Fix docstrings * Mock `HF_TOKEN` environment variable * Fix list index out of range * Fix `MagpieGenerator` last batch * Add `only_instruction` attribute * Update categories * testing * Worth trying * Add examples * Add magpie unit tests * Fix docstring * Update docstrings * Apply suggestions from code review Co-authored-by: Agus <[email protected]> * Update to `huggingface_hub >= 0.22.0` * Add generation with `chat_completion` * Update `agenerate` arguments * Update unit tests * Fix `tools` were not being used * Update unit tests * Fix list of tuples instead of list of list * Add missing docstring * Add `chat_completion` unit tests * Fix `GroqLLM.generate` unit test after updating `_agenerate` --------- Co-authored-by: Agus <[email protected]>
…ks and handle `None`s. (#784) * Add `end_with_user` flag * Add `include_system_prompt` attribute to `Magpie` * Update docstrings * Update `MagpieBase` to handle `None`s * Fix `InferenceEndpointsLLM` unit tests after release of `huggingface_hub==0.24.0`
* Add `_NoDaemonPool` class * Use `Union` * Update src/distilabel/pipeline/local.py Co-authored-by: Agus <[email protected]> * Update dependency version to `vllm>=0.5.3` and add `setuptools` * Remove pinned `outlines==0.34.0` * Fix docstring * Add docs about `vLLM` with `ray` --------- Co-authored-by: Agus <[email protected]>
* Update default names in GroupColumns * Fix integration test
* Add generating batches to `GeneratorStep` if unique step in the pipeline * Remove print
* Add default name for a pipeline * Move to uuid instead * Fix test and update final name based on uuid
* Update distilabel phrasing based on PR hugging face hub * Update README.md * Update index.md * Fix typos
…EmbeddingGeneration` and `FaissNearestNeighbour` steps (#830) * Add `Embeddings` base class and `SentenceTransformers` class * Add `EmbeddingGeneration` step * Add `precision` attribute * Add docstrings * Add example to docstring * Update component gallery to include `Embeddings` models * Add `sentence-transformers` extra * Add `FaissNearestNeighbour` step * Add category and example * Merge category to icons dictionaries * Add missing unit tests * Add `faiss-cpu` and `faiss-gpu` extras * Update unit tests
* Create file per hostname * Set default `_desired_num_gpus` to `1` * Fix `GeneratorTask`s not getting assigned gpus and name * Add `_init_cuda_device_placement` method * Remove info message * Add disabling `CudaDevicePlacementMixin` if `RayPipeline` * Fix unit test
* Add helper function to create generator step from dataset * Add integration tests for make_generator_step * Redirect import * Update LoadDataFromHub to not call load if a dataset is already defined * Update docs * Add unit tests for the new helper function * Update filename to utils * Add helper method to insert a root step * Add logic to create a generator step internally from a dataset * Pass the dataset variable from all the pipeline implementations * Add type for the input datasets * Avoid circular imports * Add test for pipelines with generator step and dataset * Add integration tests for dataset passed via run method * Fix error evaluation dataframe * Add example on quickstart and entry on how to guide * Update docs/sections/getting_started/quickstart.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update docs/sections/getting_started/quickstart.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update src/distilabel/pipeline/base.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update src/distilabel/pipeline/ray.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update src/distilabel/steps/generators/utils.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update src/distilabel/steps/generators/utils.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update src/distilabel/pipeline/local.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Respect import order * Move functionality to a proper internal method * Run linter * Fix format --------- Co-authored-by: Gabriel Martín Blázquez <[email protected]> Co-authored-by: David Berenstein <[email protected]>
… signature (#838) * Do not take into account `disable_cuda_device_placement` for pipeline signature * Fix unit test
…instead of `None` (#841) * Fix default value was `ellipsis` instead of `None` * Fix unit test
* Create placement group for `vLLM` * Use `SPREAD` if `pipeline_parallel_size>1` * Fix bundle initialization * Fix wrong dictionary * Remove using `SPMD` from ray docs * Refactor creating `PlacementGroup` for `vLLM`
* Update `_Argilla` base and `TextGenerationToArgilla` * Fix `_dataset.records.log` and rename to `ArgillaBase` Co-authored-by: Ben Burtenshaw <[email protected]> * Update `TextGenerationToArgilla` subclass inheritance * Remove unused `logger.info` message * Update `PreferenceToArgilla` * Update `argilla` extra to install `argilla_sdk` For the moment it's being installed as `pip install git+https://github.com/argilla-io/argilla-python.git@main` * Add `ArgillaBase` and subclasses unit tests * Install `argilla_sdk` from source and add `ipython` * upgrade argilla dep to latest rc * udate code with latest changes * chore: remove unnecessary workspace definition * fix: wrong argilla module import * Update docstrings * Fix lint * Add check for `api_url` and `api_key` * Fix unit tests * Fix unit tests * Update argilla dependency version --------- Co-authored-by: Ben Burtenshaw <[email protected]> Co-authored-by: Francisco Aranda <[email protected]> Co-authored-by: Gabriel Martín Blázquez <[email protected]>
* Use `CudaDevicePlacementMixin` in `RewardModelScore` step * Update `_init_cuda_device_placement` to be `LLM` attribute agnostic * Check if `Step` is instance of `CudaDevicePlacementMixin`
* Allow getting GPUs from several nodes * Fix multiply by float * Fix 0 gpus * Rename variable
* Add Google Analytics and feedback form per page * Remove duplicate extra tag
…iplets (#856) * Add hard-negative flag to include similar challenging negatives on triplets * Update src/distilabel/steps/tasks/sentence_transformers.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> --------- Co-authored-by: Gabriel Martín Blázquez <[email protected]>
* Grab citations from dag * Include citations in README template * Add test to check citations are parsed * Pass dag to create_distiset function * Update citation section in steps that are backed by a paper * Add reference in the docs for the Citations section * Update docs/sections/how_to_guides/advanced/distiset.md Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Update src/distilabel/distiset.py Co-authored-by: Gabriel Martín Blázquez <[email protected]> * Refactor function to grab citations when creating a distiset --------- Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-857/ |
CodSpeed Performance ReportMerging #857 will not alter performanceComparing Summary
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.