Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new add_raw_input argument to _Task so we can automatically include the formatted input #903

Merged
merged 4 commits into from
Aug 14, 2024

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Aug 14, 2024

Description

Adds a new attribute to the Tasks to automatically include the formatted input in the distilabel_metadata column.

Example from the tests:

class DummyTask(Task):
    @property
    def inputs(self) -> List[str]:
        return ["instruction", "additional_info"]

    def format_input(self, input: Dict[str, Any]) -> "ChatType":
        return [
            {"role": "system", "content": ""},
            {"role": "user", "content": input["instruction"]},
        ]

    @property
    def outputs(self) -> List[str]:
        return ["output", "info_from_input"]

    def format_output(
        self, output: Union[str, None], input: Union[Dict[str, Any], None] = None
    ) -> Dict[str, Any]:
        return {"output": output, "info_from_input": input["additional_info"]}  # type: ignore


task = DummyTask(
    llm=DummyLLM(),
    add_raw_output=True,
    add_raw_input=True,
)
task.load()
next(task.process([
    {"instruction": "test_0", "additional_info": "additional_info_0"},
]))
# {'additional_info': 'additional_info_0',
#  'distilabel_metadata': {'raw_input_dummy_task_0': [{'content': '',
#                                                      'role': 'system'},
#                                                     {'content': 'test_0',
#                                                      'role': 'user'}],
#                          'raw_output_dummy_task_0': 'output'},
#  'info_from_input': 'additional_info_0',
#  'instruction': 'test_0',
#  'model_name': 'test',
#  'output': 'output'}]

Example in the docs:

image

Closes #698

@plaguss plaguss added the enhancement New feature or request label Aug 14, 2024
@plaguss plaguss added this to the 1.4.0 milestone Aug 14, 2024
@plaguss plaguss self-assigned this Aug 14, 2024
@plaguss plaguss linked an issue Aug 14, 2024 that may be closed by this pull request
Copy link

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-903/

Copy link

codspeed-hq bot commented Aug 14, 2024

CodSpeed Performance Report

Merging #903 will not alter performance

Comparing add-raw-input (50d7b4c) with develop (c8df5a9)

Summary

✅ 1 untouched benchmarks

@plaguss plaguss marked this pull request as ready for review August 14, 2024 11:26
@plaguss plaguss merged commit 3d772c5 into develop Aug 14, 2024
7 checks passed
@plaguss plaguss deleted the add-raw-input branch August 14, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] show raw input/response within tasks
2 participants