Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Simplify customizing the TextGeneration task with custom prompts #974

Merged
merged 12 commits into from
Sep 16, 2024

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Sep 12, 2024

Description

This PR allows extra customization in the TextGeneration task. Now a jinja2 template can be passed as a string, and as long as no further processing is needed, this task is the go to task.

The previous functionality is still present, but now we have some extras:

  • Use a custom template to generate text
from distilabel.steps.tasks import TextGeneration
from distilabel.llms.huggingface import InferenceEndpointsLLM

CUSTOM_TEMPLATE = '''Document:
{{ document }}

Question: {{ question }}

Please provide a clear and concise answer to the question based on the information in the document and your general knowledge:
'''.rstrip()

text_gen = TextGeneration(
    llm=InferenceEndpointsLLM(
        model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
    ),
    system_prompt="You are a helpful AI assistant. Your task is to answer the following question based on the provided document. If the answer is not explicitly stated in the document, use your knowledge to provide the most relevant and accurate answer possible. If you cannot answer the question based on the given information, state that clearly.",
    template=CUSTOM_TEMPLATE,
    columns=["document", "question"],
)

text_gen.load()

result = next(
    text_gen.process(
        [
            {
                "document": "The Great Barrier Reef, located off the coast of Australia, is the world's largest coral reef system. It stretches over 2,300 kilometers and is home to a diverse array of marine life, including over 1,500 species of fish. However, in recent years, the reef has faced significant challenges due to climate change, with rising sea temperatures causing coral bleaching events.",
                "question": "What is the main threat to the Great Barrier Reef mentioned in the document?"
            }
        ]
    )
)
  • Use a custom template to generate text
from distilabel.steps.tasks import TextGeneration
from distilabel.llms.huggingface import InferenceEndpointsLLM

CUSTOM_TEMPLATE = '''Generate a clear, single-sentence instruction based on the following examples:

{% for example in examples %}
Example {{ loop.index }}:
Instruction: {{ example }}

{% endfor %}
Now, generate a new instruction in a similar style:
'''.rstrip()

text_gen = TextGeneration(
    llm=InferenceEndpointsLLM(
        model_id="meta-llama/Meta-Llama-3.1-70B-Instruct",
    ),
    template=CUSTOM_TEMPLATE,
    columns="examples",
)

text_gen.load()

result = next(
    text_gen.process(
        [
            {
                "examples": ["This is an example", "Another relevant example"],
                "system_prompt": "You are an AI assistant specialised in cybersecurity and computing in general, you make your point clear without any explanations."
            }
        ]
    )
)

Updates to TextGeneration customization.

@plaguss plaguss added this to the 1.4.0 milestone Sep 12, 2024
@plaguss plaguss added the enhancement New feature or request label Sep 12, 2024
@plaguss plaguss self-assigned this Sep 12, 2024
Copy link

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-974/

@plaguss plaguss marked this pull request as ready for review September 13, 2024 09:53
Copy link

codspeed-hq bot commented Sep 13, 2024

CodSpeed Performance Report

Merging #974 will not alter performance

Comparing text-generation-templated (18b6efb) with develop (28ecbc4)

Summary

✅ 1 untouched benchmarks

@plaguss plaguss merged commit af08b59 into develop Sep 16, 2024
7 checks passed
@plaguss plaguss deleted the text-generation-templated branch September 16, 2024 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants