-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add problem statement rewriting to the rewriting pipeline #192
Conversation
@FelixTJDietrich Can you review the conflicts, please? |
WalkthroughThis pull request introduces a new variant-based approach for text rewriting in the pipeline, specifically for FAQ and problem statement processing. The changes include adding two new system prompts in the Changes
Sequence DiagramsequenceDiagram
participant Client
participant Router
participant Pipeline
participant LLM
Client->>Router: Request text rewriting
Router->>Router: Validate variant
Router->>Pipeline: Initialize with variant
Pipeline->>LLM: Select appropriate system prompt
LLM->>Pipeline: Return rewritten text
Pipeline->>Router: Return processed text
Router->>Client: Send rewritten text
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@bassner should be good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
app/pipeline/rewriting_pipeline.py (3)
57-57
: Remove debug print statement.The print statement appears to be left from debugging and should be removed.
- print(variant_prompts[self.variant])
53-56
: Consider moving variant_prompts dictionary to class level.The
variant_prompts
dictionary is recreated on each call. Consider moving it to class level for better performance and maintainability.class RewritingPipeline(Pipeline): + variant_prompts = { + "faq": system_prompt_faq, + "problem_statement": system_prompt_problem_statement, + }
72-78
: Enhance markdown cleanup with regex.The current markdown cleanup logic uses string operations. Consider using regex for more robust handling of markdown code blocks.
- if response.startswith("```") and response.endswith("```"): - response = response[3:-3] - if response.startswith("markdown"): - response = response[8:] - response = response.strip() + import re + response = re.sub(r'^```(?:markdown)?\s*(.*?)\s*```$', r'\1', response.strip(), flags=re.DOTALL)app/pipeline/prompts/rewriting_prompts.py (2)
1-32
: Consider using Python's textwrap.dedent for better readability.The multi-line string could be more readable using
textwrap.dedent
to handle indentation.+from textwrap import dedent -system_prompt_faq = """\ +system_prompt_faq = dedent("""\ :You are an excellent tutor with expertise in computer science... ###END###\ -""" +""")
1-85
: Add docstrings to document the prompts.Consider adding docstrings to document the purpose and expected format of each prompt.
+"""FAQ rewriting prompt. + +This prompt guides the model to enhance FAQ text while maintaining clarity +and preserving the original meaning. +""" system_prompt_faq = ... +"""Problem statement rewriting prompt. + +This prompt guides the model to refine problem statements while preserving +technical details and maintaining student-focused language. +""" system_prompt_problem_statement = ...app/web/routers/pipelines.py (2)
282-282
: Enhance error message for unknown variants.Consider listing the available variants in the error message for better user experience.
- raise ValueError(f"Unknown variant: {variant}") + available_variants = ["faq", "problem_statement"] + raise ValueError(f"Unknown variant: {variant}. Available variants: {', '.join(available_variants)}")
303-303
: Consider validating variant before converting to lowercase.The variant is converted to lowercase without validation. Consider validating the input first.
+ if not isinstance(variant, str): + raise ValueError(f"Variant must be a string, got {type(variant)}") variant = variant.lower()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/pipeline/prompts/rewriting_prompts.py
(1 hunks)app/pipeline/rewriting_pipeline.py
(6 hunks)app/web/routers/pipelines.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Push Docker Image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
app/pipeline/rewriting_pipeline.py (5)
2-2
: Remove unused import.The
Optional
type import is no longer used since thecallback
parameter in__init__
is now required.-from typing import Literal, Optional +from typing import Literal
57-57
: Remove debug print statement.Debug print statement should be removed before merging.
- print(variant_prompts[self.variant])
67-67
: Consider making temperature configurable.The temperature value (0.4) is hardcoded. Consider making it configurable through the constructor or method parameters for more flexibility.
- [prompt], CompletionArguments(temperature=0.4), tools=None + [prompt], CompletionArguments(temperature=self.temperature), tools=NoneAnd in the constructor:
def __init__( self, callback: RewritingCallback, variant: Literal["faq", "problem_statement"], temperature: float = 0.4 ): # ... self.temperature = temperature
72-78
: Enhance markdown cleanup robustness.The markdown cleanup logic could be more robust to handle additional edge cases:
- Multiple code blocks
- Different markdown language specifiers
- Inconsistent newlines
Consider using a regular expression for more robust cleanup:
- if response.startswith("```") and response.endswith("```"): - response = response[3:-3] - if response.startswith("markdown"): - response = response[8:] - response = response.strip() + import re + response = re.sub(r'^```\w*\n?|\n?```$', '', response).strip()
Line range hint
23-81
: Consider a more extensible variant design pattern.The current implementation with a Literal type and dictionary mapping is clean but might be less flexible for future variants. Consider implementing a strategy pattern:
from abc import ABC, abstractmethod class RewritingStrategy(ABC): @abstractmethod def get_prompt(self) -> str: pass class FAQRewritingStrategy(RewritingStrategy): def get_prompt(self) -> str: return system_prompt_faq class ProblemStatementRewritingStrategy(RewritingStrategy): def get_prompt(self) -> str: return system_prompt_problem_statement # Then in RewritingPipeline: class RewritingPipeline(Pipeline): def __init__( self, callback: RewritingCallback, strategy: RewritingStrategy, temperature: float = 0.4 ): self.strategy = strategy self.temperature = temperature # ...This would make it easier to:
- Add new variants without modifying existing code
- Include variant-specific logic beyond just prompt selection
- Test variants in isolation
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
app/pipeline/rewriting_pipeline.py
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Push Docker Image
🔇 Additional comments (3)
app/pipeline/rewriting_pipeline.py (3)
28-28
: LGTM!The variant attribute is well-typed with appropriate constraints.
53-56
: LGTM!Clean implementation of variant to prompt mapping using a dictionary.
Line range hint
30-42
: Verify impact of breaking constructor change.Making
callback
a required parameter is a breaking change. Ensure all existing instantiations ofRewritingPipeline
are updated to provide this parameter.✅ Verification successful
Constructor change impact verified - no issues found
The only instantiation of
RewritingPipeline
in the codebase (inapp/web/routers/pipelines.py
) already provides both required parameters correctly.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for RewritingPipeline instantiations rg -l "RewritingPipeline\(" | while read -r file; do echo "=== $file ===" rg -A 2 "RewritingPipeline\(" "$file" doneLength of output: 658
Note
This PR builds on #191, please merge afterwards
Artemis#10156 is the corresponding Artemis PR to test with
This pull request includes significant updates to the
rewriting_pipeline
to support multiple variants and improve functionality. The most important changes include adding a new prompt for problem statements, modifying the pipeline to handle different variants, and updating the initialization and call methods to accommodate these changes.Support for Multiple Variants:
app/pipeline/prompts/rewriting_prompts.py
: Added a new prompt for problem statements and moved the FAQ prompt to this file.app/pipeline/rewriting_pipeline.py
: Updated the import statements and added avariant
parameter to theRewritingPipeline
class. Modified the__init__
and__call__
methods to handle different prompts based on the variant. [1] [2] [3] [4] [5] [6]Pipeline Initialization:
app/web/routers/pipelines.py
: Updated therun_rewriting_pipeline_worker
andrun_rewriting_pipeline
functions to handle the newvariant
parameter, ensuring the correct pipeline variant is used. [1] [2]Feature Descriptor Updates:
app/web/routers/pipelines.py
: Modified theget_pipeline
function to return specific feature descriptors for the FAQ and problem statement variants.Summary by CodeRabbit
New Features
Improvements