-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization dry runs #397
Comments
This would be a wonderful feature. I have 3 things in mind (not sure about how well this can be implemented as I don't yet have complete source code level understanding):
|
I've started to work on some of these - #408 Not going in the same order, but here's what I have so far:
- def compile(self, student, *, teacher=None, trainset, valset=None):
+ def compile(self, student, *, teacher=None, trainset, valset=None, step=False):
self.trainset = trainset
self.valset = valset - def _bootstrap(self, *, max_bootstraps=None):
+ def _bootstrap(self, *, max_bootstraps=None, step=False):
max_bootstraps = max_bootstraps or self.max_bootstrapped_demos
bootstrapped = {}
self.name2traces = {name: [] for name in self.name2predictor}
for round_idx in range(self.max_rounds):
for example_idx, example in enumerate(tqdm.tqdm(self.trainset)):
if len(bootstrapped) >= max_bootstraps:
break
if example_idx not in bootstrapped:
success = self._bootstrap_one_example(example, round_idx)
if success:
bootstrapped[example_idx] = True
+ if step:
+ user_input = input("Continue bootstrapping? (Y/n): ")
+ if user_input.lower() == 'n':
+ print("Bootstrapping interrupted by user.")
+ return # Exit the loop and method
print(f'Bootstrapped {len(bootstrapped)} full traces after {example_idx+1} examples in round {round_idx}.') This seems pretty straightforward. Just adding
By changing
class Predict(Parameter):
...
def forward(self, **kwargs):
# Extract the three privileged keyword arguments.
new_signature = kwargs.pop("new_signature", None)
signature = kwargs.pop("signature", self.signature)
demos = kwargs.pop("demos", self.demos)
+ dry_run = kwargs.pop("dry_run", False)
...
+ if dry_run:
+ # Prepare a structured output for the dry run
+ dry_run_info = {
+ 'prompt': x, # The prepared prompt
+ 'config': config, # The configuration used for generation
+ 'signature': str(signature), # The signature being used
+ 'stage': self.stage, # The current stage
+ }
+
+ # If an encoder is available, include encoded tokens in the output
+ encoder = dsp.settings.config.get('encoder', None)
+ if encoder is not None:
+ encoded_tokens = encoder.encode(x)
+ dry_run_info['encoded_tokens'] = encoded_tokens
+ dry_run_info['token_count'] = len(encoded_tokens)
+ # Option 1: Return the dry run information for further inspection
+ return dry_run_info If they do want to perform a dry run, we use
import tqdm
import datetime
import itertools
+import tiktoken
from collections import defaultdict
...
+def load_encoder_for_lm(lm):
+ """
+ Load and cache the tiktoken encoder based on the LM configuration.
+ Args:
+ lm: The language model configuration.
+ """
+ # Load the encoder. This is a placeholder; adjust based on how you actually load the encoder.
+ encoder = tiktoken.encoding_for_model(lm)
+ # Cache the encoder
+ return encoder Do note that this only works for OpenAI models, since we're using tiktoken. To make it more robust, we ought to consider the suite of models that are valid, and then expand Let me know your thoughts. Happy to continue working on this. |
One thought on Token Counting - would it make sense to build this directly into the LM abstraction? I imagine, while there may be some overlap in tokenization model to model, it may be cleaner to pair the tokenization directly with the provider. |
I think this ties to the refactoring work that is being discussed here: #390 and I agree that it would be good to think of a generic solution which will work when integrating other open source models. |
Is there a way to do this without adding more special features to the Could it maybe be done using something like: with dspy.dryrun():
optimizer.compile(...) Where the |
I love the explorations here. Will look more closely tomorrow most likely BUT: the main challenge on this issue is possibly unaddressed which is that a lot of the optimizer logic is complex and data-dependent For example, bootstrap few shot will stop after it labels enough training examples — it won’t try to trace them all unnecessarily, depending on the metric Unclear how to dryrun that behavior… |
If the tokens used is non-determistic based on the optimization, could it provide value if we simply collected a broad sample of the number of optimization calls, and use it to estimate a likely range, as opposed to a single estimate? |
Great idea @KCaverly — yeah this also brings up having a “budget”. I imagine saying: “ please don’t make more than 10,000 requests and don’t cost me more than $4 on this run” |
To be clear budgets are outside the scope of dryruns but they’re related |
This may need to be done on a per-optimizer basis but it may be good to think of a dev UX that involves showing an estimation upfront of the number of calls / tokens in an optimization run, and possibly asking for confirmation if
dspy.settings.confirm_first
(this doesn't exist yet) is True or something to that effect.The text was updated successfully, but these errors were encountered: