Classifier Free Guidance #536

martindevans · 2024-02-24T20:12:00Z

Implemented a demo of how to implement classifier free guidance (aka "negative prompts") using the batched executor.

zsogitbe · 2024-02-25T07:24:45Z

Thank you Martin.
I had to tweak it a little bit that it would work with the master branch (the example in your branch was not compatible with it), but finally I could make it work.

It is really a guidance and not a negative prompt. We could call it a positive prompt. You basically tell the model what to answer. This is not what we want because this is not as useful as having a negative prompt where we can prevent specific answers (for example for ethical filtering). I am not sure if llama.cpp has the support for a negative prompt or if we can turn this positive prompt (guidance) around to become a negative prompt. Maybe you have some ideas.

zsogitbe · 2024-02-25T10:34:33Z

Added a possible bug issue: ggerganov/llama.cpp#5709

martindevans · 2024-02-25T16:53:53Z

I had to tweak it a little bit that it would work with the master branch

Do you mean the master branch in llama.cpp? if so that's expected, there's no stability in the llama.cpp API, tweaks are required on the C# side every time we update the binaries we ship. The branch should work as-is though, without any modifications.

It is really a guidance and not a negative prompt.

I'm not 100% sure what you mean, but did you try a negative guidance strength? That should work in the "opposite direction", if that's what you're looking for?

Edit: Change the 2 on this line to change guidance strength, to a negative value if you want.

zsogitbe · 2024-02-25T17:38:03Z

I had to tweak it a little bit that it would work with the master branch

Do you mean the master branch in llama.cpp? if so that's expected, there's no stability in the llama.cpp API, tweaks are
required on the C# side every time we update the binaries we ship. The branch should work as-is though, without any modifications.

No, I took [martindevans:guidance_in_token_data_array] and there were a lot of issues like Span instead of ReadOnlySpan, some 'protected tokens' which I have removed, in the Native api, etc. If you have committed that version to here, then I am not sure how it passed all checks. Maybe I have downloaded the wrong version.

It is really a guidance and not a negative prompt.

I'm not 100% sure what you mean, but did you try a negative guidance strength? That should work in the "opposite direction", if that's what you're looking for?

Edit: Change the 2 on this line to change guidance strength, to a negative value if you want.

What we have now is a positive guidance (what usually goes into the normal prompt) instead of a negative one. You can see this if you run your example, if you include 'red' in your negative prompt, then 'red' will be in the output. This is not how a negative prompt should work. If you add 'red' to the negative prompt, then we do not expect 'red' in the output.
In the mean time I have added an issue to llama.cpp about this (ggerganov/llama.cpp#5709). We probably need to convert the logits to negative weights in some way. I do not think that this happens in llama.cpp.

…sifier free guidance

…gPipeline`

martindevans · 2024-02-25T18:53:36Z

No, I took [martindevans:guidance_in_token_data_array] and there were a lot of issues like Span instead of ReadOnlySpan

I've rebased this branch onto master now. I made some changes to the custom sampling pipeline (actually inspired by this PR) which broke things.

What we have now is a positive guidance

So rather than trying to steer the model away from a certain direction/topic, are you trying to make sure it doesn't mention that thing at all?

Results

Just demonstrating some more results, for the sake of discussion.

Prompts:

Positive: "My favourite colour is"
Guidance: "I hate the colour red. My favourite colour is"

Weight: 2

Unguided: blue. Blue is calm, serene and peaceful. It's the colour of the sky and the ocean, two of my favourite things.
Guided: red. It's bold, vibrant and eye-catching, much like the sky and the ocean, two of the most beautiful natural wonders.

The guidance ("hate red") has been negated to steer the model away from that direction (i.e. towards liking red).

Weight: -2

Unguided: brown. It's such a versatile colour that can be used in many different ways. It's warm, inviting and comforting, which makes
Guided: blue. Blue. Red makes everything look blood-stained and violent. It's the colour of danger. I don't like danger.

Here the guidance weight has been inverted. It now hates red, just like the instructions in the guidance tell it to. Not sure how useful this is, since you could just put that instruction in the normal prompt.

zsogitbe · 2024-02-25T19:18:22Z

This sounds very promising! I think that you have just inverted the weight as we need it.
The aim of negative prompting, I think, should be indeed to steer the model away from specific topics, objects, ... There are three main elements in Attention:

Selection => prompt
Restriction => prompt + negative prompt
Suppression => negative prompt

I think that you should change your example like this:

Prompt: "My favourite colour is"
Negative Prompt: "I hate the colour red." + Weight: -2

What we still need to figure out is how to make sure that some things are just not mentioned. For example, if I do not ever want to see 'red' in the output, how to do it. In your example 'red' is mentioned in the output even with weight -2.

martindevans · 2024-02-25T20:44:56Z

Banning a specific token can be achieved with a logit bias, for example setting a bias of 1171 => float.NegativeInfinity would ban the token "red" (make it infinitely unlikely). That's something that's already supported by LLamaSharp in the inference parameters.

zsogitbe · 2024-02-26T07:06:10Z

I do not see any logit bias inference parameter. Could you please give an example on how to do this? Also, what is 1171?
I guess that you are referring to the vocabulary index, then that could solve one part of the problem, but not for the words which are not in the vocab.

I have tested the code. A few remarks:

if I restart the example in the console (after running it once) it crashes with access violation. I am not sure if this is a problem with your example or the example framework. The resources are probably not freed after a run Fatal error. System.AccessViolationException: Attempted to read or write protected memory...
I noticed that you repeat the prompt completely in the negative prompt like this: "I hate the colour red. My favourite colour is"
I am not sure that I can agree with that, but when I remove 'My favourite colour is' from the end, then it just does not work. Very strange because the negative prompt should be only the negative prompt and not the original prompt. Are we sure that the pipeline is run correctly?
In your example 'guided.Sample()' = 'unguided.Sample()'. This is a bit confusing, maybe better to rename both to one logical name.

martindevans · 2024-02-26T13:59:36Z

I do not see any logit bias inference parameter.

It here. If you wanted to ban the token "red" you would add Key=1171, Value=float.NegativeInfinity (1171 is the key for the characters red). You could also ban " red" (note the leading space) with token 2579.

This is a much more limited mechanism than CFG, as you say it can't ban words that aren't individual tokens. It could also cause it to become predisposed to reductively redacting redundant words ;)

if I restart the example in the console (after running it once) it crashes with access violation

Odd, I don't see this. I'm running with CPU inference though, are you running with CUDA? I just checked and I had missed a few places where resources were not being disposed properly in the examples, I just pushed up a commit fixing that. Hopefully that helps 🤞

I noticed that you repeat the prompt completely in the negative prompt like this...

If you look in the PR which originally implemented CFG ggerganov/llama.cpp#2135 you can see their examples do the same thing.

This makes sense given how it works internally. All it's doing is generating 2 sets of token probabilities at once, but the guidance probabilities lower the chance a token is selected. This means you want both sequences talking about roughly the same thing, otherwise the guidance probabilities are unrelated and don't really do much (it'll probably just make already unlikely tokens less likely).

In your example 'guided.Sample()' = 'unguided.Sample()'.

I'm not sure exactly what you mean. Do you mean this bit?

// Use this token to advance both guided _and_ guidance. Keeping them in sync (except for the initial prompt).
guided.Prompt(g);
guidance.Prompt(g);

If so, again this is due to the way CFG works. Both the guided and guidanced sequences needs to be kept exactly in sync (except for the initial prompt of course). So this is sampling from guided (using guidance for steering) and then advancing both sequences by the same token.

zsogitbe · 2024-02-26T14:43:35Z

Thank you Martin. It is clear now what is happening. We basically need to enter things we do not want to see in the output before the original prompt for the negative guidance (negative propt). The fusion of the logits will then make sure that those elements have less chance to appear in the output.
My only question is then what happens with the original propt? Because those words will also be downweighed because they are in the negative guidance!?
I should test this thorough.

Yes, I agree, this logit bias removal of tokens does not have any sense.

zsogitbe · 2024-02-26T14:50:19Z

Tested the code again. No crash this time, you have found the problem 👍 .

martindevans · 2024-02-26T15:13:05Z

My only question is then what happens with the original propt? Because those words will also be downweighed because they are in the negative guidance!?

That shouldn't be a huge problem. The guidance is based on the difference in probabilities between guided and guidance.

So for example if you prompted the same thing in both sequences it would make no difference at all to the output because the probabilities would be identical in both sequences!

In the example we're using here with favourite colours, both sequences will generate tokens (with high probability) to do with talking about colours in general (small difference, not much effect from guidance). However, the negative one will have a much higher probability for on tokens talking about hating red, so that will have a strong effect on output (pushing it away from that).

This is why the negative sequence needs to contain the same prompt, it keeps the continuations as similar as possible and thus the differences in token weights are relevant.

zsogitbe · 2024-02-26T15:25:12Z

My only question is then what happens with the original propt? Because those words will also be downweighed because they are in the negative guidance!?

That shouldn't be a huge problem. The guidance is based on the difference in probabilities between guided and guidance.

So for example if you prompted the same thing in both sequences it would make no difference at all to the output because the probabilities would be identical in both sequences!

In the example we're using here with favourite colours, both sequences will generate tokens (with high probability) to do with talking about colours in general (small difference, not much effect from guidance). However, the negative one will have a much higher probability for on tokens talking about hating red, so that will have a strong effect on output (pushing it away from that).

This is why the negative sequence needs to contain the same prompt, it keeps the continuations as similar as possible and thus the differences in token weights are relevant.

I think that it would be a good idea to enforce that the original propt is added to the end of the guidance prompt (negative propt). We should ask the guidance and then add them together internally. It does not have any sense to accidentally have a different original propt in the guidance propt...

martindevans · 2024-02-26T15:41:22Z

Yeah I agree. This is the "low level" interface (working directly with sampling, tokens logits etc). There definitely need to be a much higher level API over this in the future where you can just supply a positive and negative prompt and it internally does the right thing (e.g. joining them together, making sure that combo fits the template etc).

martindevans · 2024-02-26T15:41:51Z

Since the crashing issue has been fixed for you I'll merge this one. Thanks for your help testing it :)

martindevans mentioned this pull request Feb 24, 2024

Negative Prompt support - not found #535

Closed

martindevans added 5 commits February 25, 2024 17:44

Added a Guidance method to LLamaTokenDataArray which applies clas…

62381ab

…sifier free guidance

Factored out a safer llama_sample_apply_guidance method based on spans

528bb01

Created a guided sampling demo using the batched executor

80d1080

fixed comment, "classifier free" not "context free"

8526d6b

Rebased onto master and fixed breakage due to changes in `BaseSamplin…

879c62e

…gPipeline`

martindevans force-pushed the guidance_in_token_data_array branch from 2f4a85b to 879c62e Compare February 25, 2024 17:55

martindevans added 2 commits February 25, 2024 18:53

Asking user for guidance weight

d509968

Progress bar in batched fork demo

d50fe79

Improved fork example (using tree display)

cec4d65

martindevans added 2 commits February 26, 2024 13:47

Added proper disposal of resources in batched examples

29e043d

Added some more comments in BatchedExecutorGuidance

6c8dea9

martindevans merged commit 7d84625 into SciSharp:master Feb 26, 2024
3 checks passed

martindevans deleted the guidance_in_token_data_array branch February 26, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classifier Free Guidance #536

Classifier Free Guidance #536

martindevans commented Feb 24, 2024

zsogitbe commented Feb 25, 2024

zsogitbe commented Feb 25, 2024

martindevans commented Feb 25, 2024 •

edited

Loading

zsogitbe commented Feb 25, 2024

martindevans commented Feb 25, 2024

zsogitbe commented Feb 25, 2024 •

edited

Loading

martindevans commented Feb 25, 2024

zsogitbe commented Feb 26, 2024

martindevans commented Feb 26, 2024

zsogitbe commented Feb 26, 2024

zsogitbe commented Feb 26, 2024

martindevans commented Feb 26, 2024

zsogitbe commented Feb 26, 2024

martindevans commented Feb 26, 2024

martindevans commented Feb 26, 2024

Classifier Free Guidance #536

Classifier Free Guidance #536

Conversation

martindevans commented Feb 24, 2024

zsogitbe commented Feb 25, 2024

zsogitbe commented Feb 25, 2024

martindevans commented Feb 25, 2024 • edited Loading

zsogitbe commented Feb 25, 2024

martindevans commented Feb 25, 2024

Results

Weight: 2

Weight: -2

zsogitbe commented Feb 25, 2024 • edited Loading

martindevans commented Feb 25, 2024

zsogitbe commented Feb 26, 2024

martindevans commented Feb 26, 2024

zsogitbe commented Feb 26, 2024

zsogitbe commented Feb 26, 2024

martindevans commented Feb 26, 2024

zsogitbe commented Feb 26, 2024

martindevans commented Feb 26, 2024

martindevans commented Feb 26, 2024

martindevans commented Feb 25, 2024 •

edited

Loading

zsogitbe commented Feb 25, 2024 •

edited

Loading