Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow grammar json array to output unescaped new line in string #5885

Merged
merged 2 commits into from
Mar 5, 2024
Merged

Don't allow grammar json array to output unescaped new line in string #5885

merged 2 commits into from
Mar 5, 2024

Conversation

ExtReMLapin
Copy link
Contributor

@ExtReMLapin ExtReMLapin commented Mar 5, 2024

I had this issue where the generated json was this (with a another grammar file)

{"entities": [

{"type": "Geo-Political Entity", "name": "Paris 2024"},
{"type": "Location", "name": "Paris"}],
"relations": [{"type_from": "G',
eographic Feature", "name_from": "Paris", "type_to": "Event", "name_to": "Paris 2024", "relation_type": "occurs_in"}]}

For the grammar builder I came up with this solution

string ::= "\"" (char | escapeSequence)* "\""
char ::= [^"\\]
escapeSequence ::= "\\\"" | "\\\\" | "\\n" | "\\r" | "\\t" | "\\b" | "\\f" | "\\'"

@ExtReMLapin
Copy link
Contributor Author

Related to IntrinsicLabsAI/gbnfgen#31

@ggerganov ggerganov merged commit b1a4e99 into ggerganov:master Mar 5, 2024
@ExtReMLapin
Copy link
Contributor Author

ExtReMLapin commented Mar 5, 2024

Oopsie woopsie, I did a mistake

--grammar "root ::= [^'\\\n]*"

Won't actually BLOCK \ characters and new lines, it will interpret \\ to block and the letter n (lowercase only)

Example :

./main -m ../models/mistral-7b-instruct-v0.2.Q5_K_M.gguf -n 4096 -p 'which month is after october ?(lowercase answer only)' -ngl 444444 --grammar "root ::= [^'\\\n]*"
Will output

The correct lowercase spelled out word for the desired term is: "the letter after o, followed by two letters" or "the eleVенth mOth". So, the word you are likely after is "November". 

And giving a try with :

"root ::= [^'\n\\]*"

Segfaults (llama_sampling_init: failed to parse grammar)

Now, my question is , from original grammar file: [^"\\] what does this thing even go ? Alright, it blocks double quotes but then how is \\ correctly interpreted if "root ::= [^'\n\\]*" fails

ggerganov added a commit that referenced this pull request Mar 5, 2024
@ggerganov
Copy link
Owner

Hm, not sure - if you find something that works, open a PR. For now I reverted the change

@ExtReMLapin
Copy link
Contributor Author

Thanks for reverting, i'll do more investigations, it could also be caused by my terminal interpreting the anti slashes

@ExtReMLapin
Copy link
Contributor Author

Alright, it seems that my command line interpreter was causing the \ + lower case N interpretation, and even with that, it was not fully fixed because I forgot about carriage return.

I'll take some time to see if there are others nasty characters that can cause troubles with json strings

@ExtReMLapin ExtReMLapin deleted the patch-1 branch March 5, 2024 17:17
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
…nov#5885)

* Don't allow grammar json array to output unescaped new line in string

* Don't allow new line in json object string
hazelnutcloud pushed a commit to hazelnutcloud/llama.cpp that referenced this pull request Mar 10, 2024
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
…nov#5885)

* Don't allow grammar json array to output unescaped new line in string

* Don't allow new line in json object string
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
…nov#5885)

* Don't allow grammar json array to output unescaped new line in string

* Don't allow new line in json object string
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants