Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call escape_raw_string to make parsing more predictable #34

Merged
merged 1 commit into from
Feb 10, 2024

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Feb 5, 2024

The advantage here is that you don't escape it twice, so that styled"\\" doesn't need double the number of \\ each time to get one more \ at the end (the downside is that less common forms such as @styled_str "\\" still apply the unescape_string call twice and we cannot catch that)

julia> styled""
""

julia> styled"\\"
"\\"

julia> styled"\\\\"
"\\"

julia> styled"\\\\\\"
"\\\\"

julia> styled"\\\\\\\\"
"\\\\"

@tecosaur
Copy link
Collaborator

tecosaur commented Feb 6, 2024

Thanks! I wonder, why isn't this fixed by just adding \ to the tuple of chars supplied to unescape_string? If I make that change locally I see:

julia> styled"\\"
"\\"

julia> styled"\\\\"
"\\\\"

julia> styled"\\\\\\"
"\\\\\\"

julia> styled"\\\\\\\\"
"\\\\\\\\"

julia> styled"\a\b\e\u200b"
"\a\b\e\u200b"

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

That applies it always, rather than just at the end of a string

@tecosaur
Copy link
Collaborator

tecosaur commented Feb 6, 2024

Ah, so the bug you're mentioning in the original commit only affects a terminal /s?

Also, perhaps I'm missing the obvious - would you mind providing an example of how adding / to the tuple of unescape chars isn't a good enough fix?

These are the changes I'm considering:

diff --git a/src/stylemacro.jl b/src/stylemacro.jl
index b030569..9abccb4 100644
--- a/src/stylemacro.jl
+++ b/src/stylemacro.jl
@@ -107,7 +107,7 @@ macro styled_str(raw_content::String)
     end =#
 
     # Instead we'll just use a `NamedTuple`
-    state = let content = unescape_string(raw_content, ('{', '}', '$', '\n', '\r'))
+    state = let content = unescape_string(raw_content, ('\\', '{', '}', '$', '\n', '\r'))
          (; content, bytes = Vector{UInt8}(content),
          s = Iterators.Stateful(pairs(content)),
          parts = Any[],
@@ -635,10 +635,10 @@ macro styled_str(raw_content::String)
     function run_state_machine!(state)
         # Run the state machine
         for (i, char) in state.s
-            if char == '\\'
-                state.escape[] = true
-            elseif state.escape[]
+            if state.escape[]
                 escaped!(state, i, char)
+            elseif char == '\\'
+                state.escape[] = true
             elseif char == '$'
                 interpolated!(state, i, char)
             elseif char == '{'

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

These should return the same styled string

julia> styled"\\\\\\\\" * "a"
"\\\\a"

julia> styled"\\\\\\\\a"
"\\\\\\\\a"

@vtjnash
Copy link
Member Author

vtjnash commented Feb 6, 2024

You can attempt to combine escape_string and unescape_raw_string, but it can make the implementation much more confusing and likely to have bugs since they cannot be tested separately. It doesn't seem like that change you are considering implements any part of unescape_raw_string, but rather just reimplements the unescape_string's handling for \\ needlessly

@tecosaur
Copy link
Collaborator

tecosaur commented Feb 7, 2024

Right. Thanks for clarifying. One more question I'd like to ask is regarding this code comment in the PR

assuming the input was entered with single styled"" markers so this transform is unambiguously reversible and not as @styled_str "." or styled"""."""

I'm not too concerned about the @styled_str "..." case, but what's the potential transformation ambiguity with a triple-quote string macro styled"""..."""?

p.s. if there are any relevant tests you think it would be good to have, more tests would be nice, though not required.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 7, 2024

IIRC, styled"""...""" has an ambiguity with styled"""... \\\" ...""" vs. styled"""... \\" ...""" since the final \ in that set is optional. But, actually, I think unescape_string would give the same result for either?

@vtjnash
Copy link
Member Author

vtjnash commented Feb 10, 2024

Bump?

@tecosaur
Copy link
Collaborator

Ah yes. Mind if I force push to your branch with some minor changes (moving a comment, adding some a test or two)?

@vtjnash
Copy link
Member Author

vtjnash commented Feb 10, 2024

Go for it

Due to the initial parsing of the string before the macro sees it,
followed by the use of `unescape_string`, you currently need to double
up trailing backslashes. We can fix this by calling escape_raw_string in
advance.

This change may cause undesirable side-effects with less comply used
macro forms (e.g. `@styled_str "\\"`), but we'll deal with that if and
when it comes up.
@tecosaur tecosaur force-pushed the jn/escape_raw_string branch from 9ca429e to 0625bce Compare February 10, 2024 15:52
@tecosaur
Copy link
Collaborator

Done. The tests I just added failed before the change, but now pass.

I also expanded on the commit message a bit.

@tecosaur tecosaur merged commit 0625bce into main Feb 10, 2024
6 checks passed
@tecosaur tecosaur deleted the jn/escape_raw_string branch February 10, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants