Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unescape printable characters #3140

Merged
merged 9 commits into from
Sep 15, 2022
Merged

unescape printable characters #3140

merged 9 commits into from
Sep 15, 2022

Conversation

kokobd
Copy link
Collaborator

@kokobd kokobd commented Sep 5, 2022

This should fix #3115. It works. I will add tests later if the idea is okay.
Maybe a more proper fix is to modify the Outputable instance for IfaceTyLit in GHC?

Demo:
image

@kokobd kokobd marked this pull request as ready for review September 5, 2022 10:58
@kokobd kokobd requested a review from michaelpj September 5, 2022 10:58
@michaelpj
Copy link
Collaborator

Is there not a more appropriate printing method from GHC we can use? Or are we just only given ones that escape?

@kokobd
Copy link
Collaborator Author

kokobd commented Sep 5, 2022

Is there not a more appropriate printing method from GHC we can use?

I'm not aware of one. Note that SDoc already contains the escaped string, so SDocContext and OutputableP won't help.

@michaelpj
Copy link
Collaborator

Then I think the approach seems reasonable so long as we document it.

@kokobd kokobd requested review from drsooch and fendor September 9, 2022 15:10
@kokobd
Copy link
Collaborator Author

kokobd commented Sep 9, 2022

Then I think the approach seems reasonable so long as we document it.

Done.

Copy link
Collaborator

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some nits.

@@ -288,9 +289,13 @@ instance Outputable SDoc where

-- | Print a GHC value in `defaultUserStyle` without unique symbols.
--
-- This is the most common print utility, will print with a user-friendly style like: `a_a4ME` as `a`.
-- This is the most common print utility, and it will print with a user-friendly style like: `a_a4ME` as `a`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also include the fact that it unescapes characters? That's quite an important difference from other outputable functions. In fact... maybe we should use hlint to ban use of other outputable functions apart from this one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also include the fact that it unescapes characters?

Done.

maybe we should use hlint to ban use of other outputable functions apart from this one?

TBH I don't know if this is configurable. I think the scheme is to use printOutputable by default, and also allow other print functions to be used when needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we can use hlint to ban functions, so we could do it. I don't know if it's a good idea, though.

escapedTextParser = do
xs <- P.many (P.try stringLiteral)
x <- P.manyTill P.anySingle P.eof -- consume characters after the final double quote
pure $ concat xs ++ x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a little weird that you handle the content before the opening quote in stringLiteral but the content after the closing quote outside it. Maybe make that consistent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote stringLiteral to not include any content outside of the double quotes. I think that's more reasonable.

stringLiteral :: TextParser String
stringLiteral = do
before <- P.manyTill P.anySingle (P.char '"') -- include any character before the first double quote
inside <- P.manyTill P.charLiteral (P.char '"')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "hell\"ooo"? i.e. escaped quotes probably shouldn't stop us

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, from the tests it looks like you made this work but I'm not sure why it works...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.charLiteral calls lookAhead internally. And this is an example from the haddock of charLiteral:

stringLiteral = char '"' >> manyTill L.charLiteral (char '"')

[ unescapeTest
]

unescapeTest :: TestTree
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@michaelpj michaelpj added the merge me Label to trigger pull request merge label Sep 15, 2022
@mergify mergify bot merged commit 2bd1863 into master Sep 15, 2022
@kokobd kokobd deleted the kokobd/3115-fix-unicode branch September 16, 2022 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Label to trigger pull request merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unreadable unicode in type-level literals
2 participants