-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple error reporting from Stream
#388
Comments
It is like this because I wanted to do calculation of source position and fetching of line to display in one pass. I can imagine why you would want this, though. Can you describe what exactly seems to be problematic about implementing Also here is an example (granted, it's a very simple one): https://markkarpov.com/tutorial/megaparsec.html#working-with-custom-input-streams |
I'm working on a parser that consumes custom token streams. But the token streams I'm working with also happens to be equipped with source positions: -- | Run a lexer on a string and produce a lazy stream of tokens
runLexer
:: forall tok.
Lexer tok -- ^ lexer specification
-> String -- ^ source file name (used in locations)
-> String -- ^ source text
-> TokenStream (L tok) (here In my case, I don't really need edit: |
I'm having the same issue; I've written a lexer with a lot of error recovery, and while the output from that stage is usually not too changed from the input, there are still a lot of long, complex sequences compressed to a single token, a few tokens that get inserted from nothing, and a lot of the line breaks that have been removed. With all that, even if "line" still had any meaning, I'd really rather just use the stage-1 tokens themselves in warning messages for the next parser pass (populating a data model), even if it means writing the printer myself. All that is even more strongly underscored by the fact that I'm (ab)using the parsers in a way that they can't fail at the outermost level -- I'm basically required to write a lot of code and carry a lot of data around in order to generate strings which I'll never use. I get its utility in the general case, but here I'm winding up fighting against something that's not even part of the core purpose of Megaparsec (i.e. parser combinators). |
Confession: I just read the lines I need to display directly from the file. |
In fall 2019 I tried to overhaul the
|
@hesiod Where can I see your changes? |
FWIW, I'm also facing the same challenge. I'm trying to port over my lexer and my parser from Parsec to Megaparsec. The lexer port was easy. The parser of the lexed tokens, not so much. In Parsec I simply had to define one combinator that would extract a token from the stream and a location/text equivalent. So Parsec was able to show in error messages what and where: -- | Consume the given predicate from the token stream.
consumeToken :: (Token -> Maybe a) -> TokenParser (a, Location)
consumeToken f = do
u <- getState
tokenPrim
tokenString
tokenPosition
(\(tok, loc) ->
if locationStartColumn loc > u
then fmap (, loc) (f tok)
else Nothing)
-- | Make a string out of the token, for error message purposes.
tokenString :: (Token, Location) -> [Char]
tokenString = tokenStr . fst
-- | Update the position by the token.
tokenPosition :: SourcePos -> (Token, Location) -> t -> SourcePos
tokenPosition pos (_, l) _ =
setSourceColumn (setSourceLine pos line) col
where (line,col) = (locationStartLine l, locationStartColumn l) As for megaparsec, I attempted to implement instance Ord a => Mega.Stream (Seq (Located a)) where
type Token (Seq (Located a)) = Located a
type Tokens (Seq (Located a)) = Seq (Located a)
tokenToChunk Proxy = pure
tokensToChunk Proxy = Seq.fromList
chunkToTokens Proxy = toList
chunkLength Proxy = length
chunkEmpty Proxy = null
positionAt1 Proxy _ (Located start _ _) = start
positionAtN Proxy pos Seq.Empty = pos
positionAtN Proxy _ (Located start _ _ :<| _) = start
advance1 Proxy _ _ (Located _ end _) = end
advanceN Proxy _ pos Seq.Empty = pos
advanceN Proxy _ _ ts =
let Located _ end _ = last (toList ts) in end
take1_ Seq.Empty = Nothing
take1_ (t :<| ts) = Just (t, ts)
takeN_ n s
| n <= 0 = Just (mempty, s)
| null s = Nothing
| otherwise = Just (Seq.splitAt n s)
takeWhile_ = Seq.spanl
instance Mega.ShowToken (Located Token) where
showTokens = unwords . map (showToken . locatedThing) . toList Whereas today, the
At this point I figured I would come and see if there was any plan to make this easier in the issue tracker. For now I'll stick with megaparsec-6.4.1 because I already have a working parser infrastructure with that and I've used up my budgeted time on this instead of writing out the parser for my language. I don't have a speed requirement whatsoever, I'm just interested in megaparsec for its ability to use custom data structures for error messages, so I might not be in your target userbase. |
I just attempted to port https://github.com/unisonweb/unison's parser to the latest version of megaparsec, and had to back out after 90 minutes. I had the same issue as others in this thread: it's really hard to implement First I thrashed around a bit, trying to figure out how things are supposed to work from reading the haddocks and the source code. Then I read a couple threads on this issue tracker, and found an example in the tutorial, which shows how to do this by carrying around the original non-lexed source, but ultimately calculating the right offsets and such was just too confusing for me to push through tonight. Unison has some lexemes that don't take any space, like virtual semicolons, which may complicate things (but maybe not). Anyways, my task is to bump Unison to GHC 8.8, and it's a |
I've attempted this with #415. Let me know what you think! |
Please also see my comment #415 (comment). |
Resolved in #415. |
Many of the methods of
Stream
, likereachOffset
,showTokens
, are related to printing pretty error messages.But it's a pain when it comes to implementing these methods for custom token streams.
Perhaps we should make error reporting a typeclass of its own:
So that people who have their own way of reporting error don't have to implement those methods.
The text was updated successfully, but these errors were encountered: