Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

takeTill acting wierd #80

Closed
banacorn opened this issue Sep 27, 2014 · 7 comments
Closed

takeTill acting wierd #80

banacorn opened this issue Sep 27, 2014 · 7 comments

Comments

@banacorn
Copy link

parser :: Parser Text
parser = takeTill ((==) 'a'))

main :: IO ()
main = parseTest parser "𝟘a" >>= print

The code should result in Done "a" "\120792", a clean cut.
But I get Done "\57304a" "\120792"

With the predicate negated, takeWhile also presents the same issue.

The issue can be reproduced with this gist
I'm using attoparsec-0.12.1.2 with text-1.2.0.0

Thanks!

@SeanRBurton
Copy link
Contributor

Note that '\57304' is the second element of the surrogate pair of '𝟘' which suggests that this bug is caused by advancing by 16 bits irrespective of the width of any particular character. I can reproduce this bug (and similar bugs in scan, peekChar, takeText, and takeLazyText) using any character which requires 32 bits to represent (i.e. ord c >= 2^16).

@basvandijk
Copy link
Member

...this bug is caused by advancing by 16 bits irrespective of the width of any particular character.

It looks like you're correct.

@bos
Copy link
Collaborator

bos commented Dec 5, 2014

Thanks for the helpful repro. I'll take a look at this as soon as I can.

This was referenced Dec 7, 2014
@hesselink
Copy link

Any chance of a release with this fix? I just ran into this with an even simpler reproduction: takeText "💋".

@banacorn
Copy link
Author

ha, is that a pair of lips?

@bos
Copy link
Collaborator

bos commented Feb 18, 2015

Released as 0.12.1.3.

@bos bos closed this as completed Feb 18, 2015
@hesselink
Copy link

Thanks!

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 31, 2019
0.13.2.1
* Improved performance of Data.Attoparsec.Text.asciiCI

0.13.2.0
* pure is now strict in Position

0.13.1.0
* runScanner now correctly returns the final state
  (haskell/attoparsec#105).
* Parser, ZeptoT, Buffer, and More now expose Semigroup instances.
* Parser, and ZeptoT now expose MonadFail instances.

0.13.0.2
* Restore the fast specialised character set implementation for Text
* Move testsuite from test-framework to tasty
* Performance optimization of takeWhile and takeWhile1

0.13.0.1
* Fixed a bug in the implementations of inClass and notInClass for
  Text (haskell/attoparsec#103)

0.13.0.0
* Made the parser type in the Zepto module a monad transformer (needed
  by aeson's string unescaping parser).

0.12.1.6
* Fixed a case folding bug in the ByteString version of stringCI.

0.12.1.5
* Fixed an indexing bug in the new Text implementation of string,
  reported by Michel Boucey.

0.12.1.4
* Fixed a case where the string parser would consume an unnecessary
  amount of input before failing a match, when it could bail much
  earlier (haskell/attoparsec#97)
* Added more context to error messages
  (haskell/attoparsec#79)

0.12.1.3
* Fixed incorrect tracking of Text lengths
  (haskell/attoparsec#80)
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jan 14, 2020
0.13.2.1
* Improved performance of Data.Attoparsec.Text.asciiCI

0.13.2.0
* pure is now strict in Position

0.13.1.0
* runScanner now correctly returns the final state
  (haskell/attoparsec#105).
* Parser, ZeptoT, Buffer, and More now expose Semigroup instances.
* Parser, and ZeptoT now expose MonadFail instances.

0.13.0.2
* Restore the fast specialised character set implementation for Text
* Move testsuite from test-framework to tasty
* Performance optimization of takeWhile and takeWhile1

0.13.0.1
* Fixed a bug in the implementations of inClass and notInClass for
  Text (haskell/attoparsec#103)

0.13.0.0
* Made the parser type in the Zepto module a monad transformer (needed
  by aeson's string unescaping parser).

0.12.1.6
* Fixed a case folding bug in the ByteString version of stringCI.

0.12.1.5
* Fixed an indexing bug in the new Text implementation of string,
  reported by Michel Boucey.

0.12.1.4
* Fixed a case where the string parser would consume an unnecessary
  amount of input before failing a match, when it could bail much
  earlier (haskell/attoparsec#97)
* Added more context to error messages
  (haskell/attoparsec#79)

0.12.1.3
* Fixed incorrect tracking of Text lengths
  (haskell/attoparsec#80)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants