takeTill acting wierd #80

banacorn · 2014-09-27T16:26:16Z

parser :: Parser Text
parser = takeTill ((==) 'a'))

main :: IO ()
main = parseTest parser "𝟘a" >>= print

The code should result in Done "a" "\120792", a clean cut.
But I get Done "\57304a" "\120792"

With the predicate negated, takeWhile also presents the same issue.

The issue can be reproduced with this gist
I'm using attoparsec-0.12.1.2 with text-1.2.0.0

Thanks!

The text was updated successfully, but these errors were encountered:

SeanRBurton · 2014-12-02T19:58:58Z

Note that '\57304' is the second element of the surrogate pair of '𝟘' which suggests that this bug is caused by advancing by 16 bits irrespective of the width of any particular character. I can reproduce this bug (and similar bugs in scan, peekChar, takeText, and takeLazyText) using any character which requires 32 bits to represent (i.e. ord c >= 2^16).

basvandijk · 2014-12-05T11:30:06Z

...this bug is caused by advancing by 16 bits irrespective of the width of any particular character.

It looks like you're correct.

bos · 2014-12-05T19:59:25Z

Thanks for the helpful repro. I'll take a look at this as soon as I can.

hesselink · 2015-02-13T10:19:35Z

Any chance of a release with this fix? I just ran into this with an even simpler reproduction: takeText "💋".

banacorn · 2015-02-13T17:58:18Z

ha, is that a pair of lips?

bos · 2015-02-18T06:15:22Z

Released as 0.12.1.3.

hesselink · 2015-02-18T08:46:42Z

Thanks!

0.13.2.1 * Improved performance of Data.Attoparsec.Text.asciiCI 0.13.2.0 * pure is now strict in Position 0.13.1.0 * runScanner now correctly returns the final state (haskell/attoparsec#105). * Parser, ZeptoT, Buffer, and More now expose Semigroup instances. * Parser, and ZeptoT now expose MonadFail instances. 0.13.0.2 * Restore the fast specialised character set implementation for Text * Move testsuite from test-framework to tasty * Performance optimization of takeWhile and takeWhile1 0.13.0.1 * Fixed a bug in the implementations of inClass and notInClass for Text (haskell/attoparsec#103) 0.13.0.0 * Made the parser type in the Zepto module a monad transformer (needed by aeson's string unescaping parser). 0.12.1.6 * Fixed a case folding bug in the ByteString version of stringCI. 0.12.1.5 * Fixed an indexing bug in the new Text implementation of string, reported by Michel Boucey. 0.12.1.4 * Fixed a case where the string parser would consume an unnecessary amount of input before failing a match, when it could bail much earlier (haskell/attoparsec#97) * Added more context to error messages (haskell/attoparsec#79) 0.12.1.3 * Fixed incorrect tracking of Text lengths (haskell/attoparsec#80)

This was referenced Dec 7, 2014

Improve QC instances #86

Merged

Fix Issue #80 #87

Merged

bos closed this as completed Feb 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

takeTill acting wierd #80

takeTill acting wierd #80

banacorn commented Sep 27, 2014

SeanRBurton commented Dec 2, 2014

basvandijk commented Dec 5, 2014

bos commented Dec 5, 2014

hesselink commented Feb 13, 2015

banacorn commented Feb 13, 2015

bos commented Feb 18, 2015

hesselink commented Feb 18, 2015

takeTill acting wierd #80

takeTill acting wierd #80

Comments

banacorn commented Sep 27, 2014

SeanRBurton commented Dec 2, 2014

basvandijk commented Dec 5, 2014

bos commented Dec 5, 2014

hesselink commented Feb 13, 2015

banacorn commented Feb 13, 2015

bos commented Feb 18, 2015

hesselink commented Feb 18, 2015