-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested quotes inside of the multi-line strings bugs #230
Comments
This is important. I hit the same exact. Will really appreciate it if this is fixed. |
Sorry to be like this, but i've stepped thru the code in the debugger w/ step debugging, and the whole thing needs a rewrite. the parsing algorithm is fundamentally flawed and it needs an overhaul, if not a parser-generator. Pretty much every line i've stepped over has something .. not-best-practicey about it. I could elaborate 🤪 Specifically the part about how multiline strings are parsed is like, a mostly nonworking hack. Every couple of lines I step over I can come up with another edge case for a multi-line string that trips up the parser. like, parsing a multi-line literal string ( I'm exploring various options now, including trying to get C-bindings to the (unofficially authoritative) Go library .. that could be called from python. I'd be curious to hear others' thoughts of course! Also i find the lack of unit tests troubling. Yes there's the official test suite, but that doesn't cover all these edge cases that people are running in to. When you fix it for these edge cases, there should always be a new test case covering this fix. There isn't |
@hipe thank you for taking a look! Out of curiosity, is go parser more authoritative than the rust one? |
@vors thanks for your response! Offhand i don't know but i (i as in me but also i as in whoever) would certainly look at the rust one too when shopping around for possibilities for a python-go or python-rust interoperability for a possible new python package to work on. Also i would give the rust [crate ?] a close look too when I'm looking for test cases to put in a test suite with better coverage to compliment the "official" (but apparently anemic) test suite. My workaround for my current project .. (i have a bug where, with this ☝️ python implementation of toml, multiline strings that have URL's with html anchors in them '#' they GET REMOVED LIKE THEY'RE COMMENTS LOL 😭 (but only if the number of single quotes that appear before that '#' IS ODD 💀)) ..my workaround for my current projects might be to migrate from toml to this new "eno" https://news.ycombinator.com/item?id=17765426 because it sounds like a better fit for my use case anyway. But there is still this place in my heart for toml and also ☝️ this toml library should "just work" because of how supposedly simple it is and because this module ships in the python standard distribution, right? more later hopefully.. |
I don't think so - I got it from pip. |
Hi Sergei -- thank you for reporting this issue! Unfortunately, I have not yet had the chance to look in depth at fixing it. I am going to add a bit of context for Chip as to why the string parsing in particular is so fucked up and the sorry state of this repository. @hipe This codebase is almost as old as the TOML standard itself. I originally wrote it over 6 years ago after seeing the spec posted on Hacker News. The format at that time was much, much simpler. For example, there was one type of string (not 4!) and there were no inline tables to create mutual nesting of arrays and tables. Since I was so quick to implement it, I was able to snag the The multiline string and literal string stuff was added in v0.3.0, around a year and a half after I had implemented the original parser. So the code suffers from the standard problem of scope creep plus a lack of foresight as to how the spec would expand in the future when the codebase was first written. Beyond the scope creep issue, I have been the primary maintainer of the codebase since I wrote it. This project has never been my day job. I've had less and less spare time to devote to it as I've gone from high school senior to undergrad student to software professional. The primary mode of development is drive-by patches, whether written by someone else or by myself. I have been meaning to go back and add test cases to the test suite to cover all the past issues that I've patched, but it is difficult to justify prioritizing that over fixing new and existing issues as they are reported. And by "meaning to", I mean this is something that has been in the back of my mind as a TODO for months (years?) at this point. If you want to take a stab at it, by all means, please do. Patches welcome 😃 That applies to both this repo and the test suite repo. |
Thank you @uiri! I feel your pain 🙏 and I hope I can help out eventually 🤗 i just re-discovered that Pipfiles are in toml, and so we have: https://github.com/pypa/pip/blob/master/src/pip/_vendor/pytoml/parser.py (and pip certainly isn't going anywhere!) and so it would be a fun compare-and-contrast to see if one can hackishly reach that module and see how it does against our imaginary test suite i keep talking about. but when you look for a "pytoml" library proper, it says it's deprecated and points us back to the topic project https://github.com/avakar/pytoml more later! |
This is somewhat similar to #123 .
Tested on 0.10.0
I found 2 issues with quotes inside a multiline file.
Repro 1
That should not be empty.
Note that
'bar' = 'baz'
is significant - if you drop it, then everything worksRepro 2
Also dropping the
yield
part leads to exception, but I don't think that the quotes are unbalanced thereThe text was updated successfully, but these errors were encountered: