Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #100: track position while parsing for better errors #128

Merged
merged 2 commits into from
Jan 27, 2018

Conversation

pearcedavis
Copy link
Contributor

I made the TomlDecodeError more like the JSONDecodeError from the stdlib json module, allowing position information to be surfaced in errors.

This partially addresses #100, as line numbers are reported accurately in errors. However, column numbers are harder to track in the current code; as a result, errors will often say "column 1", when the issue occurred elsewhere in the line. Also, errors in multiline arrays report the opening line, because they are currently in-lined before parsing.

@uiri
Copy link
Owner

uiri commented Oct 2, 2017

Hi pdedmon,

Thank you for your pull request!

The Travis failure is due to some lines exceeding 80 characters. Not a huge deal. I have a pre-commit hook I use to warn me about these things but I am not sure what is the best way to share it in the repository.

My initial thoughts with regards to the issues tagged user customization were to allow them to be fixed by refactoring the code into a Decoder and Encoder model similar to that used by the python standard library for JSON.

I will take a closer look at this PR this week to see how I feel about your proposed solution. It may be easier to fix #100 and #77 together since they both will rely on the unmangled toml source.

@pearcedavis pearcedavis force-pushed the decode-error-position branch from 43a6b7a to 3bf102a Compare October 2, 2017 04:53
@pearcedavis
Copy link
Contributor Author

No worries! I just noticed the .flake8 file in the repo - I'll make sure to run flake8 before committing next time.

+1 for the Encoder/Decoder model!

@gaborbernat
Copy link

@pdedmon any updates on this?

@pearcedavis
Copy link
Contributor Author

@gaborbernat I'd forgotten about this actually, but I might look into the conflicts and build failures later.

- errors now indicate the position in which they occurred
- the way the parser currently works makes columns hard to track, so
errors will often say "column 1", when they occur elsewhere in the line
- errors in multiline arrays are reported to be on the opening line, due
to the current preprocessing of the input
@pearcedavis pearcedavis force-pushed the decode-error-position branch from 3bf102a to 3ecf210 Compare January 14, 2018 20:53
@pearcedavis
Copy link
Contributor Author

I rebased and did a quick manual test (on top of the toml-test and tox tests), and it seems to be working as before.

@codecov-io
Copy link

codecov-io commented Jan 14, 2018

Codecov Report

Merging #128 into master will increase coverage by 0.26%.
The diff coverage is 47.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   63.85%   64.12%   +0.26%     
==========================================
  Files           4        4              
  Lines         711      733      +22     
==========================================
+ Hits          454      470      +16     
- Misses        257      263       +6
Impacted Files Coverage Δ
toml/decoder.py 67.17% <47.72%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36ca44c...b6249cb. Read the comment docs.

@gaborbernat
Copy link

@uiri does this look good?

toml/decoder.py Outdated
except IndexError:
raise TomlDecodeError("Invalid escape sequence")
raise ValueError()
except (IndexError, ValueError):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still be an IndexError? Why the change to ValueError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the index error was a hack to fall into this except, and that a ValueError was more appropriate for an invalid hex character.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you are right. It doesn't really matter (to me) which error type is used but I think the except should only accept the intentional error which gets raised for an invalid hex character.

@uiri
Copy link
Owner

uiri commented Jan 17, 2018

Hi,

I think that the TomlDecodeErrors should not change to ValueErrors. I think it would be better to have an alternative __init__ for the TomlDecodeError for the case where there is no file position available.

What is the reasoning behind changing to ValueError ?

@pearcedavis
Copy link
Contributor Author

I inherited from ValueError because that's what the python stdlib does with JSONDecodeError: https://docs.python.org/3/library/json.html#json.JSONDecodeError

@uiri
Copy link
Owner

uiri commented Jan 23, 2018

Inheriting from ValueError isn't the problem. I think that that makes a lot of sense, actually!

I see that there are a lot of instances of TomlDecodeError which this patch replaces with ValueError. I think it would be better to clearly indicate that an error from the library during decoding came from the library and not some other part of a program which happens to consume the library.

@gaborbernat
Copy link

@uiri so what changes do you request/propose?

@pearcedavis
Copy link
Contributor Author

I replaced the uses of TomlDecodeError with ValueError in some of the functions, because those functions didn't have access to the decoder position information. I suppose you could continue to raise a TomlDecodeError without any information, and then augment it before re-raising, but using a ValueError made more sense to me. All of the ValueErrors that have replaced TomlDecodeErrors are caught and result in TomlDecodeErrors, so we are not letting the ValueErrors escape from the lib.

@uiri
Copy link
Owner

uiri commented Jan 27, 2018

Oh! I see, now, thanks for pointing out the except ValueError clauses which you added. Somehow that didn't click for me before.

So then I think it is just the nit around the (slightly, overly) broad except (IndexError, ValueError): line which I would prefer narrowed.

Thank you again for your PR and for doing the rebase after the recent significant changes to master.

@pearcedavis
Copy link
Contributor Author

No worries! I'll update that except line now.

toml/decoder.py Outdated
raise ValueError()
except (IndexError, ValueError):
raise ValueError("Invalid hex character")
except IndexError:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhhh... shouldn't the raised error match the excepted error?

@pearcedavis
Copy link
Contributor Author

pearcedavis commented Jan 27, 2018 via email

@uiri uiri merged commit 4975790 into uiri:master Jan 27, 2018
uiri added a commit that referenced this pull request Mar 24, 2018
Any post-parsing issues with decoding get bubbled back up as a
ValueError which is then converted into the appropriate
TomlDecodeError.

Actual line number/column positioning might be ported later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants