Fix #100: track position while parsing for better errors #128

pearcedavis · 2017-10-02T03:27:01Z

I made the TomlDecodeError more like the JSONDecodeError from the stdlib json module, allowing position information to be surfaced in errors.

This partially addresses #100, as line numbers are reported accurately in errors. However, column numbers are harder to track in the current code; as a result, errors will often say "column 1", when the issue occurred elsewhere in the line. Also, errors in multiline arrays report the opening line, because they are currently in-lined before parsing.

uiri · 2017-10-02T04:48:59Z

Hi pdedmon,

Thank you for your pull request!

The Travis failure is due to some lines exceeding 80 characters. Not a huge deal. I have a pre-commit hook I use to warn me about these things but I am not sure what is the best way to share it in the repository.

My initial thoughts with regards to the issues tagged user customization were to allow them to be fixed by refactoring the code into a Decoder and Encoder model similar to that used by the python standard library for JSON.

I will take a closer look at this PR this week to see how I feel about your proposed solution. It may be easier to fix #100 and #77 together since they both will rely on the unmangled toml source.

pearcedavis · 2017-10-02T05:03:10Z

No worries! I just noticed the .flake8 file in the repo - I'll make sure to run flake8 before committing next time.

+1 for the Encoder/Decoder model!

gaborbernat · 2018-01-14T11:00:46Z

@pdedmon any updates on this?

pearcedavis · 2018-01-14T17:26:38Z

@gaborbernat I'd forgotten about this actually, but I might look into the conflicts and build failures later.

- errors now indicate the position in which they occurred - the way the parser currently works makes columns hard to track, so errors will often say "column 1", when they occur elsewhere in the line - errors in multiline arrays are reported to be on the opening line, due to the current preprocessing of the input

pearcedavis · 2018-01-14T20:55:36Z

I rebased and did a quick manual test (on top of the toml-test and tox tests), and it seems to be working as before.

codecov-io · 2018-01-14T20:57:05Z

Codecov Report

Merging #128 into master will increase coverage by 0.26%.
The diff coverage is 47.72%.

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   63.85%   64.12%   +0.26%     
==========================================
  Files           4        4              
  Lines         711      733      +22     
==========================================
+ Hits          454      470      +16     
- Misses        257      263       +6

Impacted Files	Coverage Δ
toml/decoder.py	`67.17% <47.72%> (+0.21%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36ca44c...b6249cb. Read the comment docs.

gaborbernat · 2018-01-14T21:37:52Z

@uiri does this look good?

uiri · 2018-01-17T06:00:45Z

toml/decoder.py

-            except IndexError:
-                raise TomlDecodeError("Invalid escape sequence")
+                    raise ValueError()
+            except (IndexError, ValueError):


I think this should still be an IndexError? Why the change to ValueError?

I thought the index error was a hack to fall into this except, and that a ValueError was more appropriate for an invalid hex character.

I suppose you are right. It doesn't really matter (to me) which error type is used but I think the except should only accept the intentional error which gets raised for an invalid hex character.

uiri · 2018-01-17T06:02:44Z

Hi,

I think that the TomlDecodeErrors should not change to ValueErrors. I think it would be better to have an alternative __init__ for the TomlDecodeError for the case where there is no file position available.

What is the reasoning behind changing to ValueError ?

pearcedavis · 2018-01-17T06:15:59Z

I inherited from ValueError because that's what the python stdlib does with JSONDecodeError: https://docs.python.org/3/library/json.html#json.JSONDecodeError

uiri · 2018-01-23T03:15:29Z

Inheriting from ValueError isn't the problem. I think that that makes a lot of sense, actually!

I see that there are a lot of instances of TomlDecodeError which this patch replaces with ValueError. I think it would be better to clearly indicate that an error from the library during decoding came from the library and not some other part of a program which happens to consume the library.

gaborbernat · 2018-01-25T23:31:38Z

@uiri so what changes do you request/propose?

pearcedavis · 2018-01-26T00:02:36Z

I replaced the uses of TomlDecodeError with ValueError in some of the functions, because those functions didn't have access to the decoder position information. I suppose you could continue to raise a TomlDecodeError without any information, and then augment it before re-raising, but using a ValueError made more sense to me. All of the ValueErrors that have replaced TomlDecodeErrors are caught and result in TomlDecodeErrors, so we are not letting the ValueErrors escape from the lib.

uiri · 2018-01-27T04:25:37Z

Oh! I see, now, thanks for pointing out the except ValueError clauses which you added. Somehow that didn't click for me before.

So then I think it is just the nit around the (slightly, overly) broad except (IndexError, ValueError): line which I would prefer narrowed.

Thank you again for your PR and for doing the rebase after the recent significant changes to master.

pearcedavis · 2018-01-27T05:06:45Z

No worries! I'll update that except line now.

uiri · 2018-01-27T06:16:30Z

toml/decoder.py

-                    raise ValueError()
-            except (IndexError, ValueError):
+                    raise ValueError("Invalid hex character")
+            except IndexError:


Uhhh... shouldn't the raised error match the excepted error?

pearcedavis · 2018-01-27T06:25:00Z

Not necessarily, why?

…

On Fri, 26 Jan 2018, 22:16 Will Pearson, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In toml/decoder.py <#128 (comment)>: > @@ -445,8 +445,8 @@ def _load_unicode_escapes(v, hexbytes, prefix): while i < hxblen: try: if not hx[i].lower() in hexchars: - raise ValueError() - except (IndexError, ValueError): + raise ValueError("Invalid hex character") + except IndexError: Uhhh... shouldn't the raised error match the excepted error? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#128 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKJWC26yCfJQ20i-ao2PSwg1J5Alnh2Aks5tOr8-gaJpZM4PqKKT> .

Any post-parsing issues with decoding get bubbled back up as a ValueError which is then converted into the appropriate TomlDecodeError. Actual line number/column positioning might be ported later.

pearcedavis force-pushed the decode-error-position branch from 43a6b7a to 3bf102a Compare October 2, 2017 04:53

pearcedavis force-pushed the decode-error-position branch from 3bf102a to 3ecf210 Compare January 14, 2018 20:53

uiri reviewed Jan 17, 2018

View reviewed changes

accept only IndexError, not unnecessary ValueError

b6249cb

uiri reviewed Jan 27, 2018

View reviewed changes

uiri merged commit 4975790 into uiri:master Jan 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #100: track position while parsing for better errors #128

Fix #100: track position while parsing for better errors #128

pearcedavis commented Oct 2, 2017

uiri commented Oct 2, 2017

pearcedavis commented Oct 2, 2017

gaborbernat commented Jan 14, 2018

pearcedavis commented Jan 14, 2018

pearcedavis commented Jan 14, 2018

codecov-io commented Jan 14, 2018 •

edited

Loading

gaborbernat commented Jan 14, 2018

uiri Jan 17, 2018

pearcedavis Jan 17, 2018

uiri Jan 23, 2018

uiri commented Jan 17, 2018

pearcedavis commented Jan 17, 2018

uiri commented Jan 23, 2018

gaborbernat commented Jan 25, 2018

pearcedavis commented Jan 26, 2018

uiri commented Jan 27, 2018

pearcedavis commented Jan 27, 2018

uiri Jan 27, 2018

pearcedavis commented Jan 27, 2018 via email

Fix #100: track position while parsing for better errors #128

Fix #100: track position while parsing for better errors #128

Conversation

pearcedavis commented Oct 2, 2017

uiri commented Oct 2, 2017

pearcedavis commented Oct 2, 2017

gaborbernat commented Jan 14, 2018

pearcedavis commented Jan 14, 2018

pearcedavis commented Jan 14, 2018

codecov-io commented Jan 14, 2018 • edited Loading

Codecov Report

gaborbernat commented Jan 14, 2018

uiri Jan 17, 2018

Choose a reason for hiding this comment

pearcedavis Jan 17, 2018

Choose a reason for hiding this comment

uiri Jan 23, 2018

Choose a reason for hiding this comment

uiri commented Jan 17, 2018

pearcedavis commented Jan 17, 2018

uiri commented Jan 23, 2018

gaborbernat commented Jan 25, 2018

pearcedavis commented Jan 26, 2018

uiri commented Jan 27, 2018

pearcedavis commented Jan 27, 2018

uiri Jan 27, 2018

Choose a reason for hiding this comment

pearcedavis commented Jan 27, 2018 via email

codecov-io commented Jan 14, 2018 •

edited

Loading