Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-92734: Add indentation feature to reprlib.Repr #92735

Merged
merged 18 commits into from
Sep 8, 2022

Conversation

finefoot
Copy link
Contributor

Add an indentation feature to the reprlib.Repr class. This includes
the actual implementation as well as added tests and an attribute
description for the docs.
Copy link
Contributor

@zmievsa zmievsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found any issues with the code or the tests. The docs are my only concern.

Doc/library/reprlib.rst Outdated Show resolved Hide resolved
@CAM-Gerlach CAM-Gerlach added type-feature A feature request or enhancement docs Documentation in the Doc dir stdlib Python modules in the Lib dir labels May 14, 2022
Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided a suggested revision of the documentation section, as you requested on Discourse.

Additionally, some quick testing of a few simple cases locally reveals several apparent issues and odd behaviors, which really should be handled properly (and captured in the tests, which currently are very limited):

  • Setting indent = 0 makes any recursive objects silently not print anything, instead of being a natural extension of the positive integer case as it should be and I'd expect, with line breaks but no indents (or at the absolute minimum raising an error). Example:

    >>> example = [1, 'spam', {'eggs': 1, 'ham': 2}]
    >>> reprlib.aRepr.indent = 0
    >>> print(reprlib.repr(example))
    []

    What I'd expect instead, of course, is

    [
    1,
    'spam',
    {
    'eggs': 1,
    'ham': 2,
    },
    ]
  • Even more unexpectedly, the same happens for an empty string '', which from the perspective of a user is even more confusing, and likewise should work as one would expect

  • Furthermore, negative integers also produce the same behavior, instead of either being an extension of the non-negative case (indenting less nested objects more to the left, with the indent level zero being the most deeply nested object—which I don't think would be useful enough to justify the implementation complexity) or raising an error at any point (which it probably should).

  • Also, passing False produces the same result, while True is equivalent to the integer 1—presuming the former case is fixed to work like 0, a seasoned Python user might anticipate and understand this, but I worry that it might confuse users who expect False to act like None and True to use some default/automatic value, e.g. 4 spaces. My suggestion would be to raise an explicit error in this edge case rather than trying to guess at the user's intent, though I'm not sure what's most idiomatic here.

  • Objects that clearly won't work for indent, like [], {"test": 1}, 4.2 and arbitrary object()s produce two different possible TypeErrors (one for objects that don't support __mul__ with an int, and one for those that can but cannot be __add__ed with a str), neither of which are necessarily clear to the user as to the root cause, and only when actually used to indent. IMO, it would be better to catch this error and raise from a more helpful one, e.g. Repr.indent must be int or str, not {type(indent)}. You could also use a @property to do validation on indent at set-time, but that's not very idiomatic, at least for the stdlib.

Doc/library/reprlib.rst Outdated Show resolved Hide resolved
Lib/test/test_reprlib.py Outdated Show resolved Hide resolved
@finefoot
Copy link
Contributor Author

Thank you very much for the extensive feedback.

and captured in the tests, which currently are very limited

I agree. Do you have a suggestion in which way to add more cases? Because the best solution I can come up with, would be to simply add more explicit tests for sample cases and edge cases and compare the expected string to the return value. You can see that I tried to add a more "sophisticated" test, based on the idea that after removing whitespace and commas the output should be identical to the output produced with Repr.indent = None. But I guess that's not really helping much and I can just remove it again, in favor of more explicit tests?

Setting indent = 0 makes any recursive objects silently not print anything

Good catch! :) I didn't think of this edge case. Repr.indent = "" (and subsequently Repr.indent = 0) should be fixed by f8a2db1. The reason was -len(indent) evaluating to -0, which is 0, which obviously doesn't mean "cut zero characters from end of sequence" but "cut everything until character position 0". Instead, None works for zero-length indentation strings.

IMO, it would be better to catch this error and raise from a more helpful one, e.g. Repr.indent must be int or str, not {type(indent)}. You could also use a @property to do validation on indent at set-time, but that's not very idiomatic, at least for the stdlib.

The other attributes don't have any explicit error handling either. That's why I kept it out of the code, but I am in favor of adding it, due to exactly the situations you described. 👍 If @property isn't idiomatic for the stdlib as you say, I've implemented simple LBYL error handling in 721357c. I noticed that you mentioned raise from, yet I don't think it's necessary (i.e. helpful) to catch the exception from an EAFP approach; and we'd have to LBYL due to disallowing negative integers anyway. Am I overlooking something?

Furthermore, negative integers (...) indenting less nested objects more to the left, with the indent level zero being the most deeply nested object—which I don't think would be useful enough to justify the implementation complexity

I agree with this, too. Therefore, I wrote "non-negative int" in the ValueError message, above.

Co-authored-by: CAM Gerlach <[email protected]>

Co-authored-by: CAM Gerlach <[email protected]>
Lib/reprlib.py Outdated Show resolved Hide resolved
@CAM-Gerlach
Copy link
Member

I agree. Do you have a suggestion in which way to add more cases?

To be honest, every other project I work on uses Pytest, where I'd just use the @pytest.parameterize() decorator for this, and I'm not terribly familiar with the idiomatic unittest approach—especially since the construct that I believe is most appropriate here, subtests, is the one main unittest feature that Pytest does not natively support.

That said, here's a stab at it, based on skimming over a few other examples in the CPython test suite (e.g. test_codecs.py). To summarize, you want methods for each major distinction in behavior (e.g. formatted output, raising an exception), and within each test method, you can use unittest subtests, with one subtest per test case. iterating over a mapping of your inputs to expected outputs, with a key you can use to name the subtests.

Putting it all together, that would look something like this:

import textwrap

...

    def test_valid_indent(self):
        test_obj = [1, 'spam', {'eggs': True, 'ham': []}]
        test_cases = {
            None: """[1, 'spam', {'eggs': True, 'ham': []}]""",
            4: ("""
                [
                    1,
                    'spam',
                    {
                        'eggs': True,
                        'ham': [],
                    },
                ]
                """),
            '-->': ("""
                [
                -->1,
                -->'spam',
                -->{
                -->-->'eggs': True,
                -->-->'ham': [],
                -->},
                ]
                """),
            # Etc.
        }

        for indent, expected_repr in test_cases.items():
            with self.subTest(indent=indent):
                repr_obj = Repr()
                repr_obj.indent = indent
                actual_repr = repr_obj.repr(test_obj).strip('\n')
                expected_repr = textwrap.dedent(expected_repr).strip('\n')
                self.assertEqual(actual_repr, expected_repr)

    def test_invalid_indent(self):
        test_obj = [1, 'spam', {'eggs': True, 'ham': []}]
        test_cases = {
            -2: (ValueError, '[Nn]egative|[Pp]ositive'),
            (4,): (TypeError, None),
            # Etc.
        }
        for indent, (expected_error, expected_msg) in test_cases.items():
            with self.subTest(indent=indent):
                repr_obj = Repr()
                repr_obj.indent = indent
                expected_msg = expected_msg or f'{indent!r}'
                with self.assertRaisesRegex(expected_error, expected_msg):
                    repr_obj.repr(test_obj)

Good catch! :) I didn't think of this edge case.

Its really helpful to get in the habit of doing so. Pretty much any time you're testing functions like this, even just as a quick manual sanity check, you should try common cases like 1, 0, -1, True, False, floats, empty strings, empty collections, arbitrary objects, etc. as appropriate for your use case, and make sure they behave how you expect (whether that's normal output or raising an appropriate error).

Repr.indent = "" (and subsequently Repr.indent = 0) should be fixed by f8a2db1.

Indeed; LGTM in manual testing.

I noticed that you mentioned raise from, yet I don't think it's necessary (i.e. helpful) to catch the exception from an EAFP approach; and we'd have to LBYL due to disallowing negative integers anyway. Am I overlooking something?

Why not? They are two very different cases that need to be handled separately, an invalid type (TypeError) and an invalid value (ValueError), and you already have a if isinstance(indent, int) block anyway for the invalid value (negative int) case. See above for a better approach.

Lib/reprlib.py Outdated Show resolved Hide resolved
@finefoot
Copy link
Contributor Author

Also, passing False produces the same result, while True is equivalent to the integer 1—presuming the former case is fixed to work like 0, a seasoned Python user might anticipate and understand this, but I worry that it might confuse users who expect False to act like None and True to use some default/automatic value, e.g. 4 spaces. My suggestion would be to raise an explicit error in this edge case rather than trying to guess at the user's intent, though I'm not sure what's most idiomatic here.

I think we can reasonably skip this, in accordance with json.dumps, for example:

>>> print(json.dumps([[1]], indent=True))
[
 [
  1
 ]
]
>>> print(json.dumps([[1]], indent=False))
[
[
1
]
]

But I don't feel strongly about it. So if you disagree, I'm happy to open a thread and ask for opinions if there's a different idiomatic solution that would be best to implement. (For both json.dumps and reprlib.Repr in that case, I guess.)

I agree. Do you have a suggestion in which way to add more cases?

To be honest, every other project I work on uses Pytest, where I'd just use the @pytest.parameterize() decorator for this (...) That said, here's a stab at it, based on skimming over a few other examples in the CPython test suite

I apologize for the misunderstanding. I am aware of unittest.TestCase.subTest but simply didn't use it because I wanted to keep the indent tests in accordance with the preexisting tests in test_reprlib.py. I was under the impression that document coherence is more important. I am fine with using subtests. 👍

My question was meant more like this: Theoretically, we could reverse the indentation process and test if the result matches the result that regular Repr.indent = None produces. The rough idea was like this:

# Results are the same after removing whitespace and commas
y = [1, [2, "foo", b"bar", {"a": 1, "b": "abc def ghi", "c": {1: 2, 3: 4, 5: [], 6: {}}}], 3]
self.assertNotEqual(r1.repr(y), r2.repr(y))
self.assertEqual(
    "".join(r1.repr(y).replace(",", "").split()),
    "".join(r2.repr(y).replace(",", "").split())
)

But, like I said before, I don't think it's worth it. Explicit tests of example cases should be fine? I was simply inquiring if you agreed with this or if you think I should work on the "indentation reversal test". :)

Anyway, I've taken your suggestions and added some tests in 1d652a5. I also split it into test_valid_indent and test_invalid_indent, like you proposed.

@CAM-Gerlach
Copy link
Member

I think we can reasonably skip this, in accordance with json.dumps, for example:

I think you're right; its consistent with the behavior elsewhere and its not that big of a deal. You should, however, include True and False in your indent value test cases, which otherwise look fairly complete.

I apologize for the misunderstanding.

No need to apologize!

Anyway, I've taken your suggestions and added some tests in 1d652a5. I also split it into test_valid_indent and test_invalid_indent, like you proposed.

All looks good; aside from including True and False as test cases for indent, to avoid readability impacts due to the space it consumes, I might might your cases cases to a module-level variable, and you could also use somewhat smaller test objects, though its fine if others don't complain. You could also consider using nested subtests (object ---> settings). Other than that, LGTM at least from my end.

@finefoot
Copy link
Contributor Author

You should, however, include True and False in your indent value test cases,

👍 Tests added in 7cc9fc8. I assume, the rationale behind testing this undefined behavior - since the docs don't mention bool for Repr.indent - is less about verifying that indent = True produces the same as indent = 1, but more about verifying that no error happens if a user unknowingly uses said value producing undefined behavior?

to avoid readability impacts due to the space it consumes, I might might your cases cases to a module-level variable, and you could also use somewhat smaller test objects, though its fine if others don't complain

The example in test_pprint.QueryTestCase.test_set_of_sets_reprs is fairly big, too, for example. I guess that means, it's okay?

You could also consider using nested subtests (object ---> settings)

That's a very good suggestion, thank you. :) Implemented in 61a6b5a.

@CAM-Gerlach
Copy link
Member

+1 Tests added in 7cc9fc8. I assume, the rationale behind testing this undefined behavior - since the docs don't mention bool for Repr.indent - is less about verifying that indent = True produces the same as indent = 1, but more about verifying that no error happens if a user unknowingly uses said value producing undefined behavior?

Looks good. It has a dual purpose, in my mind—it both tests the current behavior, and serves as de-facto CPython-developer-facing, machine-checkable verification of it; if it is changed, intentionally or not, then the test would provide clear indication of that and would ensure the tests are updated to cover the new behavior instead.

The example in test_pprint.QueryTestCase.test_set_of_sets_reprs is fairly big, too, for example. I guess that means, it's okay?

I don't have a problem with the size, though personally I'd separate the test data from the test code to make it cleaner to follow (in Pytest, I'd made it a fixture or at least a module-level constant). I was mostly concerned about any objections from others with more authority about it, since I'm not a core dev myself. But its ultimately up to you and whatever core dev does the final review and merge here. Otherwise, LGTM!

Copy link
Member

@CAM-Gerlach CAM-Gerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work and responsiveness to feedback, @finefoot . LGTM now and looking forward to see this merged; this should be ready for core dev review now (with one potential concern about mixing test data and code outstanding, though that can be left to a core dev for final judgment).

@CAM-Gerlach CAM-Gerlach requested a review from rhettinger May 16, 2022 19:41
@CAM-Gerlach
Copy link
Member

Tagging @rhettinger for review, since there's no one listed in the Experts Index or CODEOWNERS and he seems to be de-facto responsible for this module given his code contributions and his review of the most recent significant PR affecting it.

@finefoot
Copy link
Contributor Author

with one potential concern about mixing test data and code outstanding, though that can be left to a core dev for final judgment

Alright, makes sense. Let's see. :)

@CAM-Gerlach Thanks for your patience, explanations and helpful feedback. I learned a lot 👍

@CAM-Gerlach
Copy link
Member

Thank you to, so did I! I'm pretty new to most of this stuff as well and have only just started contributing to CPython proper in the past couple weeks, and pretty much all of that has been docs stuff so far. I'm also not even a "real" software developer, I just play one on the internet—in real life, I'm a NASA satellite scientist, heh.

@CAM-Gerlach
Copy link
Member

Thanks, and sorry for the delay; I've reached out to the core devs to take a look at this...

Copy link
Contributor

@ambv ambv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I find it icky that indent is a field here and not an argument to the repr() method. However, since the rest of the functionality in reprlib provides a precedent for this design, I think we can live with that.

That being said, I have a few comments around the current implementation.

Doc/library/reprlib.rst Show resolved Hide resolved
Doc/library/reprlib.rst Outdated Show resolved Hide resolved
finefoot and others added 3 commits June 28, 2022 14:30
…e-92734.d0wjDt.rst to Misc/NEWS.d/next/Library/2022-05-12-15-19-00.gh-issue-92734.d0wjDt.rst
@finefoot
Copy link
Contributor Author

finefoot commented Jun 28, 2022

I guess this would make sense to add to the docs once it's getting close to getting merged and the Python version is foreseeable?

  • Add .. versionadded:: 3.xyz to docs

It doesn't get added automatically, right?

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Jun 29, 2022

I guess this would make sense to add to the docs once it's getting close to getting merged and the Python version is foreseeable?

Add .. versionadded:: 3.xyz to docs

It doesn't get added automatically, right?

Oops, I completely forgot the mention that—great catch. No, its not automatic, and as this is essentially certain to go in Python 3.12 now, we should just go ahead and added it. I'll make a suggestion to that effect.

@CAM-Gerlach CAM-Gerlach requested a review from ambv June 29, 2022 20:08
@CAM-Gerlach
Copy link
Member

FYI, right now things are very intense due to the planned release of 3.11rc1, so if @ambv doesn't respond in a week or two, ping again and I can ping him or another core dev if there still isn't a response.

@rhettinger rhettinger merged commit c06c001 into python:main Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants