Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss adding labels for fuzzed values #94

Closed
mgold opened this issue May 10, 2019 · 15 comments · Fixed by #187
Closed

Discuss adding labels for fuzzed values #94

mgold opened this issue May 10, 2019 · 15 comments · Fixed by #187
Labels
Design Question Needs design discussion fuzzers Concerns randomness or simplifiers major release blocker Issue to be resolved before next major version bump

Comments

@mgold
Copy link
Collaborator

mgold commented May 10, 2019

There's a good discussion happening on discourse about adding labels and coverage requirements to elm-test. Don't do what I did, watch the 50 minute talk linked to in the first post before trying to say something intelligent.

There may be breaking changes that come out of the discussion to I want to hold open the opportunity to make them.

@mgold mgold added Design Question Needs design discussion fuzzers Concerns randomness or simplifiers major release blocker Issue to be resolved before next major version bump labels May 10, 2019
@Janiczek
Copy link
Collaborator

Janiczek commented Jul 26, 2022

I'd like to revive this discussion, as I'd be happy to work on this and we're closing in on releasing 2.0.0.

Sorry for pings in case those are unwanted - I'm basing them on a) who participated in the Discourse discussion, and b) who I usually use as my sounding board with testing/fuzzing ideas. In no particular order, @BrianHicks @gampleman @harrysarson @drathier @ianmackenzie @rtfeldman @jonathanjouty

There are IMHO a few separate ideas, I'll try to summarize (paraphrasing @mgold's summary on Discourse):

  1. Report the count and/or percentage of rejected values when a Fuzz.invalid has been triggered in any way (Fuzz.filter, Fuzz.invalid by itself or inside Fuzz.andThen or any other way, etc.), so as to warn about inefficient fuzzers. This likely doesn't need to be configured in any way. (Perhaps it also belongs in a totally separate ticket, but I feel it's related.)

  2. Add a way to categorize the values and report the counts and/or percentages of these categories (to test the fuzzers). Something like Fuzz.label : (a -> String) -> Fuzzer a -> Fuzzer a might be a simple API, but by virtue of being an opaque function we don't know all the possible strings that might fall out of it. Representations like List (String, a -> Bool) are more amenable to exhaustive checking in this sense, see point 4.

  3. Be able to show examples of the categories (to test the classifying function / list of predicates). Unsure in which module should this live, but it might best live as its own function similar to Fuzz.examples, usable from REPL and perhaps connected to the CLI tool. Something like showLabelExamples : List (String, a -> Bool) -> Fuzzer a -> List (String, Maybe a) perhaps, or if it's baked into the fuzzer, then just showLabelExamples : Fuzzer a -> List (String, Maybe a)? (Maybe with an integer specifying the number of tries; also maybe parameterized by seed or returning a Generator.)

  4. Be able to expect/demand certain coverage of those categories. A few axes here also: Nonzero vs certain percentage; just reporting a warning if not met vs running enough tests (ignoring runs : Int config) to be statistically confident ™️ [1] [2] [3] that it is/is not met. This requires the List variant of the classification API, to be able to list all possible classes. (It's incompatible with the (a -> String) way to classify fuzzers.)


First of all, is there something I'm missing? The Discourse post is long, the John Hughes' talk has many ideas, there are various implementations in various PBT libraries...


Other than that, would you folks be OK with me working on these? Currently I favor something like

myFuzzer : Fuzzer Int
myFuzzer =
    Fuzz.intRange 1 20

justReportFuzzer : Fuzzer Int
justReportFuzzer =
    myFuzzer
        |> Fuzz.reportCoverage
            [ ( "Lower boundary (1)", \n -> n == 1 )
            , ( "Upper boundary (20)", \n -> n == 20 )
            , ( "In the middle (2..19)", \n -> n > 1 && n <= 20 )
            , ( "Outside boundaries??", \n -> n < 1 || n > 20 )
            ]

expectCoverageFuzzer : Fuzzer Int
expectCoverageFuzzer =
    myFuzzer
        |> Fuzz.expectCoverage
            [ ( 10, "Lower boundary (1)", \n -> n == 1 )
            , ( 10, "Upper boundary (20)", \n -> n == 20 )
            , ( 50, "In the middle (2..19)", \n -> n > 1 && n <= 20 )
            , ( 0, "Outside boundaries??", \n -> n < 1 || n > 20 )
            ]

mandateCoverageFuzzer : Fuzzer Int
mandateCoverageFuzzer =
    myFuzzer
        |> Fuzz.mandateCoverage
            [ ( 10, "Lower boundary (1)", \n -> n == 1 )
            , ( 10, "Upper boundary (20)", \n -> n == 20 )
            , ( 50, "In the middle (2..19)", \n -> n > 1 && n <= 20 )
            , ( 0, "Outside boundaries??", \n -> n < 1 || n > 20 )
            ]

-- Usage in tests doesn't change, there is no Test/Expect API change:

justReport : Test
justReport =
    Test.fuzz justReportFuzzer "justReport" <| \n -> Expect.true

expectCoverage : Test
expectCoverage =
    Test.fuzz expectCoverageFuzzer "expectCoverage" <| \n -> Expect.true

mandateCoverage : Test
mandateCoverage =
    Test.fuzz mandateCoverageFuzzer "mandateCoverage" <| \n -> Expect.true

-- Possible to test the labels out in REPL:

> Fuzz.labelExamples { tries = 100, seed = 1 } mandateCoverageFuzzer
[ ( "Lower boundary (1)", Just 1 )
, ( "Upper boundary (20)", Just 20 )
, ( "In the middle (2..19)", Just 5 )
, ( "Outside boundaries??", Nothing )
]

@Janiczek
Copy link
Collaborator

Janiczek commented Jul 27, 2022

A few clarifications on my current intentions with the above:

  • 0 would be special in that any occurrence of the label would be counted as a failure (as opposed to other values meaning "at least that percentage")
  • with the List based approach a given value could accumulate multiple labels, like with Html.classList. For a list like
[ ( "green", isGreen )
, ( "red", isRed )
, ( "big", isBig )
]

we'd report all the occurring overlaps:

green, big: 8% (23x)
green: 21% (128x)
red: 32% (211x)

(eg. red,big == 0x, red,green == 0x, red,green,big == 0x, big == 23x)
( ⚠️ the numbers are bogus )

What I'm not sure about is whether green should also count the occurrences from green, big, or whether they'd be separate. I think the latter makes more sense but it might be nice to spell it out for the user.

@gampleman
Copy link
Contributor

I'd like to re-iterate from that discussion that I strongly believe that the classification makes a lot more sense in the Expect than in the Fuzzer. What counts as what condition depends on the code under test, not on the fuzzer being used. This makes fuzzers less re-usable, which I think is a bad thing to encourage.

Specifically I love for instance hacking on elm-geometry, since the library has a very well developed library of reusable fuzzers that expressing new tests is typically super easy. I think we should try to guide users into that kind of design.

@Janiczek
Copy link
Collaborator

@gampleman In addition, I've realized during implementation that making classification live in Fuzzer has the additional problem of having to be done as a last step.

bad : Fuzzer Bool
bad =
  Fuzz.intRange 0 10
  |> Fuzz.reportCoverage [...]
  |> Fuzz.map ((==) 5)

can't retain the coverage request from the first step.
So I like that making the API Expect-centric makes the problem go away entirely.

@jonathanjouty
Copy link

jonathanjouty commented Jul 29, 2022

@Janiczek This looks really nice.
A couple of thoughts below:

  • Fuzz.expectCoverage and Fuzz.mandateCoverage: (a bit late as you've already commented on this in the PR 😄 but posting here for posterity) I feel like we should aim to give users the "right" default option, and perhaps this means only providing mandateCoverage and reportCoverage (i.e. do not include expectCoverage).
    If a user wants to see the categories, then reportCoverage is sufficient. What would a warning look like? What extra information would a warning give the user? With a warning, I feel like once a user is done writing the fuzzer, the warning output will be relegated to the depths of CI runs and never looked at properly, and hence a failure is more useful.
  • Special case for 0 coverage: This might have a problem of discoverability. Perhaps a custom type could force the user to be explicit about their expectations? (say, something like NotPresent and Coverage Int).
  • Output numbers (as seen in Fuzzer classification and coverage #187) Personally the total number of runs and percentages are enough, the number of runs for each category adds noise to the output IMO (Haskell QuickCheck seems to include only total and percentages).

@Janiczek
Copy link
Collaborator

Janiczek commented Jul 29, 2022

I feel like we should aim to give users the "right" default option, and perhaps this means only providing mandateCoverage and reportCoverage (i.e. do not include expectCoverage).

For posterity (I'm editing the PR description a bunch), I've went with reportCoverage for printing it out, and expectCoverage as the "mandate+fail if not satisfied" option. The option that only warned if not satistied is gone.

Special case for 0 coverage

The current API I'm trying out is

type ExpectedCoverage
  = Zero -- "this variant is never generated!"
  | AtLeast Int -- the bread and butter option that most users will use
  | MoreThan Int

The MoreThan option is debatable, as I think it will only be useful for "more than zero" and at that point the user could probably choose an arbitrary percentage, like AtLeast 1. Should we have it at all?

Personally the total number of runs and percentages are enough, the number of runs for each category adds noise to the output IMO

Fair, I can experiment and see how it looks without the totals!

@gampleman
Copy link
Contributor

What's the point of Zero? Also are counts better than percentages?

@Janiczek
Copy link
Collaborator

@gampleman The ExpectedCoverage would all be percentages. I think it doesn't make sense to put absolute counts there as those would need to change in your test source code everytime you changed --runs

@Janiczek
Copy link
Collaborator

Janiczek commented Jul 29, 2022

For @jonathanjouty: Here are the two versions (with and without counts) together:

Screenshot 2022-07-29 at 11 33 37

Screenshot 2022-07-29 at 14 45 02

I personally like the version with counts better, I'll leave this detail up for discussion for now.

@gampleman
Copy link
Contributor

Maybe we could add little bar charts? 😍

@Janiczek
Copy link
Collaborator

@gampleman What about these?

Screenshot 2022-07-29 at 21 19 13

@Janiczek
Copy link
Collaborator

Janiczek commented Jul 29, 2022

Also I figured I never answered this.

What's the point of Zero?

So, I can imagine a situation where you're trying to fuzz a specific variant of your data - perhaps a balanced binary tree that isn't full.

So your fuzzer hopefully generates a subset of all possible trees (doesn't generate full trees), and you'd want to check that fuzzer. In the particular test that needs trees to not be full, to test something on them (perhaps some kind of insert operation), you would mandate this distribution of trees from your fuzzer:

Test.fuzzWith
  { runs = 10000
  , coverage = Test.expectCoverage [ (Test.zero, "full", Tree.isFull) ]
  }
  nonFullTreeFuzzer
  "insert does XYZ"
  <| \tree -> ...

Is it too convoluted as an example usecase for Zero?

@gampleman
Copy link
Contributor

gampleman commented Jul 29, 2022

I'm not sure why you wouldn't just use Expect.fail...

Why not put the categorisation into Expect? Now it's in Test, which is kind of wierd since it's only accessible through fuzzWith, but that makes it more awkward to use with fuzz2 and friends (of course the more knowledgable might work out that fuzz2 can be achieved with Fuzz.tuple, but it's an extra conceptual step).

The api I would think would just be Expect.label : String -> Expectation -> Expectation or the asserting version could then be the same, but without the need to have the classification functions.

I think this is sensible, since the main usage would anyway be:

myTest = Test.fuzz myfuzzer "custom type" <|
   \data ->
       case date of 
            Foo xyzzy ->
                foo xyzzy
                       |> Expect.greaterThan 30
                       |> Expect.label "foo"
            Bar [single] ->
                    ...
                       |> Expect.label "single bar"

etc.

Edit: than reportCoverage could just be kind of like skip, i.e. a little toggle-able thingy you can tack on to a test when you're working on it...

@Janiczek
Copy link
Collaborator

Janiczek commented Jul 29, 2022

With the Expect.label : String -> Expectation -> Expectation version, I wonder whether there are cases where you'd want to label a value conditionally in a way that doesn't fully mirror the conditions the expectation is taking. Would we be locking ourselves out of some potentially useful expressiveness?

I guess you could still do

myTest = Test.fuzz myfuzzer "custom type" <|
   \data ->
       (case date of 
            Foo xyzzy ->
                foo xyzzy
                       |> Expect.greaterThan 30
                       |> Expect.label "foo"
            Bar [single] ->
                    ...
                       |> Expect.label "single bar"
        )
        |> Expect.labelIf isBleh "bleh"
        |> Expect.labelIf isBlah "blah"

at the end of the test, it just isn't very nice with the need to parenthesize.
The other way round, with the labelIf at the beginning and <| adds indentation - also not very beautiful.

I'm guessing the Test and Expect ways to do this are roughly equivalent, but:

  • Expect
    ➕ allows you to label "contextually" in the condition paths created for the purpose of the expectation itself
    reportCoverage is simpler: Test.reportCoverage <| Test.fuzz ...
    ➖ conditional labeling that doesn't map to expectation conditions needs parentheses or indentation
  • Test
    ➖ touches the FuzzOptions record (more things to upgrade for folks)
    ➕ conditional labeling doesn't get in your way syntax-wise

Anything that I'm missing?

@gampleman
Copy link
Contributor

I’m wondering if don’t need to experiment with this feature in practice to get enough perspective to decide this correctly. I’m not confident of a good use case for labelling “independently”, since the purpose is to explore different code paths, but perhaps the conditionals are inside the code you are testing and it’s more convenient to do the categorisation outside.

I think another disadvantage of the way it’s setup in Test is that it doesn’t naturally lead to exhaustiveness checking by the compiler, unlike a more standard implementation using a case statement.

Furthermore I think that ergonomically I’d like to be able to leave the logic of labelling in the code, but have an easy way to turn on/off reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design Question Needs design discussion fuzzers Concerns randomness or simplifiers major release blocker Issue to be resolved before next major version bump
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants