Update elm-test-rs to v2.0 and add smoke tests #36

jiegillet · 2022-03-26T12:08:25Z

Solves #35.

This seems a little too simple :)

I'm a little bit weary of merging this now because we don't have any smoke tests. Maybe I should add some?
In any case, I would like to get started on tagging the concept exercise tests first, this will help with testing (at the very least locally, or better in the CI).

mpizenberg · 2022-03-26T16:08:26Z

I'm not super familiar with tests terminology. What does "smoke test" correspond to? In any case, we don't have any test at all except some for the code extractor done by Cedd. So some more end-to-end tests would be awesome. I mean tests that take input directories, run the whole test runner and check that the output files are expected.

jiegillet · 2022-03-27T01:56:11Z

Yes, what you are describing is exactly what smoke tests are, they are high-level, sanity checks that things actually work or build.

OK, I think I will add some then.

There is another type of test that we could run, in the Elixir test runner we added the Elixir track repo as a git submodule, and we run the example/exemplar solution of each exercise and check that they all pass in the CI. We don't need to update the submodule in the actual repo, this is done in the CI every time. What do you think of that option?

I might also add the check that the test code extractor extracts exactly as many code snippets as there are test " in the test files. I have a script like that locally. This also requires a git submodule.

mpizenberg · 2022-03-27T11:42:52Z

Do we need to add a submodule for that. Since we can clone things directly in CI it might not be needed? Except if you want to be able to run the tests locally I guess. But even then, I don't see why we'd need one if we can use git directly in the test script?

I'm curious to see how you set this up anyway!

ceddlyburge

I ran this locally and it worked for me (apart from a readlink error on my mac, which is a pre existing problem and doesn't affect production).

jiegillet · 2022-03-28T02:04:06Z

Do we need to add a submodule for that. Since we can clone things directly in CI it might not be needed? Except if you want to be able to run the tests locally I guess. But even then, I don't see why we'd need one if we can use git directly in the test script?

Yes, I guess that's true, I never realized that. I think at first we set it up that way in Elixir because we initially thought we would need it locally, but ended up living in the CI only.
Maybe I'll give that a go in a different PR.

I'm curious to see how you set this up anyway!

The CI file is here.
The testing script is here.

ErikSchierboom · 2022-03-28T06:05:40Z

Adding smoke tests is definitely something you should do. All the test runners I built have them, and they've been invaluable. See https://github.com/exercism/generic-test-runner/ for an example of how we've done this.

mpizenberg · 2022-03-28T08:26:45Z

@jiegillet I've had a look at the elixir setup. I'm a bit puzzled by the test_all_exercises.sh script used in CI. I didn't see how it's using the test runner docker setup. As I understand it's just testing all exercises in normal conditions, like what we do in the track tests. If you could point me to the part that does it in the actual test runner that would help.

Regarding the smoke tests, I think I prefer the generic setup from Eric where the names of the folders are explicit of what we expect the smoke test to be, like example-partial-fail.

Also, having exact expected output will also be too time-consuming in my opinion for maintainability. I think I'd rather have smoke tests checking some properties, like for example check that we have the expected status, the expected number of tests, that all test names are also strings present in the test file, etc.

In the end, checking those properties on both the track exercises and a few purposely failing exercises would be enough in my opinion to have some confidence that we are not breaking everything.

What do you think?

jiegillet · 2022-03-29T04:29:57Z

Right, those tests run in the CI, not in Docker. If you look at the end of the CI files, there are also smoke tests in docker (which call this script), pretty much the same thing I added to the Elm analyzer, and that I plan on adding here as well.

I will add a couple of failing tests as well, and check some properties like you suggest.

I was not planning on adding all exercises in the docker test runner, but it would be quite simple to add later some simple checks on all exercises in the CI.

mpizenberg · 2022-03-29T09:53:04Z

Ok thanks @jiegillet ! The very minimum number of smoke tests then will be enough. And if you find a relatively easy way to script 1 or 2 obvious property tests that run on all exercises outputs of the test runner would be awesome. Otherwise let's open an issue and leave it for another PR.

Thanks @ErikSchierboom for the approval!

Some added changes: It now runs tests in /tmp/solution in order to use --read-only docker flag. run.sh now checks for more failure types: * If test file doesn't exist * if number of tests doesn't match number of code snippets elm-test-rs v2.0 seems to return test results in random order, so I changed the jq script to merge test results and code snippets in the order of the snippets.

jiegillet · 2022-03-30T06:27:07Z

OK, this suddenly went from tiny to large, so you might want to rethink your approvals :)

Basically I added smoke tests for 11 situations in total, with hopefully explicit names in test_data.

In the process of building that, I ended up making a bunch of other changes: checking that the number of tests results matches the number of code snippets, changing where the tests are ran so it works in --read-only mode (also so it could run several smoke tests in a row), and more. My last commits has some more details.

@mpizenberg I realized that elm-test-rs v2.0 doesn't export the test results in a determined order, it changes from run to run, is that expected? It's a big deal because it ended up mixing the test results and code snippets.
I changed the jq merging script so that the final result keeps the order of the code snippets.

Another question: on the website, it says that Debug.log won't log constants, but that seems not to be the case anymore with v2.0, right? We should change that description (and maybe also insist that passing tests don't produce logs, it is mentioned but I missed it and was confused for a bit).

mpizenberg · 2022-03-30T08:43:30Z

@mpizenberg I realized that elm-test-rs v2.0 doesn't export the test results in a determined order, it changes from run to run, is that expected? It's a big deal because it ended up mixing the test results and code snippets.
I changed the jq merging script so that the final result keeps the order of the code snippets.

@jiegillet so here is a somewhat complete answer to the question "in which orders are the tests?".

The reporter has the following interface:

Result String SeededRunners.Kind -> Array TestResult -> Maybe String

So the order depends on the order of that array of tests results, which is build by pushing new results as they arrive. So the order is the order in which the tests terminates. Therefore, one requirement with the current setting to have the order stable is to always run with 1 worker (--workers 1). Because multiple workers might run tests concurrently. But this is not a enough, we also need to have the dispatch order within a worker consistent. And that order is fully determined by the order in which the tests are stored within the runner model, which is obtained via the call SeededRunners.fromTest concatenatedTest flags. And that function from my package calls the function Test.Runner.fromTest from the elm-explorations/test package which generates the List of tests that I'm converting into an array.

So ultimately, depends on how the elm-exploration/test package build its list of tests for each top-level Test in our tests file. It also depends on the order in which elm-test-rs orders each top-level Test it found in tests files, but in our case for exercism, we only have a single top-level describe test so that doesn't matter. It could matter if we splitted the main describe into multiple top level definitions like follows:

suite = describe "TestFile" [ test1, test2 ]
test1 = describe "test1" ...
test2 = describe "test2" ...

But I don't know how the code extractor would behave with that. By the way, we should add the info that a test file must have a specific structure in the docs somewhere, in addition to the "no fuzz" and "string literals" tests descriptions.

Anyway, to summup, I believe that if we use --workers 1, with our setup, the order is supposed to stay consistent.

But, does it really matter since we are merging the json report of elm-test-rs with the code retrieved by the extractor. Aren't they merged by key where the key is the name property of each test? If not, that's a mistake or a forgotten task on our part ...

mpizenberg · 2022-03-30T08:52:32Z

Another question: on the website, it says that Debug.log won't log constants, but that seems not to be the case anymore with v2.0, right? We should change that description (and maybe also insist that passing tests don't produce logs, it is mentioned but I missed it and was confused for a bit).

It's still right, Debug.log can log constants, but cannot log anything while being used in a constant because they are evaluated in a runtime exception when node opens the compiled js file, making it crash.

constant : Int
constant = Debug.log "this will crash" 42

function : Int -> Int
function _ =
    Debug.log "ok to log this constant" 42

We might have to change the wording, let me know if that's clearer for you. Also yes passing tests do not produce logs, might be worth making it clearer.

BIG EDIT

Sorry I mixed up Debug.log and Debug.todo in my answer. It's Debug.todo that makes the program crash, not Debug.log. But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

For example, the following test:

suite : Test
suite =
    Test.describe "Question"
        [ Test.test (Debug.log "is this captured?" "wrong answer") <|
            \_ ->
                Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
                    |> Expect.equal 42

Will produce the following output when calling elm-test-rs:

Debug logs captured when setting up tests: -----------

is this captured?: "wrong answer"

------------------------------------------------------


Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100

↓ Question
✗ wrong answer

    43
    ╷
    │ Expect.equal
    ╵
    42

    with debug logs:

The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"


TEST RUN FAILED

Duration: 2 ms
Passed:   1
Failed:   1

As you can see, the log was not captured and reported for the test, but instead at initialization.

mpizenberg · 2022-03-30T09:02:22Z

Side note, we are not generating logs for passing tests, but we totally could, by just changing the output in this line https://github.com/mpizenberg/elm-test-runner/blob/master/src/ElmTestRunner/Reporter/Exercism.elm#L120

toExercismResult : TestResult -> ExercismResult
toExercismResult testResult =
    case testResult of
        TestResult.Passed { labels } ->
            { name = extractTestName labels
            , taskId = extractTaskId labels
            , status = "pass"
            , message = Nothing
            , output = Nothing
            }

The passed results also contains a logs field that we are discarding but we could use it just like in the failing case. Do you think that would make more sense to always report debug logs, and not only for failing tests?

mpizenberg · 2022-03-30T09:04:25Z

In the process of building that, I ended up making a bunch of other changes

It's big! I might not be able to review today.

mpizenberg · 2022-03-30T09:23:12Z

BIG EDIT

Sorry I mixed up Debug.log and Debug.todo in my answer. It's Debug.todo that makes the program crash, not Debug.log. But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

For example, the following test:

suite : Test
suite =
    Test.describe "Question"
        [ Test.test (Debug.log "is this captured?" "wrong answer") <|
            \_ ->
                Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
                    |> Expect.equal 42

Will produce the following output when calling elm-test-rs:

Debug logs captured when setting up tests: -----------

is this captured?: "wrong answer"

------------------------------------------------------


Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100

↓ Question
✗ wrong answer

    43
    ╷
    │ Expect.equal
    ╵
    42

    with debug logs:

The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"


TEST RUN FAILED

Duration: 2 ms
Passed:   1
Failed:   1

As you can see, the log was not captured and reported for the test, but instead at initialization.

jiegillet · 2022-03-30T13:06:17Z

But, does it really matter since we are merging the json report of elm-test-rs with the code retrieved by the extractor. Aren't they merged by key where the key is the name property of each test? If not, that's a mistake or a forgotten task on our part ...

Thanks for the insight. Like you said, it doesn't matter now, because I'm merging by name in the order of the code snippets, but it wasn't the case before, it was just zipping the arrays as they were. This is backed by a comment

# This rely on the fact that the order is the same in both arrays.

Reading your explanation, I'm surprised this was never an issue. Maybe I never noticed because I barely fail any test? (Haha I wish 😈😈😈)

Also now, we can only have one test called test, but I have some code that is more general and could change that. Some other time though.

But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

OK that clears things up for me. Maybe it's worth rewording a bit, or using the examples you just used.

The passed results also contains a logs field that we are discarding but we could use it just like in the failing case. Do you think that would make more sense to always report debug logs, and not only for failing tests?

I'm not sure, I guess so? It doesn't really hurt to include them anyway. Maybe in some cases it's nice to be able to compare some internal value when a test passes and one doesn't? @ceddlyburge what do you think?
In any case, we can do that later, the only thing that would need to change here would be updating the expected result of one of the smoke tests.

And as always with big PRs, I don't expect a fast review. Please take your time :)

ceddlyburge

I think this looks good. I share Matthieus concern about maintainability. Some of the tests are very specific to format / tool version. I would probably have a few less smoke tests, to just cover the happy path, and not worry about some of the corner cases, such as missing_solution, missing_tests, and things like that.

bin/check_files.sh

bin/run.sh

ceddlyburge · 2022-03-30T17:06:35Z

I'm not sure, I guess so? It doesn't really hurt to include them anyway. Maybe in some cases it's nice to be able to compare some internal value when a test passes and one doesn't? @ceddlyburge what do you think?

I think I have no strong opinion at the moment on whether to include the Debug.log statements even when the tests are passing. I think it is something we will probably learn through feedback / doing some exercises ourselves and seeing what feels best, but for now I think we could leave it as it is.

jiegillet

Some of the tests are very specific to format / tool version. I would probably have a few less smoke tests, to just cover the happy path, and not worry about some of the corner cases, such as missing_solution, missing_tests, and things like that.

I always feel more comfortable with a full set of tests that capture most of the behavior of the code so you can detect behavior changes and make sure all changes are intended. I only found out about the inconsistent test ordering because of that.

That being said, I do agree that missing_solution, missing_tests are quite unlikely to happen on Exercism.

Which ones would you suggest I remove?

lucians-luscious-lasagna
- partial_fail
- success
- test_issues
two-fer
- all_fail
- does_not_compile
- missing_solution
- missing_tests
- partial_fail
- success
- test_with_non_literal_name
- with_debug_message

bin/check_files.sh

bin/run.sh

Co-authored-by: Cedd Burge <[email protected]>

ceddlyburge · 2022-03-31T13:45:37Z

Which ones would you suggest I remove?

I think I would remove "test_with_non_literal_name". We know this isn't supported, and this test doesn't detect tests that use it.

I think I would just have one "partial_fail" and one "success" case.

Happy for you to go with your judgement though ...

jiegillet · 2022-03-31T13:51:25Z

OK for test_with_non_literal_name, but I think I would want to keep partial_fail and success for both, since what is really being tested in lucians-luscious-lasagna is that the task id are there, pass or fail.

bin/check_files.sh

bin/run.sh

jiegillet

I've replied to your comments.
Basically the answers boil down to "I adapted scripts from the Elixir track because they work."
But I'm not against changing them if it helps maintainability.

bin/check_files.sh

bin/run.sh

mpizenberg

This is a solid improvement.

The test runner now uses v2 of elm-test-rs, enabling tasks grouping of tests.
The test runner is able to work in read-only by utilizing temp/ for the exercise and the elm home.
The snippets are correctly matched to the tests.
There is a test suite to check that we don't break the container.
Plus a few other improvements

Thank you @jiegillet !

Dockerfile

mpizenberg · 2022-04-07T19:55:06Z

I've updated elm-test-rs to v2.0.1 with the name fixes. Logically, the smoke tests here will fail until we merge the other PR for the snippet extraction, then update the tests here.

mpizenberg · 2022-04-13T08:40:22Z

@jiegillet I think I've updated the smoke tests correctly, I'll wait for your check. Don't hesitate to do the merge if it sounds correct.

jiegillet · 2022-04-13T11:52:46Z

Looks good to me. Let's merge :)
Thanks everyone!

Update elm-test-rs to v2.0

a50f991

jiegillet added x:module/test-runner Work on Test Runners x:size/tiny Tiny amount of work labels Mar 26, 2022

jiegillet requested a review from a team as a code owner March 26, 2022 12:08

jiegillet mentioned this pull request Mar 26, 2022

Add task ID tags to concept exercise tests exercism/elm#476

Merged

ceddlyburge approved these changes Mar 27, 2022

View reviewed changes

ErikSchierboom approved these changes Mar 29, 2022

View reviewed changes

jiegillet added 2 commits March 30, 2022 14:55

Add test_data

174e6dc

jiegillet added x:size/large Large amount of work and removed x:size/tiny Tiny amount of work labels Mar 30, 2022

jiegillet changed the title ~~Update elm-test-rs to v2.0~~ Update elm-test-rs to v2.0 and add smoke tests Mar 30, 2022

ceddlyburge approved these changes Mar 30, 2022

View reviewed changes

bin/check_files.sh Outdated Show resolved Hide resolved

bin/check_files.sh Outdated Show resolved Hide resolved

bin/run.sh Show resolved Hide resolved

jiegillet commented Mar 31, 2022

View reviewed changes

bin/check_files.sh Outdated Show resolved Hide resolved

bin/run.sh Show resolved Hide resolved

Fix comment typos

cc1a459

Co-authored-by: Cedd Burge <[email protected]>

jiegillet added 2 commits March 31, 2022 22:54

Remove overzealous tests

6b6e544

Add smoke tests to CI

dd62e44

jiegillet mentioned this pull request Apr 2, 2022

Tweak debug message exercism/elm#481

Merged

Some rephrasing

31a373d

mpizenberg reviewed Apr 3, 2022

View reviewed changes

bin/check_files.sh Outdated Show resolved Hide resolved

bin/check_files.sh Outdated Show resolved Hide resolved

bin/run.sh Show resolved Hide resolved

jiegillet commented Apr 4, 2022

View reviewed changes

bin/check_files.sh Outdated Show resolved Hide resolved

bin/check_files.sh Outdated Show resolved Hide resolved

bin/run.sh Show resolved Hide resolved

jiegillet added 2 commits April 5, 2022 00:00

Simplify and merge smoke test scripts

d60edb9

Remove deleted file from Dockerfile

178a111

mpizenberg approved these changes Apr 4, 2022

View reviewed changes

mpizenberg requested changes Apr 6, 2022

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

jiegillet mentioned this pull request Apr 7, 2022

Fix code extractor to match elm-test-rs 2.x #38

Merged

Update elm-test-rs to v2.0.1 (unique test names)

5a07ddd

mpizenberg added 2 commits April 12, 2022 16:12

Merge branch 'main' into jie-version3

b4c7704

Update smoke tests with task ids, and elm-test-rs 2.0.1

65c242a

mpizenberg approved these changes Apr 12, 2022

View reviewed changes

jiegillet merged commit c7e7c12 into exercism:main Apr 13, 2022

jiegillet deleted the jie-version3 branch April 13, 2022 11:55

jiegillet mentioned this pull request Apr 13, 2022

Actually update tests to version 3 #35

Closed

This was referenced Jul 14, 2022

Document reputation label exercism/docs#347

Merged

Consider correct reputation amounts for different labels exercism/exercism#6440

Open

Consider correct reputation amounts for different labels exercism/exercism#6441

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update elm-test-rs to v2.0 and add smoke tests #36

Update elm-test-rs to v2.0 and add smoke tests #36

jiegillet commented Mar 26, 2022

mpizenberg commented Mar 26, 2022

jiegillet commented Mar 27, 2022

mpizenberg commented Mar 27, 2022

ceddlyburge left a comment

jiegillet commented Mar 28, 2022

ErikSchierboom commented Mar 28, 2022

mpizenberg commented Mar 28, 2022

jiegillet commented Mar 29, 2022 •

edited

Loading

mpizenberg commented Mar 29, 2022 •

edited

Loading

jiegillet commented Mar 30, 2022 •

edited

Loading

mpizenberg commented Mar 30, 2022

mpizenberg commented Mar 30, 2022 •

edited

Loading

mpizenberg commented Mar 30, 2022

mpizenberg commented Mar 30, 2022

mpizenberg commented Mar 30, 2022

jiegillet commented Mar 30, 2022

ceddlyburge left a comment

ceddlyburge commented Mar 30, 2022

jiegillet left a comment •

edited

Loading

ceddlyburge commented Mar 31, 2022

jiegillet commented Mar 31, 2022

jiegillet left a comment

mpizenberg left a comment

mpizenberg commented Apr 7, 2022

mpizenberg commented Apr 13, 2022

jiegillet commented Apr 13, 2022

Update elm-test-rs to v2.0 and add smoke tests #36

Update elm-test-rs to v2.0 and add smoke tests #36

Conversation

jiegillet commented Mar 26, 2022

mpizenberg commented Mar 26, 2022

jiegillet commented Mar 27, 2022

mpizenberg commented Mar 27, 2022

ceddlyburge left a comment

Choose a reason for hiding this comment

jiegillet commented Mar 28, 2022

ErikSchierboom commented Mar 28, 2022

mpizenberg commented Mar 28, 2022

jiegillet commented Mar 29, 2022 • edited Loading

mpizenberg commented Mar 29, 2022 • edited Loading

jiegillet commented Mar 30, 2022 • edited Loading

mpizenberg commented Mar 30, 2022

mpizenberg commented Mar 30, 2022 • edited Loading

mpizenberg commented Mar 30, 2022

mpizenberg commented Mar 30, 2022

mpizenberg commented Mar 30, 2022

jiegillet commented Mar 30, 2022

ceddlyburge left a comment

Choose a reason for hiding this comment

ceddlyburge commented Mar 30, 2022

jiegillet left a comment • edited Loading

Choose a reason for hiding this comment

ceddlyburge commented Mar 31, 2022

jiegillet commented Mar 31, 2022

jiegillet left a comment

Choose a reason for hiding this comment

mpizenberg left a comment

Choose a reason for hiding this comment

mpizenberg commented Apr 7, 2022

mpizenberg commented Apr 13, 2022

jiegillet commented Apr 13, 2022

jiegillet commented Mar 29, 2022 •

edited

Loading

mpizenberg commented Mar 29, 2022 •

edited

Loading

jiegillet commented Mar 30, 2022 •

edited

Loading

mpizenberg commented Mar 30, 2022 •

edited

Loading

jiegillet left a comment •

edited

Loading