Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update elm-test-rs to v2.0 and add smoke tests #36

Merged
merged 12 commits into from
Apr 13, 2022

Conversation

jiegillet
Copy link
Contributor

Solves #35.

This seems a little too simple :)

I'm a little bit weary of merging this now because we don't have any smoke tests. Maybe I should add some?
In any case, I would like to get started on tagging the concept exercise tests first, this will help with testing (at the very least locally, or better in the CI).

@jiegillet jiegillet added x:module/test-runner Work on Test Runners x:size/tiny Tiny amount of work labels Mar 26, 2022
@jiegillet jiegillet requested a review from a team as a code owner March 26, 2022 12:08
@mpizenberg
Copy link
Member

I'm not super familiar with tests terminology. What does "smoke test" correspond to? In any case, we don't have any test at all except some for the code extractor done by Cedd. So some more end-to-end tests would be awesome. I mean tests that take input directories, run the whole test runner and check that the output files are expected.

@jiegillet
Copy link
Contributor Author

Yes, what you are describing is exactly what smoke tests are, they are high-level, sanity checks that things actually work or build.

OK, I think I will add some then.

There is another type of test that we could run, in the Elixir test runner we added the Elixir track repo as a git submodule, and we run the example/exemplar solution of each exercise and check that they all pass in the CI. We don't need to update the submodule in the actual repo, this is done in the CI every time. What do you think of that option?

I might also add the check that the test code extractor extracts exactly as many code snippets as there are test " in the test files. I have a script like that locally. This also requires a git submodule.

@mpizenberg
Copy link
Member

Do we need to add a submodule for that. Since we can clone things directly in CI it might not be needed? Except if you want to be able to run the tests locally I guess. But even then, I don't see why we'd need one if we can use git directly in the test script?

I'm curious to see how you set this up anyway!

Copy link
Contributor

@ceddlyburge ceddlyburge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this locally and it worked for me (apart from a readlink error on my mac, which is a pre existing problem and doesn't affect production).

@jiegillet
Copy link
Contributor Author

Do we need to add a submodule for that. Since we can clone things directly in CI it might not be needed? Except if you want to be able to run the tests locally I guess. But even then, I don't see why we'd need one if we can use git directly in the test script?

Yes, I guess that's true, I never realized that. I think at first we set it up that way in Elixir because we initially thought we would need it locally, but ended up living in the CI only.
Maybe I'll give that a go in a different PR.

I'm curious to see how you set this up anyway!

The CI file is here.
The testing script is here.

@ErikSchierboom
Copy link
Member

Adding smoke tests is definitely something you should do. All the test runners I built have them, and they've been invaluable. See https://github.com/exercism/generic-test-runner/ for an example of how we've done this.

@mpizenberg
Copy link
Member

@jiegillet I've had a look at the elixir setup. I'm a bit puzzled by the test_all_exercises.sh script used in CI. I didn't see how it's using the test runner docker setup. As I understand it's just testing all exercises in normal conditions, like what we do in the track tests. If you could point me to the part that does it in the actual test runner that would help.

Regarding the smoke tests, I think I prefer the generic setup from Eric where the names of the folders are explicit of what we expect the smoke test to be, like example-partial-fail.

Also, having exact expected output will also be too time-consuming in my opinion for maintainability. I think I'd rather have smoke tests checking some properties, like for example check that we have the expected status, the expected number of tests, that all test names are also strings present in the test file, etc.

In the end, checking those properties on both the track exercises and a few purposely failing exercises would be enough in my opinion to have some confidence that we are not breaking everything.

What do you think?

@jiegillet
Copy link
Contributor Author

jiegillet commented Mar 29, 2022

Right, those tests run in the CI, not in Docker. If you look at the end of the CI files, there are also smoke tests in docker (which call this script), pretty much the same thing I added to the Elm analyzer, and that I plan on adding here as well.

I will add a couple of failing tests as well, and check some properties like you suggest.

I was not planning on adding all exercises in the docker test runner, but it would be quite simple to add later some simple checks on all exercises in the CI.

@mpizenberg
Copy link
Member

mpizenberg commented Mar 29, 2022

Ok thanks @jiegillet ! The very minimum number of smoke tests then will be enough. And if you find a relatively easy way to script 1 or 2 obvious property tests that run on all exercises outputs of the test runner would be awesome. Otherwise let's open an issue and leave it for another PR.

Thanks @ErikSchierboom for the approval!

Some added changes:

It now runs tests in /tmp/solution in order to use --read-only docker flag.

run.sh now checks for more failure types:
* If test file doesn't exist
* if number of tests doesn't match number of code snippets

elm-test-rs v2.0 seems to return test results in random order, so I
changed the jq script to merge test results and code snippets in the
order of the snippets.
@jiegillet jiegillet added x:size/large Large amount of work and removed x:size/tiny Tiny amount of work labels Mar 30, 2022
@jiegillet
Copy link
Contributor Author

jiegillet commented Mar 30, 2022

OK, this suddenly went from tiny to large, so you might want to rethink your approvals :)

Basically I added smoke tests for 11 situations in total, with hopefully explicit names in test_data.

In the process of building that, I ended up making a bunch of other changes: checking that the number of tests results matches the number of code snippets, changing where the tests are ran so it works in --read-only mode (also so it could run several smoke tests in a row), and more. My last commits has some more details.

@mpizenberg I realized that elm-test-rs v2.0 doesn't export the test results in a determined order, it changes from run to run, is that expected? It's a big deal because it ended up mixing the test results and code snippets.
I changed the jq merging script so that the final result keeps the order of the code snippets.

Another question: on the website, it says that Debug.log won't log constants, but that seems not to be the case anymore with v2.0, right? We should change that description (and maybe also insist that passing tests don't produce logs, it is mentioned but I missed it and was confused for a bit).

@jiegillet jiegillet changed the title Update elm-test-rs to v2.0 Update elm-test-rs to v2.0 and add smoke tests Mar 30, 2022
@mpizenberg
Copy link
Member

@mpizenberg I realized that elm-test-rs v2.0 doesn't export the test results in a determined order, it changes from run to run, is that expected? It's a big deal because it ended up mixing the test results and code snippets.
I changed the jq merging script so that the final result keeps the order of the code snippets.

@jiegillet so here is a somewhat complete answer to the question "in which orders are the tests?".

The reporter has the following interface:

Result String SeededRunners.Kind -> Array TestResult -> Maybe String

So the order depends on the order of that array of tests results, which is build by pushing new results as they arrive. So the order is the order in which the tests terminates. Therefore, one requirement with the current setting to have the order stable is to always run with 1 worker (--workers 1). Because multiple workers might run tests concurrently. But this is not a enough, we also need to have the dispatch order within a worker consistent. And that order is fully determined by the order in which the tests are stored within the runner model, which is obtained via the call SeededRunners.fromTest concatenatedTest flags. And that function from my package calls the function Test.Runner.fromTest from the elm-explorations/test package which generates the List of tests that I'm converting into an array.

So ultimately, depends on how the elm-exploration/test package build its list of tests for each top-level Test in our tests file. It also depends on the order in which elm-test-rs orders each top-level Test it found in tests files, but in our case for exercism, we only have a single top-level describe test so that doesn't matter. It could matter if we splitted the main describe into multiple top level definitions like follows:

suite = describe "TestFile" [ test1, test2 ]
test1 = describe "test1" ...
test2 = describe "test2" ...

But I don't know how the code extractor would behave with that. By the way, we should add the info that a test file must have a specific structure in the docs somewhere, in addition to the "no fuzz" and "string literals" tests descriptions.

Anyway, to summup, I believe that if we use --workers 1, with our setup, the order is supposed to stay consistent.

But, does it really matter since we are merging the json report of elm-test-rs with the code retrieved by the extractor. Aren't they merged by key where the key is the name property of each test? If not, that's a mistake or a forgotten task on our part ...

@mpizenberg
Copy link
Member

mpizenberg commented Mar 30, 2022

Another question: on the website, it says that Debug.log won't log constants, but that seems not to be the case anymore with v2.0, right? We should change that description (and maybe also insist that passing tests don't produce logs, it is mentioned but I missed it and was confused for a bit).

It's still right, Debug.log can log constants, but cannot log anything while being used in a constant because they are evaluated in a runtime exception when node opens the compiled js file, making it crash.

constant : Int
constant = Debug.log "this will crash" 42

function : Int -> Int
function _ =
    Debug.log "ok to log this constant" 42

We might have to change the wording, let me know if that's clearer for you. Also yes passing tests do not produce logs, might be worth making it clearer.

BIG EDIT

Sorry I mixed up Debug.log and Debug.todo in my answer. It's Debug.todo that makes the program crash, not Debug.log. But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

For example, the following test:

suite : Test
suite =
    Test.describe "Question"
        [ Test.test (Debug.log "is this captured?" "wrong answer") <|
            \_ ->
                Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
                    |> Expect.equal 42

Will produce the following output when calling elm-test-rs:

Debug logs captured when setting up tests: -----------

is this captured?: "wrong answer"

------------------------------------------------------


Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100

↓ Question
✗ wrong answer

    43
    ╷
    │ Expect.equal
    ╵
    42

    with debug logs:

The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"


TEST RUN FAILED

Duration: 2 ms
Passed:   1
Failed:   1

As you can see, the log was not captured and reported for the test, but instead at initialization.

@mpizenberg
Copy link
Member

Side note, we are not generating logs for passing tests, but we totally could, by just changing the output in this line https://github.com/mpizenberg/elm-test-runner/blob/master/src/ElmTestRunner/Reporter/Exercism.elm#L120

toExercismResult : TestResult -> ExercismResult
toExercismResult testResult =
    case testResult of
        TestResult.Passed { labels } ->
            { name = extractTestName labels
            , taskId = extractTaskId labels
            , status = "pass"
            , message = Nothing
            , output = Nothing
            }

The passed results also contains a logs field that we are discarding but we could use it just like in the failing case. Do you think that would make more sense to always report debug logs, and not only for failing tests?

@mpizenberg
Copy link
Member

In the process of building that, I ended up making a bunch of other changes

It's big! I might not be able to review today.

@mpizenberg
Copy link
Member

BIG EDIT

Sorry I mixed up Debug.log and Debug.todo in my answer. It's Debug.todo that makes the program crash, not Debug.log. But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

For example, the following test:

suite : Test
suite =
    Test.describe "Question"
        [ Test.test (Debug.log "is this captured?" "wrong answer") <|
            \_ ->
                Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
                    |> Expect.equal 42

Will produce the following output when calling elm-test-rs:

Debug logs captured when setting up tests: -----------

is this captured?: "wrong answer"

------------------------------------------------------


Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100

↓ Question
✗ wrong answer

    43
    ╷
    │ Expect.equal
    ╵
    42

    with debug logs:

The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"


TEST RUN FAILED

Duration: 2 ms
Passed:   1
Failed:   1

As you can see, the log was not captured and reported for the test, but instead at initialization.

@jiegillet
Copy link
Contributor Author

But, does it really matter since we are merging the json report of elm-test-rs with the code retrieved by the extractor. Aren't they merged by key where the key is the name property of each test? If not, that's a mistake or a forgotten task on our part ...

Thanks for the insight. Like you said, it doesn't matter now, because I'm merging by name in the order of the code snippets, but it wasn't the case before, it was just zipping the arrays as they were. This is backed by a comment

# This rely on the fact that the order is the same in both arrays.

Reading your explanation, I'm surprised this was never an issue. Maybe I never noticed because I barely fail any test? (Haha I wish 😈😈😈)

Also now, we can only have one test called test, but I have some code that is more general and could change that. Some other time though.

But for the same reason that Debug.todo crashes, Debug.log cannot be captured and attributed to the correct test when used inside a constant. This is because the log is called when node initialize the js file and not when the specific test calling that constant is run. So the log happens before any of the tests are run, and we can't put it back in the correct test entry in the json output.

OK that clears things up for me. Maybe it's worth rewording a bit, or using the examples you just used.

The passed results also contains a logs field that we are discarding but we could use it just like in the failing case. Do you think that would make more sense to always report debug logs, and not only for failing tests?

I'm not sure, I guess so? It doesn't really hurt to include them anyway. Maybe in some cases it's nice to be able to compare some internal value when a test passes and one doesn't? @ceddlyburge what do you think?
In any case, we can do that later, the only thing that would need to change here would be updating the expected result of one of the smoke tests.

And as always with big PRs, I don't expect a fast review. Please take your time :)

Copy link
Contributor

@ceddlyburge ceddlyburge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. I share Matthieus concern about maintainability. Some of the tests are very specific to format / tool version. I would probably have a few less smoke tests, to just cover the happy path, and not worry about some of the corner cases, such as missing_solution, missing_tests, and things like that.

bin/check_files.sh Outdated Show resolved Hide resolved
bin/check_files.sh Outdated Show resolved Hide resolved
bin/run.sh Show resolved Hide resolved
@ceddlyburge
Copy link
Contributor

I'm not sure, I guess so? It doesn't really hurt to include them anyway. Maybe in some cases it's nice to be able to compare some internal value when a test passes and one doesn't? @ceddlyburge what do you think?

I think I have no strong opinion at the moment on whether to include the Debug.log statements even when the tests are passing. I think it is something we will probably learn through feedback / doing some exercises ourselves and seeing what feels best, but for now I think we could leave it as it is.

Copy link
Contributor Author

@jiegillet jiegillet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the tests are very specific to format / tool version. I would probably have a few less smoke tests, to just cover the happy path, and not worry about some of the corner cases, such as missing_solution, missing_tests, and things like that.

I always feel more comfortable with a full set of tests that capture most of the behavior of the code so you can detect behavior changes and make sure all changes are intended. I only found out about the inconsistent test ordering because of that.

That being said, I do agree that missing_solution, missing_tests are quite unlikely to happen on Exercism.

Which ones would you suggest I remove?

  • lucians-luscious-lasagna
    • partial_fail
    • success
    • test_issues
  • two-fer
    • all_fail
    • does_not_compile
    • missing_solution
    • missing_tests
    • partial_fail
    • success
    • test_with_non_literal_name
    • with_debug_message

bin/check_files.sh Outdated Show resolved Hide resolved
bin/run.sh Show resolved Hide resolved
Co-authored-by: Cedd Burge <[email protected]>
@ceddlyburge
Copy link
Contributor

Which ones would you suggest I remove?

I think I would remove "test_with_non_literal_name". We know this isn't supported, and this test doesn't detect tests that use it.

I think I would just have one "partial_fail" and one "success" case.

Happy for you to go with your judgement though ...

@jiegillet
Copy link
Contributor Author

OK for test_with_non_literal_name, but I think I would want to keep partial_fail and success for both, since what is really being tested in lucians-luscious-lasagna is that the task id are there, pass or fail.

bin/check_files.sh Outdated Show resolved Hide resolved
bin/check_files.sh Outdated Show resolved Hide resolved
bin/run.sh Show resolved Hide resolved
Copy link
Contributor Author

@jiegillet jiegillet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replied to your comments.
Basically the answers boil down to "I adapted scripts from the Elixir track because they work."
But I'm not against changing them if it helps maintainability.

bin/check_files.sh Outdated Show resolved Hide resolved
bin/check_files.sh Outdated Show resolved Hide resolved
bin/run.sh Show resolved Hide resolved
Copy link
Member

@mpizenberg mpizenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a solid improvement.

  • The test runner now uses v2 of elm-test-rs, enabling tasks grouping of tests.
  • The test runner is able to work in read-only by utilizing temp/ for the exercise and the elm home.
  • The snippets are correctly matched to the tests.
  • There is a test suite to check that we don't break the container.
  • Plus a few other improvements

Thank you @jiegillet !

Dockerfile Outdated Show resolved Hide resolved
@mpizenberg
Copy link
Member

I've updated elm-test-rs to v2.0.1 with the name fixes. Logically, the smoke tests here will fail until we merge the other PR for the snippet extraction, then update the tests here.

@mpizenberg
Copy link
Member

@jiegillet I think I've updated the smoke tests correctly, I'll wait for your check. Don't hesitate to do the merge if it sounds correct.

@jiegillet
Copy link
Contributor Author

Looks good to me. Let's merge :)
Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
x:module/test-runner Work on Test Runners x:size/large Large amount of work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants