-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update elm-test-rs to v2.0 and add smoke tests #36
Conversation
I'm not super familiar with tests terminology. What does "smoke test" correspond to? In any case, we don't have any test at all except some for the code extractor done by Cedd. So some more end-to-end tests would be awesome. I mean tests that take input directories, run the whole test runner and check that the output files are expected. |
Yes, what you are describing is exactly what smoke tests are, they are high-level, sanity checks that things actually work or build. OK, I think I will add some then. There is another type of test that we could run, in the Elixir test runner we added the Elixir track repo as a git submodule, and we run the example/exemplar solution of each exercise and check that they all pass in the CI. We don't need to update the submodule in the actual repo, this is done in the CI every time. What do you think of that option? I might also add the check that the test code extractor extracts exactly as many code snippets as there are |
Do we need to add a submodule for that. Since we can clone things directly in CI it might not be needed? Except if you want to be able to run the tests locally I guess. But even then, I don't see why we'd need one if we can use git directly in the test script? I'm curious to see how you set this up anyway! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this locally and it worked for me (apart from a readlink
error on my mac, which is a pre existing problem and doesn't affect production).
Yes, I guess that's true, I never realized that. I think at first we set it up that way in Elixir because we initially thought we would need it locally, but ended up living in the CI only.
|
Adding smoke tests is definitely something you should do. All the test runners I built have them, and they've been invaluable. See https://github.com/exercism/generic-test-runner/ for an example of how we've done this. |
@jiegillet I've had a look at the elixir setup. I'm a bit puzzled by the Regarding the smoke tests, I think I prefer the generic setup from Eric where the names of the folders are explicit of what we expect the smoke test to be, like Also, having exact expected output will also be too time-consuming in my opinion for maintainability. I think I'd rather have smoke tests checking some properties, like for example check that we have the expected In the end, checking those properties on both the track exercises and a few purposely failing exercises would be enough in my opinion to have some confidence that we are not breaking everything. What do you think? |
Right, those tests run in the CI, not in Docker. If you look at the end of the CI files, there are also smoke tests in docker (which call this script), pretty much the same thing I added to the Elm analyzer, and that I plan on adding here as well. I will add a couple of failing tests as well, and check some properties like you suggest. I was not planning on adding all exercises in the docker test runner, but it would be quite simple to add later some simple checks on all exercises in the CI. |
Ok thanks @jiegillet ! The very minimum number of smoke tests then will be enough. And if you find a relatively easy way to script 1 or 2 obvious property tests that run on all exercises outputs of the test runner would be awesome. Otherwise let's open an issue and leave it for another PR. Thanks @ErikSchierboom for the approval! |
Some added changes: It now runs tests in /tmp/solution in order to use --read-only docker flag. run.sh now checks for more failure types: * If test file doesn't exist * if number of tests doesn't match number of code snippets elm-test-rs v2.0 seems to return test results in random order, so I changed the jq script to merge test results and code snippets in the order of the snippets.
OK, this suddenly went from tiny to large, so you might want to rethink your approvals :) Basically I added smoke tests for 11 situations in total, with hopefully explicit names in In the process of building that, I ended up making a bunch of other changes: checking that the number of tests results matches the number of code snippets, changing where the tests are ran so it works in @mpizenberg I realized that elm-test-rs v2.0 doesn't export the test results in a determined order, it changes from run to run, is that expected? It's a big deal because it ended up mixing the test results and code snippets. Another question: on the website, it says that |
@jiegillet so here is a somewhat complete answer to the question "in which orders are the tests?". The reporter has the following interface: Result String SeededRunners.Kind -> Array TestResult -> Maybe String So the order depends on the order of that array of tests results, which is build by pushing new results as they arrive. So the order is the order in which the tests terminates. Therefore, one requirement with the current setting to have the order stable is to always run with 1 worker ( So ultimately, depends on how the suite = describe "TestFile" [ test1, test2 ]
test1 = describe "test1" ...
test2 = describe "test2" ... But I don't know how the code extractor would behave with that. By the way, we should add the info that a test file must have a specific structure in the docs somewhere, in addition to the "no fuzz" and "string literals" tests descriptions. Anyway, to summup, I believe that if we use But, does it really matter since we are merging the json report of elm-test-rs with the code retrieved by the extractor. Aren't they merged by key where the key is the |
It's still right, constant : Int
constant = Debug.log "this will crash" 42
function : Int -> Int
function _ =
Debug.log "ok to log this constant" 42 We might have to change the wording, let me know if that's clearer for you. Also yes passing tests do not produce logs, might be worth making it clearer. BIG EDIT Sorry I mixed up For example, the following test: suite : Test
suite =
Test.describe "Question"
[ Test.test (Debug.log "is this captured?" "wrong answer") <|
\_ ->
Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
|> Expect.equal 42 Will produce the following output when calling Debug logs captured when setting up tests: -----------
is this captured?: "wrong answer"
------------------------------------------------------
Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100
↓ Question
✗ wrong answer
43
╷
│ Expect.equal
╵
42
with debug logs:
The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
TEST RUN FAILED
Duration: 2 ms
Passed: 1
Failed: 1 As you can see, the log was not captured and reported for the test, but instead at initialization. |
Side note, we are not generating logs for passing tests, but we totally could, by just changing the toExercismResult : TestResult -> ExercismResult
toExercismResult testResult =
case testResult of
TestResult.Passed { labels } ->
{ name = extractTestName labels
, taskId = extractTaskId labels
, status = "pass"
, message = Nothing
, output = Nothing
} The passed results also contains a |
It's big! I might not be able to review today. |
BIG EDIT Sorry I mixed up For example, the following test: suite : Test
suite =
Test.describe "Question"
[ Test.test (Debug.log "is this captured?" "wrong answer") <|
\_ ->
Question.answer "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
|> Expect.equal 42 Will produce the following output when calling Debug logs captured when setting up tests: -----------
is this captured?: "wrong answer"
------------------------------------------------------
Running 2 tests. To reproduce these results later,
run elm-test-rs with --seed 1103062080 and --fuzz 100
↓ Question
✗ wrong answer
43
╷
│ Expect.equal
╵
42
with debug logs:
The question was: "What is the Answer to the Ultimate Question of Life, The Universe, and Everything?"
TEST RUN FAILED
Duration: 2 ms
Passed: 1
Failed: 1 As you can see, the log was not captured and reported for the test, but instead at initialization. |
Thanks for the insight. Like you said, it doesn't matter now, because I'm merging by
Reading your explanation, I'm surprised this was never an issue. Maybe I never noticed because I barely fail any test? (Haha I wish 😈😈😈) Also now, we can only have one test called
OK that clears things up for me. Maybe it's worth rewording a bit, or using the examples you just used.
I'm not sure, I guess so? It doesn't really hurt to include them anyway. Maybe in some cases it's nice to be able to compare some internal value when a test passes and one doesn't? @ceddlyburge what do you think? And as always with big PRs, I don't expect a fast review. Please take your time :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. I share Matthieus concern about maintainability. Some of the tests are very specific to format / tool version. I would probably have a few less smoke tests, to just cover the happy path, and not worry about some of the corner cases, such as missing_solution, missing_tests, and things like that.
I think I have no strong opinion at the moment on whether to include the Debug.log statements even when the tests are passing. I think it is something we will probably learn through feedback / doing some exercises ourselves and seeing what feels best, but for now I think we could leave it as it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the tests are very specific to format / tool version. I would probably have a few less smoke tests, to just cover the happy path, and not worry about some of the corner cases, such as missing_solution, missing_tests, and things like that.
I always feel more comfortable with a full set of tests that capture most of the behavior of the code so you can detect behavior changes and make sure all changes are intended. I only found out about the inconsistent test ordering because of that.
That being said, I do agree that missing_solution, missing_tests are quite unlikely to happen on Exercism.
Which ones would you suggest I remove?
- lucians-luscious-lasagna
- partial_fail
- success
- test_issues
- two-fer
- all_fail
- does_not_compile
- missing_solution
- missing_tests
- partial_fail
- success
- test_with_non_literal_name
- with_debug_message
Co-authored-by: Cedd Burge <[email protected]>
I think I would remove "test_with_non_literal_name". We know this isn't supported, and this test doesn't detect tests that use it. I think I would just have one "partial_fail" and one "success" case. Happy for you to go with your judgement though ... |
OK for test_with_non_literal_name, but I think I would want to keep partial_fail and success for both, since what is really being tested in lucians-luscious-lasagna is that the task id are there, pass or fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replied to your comments.
Basically the answers boil down to "I adapted scripts from the Elixir track because they work."
But I'm not against changing them if it helps maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a solid improvement.
- The test runner now uses v2 of elm-test-rs, enabling tasks grouping of tests.
- The test runner is able to work in read-only by utilizing
temp/
for the exercise and the elm home. - The snippets are correctly matched to the tests.
- There is a test suite to check that we don't break the container.
- Plus a few other improvements
Thank you @jiegillet !
I've updated elm-test-rs to v2.0.1 with the name fixes. Logically, the smoke tests here will fail until we merge the other PR for the snippet extraction, then update the tests here. |
@jiegillet I think I've updated the smoke tests correctly, I'll wait for your check. Don't hesitate to do the merge if it sounds correct. |
Looks good to me. Let's merge :) |
Solves #35.
This seems a little too simple :)
I'm a little bit weary of merging this now because we don't have any smoke tests. Maybe I should add some?
In any case, I would like to get started on tagging the concept exercise tests first, this will help with testing (at the very least locally, or better in the CI).