Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger tests become flaky #114

Closed
Nereboss opened this issue May 13, 2024 · 68 comments
Closed

Larger tests become flaky #114

Nereboss opened this issue May 13, 2024 · 68 comments
Labels
bug Something isn't working

Comments

@Nereboss
Copy link

Describe the bug

Tests for a function that calls multiple extension methods become flaky.

In our application we have multiple parsers that each have a dialog which consists of multiple questions.
All questions are tested separately from another and no test there has ever been flaky.

Now the dialog of a parser is one function that calls multiple questions, a bit like this:

Dialog {
result1 = question1()
result2 = question1()
return result 1 + result 2
}

However, the tests for that dialog function become flaky and it looks like the terminal is somehow "swallowing keys" with the starting keys of the user input missing.

To Reproduce

I wrote a minimal example to make this issue reproducible:

We have extension functions for the different questions like this

fun Session.question1(callback: suspend RunScope.() -> Unit): String {
    var userInput = "-1"
    section {
        textLine("render some text and then display the input:")
        input()
    }.runUntilSignal {
        onInputEntered {
            signal()
            userInput = input
        }
        callback()
    }
    return userInput
}

Tests for this function alone are not flaky, here is a test to try it out:

@Test
    fun `does question1 ever become flaky`() {
        val testString = "Long input that should test if the length has any effect on anything"
        var result: String
        testSession { terminal ->
            result = question1(callback = {
                terminal.type(testString)
                terminal.press(Keys.ENTER)
            })
            assertThat(result).isEqualTo(testString)
        }
    }

Now we build a dialog from multiple questions with a function like this (the code of question2 is identical to the above shown question1):

fun Session.dialog(
    callback: suspend RunScope.() -> Unit = {},
): String {
    val result1: String = question1(callback)
    val result2: String = question2(callback)

    return "$result1|||$result2"
}

Tests for this dialog function are flaky with the start of the simulated input being cut off (more often for the 2nd question than the first but it does happen for the first as well). Here is such a test to try it out:

@Test
fun `test if calling the function that calls fun1 from here becomes flaky`() {
    val testString = "Long input that should test if the length has any effect on anything"
    var result: String
    testSession { terminal ->
        result = dialog(callback = {
            terminal.type(testString)
            terminal.press(Keys.ENTER)
        })
        assertThat(result).isEqualTo("$testString|||$testString")
    }
}

It seems like the tests fail more often when the CPU is busy. They also fail more on weaker hardware.
To make it more easily reproducible, we pasted the test for the dialog 6 times into a testclass. That way it most of the time fails within the first 10 executions.

Expected behavior
The tests are not flaky.

Screenshots
Example of the first 2 letters of the input for the 2nd question being "swallowed"
image

Desktop (please complete the following information):

  • OS: MacOS (also tested on Windows with the same result)
  • Version 1.1.3-SNAPSHOT

Additional context
As a testing environment we used Junit5, Gradle8 and Kotter version 1.9.23.
Annotating the tests with @Execution(ExecutionMode.SAME_THREAD) also did not influence the result.

@Nereboss Nereboss added the bug Something isn't working label May 13, 2024
@Nereboss
Copy link
Author

@bitspittle friendly poke in case your email settings are still swallowing issues ^^

@bitspittle
Copy link
Contributor

Thanks for the poke! I did indeed miss it, and you are welcome to poke like this anytime I don't respond within 48 hours.

I probably won't be able to take a close look at this until after my talk on June 7 (well and a few days after, which I'll be spending with friends in the Bay Area), but my intention is to prioritize after that. Worst case there's actually a threading issue in core Kotter itself, which would be worth understanding.

I do appreciate you bringing this up; I won't push out 1.1.3 until we either fix this or understand it better.

@bitspittle
Copy link
Contributor

Just an FYI, finally back from my talk and getting back into working on stuff again. Will probably be busy today and tomorrow but plan on looking into this flaky issue this week. Thanks for your patience!

@bitspittle
Copy link
Contributor

I am able to repro this issue locally. Coincidentally enough, it may be related to the same bug opened by another user around this same time, see also: #116

Hopefully we can figure things out tomorrow. It sure seems like occasionally the keyboard input flow from the previous section doesn't shut down early enough and therefore the next section tries to start up and therefore briefly sees some of its keys getting swallowed.


"Good news" is that this is the sort of thing that in practice would only happen at the speeds of a computer running tests in a tight loop; it would not be expected for any user to notice this themselves (and even if they somehow did, it would be a single character missed at the beginning of what they were typing, which would be easy to retype)

@bitspittle
Copy link
Contributor

bitspittle commented Jun 22, 2024

Been a week since my last reply, but I got hit with a bunch of Kobweb stuff, sorry for yet another delay! Finally had some time to dig today.

So looking into this, this is really tricky. The issue is fundamentally dealing with subhuman speeds and coroutine timing. Basically when you call input we do two things:

  1. create a Kotlin flow that read bytes from the terminal and convert them to Kotter Key instances. This flow is tied to the session lifecycle.
  2. launch a Kotlin coroutine job that collects input Key events and updates a bunch of state that ends up in the final output that you see rendered on the screen, e.g. this is where cursor presses, the ENTER key, and etc. are handled. This work is tied to the section lifecycle.

With the flake you are seeing, what's happening is:

  1. You start the first section, which spins up the "byte to Key" flow and shortly after launches the "process Key" job.
  2. Eventually, the first section exits. The "byte to Key" flow is left alive while the "process Key" job is cancelled.
  3. You run the second section, which launches a second "process Key" job. During this time, you immediately start sending keypresses.
  4. The "byte to Key" flow handles these key presses, converting them to Key instances and sending them off. However...
  5. The "process Key" job is not ready yet, so some of those initial Key events are dropped in that brief window.

I'm still hoping to fix this in a clean way, but so far my efforts have all failed.

I tried tying the "byte to Key" flow to the section lifecycle instead of the session lifecycle, but it's not easy -- it's tricky to cancel and then restart the flow. Meanwhile, tying the "process Key" job to the session lifecycle caused at least one of my other input tests to hang. Neat.

If you want a temporary fix, you can do something like this:

        onInputEntered {
            signal()
            userInput = input
        }
+       delay(16) // Give Kotter some time to set up `input` machinery
        callback()

@phanlezz
Copy link

phanlezz commented Dec 6, 2024

Hello @bitspittle,
I tried adding a delay(20) to all our prompts, but now the test sometimes get stuck indefinitely.

More specifically, on every of our Session.myPrompt([...]) I added, as you described, the delay before callback() inside the runUntilSignal block after section.

Most of the time it ran through, but in one of six runs it got stuck. (Usually the "Enter" key gets eaten, similar to the strings in the original issue).

Do you have an idea on how to continue?

edit:
I was not able to recreate the original example while using the delays, currently it just gets stuck.

@bitspittle
Copy link
Contributor

Nuts! That's disappointing to hear. You're using 1.2.0?

It's been a long time since we had this discussion last. Can you either share your project with me or share exact code so I don't have to piece together code from old comments?

@phanlezz
Copy link

phanlezz commented Dec 6, 2024

Nuts! That's disappointing to hear. You're using 1.2.0?

It's been a long time since we had this discussion last.

Yeah, only started yesterday looking into this branch again. It's kinda a mess. On this branch we are still using 1.1.3-SNAPSHOT, do you reckon the new version can help with this issue?

Can you either share your project with me or share exact code so I don't have to piece together code from old comments?

Sure, branch: https://github.com/MaibornWolff/codecharta/tree/tech/3483/integrate-kotter
Current test file: dialog, customPrompts

You need to disable caching in the gradle config and then spam build runs, then the error usually occurs rather quickly.

Sorry for all this extra work, I just started working on this issue.

@bitspittle
Copy link
Contributor

So I added delay(20) at this line

onInputEntered {
   /* ... */
}
delay(20) // <--------
onInputReady()

and I ran InquirerTest.kt and ParserDialogTest.kt a bunch of times (using the IntelliJ UI, so e.g. go to InquirerTest.kt and click on the "run" arrow next to the class InquirerTest line), and I can't hit any instances of things hanging.

Maybe you can share your exact repro steps? I guess you're using Gradle?

@phanlezz
Copy link

phanlezz commented Dec 6, 2024

Thank you again for looking into it :)
There are two important things:

  1. The Inquirer only provides the base templates for prompts. Those base prompts will then be used in multiple dialogs (e.g. ParserDialog.kt) to ask the user for the necessary information. The tests (InquirerTest), that directly cover the prompts, never failed iirc.
  2. The problem has a way higher chance of occurring when the whole project is being build. (./gradlew clean build in analysis/) It seems like a high CPU load is required to reproduce it. ( If I remember correctly I only got it to crash with the dialog tests directly one or two times )

You need to run the tests for the dialog dialog test file, as they have multiple inputs back to back, and those tests are what we are trying to fix.

with "to run the tests for the dialog" I mean you need to build the whole project which includes those tests, because the Dialog tests alone do not stress the CPU enough.

So step-by-step:

  1. Disable gradle caching, reload gradle model, clean cache manually if necessary
  2. Build the project back to back

I'm also using Intellij, the build and clean task in the side bar are usually enough.

You can dm me on discord if you want to, just joined your server

@bitspittle
Copy link
Contributor

I'm getting ktlint failures

> Task :filter:StructureModifier:ktlintMainSourceSetCheck FAILED
/Users/d9n/tmp/codecharta/analysis/filter/StructureModifier/src/main/kotlin/de/maibornwolff/codecharta/filter/structuremodifier/ParserDialog.kt:25:13 Newline expected after opening parenthesis
/Users/d9n/tmp/codecharta/analysis/filter/StructureModifier/src/main/kotlin/de/maibornwolff/codecharta/filter/structuremodifier/ParserDialog.kt:26:13 Parameter should start on a newline

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

Try ktlintFormat under formatting. I'm so sorry, this branch is a mess 😢
grafik

@bitspittle
Copy link
Contributor

AFK for a few hours, will follow up later!

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

AFK for a few hours, will follow up later!

Okay 👍 I will go to sleep now, its 1 AM here. I will check back tomorrow. I just pushed the formatting changes and the delay as discussed previously. Should be easier to test now

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

To reproduce pretty quickly...

  • In ParserDialogTest.kt
  • Add this infinite test:
    @Test
    fun `run until frozen`() {
        var i = 0
        while (true) {
            println("Test #${i++}...")
            `should prompt user twice for input file when first input file is invalid`()
        }
    }
  • See how far you get before things freeze. (Not very in my experience)

Note that this freezes even without any delay. Hmmmmm...

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

I've going to have to stop for a while here to cook dinner, but it looks like the issue may be something I didn't expect. Basically this code:

                actionCallback = {
                    terminal.press(Keys.DOWN)
                    terminal.press(Keys.DOWN)
                    terminal.press(Keys.DOWN)
                    terminal.press(Keys.ENTER)
                },

must be sometimes dropping a key press? Because sometimes it ends up only selecting "Move nodes" and not "Remove nodes" (implying one of the "down" presses was lost). Then, because you didn't register any callback for move nodes, your test hangs.

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

Oh wait, there are a BUNCH of onInputReady calls in Inquirer.kt. Search for ALL of them and add a delay(20) before them.

@bitspittle
Copy link
Contributor

Totally unrelated to the bug, here's two optional suggestions which you can ignore:

  1. Add support for HOME and END keys
onKeyPressed {
    when (key) {
        Keys.UP ->
        if (selection > 0) {
        selection -= 1
        }
        Keys.DOWN ->
        if (selection < choices.size - 1) {
        selection += 1
        }
        Keys.HOME -> {
            selection = 0
        }
        Keys.END -> {
            selection = choices.size - 1
        }
        Keys.ENTER -> {
        result = choices[selection]
        signal()
        }
    }
}
  1. Use the lastIndex extension property for readability:
// BEFORE
if (selection < choices.size - 1) {
selection += 1
}

// AFTER
if (selection < choices.lastIndex) {
selection += 1
}

@bitspittle
Copy link
Contributor

BTW I was going to recommend upgrading to 2.0.0 which should have been identical (although some extension method imports had changed, still a pretty trivial fix), but now a test is failing that wasn't in 1.1.3. Until I figure it out, I'd stick with 1.1.3...

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

Oh wait, there are a BUNCH of onInputReady calls in Inquirer.kt. Search for ALL of them and add a delay(20) before them.

I think the version I pushed eight hours included them already. But maybe I missed one...

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

To reproduce pretty quickly...

* In `ParserDialogTest.kt`

* Add this infinite test:
    @Test
    fun `run until frozen`() {
        var i = 0
        while (true) {
            println("Test #${i++}...")
            `should prompt user twice for input file when first input file is invalid`()
        }
    }
* See how far you get before things freeze. (Not very in my experience)

I will test that.

Note that this freezes even without any delay. Hmmmmm...

I thought the delay should help with the freezes and dropped input? Or am I misunderstanding something here 😲

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

The infinite test doesn't really stop for me (commit d063c96d8554522396af67eb7c61173f5e03453f)
grafik

(Edit: Which is a good thing, with 1000 successful runs, the test would definitely be safe enough)

@bitspittle
Copy link
Contributor

I thought the delay should help with the freezes and dropped input? Or am I misunderstanding something here

So when I wrote this comment, I actually didn't have delays everywhere yet. I only had them in one place, without realizing that the pattern was duplicated across many functions and I had to add them everywhere. Once I did, all your tests passed for me.

The infinite test doesn't really stop for me

In my case, once the infinite test ran forever, that should mean all tests will pass. You're not seeing that?

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

grafik

No, sadly even with a two minute time out, the tests sometimes gets stuck, if I build the complete project. The infinite test is not similar enough to a complete project build (./gradlew clean build).

Edit: It failed after three successful runs

Don't mind the repetition, it was something I also tried

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

I think what you may want to do is change all your default callbacks in all methods that use input from {} to { press(ENTER) }

Your tests will still fail but they shouldn't hang anymore and maybe you'll get a more meaningful error message.

I'm deep diving into the 1.1.3 vs 1.2.0 bug. It's very hard to understand and debug so far, and I see the different output but I have no idea what the cause is! I'm a little worried 1.2.0 has some issue that got introduced somehow, so I'm prioritizing it at the moment.

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

I think what you may want to do is change all your default callbacks in all methods that use input from {} to { press(ENTER) }

Your tests will still fail but they shouldn't hang anymore and maybe you'll get a more meaningful error message.

Sorry I'm not sure if i understand that correctly, can you expand on this? 🤔

I'm deep diving into the 1.1.3 vs 1.2.0 bug. It's very hard to understand and debug so far, and I see the different output but I have no idea what the cause is! I'm a little worried 1.2.0 has some issue that got introduced somehow, so I'm prioritizing it at the moment.

No worries, that's very understandable.

@bitspittle
Copy link
Contributor

bitspittle commented Dec 7, 2024

Sorry I'm not sure if i understand that correctly, can you expand on this? 🤔

Ah, it's not as straightforward as I first thought. But maybe something like this:

// Before
internal fun Session.myCollectParserArgs(
fileCallback: suspend RunScope.() -> Unit = {},
actionCallback: suspend RunScope.() -> Unit = {},
printCallback: suspend RunScope.() -> Unit = {},
setRootCallback: suspend RunScope.() -> Unit = {},
moveFromCallback: suspend RunScope.() -> Unit = {},
moveToCallback: suspend RunScope.() -> Unit = {},
removeNodesCallback: suspend RunScope.() -> Unit = {},
outFileCallback: suspend RunScope.() -> Unit = {},
): List<String> {

// After
internal fun Session.myCollectParserArgs(
defaultCallback: suspend RunScope.() -> Unit = {},
fileCallback: suspend RunScope.() -> Unit = defaultCallback,
actionCallback: suspend RunScope.() -> Unit = defaultCallback,
printCallback: suspend RunScope.() -> Unit = defaultCallback,
setRootCallback: suspend RunScope.() -> Unit = defaultCallback,
moveFromCallback: suspend RunScope.() -> Unit = defaultCallback,
moveToCallback: suspend RunScope.() -> Unit = defaultCallback,
removeNodesCallback: suspend RunScope.() -> Unit = defaultCallback,
outFileCallback: suspend RunScope.() -> Unit = defaultCallback,
): List<String> {

Then, in test code:

session.myCollectParserArgs(
   defaultCallback = { terminal.press(ENTER) },
   /* ... */
)

That way, if an unexpected option is ever chosen (due to a dropped keypress), you won't end up on an option that just does nothing.

@phanlezz
Copy link

phanlezz commented Dec 7, 2024

I will check back tomorrow, thank you for your time :)

@bitspittle
Copy link
Contributor

bitspittle commented Dec 10, 2024

OK @phanlezz try the latest 1.2.1-SNAPSHOT and let me know if that works! Today was insane, a deep dive into hell and back.

Local testing on my end seems good so far, but of course feel free to confirm or deny.

The current version should be a lot more robust / correct, and I was able to get things working where lifecycles for input logic are now only tied to the sections that use it. However, now that I know a bit more, I think I can do better, so I'm going to keep poking at this a bit tomorrow.

@phanlezz
Copy link

phanlezz commented Dec 10, 2024

OK @phanlezz try the latest 1.2.1-SNAPSHOT and let me know if that works! Today was insane, a deep dive into hell and back.

Local testing on my end seems good so far, but of course feel free to confirm or deny.

The current version should be a lot more robust / correct, and I was able to get things working where lifecycles for input logic are now only tied to the sections that use it. However, now that I know a bit more, I think I can do better, so I'm going to keep poking at this a bit tomorrow.

Sounds awesome! I will start the testing now. Should I swap the inmemory versions back to the other ones?

Edit: The previous function doesn't exist anymore, solving the question 😄

@phanlezz
Copy link

grafik

Good news: The ParserDialogTest.kt runs in isolation pretty good. I was not able to crash it.
Sadly, when running all tests, the ParserDialogTest.kt sometimes fails. ~ Once every five runs?
It timed out again, with a 10 second timeout.

I'm not sure if this is any helpful:

   Suppressed: java.lang.InterruptedException
   	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:94)
   	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
   	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
   	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
   	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
   	at com.varabyte.kotter.platform.concurrent.locks.ReentrantReadWriteLock$ReaderLock.lock(ReentrantReadWriteLock.kt:44)
   	at com.varabyte.kotter.runtime.concurrent.ConcurrentScopedData.isActive(ConcurrentScopedData.kt:402)
   	at com.varabyte.kotter.runtime.Session.assertNoActiveSections$kotter(Session.kt:49)
   	at com.varabyte.kotter.foundation.SessionKt.session(Session.kt:83)
   	at com.varabyte.kotter.foundation.SessionKt.session$default(Session.kt:67)
   	at com.varabyte.kotterx.test.foundation.TestSessionKt.testSession(TestSession.kt:36)
   	at com.varabyte.kotterx.test.foundation.TestSessionKt.testSession$default(TestSession.kt:30)
   	at de.maibornwolff.codecharta.filter.structuremodifier.ParserDialogTest.should output correct arguments when print structure is selected(ParserDialogTest.kt:29)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
   	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
   	at org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
   	... 77 more

bitspittle added a commit that referenced this issue Dec 10, 2024
Before, we tried to let various jobs ask nicely if the upstream flow
would wait for them to be ready.

However, it turns out it is easier (and more robust) just to block
progress until these jobs finish connecting to the flows before
continuing.

For a concrete example, here is a test that used to fail:

```
section { }.runUntilSignal {
  onKeyPressed {
    if (key == Keys.Q) signal()
  }
  terminal.press(Keys.Q)
}
```

The reason this would fail is often the `terminal.press(Keys.Q)` line would
get called SO QUICKLY and at JUST SUCH A TIME that its key would get processed
when the upstream flow was ready but the downstream "key processing job" was
not.

Now, what happens is `onKeyPressed` simply blocks, inperceptibly to the human
eye, so that by the time `terminal.press(Keys.Q)` is called, everything is
guaranteed to be hooked up and ready to go.

Fixes #114
@bitspittle
Copy link
Contributor

Alright my friend @phanlezz, I have now ate, drank, and slept in this code for three days, and I think I finally got it.

After exploring probably a dozen of intermediate approaches that never saw the light of day, I think I have something that is both very robust and relatively simple (probably not a coincidence).

I have run tests on your code a bunch locally now, as well as my own, and so far it is holding up. I hope you'll be able to confirm the same on your end. We shall see!

See also:
https://bsky.app/profile/bitspittle.bsky.social/post/3lcy4vycluc2a

P.S. Hopefully you will download 1.2.1-SNAPSHOT automatically since it's been ~24 hours, but if you ever need to force it, here is some advice I got from ChatGPT (which looks right but you'll have to confirm)

@phanlezz
Copy link

I didn't get notified that you answered, GitHub swallowed the notification. Sorry for letting you wait.
I will test it right now and will report back! Regarding the SNAPSHOTs, I usually deleted the .gradle cache and re-downloaded everything, and then compared the md5sum of the *.jar file before and after to make sure that they are different. Not sure if this is reliable at all, but it worked in the past 😄 I will test the options given by gpt

@phanlezz
Copy link

phanlezz commented Dec 11, 2024

I ... I don't really know how to say it, but it failed again 😢
After 39 successful runs the ParserDialogTest got stuck again, with a 60 second timeout.
grafik

And after another 19 successful runs an assertion in InquirerTest failed with output (img below):

grafik

I think we reached a point where it runs good enough, and I can continue with the integration. I need to check if those errors also happen during our GitHub workflows.

I noticed a strange behavior with the run time of one of the dialog tests. should output correct arguments when print structure is selected() usually take 9 times longer than the second longest. (9 s vs 1 s). I haven't looked into it at all, because it don't have much time, so it could be that that is just some weird difference in implementation on our side.

I triple checked to make sure that if have the correct snapshot version, and the md5sum from the repository matches the md5sum of my local jar, that gradle used. Maybe it is just a weird behavior on my side, if you can recreate it... ? 🤔

@bitspittle
Copy link
Contributor

What is "auto run"? I'd like to run it myself to see if I can reproduce locally. Also, did you see which test exactly failed in ParserDialogTest?

I woke up this morning and realized there's one more place I can probably simplify. If it works it might be the place the input event is getting missed.

Or maybe we're running into a system issue where you're running out of I/O threads due to how hard you are hammering the system with tests (especially if things are running massively in parallel)? Not likely but maybe worth mentioning.

@phanlezz
Copy link

What is "auto run"? I'd like to run it myself to see if I can reproduce locally. Also, did you see which test exactly failed in ParserDialogTest?

autorun.sh

#!/bin/bash

counter=1
correctRuns=0

while [ $counter -le 20 ];
do
  echo "Iteration:" $counter
  returnCode="1"
  ./gradlew test --rerun-tasks -q
  returnCode="$?"

  if [ $returnCode = "0" ]; then
    correctRuns=$((correctRuns+1))
  else
   # replace with echo if test should continue on failure
    exit
  fi

  counter=$((counter+1))
done

echo "Correct runs:" $correctRuns

Sadly, I didn't save every log, but I have those two here:

  • java.util.concurrent.TimeoutException: should output correct arguments when print structure is selected() timed out after 10 seconds
  • java.util.concurrent.TimeoutException: should output correct arguments when move nodes is selected() timed out after 60 seconds

_> I increased the timeout after the first exit, as I realized that 10 s might be a little too short.

I woke up this morning and realized there's one more place I can probably simplify. If it works it might be the place the input event is getting missed.

👍

Or maybe we're running into a system issue where you're running out of I/O threads due to how hard you are hammering the system with tests (especially if things are running massively in parallel)? Not likely but maybe worth mentioning.

It could definitely be something like that. That's why suggested that maybe it is fine as is, because I/O stuff should not happen on GitHub. I mentioned a few times, that is seems like a CPU busy thing to me, as gradle spawns 1 (?) worker per thread and, although the Dialog test itself runs in sequence, many other tests are executed simultaneously.

@bitspittle
Copy link
Contributor

New 1.2.1-SNAPSHOT is published. I simplifed kotter a bit further. (I had introduced an intermediate buffer at one point while flailing, and I realized this morning with the final approach I went with that I didn't need it anymore). I'm not sure I can go any simpler at this point.

Thank you for the autorun script. I did a quick test run with two iterations and looks like it's working. I'll give it a try on my machine with 20 iterations when I eat lunch today.

I mentioned a few times, that is seems like a CPU busy thing to me, as gradle spawns 1 (?) worker per thread and, although the Dialog test itself runs in sequence, many other tests are executed simultaneously.

Yeah, it could be something like that! The current test timeout defaults are chosen assuming a fairly normal machine under normal load. It would be interesting to tell the difference between a super delayed run because the system is still just waiting for a worker on the I/O thread pool to get its chance to run or if I have a really sensitive race condition at this point, and my input system is truly left waiting forever on a concurrency latch that will never open.

Either way, the system is in WAY bettter shape than it was in 1.2.0, so thank you!

@bitspittle
Copy link
Contributor

BTW @phanlezz are you the same user as Nereboss? I'd like to share thanks in the release notes when I publish 1.2.1, and I'm wondering if I should just be calling out your account or both.

@bitspittle
Copy link
Contributor

java.util.concurrent.TimeoutException: should output correct arguments when print structure is selected() timed out after 10 seconds
java.util.concurrent.TimeoutException: should output correct arguments when move nodes is selected() timed out after 60 seconds

I just ran both these tests in a loop, 1000 times each, without issue. So I'm going to assume any timeout that occurs after the latest snapshot will be related to thread starvation.

@phanlezz
Copy link

BTW @phanlezz are you the same user as Nereboss? I'd like to share thanks in the release notes when I publish 1.2.1, and I'm wondering if I should just be calling out your account or both.

We are different persons :) You can mention us both.

@phanlezz
Copy link

phanlezz commented Dec 11, 2024

java.util.concurrent.TimeoutException: should output correct arguments when print structure is selected() timed out after 10 seconds
java.util.concurrent.TimeoutException: should output correct arguments when move nodes is selected() timed out after 60 seconds

I just ran both these tests in a loop, 1000 times each, without issue. So I'm going to assume any timeout that occurs after the latest snapshot will be related to thread starvation.

Yeah, I don't think that the tests fail in isolation at all. In the current test suit there is still one test in InquirerTest.kt that runs 1k times iirc, and I let the infinite test run set with should output correct arguments when print structure is selected() till ~25k iterations.
Thread starvation might be the right term here :)

@bitspittle
Copy link
Contributor

bitspittle commented Dec 11, 2024

So I went off today for three hours and kickstarted an autorun. So sure enough it failed on the second iteration, of course it did :P

But not in Kotter? Once in ParseFileToTable.kt and a second time in ParseFolderToTable.kt

test.zip

This was with my local version of your code, so I ended up going back to tech/3483/integrate-kotter and pulling latest. Will test again shortly.

BTW, it's up to you, but I don't think you have to have input should not break across multiple sections in your tests. My own tests already it, so you can save yourself a small bit of testing time by removing it in your codebase, if you want.

Finally, I'll say it again in the release notes, but really, thanks so much for the codecharta codebase. Having a clone of it was really helpful to make progress on this issue.

@bitspittle
Copy link
Contributor

Got to iteration 17 this time. Failure was in: JavaSonarAnalyzerTest. commented out blocks of code is zero if issue is not found()

test.zip

So far no Kotter on my end :) (unless these libraries are somehow delegating to Kotter and I didn't realize it!)

@phanlezz
Copy link

phanlezz commented Dec 12, 2024

BTW, it's up to you, but I don't think you have to have input should not break across multiple sections in your tests. My own tests already it, so you can save yourself a small bit of testing time by removing it in your codebase, if you want.

I probably will remove it moving forward

Finally, I'll say it again in the release notes, but really, thanks so much for the codecharta codebase. Having a clone of it was really helpful to make progress on this issue.

Glad it helped :)

Now regarding your test run failures. SourceCodeParser is known to be flaky #3598 and already in queue for refactoring/updates #3631.
It seems to me, that failure in the SourceCodeParser is very rare on Windows (~1/1000) and quite rare on Linux (1/100). Maybe it happens on MacOs more often? SourceCodeParser currently has nothing to do with Kotter.
It would be nice if you could reproduce the Kotter "issue" on your end, but maybe it is just more likely on my hardware...

Edit: I added your zip files to the issue. I'm not sure if we will fix this bug, or start by refactor through an update of the used library

@phanlezz
Copy link

New 1.2.1-SNAPSHOT is published. I simplifed kotter a bit further. (I had introduced an intermediate buffer at one point while flailing, and I realized this morning with the final approach I went with that I didn't need it anymore). I'm not sure I can go any simpler at this point.

I will let some tests run on this new version. I think I missed it when replying yesterday, as I was not working on the code at that time.

Today I actually removed the gradle caches, as --refresh-dependencies didn't do anything. Verified latest version via md5.

@phanlezz
Copy link

So after 20 iterations I got an failure in InquirerTest.kt :
test.zip

I can confirm now that the SouceCodeParser is quite flaky on MacOS. Maybe it can be disabled (temporarily) to test kotter...

@bitspittle
Copy link
Contributor

bitspittle commented Dec 12, 2024

Thanks, that is a fascinating flake. It does appear from your output that one of two things happened:

  1. Somehow, the UP key was dropped but no other keys surrounding it were
  2. A final text render didn't happen (meaning Kotter's state was correct but for some reason missed the rerender request or got interrupted?)

I'm going to assume 2)

@bitspittle
Copy link
Contributor

So I made an alteration to ./autorun.sh, where I only spammed kotter tests, which makes it much faster:

!/bin/bash

counter=1
correctRuns=0

while [ $counter -le 100 ];
do
  echo "Iteration:" $counter
  returnCode="1"
  ./gradlew :tools:Inquirer:test :filter:StructureModifier:test --rerun-tasks -q
  returnCode="$?"

  if [ $returnCode = "0" ]; then
    correctRuns=$((correctRuns+1))
  else
   # replace with echo if test should continue on failure
    exit
  fi

  counter=$((counter+1))
done

echo "Correct runs:" $correctRuns

Notice specifically I set the counter to 100 and changed the ./gradlew line to call inquirer and structure modifier tests directly.

After 100 successful runs, I'm going to assume that Kotter is stable at this point, and that you probably won't see flakes on your CI ever moving forward, that the flakes we're seeing are related to an overloaded system maybe.

Of course, let me know if you see something different.


⚠️ BTW, something I suggest trying. Add --no-daemon to the end of your ./gradlew line. ⚠️

  ./gradlew :tools:Inquirer:test :filter:StructureModifier:test --rerun-tasks -q --no-daemon

What this should do is ensure that Gradle win spin down and release its resources after the tests run.

Maybe this will do nothing, but one possibility is this will make things more equivalent to the clean slate that a CI runner will be in when it runs your tests. Maybe it's an overwhelmed, long running Gradle environment that is causing your tests to flake on latter runs??

Anyway, I've attached my script if you want to see it (which I zipped because GitHub won't let me upload raw .sh files):

autorun.sh.zip

@bitspittle
Copy link
Contributor

BTW I was thinking about releasing 1.2.1 either this weekend or on Monday, as even if there's more improvements that can be made, I think this is already a strict improvement on top of 1.2.0. However, I'd like to give you more time to play with it just in case you run into an issue over the next few days.

How confident would you be of a 1.2.1 release if you heard me announce it tomorrow?

@phanlezz
Copy link

BTW I was thinking about releasing 1.2.1 either this weekend or on Monday, as even if there's more improvements that can be made, I think this is already a strict improvement on top of 1.2.0. However, I'd like to give you more time to play with it just in case you run into an issue over the next few days.

I would say, that you definitely can release 1.2.1.
Sadly, I won't have time the next couple days to continue testing. There is another project that requires my attention.
Afterwards, I will continue to integrate Kotter into more dialogs, which in return provides more actual test runs on GitHub, which is what really matters in the end.

How confident would you be of a 1.2.1 release if you heard me announce it tomorrow?

I don't really understand this statement, I'm pretty confident in the library and it's releases 😲

@phanlezz
Copy link

So I made an alteration to ./autorun.sh, where I only spammed kotter tests, which makes it much faster:

(...)

Notice specifically I set the counter to 100 and changed the `./gradlew` line to call inquirer and structure modifier tests directly.

Good idea to spam the troubling tests.

After 100 successful runs, I'm going to assume that Kotter is stable at this point, and that you probably won't see flakes on your CI ever moving forward, that the flakes we're seeing are related to an overloaded system maybe.

Of course, let me know if you see something different.

I may have time for some quick runs before bed...

⚠️ BTW, something I suggest trying. Add --no-daemon to the end of your ./gradlew line. ⚠️

  ./gradlew :tools:Inquirer:test :filter:StructureModifier:test --rerun-tasks -q --no-daemon

What this should do is ensure that Gradle win spin down and release its resources after the tests run.

Maybe this will do nothing, but one possibility is this will make things more equivalent to the clean slate that a CI runner will be in when it runs your tests. Maybe it's an overwhelmed, long running Gradle environment that is causing your tests to flake on latter runs??

Could be, I can try it.

Anyway, I've attached my script if you want to see it (which I zipped because GitHub won't let me upload raw .sh files):

autorun.sh.zip

Thanks GitHub :)

@phanlezz
Copy link

Thanks, that is a fascinating flake. It does appear from your output that one of two things happened:

1. Somehow, the UP key was dropped but no other keys surrounding it were

2. A final text render didn't happen (meaning Kotter's state was correct but for some reason missed the rerender request or got interrupted?)

I'm going to assume 2)

The error reminded me of some test runs a while time ago, where the beginning of an input string would be dropped... Well I will check if anything pops up through your modified autorun.sh

@bitspittle
Copy link
Contributor

The error reminded me of some test runs a while time ago, where the beginning of an input string would be dropped... Well I will check if anything pops up through your modified autorun.sh

Dropping the first keypress (or a few) makes sense -- that's absolutely the old buggy behavior that should be fixed. Early dropped keypresses indicated that your test code was running (e.g. terminal.press("hello")) before the system finished hooking itself up.

However, getting early keypresses but then losing a keypress in the middle or at the end? That should logically be impossible, as far as I'm aware.

When a section ends, in theory it checks if any final renders were requested and it blocks, waiting for them:

CoroutineScope(KotterDispatchers.Render).launch { allRendersFinished.complete(Unit) }

However, if somehow due to concurrency timing we missed getting a render request in time? It seems like it must have happened somehow, and it must have been an exceedingly rare event.

For now, I probably just won't think about it :) But please let me know if it happens for you again! Meanwhile, I think this is something that would only ever effect tests, and not real users...

@bitspittle
Copy link
Contributor

I don't really understand this statement, I'm pretty confident in the library and it's releases 😲

Ah, it's just that I've been so focused on my issue for the last few days, I feel like I've lost perspective. I've been suuuuuper zoomed in on just the input code under test here that I haven't really done a great job verifying that I didn't break the user experience at higher level up. For example, I'm pretty sure I didn't this time, but maybe I could have introduced massive startup performance issues or stuttering after making such a fix.

So yeah, was just looking for a sanity check / second opinion. Sounds like I got it! Thank you for the confidence.

@phanlezz
Copy link

I ran the modified autorun.sh with --no-deamon to around 600 iterations. I think it is safe to say, that Kotter runs 100% stable in isolation. Maybe I can modify the gradle task in such a way that the dialog test are run in sequence after the other tests are concluded. Of course only if problems occur on GitHub

@bitspittle
Copy link
Contributor

1.2.1 is released!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants