Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on zero minimum workers #33

Closed
sermojohn opened this issue Aug 25, 2022 · 3 comments
Closed

Deadlock on zero minimum workers #33

sermojohn opened this issue Aug 25, 2022 · 3 comments
Labels
good first issue Good for newcomers

Comments

@sermojohn
Copy link

I experienced a deadlock when using pond in the following way (https://go.dev/play/p/eJLX1vc3C81).

workerpool := pond.New(10, 10, pond.Strategy(pond.Eager()))

batchWg := &sync.WaitGroup{}
batch := []string{"message1", "message2"}

for _, _ = range batch {
	batchWg.Add(1)
	workerpool.Submit(func() {
		batchWg.Done()
	})
}

batchWg.Wait()

Minimum workers defaults to 0 and purge can stop idle workers IdleWorkers() > 0.
At the same time, workerpool.Submit can add a task but not start a worker because IdleWorkers() > 0.
If the purger managed to signal the worker to stop before the newly submitted job is consumed by the worker, the workerpool ends up with a non-empty task channel and no workers to process the tasks.

Possible solutions:

  1. Increase minimum workers >= 1. This way there will always be an available worker to process pending tasks.
  2. Improve synchronisation between purge and maybeStartWorker to avoid such cases.
@alitto alitto added the good first issue Good for newcomers label Aug 26, 2022
@alitto
Copy link
Owner

alitto commented Aug 26, 2022

Hey @sermojohn!
I think you are onto something here, this purge function presented the issue you describe in earlier versions of the lib, but after running your snippet I wasn't able to reproduce the deadlock. I'll take a closer look over the weekend, but in the meantime, if you could help me come up with a sample that enters a deadlock in every run that would greatly help me debug this issue.
I made some modifications to your snippet to log how many tasks are pending after the program completes, how many idle/running workers are there before submitting another task, and also reduced the IdleTimeout to 1 microsecond to ensure the purge function runs more often (to force the deadlock). Here's the modified version: https://go.dev/play/p/hw1iGibUVDb

Thanks for opening this issue! 🙂

@alitto
Copy link
Owner

alitto commented Aug 28, 2022

Hey @sermojohn,

I was able to reproduce this issue within a unit test and found a solution for this scenario by ensuring maybeStartWorker is synchronised with purge, as you pointed out.
This is the PR with the fix: #34 and I just released version v1.8.1, which includes it.
Please check it out if you have some time and let me know how that goes 🙂

Thanks!

@sermojohn
Copy link
Author

sermojohn commented Aug 30, 2022

@alitto, thank you for your prompt fix! This library is proved to be the right tool for the job. Feel free to close this issue, it looks fixed from my perspective.

I hope to manage to contribute back to this library in the future!

@alitto alitto closed this as completed Aug 30, 2022
estahn pushed a commit to estahn/k8s-image-swapper that referenced this issue Sep 7, 2022
## [1.3.0](v1.2.3...v1.3.0) (2022-09-07)

### 🎉 Features

* cross account caching with role ([#336](#336)) ([98d138e](98d138e))

### ⬆️ Dependencies

* **deps:** bump actions/cache from 3.0.6 to 3.0.8 ([#319](#319)) ([245ab30](245ab30)), closes [#809](#809) [#833](#833) [#810](#810) [#888](https://github.com/estahn/k8s-image-swapper/issues/888) [#891](https://github.com/estahn/k8s-image-swapper/issues/891) [#899](https://github.com/estahn/k8s-image-swapper/issues/899) [#894](https://github.com/estahn/k8s-image-swapper/issues/894)
* **deps:** bump alpine from 3.16.1 to 3.16.2 ([da05fdd](da05fdd))
* **deps:** bump github.com/alitto/pond from 1.8.0 to 1.8.1 ([#342](#342)) ([4e50c28](4e50c28)), closes [alitto/pond#33](alitto/pond#33) [#34](#34) [#32](#32)
* **deps:** bump github.com/aws/aws-sdk-go from 1.44.70 to 1.44.92 ([0f396c5](0f396c5))
* **deps:** bump github.com/aws/aws-sdk-go from 1.44.70 to 1.44.92 ([#338](#338)) ([fa795ae](fa795ae)), closes [#4548](https://github.com/estahn/k8s-image-swapper/issues/4548) [#4546](https://github.com/estahn/k8s-image-swapper/issues/4546) [#4545](https://github.com/estahn/k8s-image-swapper/issues/4545) [#4544](https://github.com/estahn/k8s-image-swapper/issues/4544) [#4543](https://github.com/estahn/k8s-image-swapper/issues/4543) [#4542](https://github.com/estahn/k8s-image-swapper/issues/4542) [#4539](https://github.com/estahn/k8s-image-swapper/issues/4539) [#4536](https://github.com/estahn/k8s-image-swapper/issues/4536) [#4534](https://github.com/estahn/k8s-image-swapper/issues/4534) [#4533](https://github.com/estahn/k8s-image-swapper/issues/4533)
* **deps:** bump github.com/go-co-op/gocron from 1.16.2 to 1.17.0 ([#340](#340)) ([645bef3](645bef3)), closes [go-co-op/gocron#380](go-co-op/gocron#380) [go-co-op/gocron#381](go-co-op/gocron#381) [go-co-op/gocron#375](go-co-op/gocron#375) [#381](#381) [#380](#380) [#375](#375)
* **deps:** bump github.com/gruntwork-io/terratest from 0.40.19 to 0.40.21 ([#334](#334)) ([d0f6c39](d0f6c39)), closes [#1166](https://github.com/estahn/k8s-image-swapper/issues/1166) [#1159](https://github.com/estahn/k8s-image-swapper/issues/1159)
* **deps:** bump github.com/rs/zerolog from 1.27.0 to 1.28.0 ([#339](#339)) ([7fb4ff5](7fb4ff5)), closes [#457](#457) [#416](#416) [#454](#454) [#453](#453) [#383](#383) [#396](#396) [#414](#414) [#415](#415) [#430](#430) [#432](#432)
* **deps:** bump github.com/spf13/viper from 1.12.0 to 1.13.0 ([#341](#341)) ([9b59bd4](9b59bd4)), closes [spf13/viper#1371](spf13/viper#1371) [spf13/viper#1373](spf13/viper#1373) [spf13/viper#1393](spf13/viper#1393) [spf13/viper#1424](spf13/viper#1424) [spf13/viper#1405](spf13/viper#1405) [spf13/viper#1414](spf13/viper#1414) [spf13/viper#1387](spf13/viper#1387) [spf13/viper#1374](spf13/viper#1374) [spf13/viper#1375](spf13/viper#1375) [spf13/viper#1378](spf13/viper#1378) [spf13/viper#1360](spf13/viper#1360) [spf13/viper#1381](spf13/viper#1381) [spf13/viper#1384](spf13/viper#1384) [spf13/viper#1383](spf13/viper#1383) [spf13/viper#1395](spf13/viper#1395) [spf13/viper#1420](spf13/viper#1420) [spf13/viper#1422](spf13/viper#1422) [spf13/viper#1412](spf13/viper#1412) [spf13/viper#1373](spf13/viper#1373) [spf13/viper#1393](spf13/viper#1393) [spf13/viper#1371](spf13/viper#1371) [spf13/viper#1387](spf13/viper#1387) [spf13/viper#1405](spf13/viper#1405) [spf13/viper#1414](spf13/viper#1414)
* **deps:** bump goreleaser/goreleaser-action from 3.0.0 to 3.1.0 ([#328](#328)) ([a8d2dd1](a8d2dd1)), closes [#369](#369) [#357](#357) [#356](#356) [#360](#360) [#359](#359) [#358](#358) [#367](#367) [#369](#369) [#367](#367) [#358](#358) [#359](#359) [#360](#360) [#357](#357) [#356](#356)
* **deps:** bump k8s.io/api from 0.24.3 to 0.25.0 ([#325](#325)) ([ce10907](ce10907)), closes [#111657](https://github.com/estahn/k8s-image-swapper/issues/111657) [#109090](https://github.com/estahn/k8s-image-swapper/issues/109090) [#111258](https://github.com/estahn/k8s-image-swapper/issues/111258) [#111113](https://github.com/estahn/k8s-image-swapper/issues/111113) [#111696](https://github.com/estahn/k8s-image-swapper/issues/111696) [#108692](https://github.com/estahn/k8s-image-swapper/issues/108692)
* **deps:** bump k8s.io/client-go from 0.24.3 to 0.25.0 ([#324](#324)) ([f7c889f](f7c889f))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants