Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service retry logic incorrectly swallows errors and retries when it shouldn't #604

Open
tlake opened this issue Apr 13, 2018 · 1 comment
Assignees
Labels
Milestone

Comments

@tlake
Copy link
Contributor

tlake commented Apr 13, 2018

@the logs below illustrate a failing smoketest due to being run in the wrong region, but they also highlight a more important problem: the retry logic (for services, at least) incorrectly swallows an AWS UnsupportedFeatureException error and proceeds with retry logic when it should instead be failing immediately and returning that error. In other words, we only avoid retry logic for a whitelist of errors, when we should only be entering retry logic for a whitelist of errors. Failing immediately should be the default, not retrying.

Logs from service smoketests:

bats service.bats
 ✗ create
   (in test file service.bats, line 10)
     `l0 service create env_name svc_name_stateless1 dpl_name_stateless:latest' failed
   ENVIRONMENT ID  ENVIRONMENT NAME  OS     LINKS
   envname360ae    env_name          linux  
   DEPLOY ID     DEPLOY NAME         VERSION  COMPATIBILITIES
   dplname54de5  dpl_name_stateful1  2        stateful
   DEPLOY ID     DEPLOY NAME         VERSION  COMPATIBILITIES
   dplname10bf4  dpl_name_stateful2  2        stateful
   DEPLOY ID     DEPLOY NAME         VERSION  COMPATIBILITIES
   dplnamef885d  dpl_name_stateless  12       stateful
                                              stateless
   LOADBALANCER ID  LOADBALANCER NAME  TYPE         ENVIRONMENT  SERVICE  PORTS       PUBLIC  URL
   lbnamea28010     lb_name_alb        application  env_name              80:80/HTTP  true    l0-jlprefactortest-lbnamea28010-1930624962.us-west-2.elb.amazonaws.com
   LOADBALANCER ID  LOADBALANCER NAME  TYPE     ENVIRONMENT  SERVICE  PORTS       PUBLIC  URL
   lbnamecdaff3     lb_name_clb        classic  env_name              80:80/HTTP  true    l0-jlprefactortest-lbnamecdaff3-1410965074.us-west-2.elb.amazonaws.com
   2018/04/13 09:58:02 Maximum retry attempts reached

Logs from layer0 API:

2018/04/13 09:57:59 POST /service 
2018/04/13 09:58:00 [DEBUG] an api error occured: ServerError (code='EventualConsistencyError') UnsupportedFeatureException: FARGATE is not supported in this region.
    status code: 400, request id: cdfcd78e-3f3b-11e8-99b2-7f217f1e677c
2018/04/13 09:58:00 POST /service 
2018/04/13 09:58:01 [DEBUG] an api error occured: ServerError (code='EventualConsistencyError') UnsupportedFeatureException: FARGATE is not supported in this region.
    status code: 400, request id: cea9e083-3f3b-11e8-99b2-7f217f1e677c
2018/04/13 09:58:01 POST /service 
2018/04/13 09:58:02 [DEBUG] an api error occured: ServerError (code='EventualConsistencyError') UnsupportedFeatureException: FARGATE is not supported in this region.
    status code: 400, request id: cf58700a-3f3b-11e8-99b2-7f217f1e677c
@tlake tlake added the bug label Apr 13, 2018
@tlake tlake added this to the v0.11.0 milestone Apr 13, 2018
@tlake tlake self-assigned this Apr 13, 2018
@zpatrick
Copy link
Contributor

From Slack

  • because of the nature of the retry, the "unable to assume role" error, for example, never gets returned to the outer retry.Retry

  • another issue is maybe in RetryWithTimeout, since that creates its own error to stop the loop, and wouldn't return the AWS err

  • One idea is to make it the caller’s responsibility to track errors:

var err error 
fn := func() bool {
    err = foo()
    if err, ok := err.(*awserr.Error); ok && err.Code == "..." {
        return true
    }

    return false
}

retry(fn)
return err

that could work because it should retry again and err = foo() should make err = nil if foo() succeeds. If it retries or times out, err keeps the last error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants