Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tink Server] panic: runtime error: index out of range #559

Closed
jacobweinstock opened this issue Nov 20, 2021 · 1 comment · Fixed by #576
Closed

[Tink Server] panic: runtime error: index out of range #559

jacobweinstock opened this issue Nov 20, 2021 · 1 comment · Fixed by #576
Labels
area/tink-server kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@jacobweinstock
Copy link
Member

jacobweinstock commented Nov 20, 2021

If the ReportActionStatus RPC is called with the following constraints the Tink server will panic with panic: runtime error: index out of range.

  • The workflow has reached its final action (status of that action does not matter).
  • ReportActionStatus, for the same workflowID and workerID, is called with an ActionStatus of pb.State_STATE_RUNNING
    &pb.WorkflowActionStatus{
      	WorkflowId:   workflowID,
      	TaskName:     "doesnt matter",
      	ActionName:   "doesnt matter either",
      	ActionStatus: pb.State_STATE_RUNNING, // This is the key
      	Seconds:      0,
      	Message:      "",
      	CreatedAt:    &timestamppb.Timestamp{},
      	WorkerId:     workerID,
      }

Tink server will panic with

panic: runtime error: index out of range [6] with length 6

goroutine 215 [running]:
github.com/tinkerbell/tink/grpc-server.(*server).ReportActionStatus(0xc0000b2000, 0xeddb80, 0xc0002bb710, 0xc0004226c0, 0xc0000b2000, 0x0, 0x0)
	/home/runner/work/tink/tink/grpc-server/tinkerbell.go:114 +0x100f
github.com/tinkerbell/tink/protos/workflow._WorkflowService_ReportActionStatus_Handler.func1(0xeddb80, 0xc0002bb710, 0xd26d40, 0xc0004226c0, 0xd7e6af, 0x8, 0x0, 0x0)
	/home/runner/work/tink/tink/protos/workflow/workflow.pb.go:2299 +0x86
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1(0xeddb80, 0xc0002bb710, 0xd26d40, 0xc0004226c0, 0xc0004b5940, 0xc0004b5960, 0x0, 0x0, 0x0, 0x0)
	/home/runner/go/pkg/mod/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:327 +0x649
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0xeddb80, 0xc0002bb5c0, 0xd26d40, 0xc0004226c0, 0x4e, 0xc00041ed70, 0x203000, 0xc0003b9500)
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x63
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1(0xeddb80, 0xc0002bb5c0, 0xd26d40, 0xc0004226c0, 0xc0004b5940, 0xc0004b5980, 0x98136a, 0xce9ea0, 0xc0004b59a0, 0xc0004b5940)
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107 +0xad
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0xeddb80, 0xc0002bb5c0, 0xd26d40, 0xc0004226c0, 0xc0000338c0, 0x0, 0xc000621b30, 0x40cb48)
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x63
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1(0xeddb80, 0xc0002bb5c0, 0xd26d40, 0xc0004226c0, 0xc0004b5940, 0xc0004b5960, 0xc000621ba0, 0x48f618, 0xd07dc0, 0xc0002bb5c0)
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0xd5
github.com/tinkerbell/tink/protos/workflow._WorkflowService_ReportActionStatus_Handler(0xd5fa20, 0xc0000b2000, 0xeddb80, 0xc0002bb5c0, 0xc0003c9ec0, 0xc0000a62a0, 0xeddb80, 0xc0002bb5c0, 0xc000066690, 0x6b)
	/home/runner/work/tink/tink/protos/workflow/workflow.pb.go:2301 +0x14b
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0000c41c0, 0xee83c0, 0xc000432600, 0xc0000338c0, 0xc0000a67e0, 0x14da3b0, 0x0, 0x0, 0x0)
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1292 +0x50a
google.golang.org/grpc.(*Server).handleStream(0xc0000c41c0, 0xee83c0, 0xc000432600, 0xc0000338c0, 0x0)
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1617 +0xcfd
google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0005147f0, 0xc0000c41c0, 0xee83c0, 0xc000432600, 0xc0000338c0)
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:940 +0xa1
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:938 +0x204

Expected Behaviour

Current Behaviour

Possible Solution

The offending code is here.
One possible solution is adding something like the following after line 108,

if actionIndex == wfContext.TotalNumberOfActions-1 {
    return nil, status.Errorf(codes.FailedPrecondition, "already at the end of the workflow")
}

Steps to Reproduce (for bugs)

  1. Follow the Sandbox with Vagrant and Virtualbox quickstart
  2. Get the worker ID to be used to reproduce, should be the only one in there. tink hardware list -q
  3. Get the workflow ID to be used to reproduce, should be the only one in there. tink workflow list -q
  4. Get the TLS cert from Tink curl localhost:42114/cert
  5. Make the call to ReportActionStatus
package main

import (
	"context"
	"crypto/x509"

	"github.com/tinkerbell/tink/protos/workflow"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials"
	"google.golang.org/protobuf/types/known/timestamppb"
)

func main() {
	ctx := context.Background()
	gc, err := grpcClient(ctx)
	if err != nil {
		panic(err)
	}
	workflowClient := workflow.NewWorkflowServiceClient(gc)
	_, err = workflowClient.ReportActionStatus(ctx, &workflow.WorkflowActionStatus{
		WorkflowId:   "<workflow ID from step 3>",
		TaskName:     "doesnt matter",
		ActionName:   "also doesnt matter",
		ActionStatus: workflow.State_STATE_RUNNING,
		Seconds:      0,
		Message:      "",
		CreatedAt:    &timestamppb.Timestamp{},
		WorkerId:     "<worker ID from step 2>",
	})
	if err != nil {
		panic(err)
	}
}

// toCreds takes a byte string, assumed to be a tls cert, and creates a transport credential.
func toCreds(pemCerts []byte) credentials.TransportCredentials {
	cp := x509.NewCertPool()
	ok := cp.AppendCertsFromPEM(pemCerts)
	if !ok {
		return nil
	}
	return credentials.NewClientTLSFromCert(cp, "")
}

func grpcClient(ctx context.Context) (*grpc.ClientConn, error) {
	var cert = []byte(`<insert cert from curl command here>`)
	return grpc.DialContext(ctx, "localhost:42113", grpc.WithTransportCredentials(toCreds(cert)))
}
  1. Observe the panic from the Tink server logs docker logs -f compose_tink-server_1

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    MacOS/Linux

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Sandbox with Vagrant and Virtualbox at commit e3f8d2809997583d164d595ce81b68b081c4a502
    Tink server at version quay.io/tinkerbell/tink:sha-3743d31e.

  • Link to your project or a code example to reproduce issue:

@jacobweinstock jacobweinstock added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 20, 2021
@displague
Copy link
Member

Created #576 with the suggested fix from @jacobweinstock

mergify bot added a commit that referenced this issue Jul 6, 2022
## Description


This allows the final action in a workflow to successfully report its status and consequently allow a workflow to complete.
This issue: #559 incorrectly describes a fix that was applied in this PR: #576

## Why is this needed



Fixes: tinkerbell/playground#143

## How Has This Been Tested?



Unit tests were added and the sandbox (vagrant virtualbox quickstart) was used to manually test.

## How are existing users impacted? What migration steps/scripts do we need?





## Checklist:

I have:

- [ ] updated the documentation and/or roadmap (if required)
- [x] added unit or e2e tests
- [ ] provided instructions on how to upgrade
@displague displague added this to the 0.7.0 milestone Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tink-server kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants