-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: tls error: bad record MAC #1581
Comments
Sorry, just trying to better understand what you're seeing here: are you using |
cc @odeke-em for http / tls expertise. |
@jadekler it’s Go team using GCS not me but the code is here |
Thank you for filing this issue @mengzhuo and welcome to the Google Cloud Go project! @mengzhuo, how often can you reproduce this problem? Is it just a flake or does it happen consistently? I ask because that error indicates a connection authentication error where the payload sent from the server doesn't match expected values. Google's servers seem to use TLSv1.2 as advertised in https://cloud.google.com/load-balancing/docs/ssl-policies-concepts and also if you examine the connection when we do a server hello to googleapis.com:443 but also from https://cloud.google.com/security/encryption-in-transit/#footnote4 which notes that TLS 1.3 is being experimentally rolled out just for Gmail, while Go's TLS now seems to be using TLS 1.3. Did this error just only happen recently? If you are directly using Go1.13 and not tip before @FiloSottile's "TLS always on commit" in golang/go@a8c2e5c#diff-0977a3b43ae53839ce25d758f24138f2, package main
import (
"crypto/tls"
"log"
)
func main() {
conn, err := tls.Dial("tcp", "googleapis.com:443", &tls.Config{
// TODO: please properly load your TLSClient configuration here.
// Just using InsecureSkipVerify for easy testing.
InsecureSkipVerify: true,
})
if err != nil {
log.Fatalf("Failed to make request: %v", err)
}
defer conn.Close()
// Write something to the connection.
conn.Write([]byte("POST / HTTP/1.1\r\nPath: /upload/\r\n\r\n"))
} If this issue can be consistently reproduced, please try explicitly setting the client's TLS version to 1.2 like this package main
import (
"crypto/tls"
"log"
)
func main() {
conn, err := tls.Dial("tcp", "googleapis.com:443", &tls.Config{
// TODO: please properly load your TLSClient configuration here.
// Just using InsecureSkipVerify for easy testing.
InsecureSkipVerify: true,
MaxVersion: tls.VersionTLS12,
})
if err != nil {
log.Fatalf("Failed to make request: %v", err)
}
defer conn.Close()
// Write something to the connection.
conn.Write([]byte("POST / HTTP/1.1\r\nPath: /upload/\r\n\r\n"))
} and if that produces a difference, then we can know that this is an issue with Go's TLS 1.3 implementation. I'll also kindly ping @agl since this involves an interaction with Google's TLS. |
@mengzhuo actually since this involves running lots of code on your builder, what you can do to quickly verify if the problem is with the Go's TLS v1.3 vs Google TLS v1.2 interaction, you can apply this patch to your builder code in cmd/coordinator/gce.go diff --git a/cmd/coordinator/gce.go b/cmd/coordinator/gce.go
index e4e702d..809c647 100644
--- a/cmd/coordinator/gce.go
+++ b/cmd/coordinator/gce.go
@@ -12,6 +12,7 @@ package main
import (
"context"
+ "crypto/tls"
"encoding/json"
"errors"
"fmt"
@@ -44,6 +45,7 @@ import (
"golang.org/x/oauth2/google"
compute "google.golang.org/api/compute/v1"
"google.golang.org/api/googleapi"
+ "google.golang.org/api/option"
)
func init() {
@@ -137,20 +139,34 @@ func initGCE() error {
cfgDump, _ := json.MarshalIndent(buildEnv, "", " ")
log.Printf("Loaded configuration %q for project %q:\n%s", *buildEnvName, buildEnv.ProjectName, cfgDump)
+ opts := []option.ClientOption{
+ // Force TLS 1.2 in the HTTP client because of issues:
+ // * https://github.com/golang/go/issues/31217
+ // * https://github.com/googleapis/google-cloud-go/issues/1581
+ // in which there might be a bad interaction with Go's TLS v1.3 and Google's TLS v1.2.
+ option.WithHTTPClient(&http.Client{
+ Transport: &http.Transport{
+ TLSClientConfig: &tls.Config{
+ MaxVersion: tls.VersionTLS12,
+ },
+ },
+ }),
+ }
+
ctx := context.Background()
if *mode != "dev" {
- storageClient, err = storage.NewClient(ctx)
+ storageClient, err = storage.NewClient(ctx, opts...)
if err != nil {
log.Fatalf("storage.NewClient: %v", err)
}
- metricsClient, err = monapi.NewMetricClient(ctx)
+ metricsClient, err = monapi.NewMetricClient(ctx, opts...)
if err != nil {
log.Fatalf("monapi.NewMetricClient: %v", err)
}
}
- dsClient, err = datastore.NewClient(ctx, buildEnv.ProjectName)
+ dsClient, err = datastore.NewClient(ctx, buildEnv.ProjectName, opts...)
if err != nil {
if *mode == "dev" {
log.Printf("Error creating datastore client: %v", err) or for the full file which is big, please see this section: // Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build go1.13
// +build linux darwin
// Code interacting with Google Compute Engine (GCE) and
// a GCE implementation of the BuildletPool interface.
package main
import (
"context"
"crypto/tls"
"encoding/json"
"errors"
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
"os"
"path"
"sort"
"strconv"
"strings"
"sync"
"time"
"cloud.google.com/go/compute/metadata"
"cloud.google.com/go/datastore"
"cloud.google.com/go/errorreporting"
monapi "cloud.google.com/go/monitoring/apiv3"
"cloud.google.com/go/storage"
"golang.org/x/build/buildenv"
"golang.org/x/build/buildlet"
"golang.org/x/build/cmd/coordinator/spanlog"
"golang.org/x/build/dashboard"
"golang.org/x/build/gerrit"
"golang.org/x/build/internal/buildgo"
"golang.org/x/build/internal/buildstats"
"golang.org/x/build/internal/lru"
"golang.org/x/oauth2"
"golang.org/x/oauth2/google"
compute "google.golang.org/api/compute/v1"
"google.golang.org/api/googleapi"
"google.golang.org/api/option"
)
func init() {
buildlet.GCEGate = gceAPIGate
}
// apiCallTicker ticks regularly, preventing us from accidentally making
// GCE API calls too quickly. Our quota is 20 QPS, but we temporarily
// limit ourselves to less than that.
var apiCallTicker = time.NewTicker(time.Second / 10)
func gceAPIGate() {
<-apiCallTicker.C
}
// Initialized by initGCE:
var (
buildEnv *buildenv.Environment
dsClient *datastore.Client
computeService *compute.Service
gcpCreds *google.Credentials
errTryDeps error // non-nil if try bots are disabled
gerritClient *gerrit.Client
storageClient *storage.Client
metricsClient *monapi.MetricClient
inStaging bool // are we running in the staging project? (named -dev)
errorsClient *errorreporting.Client // Stackdriver errors client
gkeNodeIP string
initGCECalled bool
)
// oAuthHTTPClient is the OAuth2 HTTP client used to make API calls to Google Cloud APIs.
// It is initialized by initGCE.
var oAuthHTTPClient *http.Client
func initGCE() error {
initGCECalled = true
var err error
// If the coordinator is running on a GCE instance and a
// buildEnv was not specified with the env flag, set the
// buildEnvName to the project ID
if *buildEnvName == "" {
if *mode == "dev" {
*buildEnvName = "dev"
} else if metadata.OnGCE() {
*buildEnvName, err = metadata.ProjectID()
if err != nil {
log.Fatalf("metadata.ProjectID: %v", err)
}
}
}
buildEnv = buildenv.ByProjectID(*buildEnvName)
inStaging = (buildEnv == buildenv.Staging)
// If running on GCE, override the zone and static IP, and check service account permissions.
if metadata.OnGCE() {
projectZone, err := metadata.Get("instance/zone")
if err != nil || projectZone == "" {
return fmt.Errorf("failed to get current GCE zone: %v", err)
}
gkeNodeIP, err = metadata.Get("instance/network-interfaces/0/ip")
if err != nil {
return fmt.Errorf("failed to get current instance IP: %v", err)
}
// Convert the zone from "projects/1234/zones/us-central1-a" to "us-central1-a".
projectZone = path.Base(projectZone)
buildEnv.Zone = projectZone
if buildEnv.StaticIP == "" {
buildEnv.StaticIP, err = metadata.ExternalIP()
if err != nil {
return fmt.Errorf("ExternalIP: %v", err)
}
}
if !hasComputeScope() {
return errors.New("coordinator is not running with access to read and write Compute resources. VM support disabled")
}
if value, err := metadata.ProjectAttributeValue("farmer-run-bench"); err == nil {
*shouldRunBench, _ = strconv.ParseBool(value)
}
}
cfgDump, _ := json.MarshalIndent(buildEnv, "", " ")
log.Printf("Loaded configuration %q for project %q:\n%s", *buildEnvName, buildEnv.ProjectName, cfgDump)
opts := []option.ClientOption{
// Force TLS 1.2 in the HTTP client because of issues:
// * https://github.com/golang/go/issues/31217
// * https://github.com/googleapis/google-cloud-go/issues/1581
// in which there might be a bad interaction with Go's TLS v1.3 and Google's TLS v1.2.
option.WithHTTPClient(&http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{
MaxVersion: tls.VersionTLS12,
},
},
}),
}
ctx := context.Background()
if *mode != "dev" {
storageClient, err = storage.NewClient(ctx, opts...)
if err != nil {
log.Fatalf("storage.NewClient: %v", err)
}
metricsClient, err = monapi.NewMetricClient(ctx, opts...)
if err != nil {
log.Fatalf("monapi.NewMetricClient: %v", err)
}
}
dsClient, err = datastore.NewClient(ctx, buildEnv.ProjectName, opts...)
if err != nil {
if *mode == "dev" {
log.Printf("Error creating datastore client: %v", err)
} else {
log.Fatalf("Error creating datastore client: %v", err)
}
}
// don't send dev errors to Stackdriver.
if *mode != "dev" {
errorsClient, err = errorreporting.NewClient(ctx, buildEnv.ProjectName, errorreporting.Config{
ServiceName: "coordinator",
})
if err != nil {
// don't exit, we still want to run coordinator
log.Printf("Error creating errors client: %v", err)
}
}
gcpCreds, err = buildEnv.Credentials(ctx)
if err != nil {
if *mode == "dev" {
// don't try to do anything else with GCE, as it will likely fail
return nil
}
log.Fatalf("failed to get a token source: %v", err)
}
oAuthHTTPClient = oauth2.NewClient(ctx, gcpCreds.TokenSource)
computeService, _ = compute.New(oAuthHTTPClient)
errTryDeps = checkTryBuildDeps()
if errTryDeps != nil {
log.Printf("TryBot builders disabled due to error: %v", errTryDeps)
} else {
log.Printf("TryBot builders enabled.")
}
if *mode != "dev" {
go syncBuildStatsLoop(buildEnv)
}
go gcePool.pollQuotaLoop()
go createBasepinDisks(context.Background())
return nil
}
func checkTryBuildDeps() error {
if !hasStorageScope() {
return errors.New("coordinator's GCE instance lacks the storage service scope")
}
if *mode == "dev" {
return errors.New("running in dev mode")
}
wr := storageClient.Bucket(buildEnv.LogBucket).Object("hello.txt").NewWriter(context.Background())
fmt.Fprintf(wr, "Hello, world! Coordinator start-up at %v", time.Now())
if err := wr.Close(); err != nil {
return fmt.Errorf("test write of a GCS object to bucket %q failed: %v", buildEnv.LogBucket, err)
}
if inStaging {
// Don't expect to write to Gerrit in staging mode.
gerritClient = gerrit.NewClient("https://go-review.googlesource.com", gerrit.NoAuth)
} else {
gobotPass, err := metadata.ProjectAttributeValue("gobot-password")
if err != nil {
return fmt.Errorf("failed to get project metadata 'gobot-password': %v", err)
}
gerritClient = gerrit.NewClient("https://go-review.googlesource.com",
gerrit.BasicAuth("git-gobot.golang.org", strings.TrimSpace(string(gobotPass))))
}
return nil
}
var gcePool = &gceBuildletPool{}
var _ BuildletPool = (*gceBuildletPool)(nil)
// maxInstances is a temporary hack because we can't get buildlets to boot
// without IPs, and we only have 200 IP addresses.
// TODO(bradfitz): remove this once fixed.
const maxInstances = 190
type gceBuildletPool struct {
mu sync.Mutex // guards all following
disabled bool
// CPU quota usage & limits.
cpuLeft int // dead-reckoning CPUs remain
instLeft int // dead-reckoning instances remain
instUsage int
cpuUsage int
addrUsage int
inst map[string]time.Time // GCE VM instance name -> creationTime
}
func (p *gceBuildletPool) pollQuotaLoop() {
if computeService == nil {
log.Printf("pollQuotaLoop: no GCE access; not checking quota.")
return
}
if buildEnv.ProjectName == "" {
log.Printf("pollQuotaLoop: no GCE project name configured; not checking quota.")
return
}
for {
p.pollQuota()
time.Sleep(5 * time.Second)
}
}
func (p *gceBuildletPool) pollQuota() {
gceAPIGate()
reg, err := computeService.Regions.Get(buildEnv.ProjectName, buildEnv.Region()).Do()
if err != nil {
log.Printf("Failed to get quota for %s/%s: %v", buildEnv.ProjectName, buildEnv.Region(), err)
return
}
p.mu.Lock()
defer p.mu.Unlock()
for _, quota := range reg.Quotas {
switch quota.Metric {
case "CPUS":
p.cpuLeft = int(quota.Limit) - int(quota.Usage)
p.cpuUsage = int(quota.Usage)
case "INSTANCES":
p.instLeft = int(quota.Limit) - int(quota.Usage)
p.instUsage = int(quota.Usage)
case "IN_USE_ADDRESSES":
p.addrUsage = int(quota.Usage)
}
}
}
func (p *gceBuildletPool) SetEnabled(enabled bool) {
p.mu.Lock()
defer p.mu.Unlock()
p.disabled = !enabled
}
func (p *gceBuildletPool) GetBuildlet(ctx context.Context, hostType string, lg logger) (bc *buildlet.Client, err error) {
hconf, ok := dashboard.Hosts[hostType]
if !ok {
return nil, fmt.Errorf("gcepool: unknown host type %q", hostType)
}
qsp := lg.CreateSpan("awaiting_gce_quota")
err = p.awaitVMCountQuota(ctx, hconf.GCENumCPU())
qsp.Done(err)
if err != nil {
return nil, err
}
deleteIn, ok := ctx.Value(buildletTimeoutOpt{}).(time.Duration)
if !ok {
deleteIn = vmDeleteTimeout
}
instName := "buildlet-" + strings.TrimPrefix(hostType, "host-") + "-rn" + randHex(7)
instName = strings.Replace(instName, "_", "-", -1) // Issue 22905; can't use underscores in GCE VMs
p.setInstanceUsed(instName, true)
gceBuildletSpan := lg.CreateSpan("create_gce_buildlet", instName)
defer func() { gceBuildletSpan.Done(err) }()
var (
needDelete bool
createSpan = lg.CreateSpan("create_gce_instance", instName)
waitBuildlet spanlog.Span // made after create is done
curSpan = createSpan // either instSpan or waitBuildlet
)
log.Printf("Creating GCE VM %q for %s", instName, hostType)
bc, err = buildlet.StartNewVM(gcpCreds, buildEnv, instName, hostType, buildlet.VMOpts{
DeleteIn: deleteIn,
OnInstanceRequested: func() {
log.Printf("GCE VM %q now booting", instName)
},
OnInstanceCreated: func() {
needDelete = true
createSpan.Done(nil)
waitBuildlet = lg.CreateSpan("wait_buildlet_start", instName)
curSpan = waitBuildlet
},
OnGotInstanceInfo: func() {
lg.LogEventTime("got_instance_info", "waiting_for_buildlet...")
},
})
if err != nil {
curSpan.Done(err)
log.Printf("Failed to create VM for %s: %v", hostType, err)
if needDelete {
deleteVM(buildEnv.Zone, instName)
p.putVMCountQuota(hconf.GCENumCPU())
}
p.setInstanceUsed(instName, false)
return nil, err
}
waitBuildlet.Done(nil)
bc.SetDescription("GCE VM: " + instName)
bc.SetOnHeartbeatFailure(func() {
p.putBuildlet(bc, hostType, instName)
})
return bc, nil
}
func (p *gceBuildletPool) putBuildlet(bc *buildlet.Client, hostType, instName string) error {
// TODO(bradfitz): add the buildlet to a freelist (of max N
// items) for up to 10 minutes since when it got started if
// it's never seen a command execution failure, and we can
// wipe all its disk content? (perhaps wipe its disk content
// when it's retrieved, like the reverse buildlet pool) But
// this will require re-introducing a distinction in the
// buildlet client library between Close, Destroy/Halt, and
// tracking execution errors. That was all half-baked before
// and thus removed. Now Close always destroys everything.
deleteVM(buildEnv.Zone, instName)
p.setInstanceUsed(instName, false)
hconf, ok := dashboard.Hosts[hostType]
if !ok {
panic("failed to lookup conf") // should've worked if we did it before
}
p.putVMCountQuota(hconf.GCENumCPU())
return nil
}
func (p *gceBuildletPool) WriteHTMLStatus(w io.Writer) {
fmt.Fprintf(w, "<b>GCE pool</b> capacity: %s", p.capacityString())
const show = 6 // must be even
active := p.instancesActive()
if len(active) > 0 {
fmt.Fprintf(w, "<ul>")
for i, inst := range active {
if i < show/2 || i >= len(active)-(show/2) {
fmt.Fprintf(w, "<li>%v, %s</li>\n", inst.name, friendlyDuration(time.Since(inst.creation)))
} else if i == show/2 {
fmt.Fprintf(w, "<li>... %d of %d total omitted ...</li>\n", len(active)-show, len(active))
}
}
fmt.Fprintf(w, "</ul>")
}
}
func (p *gceBuildletPool) String() string {
return fmt.Sprintf("GCE pool capacity: %s", p.capacityString())
}
func (p *gceBuildletPool) capacityString() string {
p.mu.Lock()
defer p.mu.Unlock()
return fmt.Sprintf("%d/%d instances; %d/%d CPUs",
len(p.inst), p.instUsage+p.instLeft,
p.cpuUsage, p.cpuUsage+p.cpuLeft)
}
// awaitVMCountQuota waits for numCPU CPUs of quota to become available,
// or returns ctx.Err.
func (p *gceBuildletPool) awaitVMCountQuota(ctx context.Context, numCPU int) error {
// Poll every 2 seconds, which could be better, but works and
// is simple.
for {
if p.tryAllocateQuota(numCPU) {
return nil
}
select {
case <-time.After(2 * time.Second):
case <-ctx.Done():
return ctx.Err()
}
}
}
func (p *gceBuildletPool) HasCapacity(hostType string) bool {
hconf, ok := dashboard.Hosts[hostType]
if !ok {
return false
}
numCPU := hconf.GCENumCPU()
p.mu.Lock()
defer p.mu.Unlock()
return p.haveQuotaLocked(numCPU)
}
// haveQuotaLocked reports whether the current GCE quota permits
// starting numCPU more CPUs.
//
// precondition: p.mu must be held.
func (p *gceBuildletPool) haveQuotaLocked(numCPU int) bool {
return p.cpuLeft >= numCPU && p.instLeft >= 1 && len(p.inst) < maxInstances && p.addrUsage < maxInstances
}
func (p *gceBuildletPool) tryAllocateQuota(numCPU int) bool {
p.mu.Lock()
defer p.mu.Unlock()
if p.disabled {
return false
}
if p.haveQuotaLocked(numCPU) {
p.cpuUsage += numCPU
p.cpuLeft -= numCPU
p.instLeft--
p.addrUsage++
return true
}
return false
}
// putVMCountQuota adjusts the dead-reckoning of our quota usage by
// one instance and cpu CPUs.
func (p *gceBuildletPool) putVMCountQuota(cpu int) {
p.mu.Lock()
defer p.mu.Unlock()
p.cpuUsage -= cpu
p.cpuLeft += cpu
p.instLeft++
}
func (p *gceBuildletPool) setInstanceUsed(instName string, used bool) {
p.mu.Lock()
defer p.mu.Unlock()
if p.inst == nil {
p.inst = make(map[string]time.Time)
}
if used {
p.inst[instName] = time.Now()
} else {
delete(p.inst, instName)
}
}
func (p *gceBuildletPool) instanceUsed(instName string) bool {
p.mu.Lock()
defer p.mu.Unlock()
_, ok := p.inst[instName]
return ok
}
func (p *gceBuildletPool) instancesActive() (ret []resourceTime) {
p.mu.Lock()
defer p.mu.Unlock()
for name, create := range p.inst {
ret = append(ret, resourceTime{
name: name,
creation: create,
})
}
sort.Sort(byCreationTime(ret))
return ret
}
// resourceTime is a GCE instance or Kube pod name and its creation time.
type resourceTime struct {
name string
creation time.Time
}
type byCreationTime []resourceTime
func (s byCreationTime) Len() int { return len(s) }
func (s byCreationTime) Less(i, j int) bool { return s[i].creation.Before(s[j].creation) }
func (s byCreationTime) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
// cleanUpOldVMs loops forever and periodically enumerates virtual
// machines and deletes those which have expired.
//
// A VM is considered expired if it has a "delete-at" metadata
// attribute having a unix timestamp before the current time.
//
// This is the safety mechanism to delete VMs which stray from the
// normal deleting process. VMs are created to run a single build and
// should be shut down by a controlling process. Due to various types
// of failures, they might get stranded. To prevent them from getting
// stranded and wasting resources forever, we instead set the
// "delete-at" metadata attribute on them when created to some time
// that's well beyond their expected lifetime.
func (p *gceBuildletPool) cleanUpOldVMs() {
if *mode == "dev" {
return
}
if computeService == nil {
return
}
// TODO(bradfitz): remove this list and just query it from the compute API?
// http://godoc.org/google.golang.org/api/compute/v1#RegionsService.Get
// and Region.Zones: http://godoc.org/google.golang.org/api/compute/v1#Region
for {
for _, zone := range buildEnv.ZonesToClean {
if err := p.cleanZoneVMs(zone); err != nil {
log.Printf("Error cleaning VMs in zone %q: %v", zone, err)
}
}
time.Sleep(time.Minute)
}
}
// cleanZoneVMs is part of cleanUpOldVMs, operating on a single zone.
func (p *gceBuildletPool) cleanZoneVMs(zone string) error {
// Fetch the first 500 (default) running instances and clean
// thoes. We expect that we'll be running many fewer than
// that. Even if we have more, eventually the first 500 will
// either end or be cleaned, and then the next call will get a
// partially-different 500.
// TODO(bradfitz): revist this code if we ever start running
// thousands of VMs.
gceAPIGate()
list, err := computeService.Instances.List(buildEnv.ProjectName, zone).Do()
if err != nil {
return fmt.Errorf("listing instances: %v", err)
}
for _, inst := range list.Items {
if inst.Metadata == nil {
// Defensive. Not seen in practice.
continue
}
var sawDeleteAt bool
var deleteReason string
for _, it := range inst.Metadata.Items {
if it.Key == "delete-at" {
if it.Value == nil {
log.Printf("missing delete-at value; ignoring")
continue
}
unixDeadline, err := strconv.ParseInt(*it.Value, 10, 64)
if err != nil {
log.Printf("invalid delete-at value %q seen; ignoring", *it.Value)
continue
}
sawDeleteAt = true
if time.Now().Unix() > unixDeadline {
deleteReason = "delete-at expiration"
}
}
}
isBuildlet := strings.HasPrefix(inst.Name, "buildlet-")
if isBuildlet && !sawDeleteAt && !p.instanceUsed(inst.Name) {
createdAt, _ := time.Parse(time.RFC3339Nano, inst.CreationTimestamp)
if createdAt.Before(time.Now().Add(-3 * time.Hour)) {
deleteReason = fmt.Sprintf("no delete-at, created at %s", inst.CreationTimestamp)
}
}
// Delete buildlets (things we made) from previous
// generations. Only deleting things starting with "buildlet-"
// is a historical restriction, but still fine for paranoia.
if deleteReason == "" && sawDeleteAt && isBuildlet && !p.instanceUsed(inst.Name) {
if _, ok := deletedVMCache.Get(inst.Name); !ok {
deleteReason = "from earlier coordinator generation"
}
}
if deleteReason != "" {
log.Printf("deleting VM %q in zone %q; %s ...", inst.Name, zone, deleteReason)
deleteVM(zone, inst.Name)
}
}
return nil
}
var deletedVMCache = lru.New(100) // keyed by instName
// deleteVM starts a delete of an instance in a given zone.
//
// It either returns an operation name (if delete is pending) or the
// empty string if the instance didn't exist.
func deleteVM(zone, instName string) (operation string, err error) {
deletedVMCache.Add(instName, token{})
gceAPIGate()
op, err := computeService.Instances.Delete(buildEnv.ProjectName, zone, instName).Do()
apiErr, ok := err.(*googleapi.Error)
if ok {
if apiErr.Code == 404 {
return "", nil
}
}
if err != nil {
log.Printf("Failed to delete instance %q in zone %q: %v", instName, zone, err)
return "", err
}
log.Printf("Sent request to delete instance %q in zone %q. Operation ID, Name: %v, %v", instName, zone, op.Id, op.Name)
return op.Name, nil
}
func hasScope(want string) bool {
// If not on GCE, assume full access
if !metadata.OnGCE() {
return true
}
scopes, err := metadata.Scopes("default")
if err != nil {
log.Printf("failed to query metadata default scopes: %v", err)
return false
}
for _, v := range scopes {
if v == want {
return true
}
}
return false
}
func hasComputeScope() bool {
return hasScope(compute.ComputeScope) || hasScope(compute.CloudPlatformScope)
}
func hasStorageScope() bool {
return hasScope(storage.ScopeReadWrite) || hasScope(storage.ScopeFullControl) || hasScope(compute.CloudPlatformScope)
}
func readGCSFile(name string) ([]byte, error) {
if *mode == "dev" {
b, ok := testFiles[name]
if !ok {
return nil, &os.PathError{
Op: "open",
Path: name,
Err: os.ErrNotExist,
}
}
return []byte(b), nil
}
r, err := storageClient.Bucket(buildEnv.BuildletBucket).Object(name).NewReader(context.Background())
if err != nil {
return nil, err
}
defer r.Close()
return ioutil.ReadAll(r)
}
// syncBuildStatsLoop runs forever in its own goroutine and syncs the
// coordinator's datastore Build & Span entities to BigQuery
// periodically.
func syncBuildStatsLoop(env *buildenv.Environment) {
ticker := time.NewTicker(5 * time.Minute)
for {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
if err := buildstats.SyncBuilds(ctx, env); err != nil {
log.Printf("buildstats: SyncBuilds: %v", err)
}
if err := buildstats.SyncSpans(ctx, env); err != nil {
log.Printf("buildstats: SyncSpans: %v", err)
}
cancel()
<-ticker.C
}
}
// createBasepinDisks creates zone-local copies of VM disk images, to
// speed up VM creations in the future.
//
// Other than a list call, this a no-op unless new VM images were
// added or updated recently.
func createBasepinDisks(ctx context.Context) {
if !metadata.OnGCE() || (buildEnv != buildenv.Production && buildEnv != buildenv.Staging) {
return
}
for {
t0 := time.Now()
bgc, err := buildgo.NewClient(ctx, buildEnv)
if err != nil {
log.Printf("basepin: NewClient: %v", err)
return
}
log.Printf("basepin: creating basepin disks...")
err = bgc.MakeBasepinDisks(ctx)
d := time.Since(t0).Round(time.Second / 10)
if err != nil {
basePinErr.Store(err.Error())
log.Printf("basepin: error creating basepin disks, after %v: %v", d, err)
time.Sleep(5 * time.Minute)
continue
}
basePinErr.Store("")
log.Printf("basepin: created basepin disks after %v", d)
return
}
} |
The most useful thing for debugging the latter would be an issue on golang.org/issues with a PCAP and the result of setting tls.Config.KeyLogWriter during that network capture. |
Thank you @FiloSottile! In deed, as you've chimed in, seems like this issue is related to @mengzhuo's proxy and they've come to the resolution of how to get things done as per golang/go#31217 (comment). I'll close this issue. |
I found it in the Go build farm:
Error: writeSnapshot: local error: tls: bad record MAC
I think this is caused by the "finish_write_snapshot_to_gcs" while the coordinator trying to writing snapshot to GCE.
golang/go#31217
The text was updated successfully, but these errors were encountered: