Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thanos query cannot connect to TLS terminated grpc thanos sidecar correctly, grpcurl connects correctly #6439

Closed
doctorpangloss opened this issue Jun 13, 2023 · 2 comments

Comments

@doctorpangloss
Copy link

doctorpangloss commented Jun 13, 2023

Thanos, Prometheus and Golang version used: thanos 0.31.0

Object Storage Provider: n/a

What happened:

thanos query cannot connect to an ordinary thanos-sidecar nginx grpc proxied address passed as an --endpoint, whereas grpcurl using thanos's protobufs can.

Reopens #4923

Probably related to #6258

Reproduction:

  1. Try to connect to an address.
$ GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info thanos query --endpoint some-address:443
...
2023/06/12 18:26:23 INFO: [core] original dial target is: "some-address:443"
2023/06/12 18:26:23 INFO: [core] parsed dial target is: {Scheme:some-address Authority: Endpoint:443 URL:{Scheme:some-address Opaque:443 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
2023/06/12 18:26:23 INFO: [core] fallback to scheme "passthrough"
2023/06/12 18:26:23 INFO: [core] parsed dial target is: {Scheme:passthrough Authority: Endpoint:some-address:443 URL:{Scheme:passthrough Opaque: User: Host: Path:/some-address:443 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
2023/06/12 18:26:23 INFO: [core] Channel authority set to "some-address:443"
level=info ts=2023-06-13T01:26:23.890023Z caller=grpc.go:131 service=gRPC/server component=query msg="listening for serving gRPC" address=0.0.0.0:10901
2023/06/12 18:26:23 INFO: [core] ccResolverWrapper: sending update to cc: {[{some-address:443  <nil> <nil> 0 <nil>}] <nil> <nil>}
2023/06/12 18:26:23 INFO: [core] ClientConn switching balancer to "pick_first"
2023/06/12 18:26:23 INFO: [core] Channel switches to new LB policy "pick_first"
2023/06/12 18:26:23 INFO: [core] blockingPicker: the picked transport is not ready, loop back to repick
2023/06/12 18:26:23 INFO: [core] Subchannel Connectivity change to CONNECTING
2023/06/12 18:26:23 INFO: [core] Subchannel picks a new address "some-address:443" to connect
2023/06/12 18:26:23 INFO: [core] pickfirstBalancer: UpdateSubConnState: 0x14000a44410, {CONNECTING <nil>}
2023/06/12 18:26:23 INFO: [core] Channel Connectivity change to CONNECTING
level=info ts=2023-06-13T01:26:23.890529Z caller=tls_config.go:232 service=http/server component=query msg="Listening on" address=[::]:10902
level=info ts=2023-06-13T01:26:23.890696Z caller=tls_config.go:235 service=http/server component=query msg="TLS is disabled." http2=false address=[::]:10902
2023/06/12 18:26:23 INFO: [core] Subchannel Connectivity change to TRANSIENT_FAILURE
2023/06/12 18:26:23 INFO: [transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2023/06/12 18:26:23 INFO: [core] pickfirstBalancer: UpdateSubConnState: 0x14000a44410, {TRANSIENT_FAILURE connection closed before server preface received}
2023/06/12 18:26:23 INFO: [core] Channel Connectivity change to TRANSIENT_FAILURE
  1. Observe connection fails.
  2. Compare with grpcurl:
$ GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info grpcurl -import-path=protobuf -import-path=./pkg/ -proto=store/storepb/rpc.proto some-address:443 thanos.Store/Info
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Channel created
  2023/06/12 18:33:15 INFO: [core] [Channel #1] original dial target is: "some-address:443"
  2023/06/12 18:33:15 INFO: [core] [Channel #1] parsed dial target is: {Scheme:some-address Authority: Endpoint:443 URL:{Scheme:some-address Opaque:443 User: Host: Path: RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
  2023/06/12 18:33:15 INFO: [core] [Channel #1] fallback to scheme "passthrough"
  2023/06/12 18:33:15 INFO: [core] [Channel #1] parsed dial target is: {Scheme:passthrough Authority: Endpoint:some-address:443 URL:{Scheme:passthrough Opaque: User: Host: Path:/some-address:443 RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}}
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Channel authority set to "some-address:443"
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Resolver state updated: {
    "Addresses": [
      {
        "Addr": "some-address:443",
        "ServerName": "",
        "Attributes": null,
        "BalancerAttributes": null,
        "Type": 0,
        "Metadata": null
      }
    ],
    "ServiceConfig": null,
    "Attributes": null
  } (resolver returned new addresses)
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
  2023/06/12 18:33:15 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
  2023/06/12 18:33:15 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to CONNECTING
  2023/06/12 18:33:15 INFO: [core] [Channel #1 SubChannel #2] Subchannel picks a new address "some-address:443" to connect
  2023/06/12 18:33:15 INFO: [core] pickfirstBalancer: UpdateSubConnState: 0x14000891530, {CONNECTING <nil>}
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Channel Connectivity change to CONNECTING
  2023/06/12 18:33:15 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to READY
  2023/06/12 18:33:15 INFO: [core] pickfirstBalancer: UpdateSubConnState: 0x14000891530, {READY <nil>}
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Channel Connectivity change to READY
  {
    "labels": [
      {
        "name": "cluster",
        "value": "appmana-cluster-03"
      },
      {
        "name": "prometheus",
        "value": "monitoring/kube-prometheus-stack-prometheus"
      },
      {
        "name": "prometheus_replica",
        "value": "prometheus-kube-prometheus-stack-prometheus-0"
      }
    ],
    "maxTime": "9223372036854775807",
    "storeType": "SIDECAR",
    "labelSets": [
      {
        "labels": [
          {
            "name": "cluster",
            "value": "appmana-cluster-03"
          },
          {
            "name": "prometheus",
            "value": "monitoring/kube-prometheus-stack-prometheus"
          },
          {
            "name": "prometheus_replica",
            "value": "prometheus-kube-prometheus-stack-prometheus-0"
          }
        ]
      }
    ]
  }
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Channel Connectivity change to SHUTDOWN
  2023/06/12 18:33:15 INFO: [core] [Channel #1 SubChannel #2] Subchannel Connectivity change to SHUTDOWN
  2023/06/12 18:33:15 INFO: [core] [Channel #1 SubChannel #2] Subchannel deleted
  2023/06/12 18:33:15 INFO: [core] [Channel #1] Channel deleted

What you expected to happen:
You should be able to connect to remote sidecars directly.

How to reproduce it (as minimally and precisely as possible):
Try to connect to a sidecar via --endpoint and specify an address.

Environment:

  • OS (e.g. from /etc/os-release): Linux
  • Kernel (e.g. uname -a): 5.15
@doctorpangloss doctorpangloss changed the title thanos query cannot connect to plain TLS terminated grpc store correctly, grpcurl connects correctly thanos query cannot connect to TLS terminated grpc thanos sidecar correctly, grpcurl connects correctly Jun 13, 2023
@doctorpangloss
Copy link
Author

doctorpangloss commented Jun 13, 2023

--grpc-client-tls-secure was missing!

@KM3dd
Copy link

KM3dd commented Mar 23, 2024

Hello @doctorpangloss , I am having some hard time with this one, please can you help me out with how you configured ingress for the remote sidecar ? grpcurl is returning : tls: no application protocol, thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants