Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver not starting with audit logging (STDOUT) #4202

Closed
caraboides opened this issue Jan 5, 2018 · 24 comments
Closed

apiserver not starting with audit logging (STDOUT) #4202

caraboides opened this issue Jan 5, 2018 · 24 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@caraboides
Copy link

  1. What kops version are you running?
    Version 1.8.0 (git-5099bc5)
  2. What Kubernetes version are you running?
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:46:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  1. What cloud provider are you using?
    aws
  2. What commands did you run? What is the simplest way to reproduce this issue?
    kops edit
kubeAPIServer:
    auditLogPath: '-'
    auditPolicyFile: /srv/kubernetes/audit-policy.yaml

kops update

kops rolling-update cluster \
		--name=$(CLUSTER_NAME) \
		--fail-on-validate-error="$(KOPS_VERIFY)" \
		--instance-group masters\
		--force \
		--yes
  1. What happened after the commands executed?
    kube-apiserver pod is not starting, see this in syslog at the master:
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: I0105 17:22:56.225033    2882 kube_boot.go:171] kubelet systemd service already running
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: I0105 17:22:56.225044    2882 channels.go:31] checking channel: "s3://xxx/k8s/k8s-cluster0
1.dev.xxx.yy/addons/bootstrap-channel.yaml"
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: I0105 17:22:56.225143    2882 channels.go:45] Running command: channels apply channel s3://xxx
tion/k8s/k8s-cluster01.dev.xxx.yy/addons/bootstrap-channel.yaml --v=4 --yes
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: I0105 17:22:56.314835    2882 channels.go:48] error running channels apply channel s3://xxx/k8s/k8s-cluster01.dev.xxxx.yy/addons/bootstrap-channel.yaml --v=4 --yes:
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: I0105 17:22:56.314901    2882 channels.go:49] Error: error querying kubernetes version: Get https://127.0.0.1/version: dial tcp 127.0.0.1:443: getsockopt: connection refused
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: Usage:
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: channels apply channel [flags]
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: Flags:
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: -f, --filename stringSlice   Apply from a local file
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --yes                    Apply update
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: Global Flags:
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --alsologtostderr                  log to standard error as well as files
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --config string                    config file (default is $HOME/.channels.yaml)
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --log_dir string                   If non-empty, write log files in this directory
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --logtostderr                      log to standard error instead of files (default false)
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: -v, --v Level                          log level for V logs (default 0)
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: error querying kubernetes version: Get https://127.0.0.1/version: dial tcp 127.0.0.1:443: getsockopt: connection refused
Jan  5 17:22:56 ip-172-24-15-90 docker[2826]: I0105 17:22:56.314939    2882 channels.go:34] apply channel output was: Error: error querying kubernetes version: Get https://127.0.0.1/version: dial tcp 127.0.0.1:443: getsockopt: connection refused
  1. What did you expect to happen?
    A running apiserver and auditlogs in STDOUT
  2. Please provide your cluster manifest.
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2017-03-15T17:28:12Z
  name: k8s-cluster01.dev.xxx.de
spec:
  additionalPolicies:
    master: |
      [
        {
          "Action": [
            "autoscaling:Describe*",
            "cloudwatch:*",
            "logs:*",
            "sns:*"
          ],
          "Effect": "Allow",
          "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ListHostedZonesByName",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "elasticloadbalancing:DescribeLoadBalancers",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ChangeResourceRecordSets",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "es:ESHttpGet",
                "es:ESHttpHead",
                "es:ESHttpPost",
                "es:ESHttpPut"
            ],
            "Resource": [
                "arn:aws:es:eu-central-1:xxx:domain/xxxx-ek/*"
            ]
        }
      ]
    node: |
      [
        {
          "Action": [
            "autoscaling:Describe*",
            "cloudwatch:*",
            "logs:*",
            "sns:*"
          ],
          "Effect": "Allow",
          "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ListHostedZonesByName",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "elasticloadbalancing:DescribeLoadBalancers",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "route53:ChangeResourceRecordSets",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "elasticfilesystem:Describe*",
            "Resource": "*"
        }
      ]
  api:
    loadBalancer:
      type: Public
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudLabels:
    service: kubernetes
    team: k8s
    vertical: k8s
  cloudProvider: aws
  configBase: s3://xxxnger-dev-production-configuration/k8s/k8s-cluster01.dev.xxx.de
  dnsZone: dev.xxx.de
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-eu-central-1b-1
      name: b-1
    - instanceGroup: master-eu-central-1a-1
      name: a-1
    - instanceGroup: master-eu-central-1a-2
      name: a-2
    name: main
  - etcdMembers:
    - instanceGroup: master-eu-central-1b-1
      name: b-1
    - instanceGroup: master-eu-central-1a-1
      name: a-1
    - instanceGroup: master-eu-central-1a-2
      name: a-2
    name: events
  fileAssets:
  - content: |
      # Log all requests at the Metadata level.
      apiVersion: audit.k8s.io/v1beta1 # This is required.
      kind: Policy
      # Don't generate audit events for all requests in RequestReceived stage.
      omitStages:
        - "RequestReceived"
      rules:
        # Log pod changes at RequestResponse level
        - level: RequestResponse
          resources:
          - group: ""
            # Resource "pods" doesn't match requests to any subresource of pods,
            # which is consistent with the RBAC policy.
            resources: ["pods"]
        # Log "pods/log", "pods/status" at Metadata level
        - level: Metadata
          resources:
          - group: ""
            resources: ["pods/log", "pods/status"]
        # Don't log requests to a configmap called "controller-leader"
        - level: None
          resources:
          - group: ""
            resources: ["configmaps"]
            resourceNames: ["controller-leader"]
        # Don't log watch requests by the "system:kube-proxy" on endpoints or services
        - level: None
          users: ["system:kube-proxy"]
          verbs: ["watch"]
          resources:
          - group: "" # core API group
            resources: ["endpoints", "services"]
        # Don't log authenticated requests to certain non-resource URL paths.
        - level: None
          userGroups: ["system:authenticated"]
          nonResourceURLs:
          - "/api*" # Wildcard matching.
          - "/version"
        # Log the request body of configmap changes in kube-system.
        - level: Request
          resources:
          - group: "" # core API group
            resources: ["configmaps"]
          # This rule only applies to resources in the "kube-system" namespace.
          # The empty string "" can be used to select non-namespaced resources.
          namespaces: ["kube-system"]
        # Log configmap and secret changes in all other namespaces at the Metadata level.
        - level: Metadata
          resources:
          - group: "" # core API group
            resources: ["secrets", "configmaps"]
        # Log all other resources in core and extensions at the Request level.
        - level: Request
          resources:
          - group: "" # core API group
          - group: "extensions" # Version of group should NOT be included.
        # A catch-all rule to log all other requests at the Metadata level.
        - level: Metadata
          # Long-running requests like watches that fall under this rule will not
          # generate an audit event in RequestReceived.
          omitStages:
            - "RequestReceived"
    name: audit-policy-file
    path: /srv/kubernetes/audit-policy.yaml
    roles:
    - Master
  iam:
    legacy: true
  kubeAPIServer:
    authorizationMode: RBAC
    auditLogPath: '-'
    auditPolicyFile: /srv/kubernetes/audit-policy.yaml
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.8.0
  masterInternalName: api.internal.k8s-cluster01.dev.xxx.yy
  networkCIDR: 172.24.0.0/16
  networkID: vpc-yyyyyyy
  networking:
    flannel:
      backend: udp
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.24.0.0/19
    id: subnet-yyyyyy
    name: eu-central-1a
    type: Private
    zone: eu-central-1a
  - cidr: 172.24.32.0/19
    id: subnet-yyyyyyy
    name: eu-central-1b
    type: Private
    zone: eu-central-1b
  - cidr: 172.24.128.0/19
    id: subnet-yyyyyy
    name: utility-eu-central-1a
    type: Utility
    zone: eu-central-1a
  - cidr: 172.24.160.0/19
    id: subnet-yyyyyy
    name: utility-eu-central-1b
    type: Utility
    zone: eu-central-1b
  topology:
    dns:
      type: Public
    masters: private
    nodes: private
  1. Anything else do we need to know?
    I have a dirty workaround:
    deleted
    auditLogPath: '-'
    auditPolicyFile: /srv/kubernetes/audit-policy.yaml

from cluster config and activate audit logging by editing die kube-apiserver.manifes

  • adding args to /usr/local/bin/kube-apiserver call
    --audit-policy-file=/srv/kubernetes/audit-policy.yaml --audit-log-path=-
  • remove the link of STDERR to STDOUT 2>&1
    The working manifest is then:
apiVersion: v1
kind: Pod
metadata:
  annotations:
    dns.alpha.kubernetes.io/internal: api.internal.k8s-cluster01.dev.xxxx.yy
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    k8s-app: kube-apiserver
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - /bin/sh
    - -c
    - /usr/local/bin/kube-apiserver --address=127.0.0.1 --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,Priority,ResourceQuota
      --allow-privileged=true --anonymous-auth=false --apiserver-count=3 --authorization-mode=RBAC
      --basic-auth-file=/srv/kubernetes/basic_auth.csv --client-ca-file=/srv/kubernetes/ca.crt
      --audit-policy-file=/srv/kubernetes/audit-policy.yaml --audit-log-path=-
      --cloud-provider=aws --etcd-servers-overrides=/events#http://127.0.0.1:4002
      --etcd-servers=http://127.0.0.1:4001 --insecure-port=8080 --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
      --proxy-client-cert-file=/srv/kubernetes/apiserver-aggregator.cert --proxy-client-key-file=/srv/kubernetes/apiserver-aggregator.key
      --requestheader-allowed-names=aggregator --requestheader-client-ca-file=/srv/kubernetes/apiserver-aggregator-ca.cert
      --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group
      --requestheader-username-headers=X-Remote-User --secure-port=443 --service-cluster-ip-range=100.64.0.0/13
      --storage-backend=etcd2 --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key
      --token-auth-file=/srv/kubernetes/known_tokens.csv --v=2 | /bin/tee -a
      /var/log/kube-apiserver.log
    image: gcr.io/google_containers/kube-apiserver:v1.8.0
    livenessProbe:
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 8080
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: kube-apiserver
    ports:
    - containerPort: 443
      hostPort: 443
      name: https
    - containerPort: 8080
      hostPort: 8080
      name: local
    resources:
      requests:
        cpu: 150m
   volumeMounts:
    - mountPath: /etc/ssl
      name: etcssl
      readOnly: true
    - mountPath: /etc/pki/tls
      name: etcpkitls
      readOnly: true
    - mountPath: /etc/pki/ca-trust
      name: etcpkica-trust
      readOnly: true
    - mountPath: /usr/share/ssl
      name: usrsharessl
      readOnly: true
    - mountPath: /usr/ssl
      name: usrssl
      readOnly: true
    - mountPath: /usr/lib/ssl
      name: usrlibssl
      readOnly: true
    - mountPath: /usr/local/openssl
      name: usrlocalopenssl
      readOnly: true
    - mountPath: /var/ssl
      name: varssl
      readOnly: true
    - mountPath: /etc/openssl
      name: etcopenssl
      readOnly: true
    - mountPath: /var/log/kube-apiserver.log
      name: logfile
    - mountPath: /srv/kubernetes
      name: srvkube
      readOnly: true
    - mountPath: /srv/sshproxy
      name: srvsshproxy
      readOnly: true
  hostNetwork: true
  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  volumes:
  - hostPath:
      path: /etc/ssl
    name: etcssl
  - hostPath:
      path: /etc/pki/tls
    name: etcpkitls
  - hostPath:
      path: /etc/pki/ca-trust
    name: etcpkica-trust
  - hostPath:
      path: /usr/share/ssl
    name: usrsharessl
  - hostPath:
      path: /usr/ssl
    name: usrssl
  - hostPath:
      path: /usr/lib/ssl
    name: usrlibssl
  - hostPath:
      path: /usr/local/openssl
    name: usrlocalopenssl
  - hostPath:
      path: /var/ssl
    name: varssl
  - hostPath:
      path: /etc/openssl
    name: etcopenssl
  - hostPath:
      path: /var/log/kube-apiserver.log
    name: logfile
  - hostPath:
      path: /srv/kubernetes
    name: srvkube
  - hostPath:
      path: /srv/sshproxy
    name: srvsshproxy
status: {}

Now i have auditlogs in STDOUT of the POD and fluentd is able to ship.

PS:
I see the same error when logging to a file like /tmp/foo with 2>&1. If i delete 2>&1, then i see audit logs in /tmp/foo in the container.

spec:
  containers:
  - command:
    - /bin/sh
    - -c
    - /usr/local/bin/kube-apiserver --address=127.0.0.1 --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,Priority,ResourceQuota
      --allow-privileged=true --anonymous-auth=false --apiserver-count=3 --authorization-mode=RBAC
      --basic-auth-file=/srv/kubernetes/basic_auth.csv --client-ca-file=/srv/kubernetes/ca.crt
      --audit-policy-file=/srv/kubernetes/audit-policy.yaml --audit-log-path=/tmp/foo
      --audit-log-maxage=10 --audit-log-maxbackup=1 --audit-log-maxsize=100
      --cloud-provider=aws --etcd-servers-overrides=/events#http://127.0.0.1:4002
      --etcd-servers=http://127.0.0.1:4001 --insecure-port=8080 --kubelet-preferred-address-types=InternalIP,Hostname,ExternalIP
      --proxy-client-cert-file=/srv/kubernetes/apiserver-aggregator.cert --proxy-client-key-file=/srv/kubernetes/apiserver-aggregator.key
      --requestheader-allowed-names=aggregator --requestheader-client-ca-file=/srv/kubernetes/apiserver-aggregator-ca.cert
      --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group
      --requestheader-username-headers=X-Remote-User --secure-port=443 --service-cluster-ip-range=100.64.0.0/13
      --storage-backend=etcd2 --tls-cert-file=/srv/kubernetes/server.cert --tls-private-key-file=/srv/kubernetes/server.key
      --token-auth-file=/srv/kubernetes/known_tokens.csv --v=2 | /bin/tee -a
      /var/log/kube-apiserver.log
    image: gcr.io/google_containers/kube-apiserver:v1.8.0
...
@timworx
Copy link

timworx commented Jan 6, 2018

What is the output of /var/log/kube-apiserver.log - the syslog just appears to be confirming that services can't connect to the kube-apiserver.

@caraboides
Copy link
Author

There is no output in the kube-apiserver.log.

@hatemosphere
Copy link

+1 to this, we have the same problem

@chrislovecnm
Copy link
Contributor

@sethpollack do you have any insight? I think you dropped in the options for api audit logging.

@sethpollack
Copy link
Contributor

nope, never used it in the end

@caraboides
Copy link
Author

I can do simple Patch for kops and deactivate the logging of STDERROR to /var/log/kube-apiserver.log, when audit-logging is activated. But it seems to be wrong way of solving the issue.

@chrislovecnm
Copy link
Contributor

What is the right way? We have issues to get api server logs accessible to kubectl get logs as well.

@caraboides
Copy link
Author

In my opinion it is not necessary to log to /var/log/kube-apiserver.log, pls correct me if iam worng. STDOUT is the right target.

@hatemosphere
Copy link

with default cluster setup using kops, logs of Kubernetes components are being duplicated to /var/log/ and to STDOUT of corresponding component's container

@caraboides
Copy link
Author

caraboides commented Jan 15, 2018

@hatemosphere Why?

@shavo007
Copy link
Contributor

shavo007 commented Feb 2, 2018

Just got this issue also.

Updated my kops config with audit details, rolling update of master and now my api server will not start!
Kops version is 1.8
I edit cluster and added in the details:

  kubeAPIServer:
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditLogMaxAge: 10
    auditLogMaxBackups: 1
    auditLogMaxSize: 100

Then restarted my master

kops get instancegroups
 kops rolling-update cluster  --instance-group <masterName> --force --yes

completely broken after...

checked logs on master node for api server:

etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused

@caraboides how did you resolve this? This a bug?

No idea what is wrong.

So reverted the state in s3 to previous version, terminated master and rebooted.
Back to normal.

Is there an issue with audit?

@shavo007
Copy link
Contributor

shavo007 commented Feb 2, 2018

My fault!

I was running kops on a VM, version 1.7 and messed up the state files on s3 bucket!!! ;-)

@caraboides
Copy link
Author

caraboides commented Feb 7, 2018

@shavo007 i deleted 2>&1 in kube-apiserver.manifest.
In my opinion | /bin/tee -a /var/log/kube-apiserver.log is not necessary

@chrislovecnm
Copy link
Contributor

@KashifSaadat does your PR update this as well?

@KashifSaadat
Copy link
Contributor

@chrislovecnm sorry which PR? The KubeAPIServerSpec is now included in the user-data for master nodes so a forced rollout shouldn't be necessary any more.

The following PRs slightly change the command exec and logging behaviour:

@morganchristiansson
Copy link

morganchristiansson commented Mar 8, 2018

We're getting the same error.

Found this error in systemd journal for kubelet:

Mar 08 17:18:04 ip-172-18-42-239.eu-west-2.compute.internal kubelet[1286]: E0308 17:18:04.274957 1286 pod_workers.go:182] Error syncing pod 9be1b1fa1c96172962ab73eff4aa7203 ("kube-apiserver-ip-172-18-42-239.eu-west-2.compute.internal_kube-system(9be1b1fa1c96172962ab73eff4aa7203)"), skipping: failed to "StartContainer" for "kube-apiserver" with CreateContainerError: "Error response from daemon: invalid bind mount spec \".:/.\": invalid volume specification: '.:/.': invalid mount config for type \"volume\": invalid specification: destination can't be '/'"

Looking at kube-apiserver.manifest the following looks incorrect:

    volumeMounts:
    <snip>
    - mountPath: .
      name: auditlogpathdir

and

  volumes:
  <snip>
  - hostPath:
      path: .
    name: auditlogpathdir

kubelet runs in /

lrwxrwxrwx. 1 root root 0 Mar  8 16:47 /proc/1286/cwd -> /

Which could explain why .:/. bind mount is used incorrectly.

The auditlogpathdir volume mount seems to be coming from here: https://github.com/kubernetes/kops/blob/master/nodeup/pkg/model/kube_apiserver.go#L310

@chrislovecnm
Copy link
Contributor

What should it be?

@morganchristiansson
Copy link

There shouldn't be a volume mount for auditlogpathdir as it's being sent to STDOUT

@morganchristiansson
Copy link

https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#configuration says

audit-log-path - enables the audit log pointing to a file where the requests are being logged to, ‘-‘ means standard out.

kops chokes on audit-log-path being set to '-'. Generates an invalid kube-apiserver.manifest that gives the error I posted above and the node fails to come up.

@mhulscher
Copy link

I have the same issue. This is inconvenient because I use fluentd to ship the STDOUT/STDERR logs of the kube-apiserver to some faraway place.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2018
@hatemosphere
Copy link

any activities ongoing atm to solve this?

@chrw
Copy link

chrw commented Jul 11, 2018

I have the same question. It's still relevant for us.

@kampka
Copy link
Contributor

kampka commented Jul 16, 2018

I've opened a pull request with a potential fix for this issue, as far as I can reproduce it.
Additional testing is appreciated though.

When testing this issue, please keep in mind that this happens in nodeup which can be tricky to test.
I advise taking the long way for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests