Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI Volume Details API endpoint (/v1/volume/csi/:volumeID) fails when the volume is attached to a job.. #8661

Closed
RickyGrassmuck opened this issue Aug 12, 2020 · 2 comments

Comments

@RickyGrassmuck
Copy link
Contributor

For reporting security vulnerabilities please refer to the website.

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

[root@nomad-dev-0 ~]# nomad version
Nomad v0.12.2 (ee69b3379aeced67e14943b86c4f621451e64e84)

Operating system and Environment details

[root@nomad-dev-0 ~]# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Cinder CSI Plugin being used:

[root@nomad-dev-0 ~]# nomad plugin status -verbose
Container Storage Interface
ID          Provider                  Controllers Healthy/Expected  Nodes Healthy/Expected
cinder-csi  cinder.csi.openstack.org  3/3                           3/3

Volume Status

[root@nomad-dev-0 ~]# nomad volume status
Container Storage Interface
ID       Name         Plugin ID   Schedulable  Access Mode
testvol  test_volume  cinder-csi  true         single-node-writer

Issue

When attempting to load the status of a specific volume after it has already been allocated to a job, the /v1/volume/csi/:volumeID endpoint fails leading to both the nomad volume status [volumeID] command and the UI to fail.

Output when trying to view the status of a Volume that has been allocated to a job already:

[root@nomad-dev-0 ~]# nomad volume status testvol
Error querying volume: Get "https://nomad-dev-0.eco.cpanel.net:4646/v1/volume/csi/testvol": EOF

I checked the logs and found the issue is the result of an invalid memory address/nil pointer dereference.

This doesn't happen when the volume is not actively allocated to a job.

[root@nomad-dev-0 ~]#  nomad job stop metrics-server
==> Monitoring evaluation "b7751916"
    Evaluation triggered by job "metrics-server"
    Evaluation within deployment: "90826e0d"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "b7751916" finished with status "complete"

[root@nomad-dev-0 ~]# nomad volume status testvol
ID                   = testvol
Name                 = test_volume
External ID          = [openstack-external-id]
Plugin ID            = cinder-csi
Provider             = cinder.csi.openstack.org
Version              = 1.2.0@latest
Schedulable          = true
Controllers Healthy  = 3
Controllers Expected = 3
Nodes Healthy        = 3
Nodes Expected       = 3
Access Mode          = single-node-writer
Attachment Mode      = block-device
Mount Options        = <none>
Namespace            = default

Allocations
No allocations placed

Reproduction steps

  1. Deploy the Cinder CSI plugin (others may cause the behavior as well, haven't tested any though)
  2. Register a Volume
  3. Check the volumes status and see that it loads
  4. Deploy a job that claims the volume
  5. Check the volumes status again and note the errors.

Job file (if appropriate)

job "metrics-server" {
  datacenters = ["dc1"]
  group "database" {
    volume "testvol" {
      type = "csi"
      source = "testvol"
      read_only = false
    }
    task "metrics-server" {
      driver = "docker"
      config {
        image = "influxdb:latest"
        volumes = ["testvol:/var/lib/influxdb"]
        port_map {
          "influx" = 8086
        }
      }
      resources {
        network {
          port "influx" {}
        }
      }
    }
  }
}

Volume Registration Configuration

type = "csi"
id = "testvol"
name = "test_volume"
external_id = "[openstack-volume-id]"
access_mode = "single-node-writer"
attachment_mode = "block-device"
plugin_id = "cinder-csi"
mount_options {
   fs_type = "ext4"
}

Nomad Client logs (if appropriate)

Nothing was found related to this error in the Client logs

Nomad Server logs (if appropriate)

Aug 12 21:05:07 nomad-dev-0 nomad[23799]: 2020-08-12T21:05:07.617Z [ERROR] http: http: panic serving 10.3.5.53:54050: runtime error: invalid memory address or nil pointer dereference
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: goroutine 482697 [running]:
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http.(*conn).serve.func1(0xc000616140)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http/server.go:1800 +0x139
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: panic(0x2b7e160, 0x533c310)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: runtime/panic.go:975 +0x3e3
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent.structsAllocDeploymentStatusToApi(...)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent/csi_endpoint.go:420
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent.structsAllocListStubToApi(0xc0005b74a0, 0xc00103e900)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent/csi_endpoint.go:407 +0x2f9
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent.structsCSIPluginToApi(0xc001c3a700, 0xc0010411a0)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent/csi_endpoint.go:296 +0x43e
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).CSIPluginSpecificRequest(0xc0006e2190, 0x38eac00, 0xc0010411a0, 0xc001b92700, 0x4d78e6, 0x5f345983, 0x24b0fcf0, 0xbb43060948b)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent/csi_endpoint.go:265 +0x351
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent.(*HTTPServer).wrap.func1(0x38eac00, 0xc0010411a0, 0xc001b92700)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/hashicorp/nomad/command/agent/http.go:448 +0x176
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http.HandlerFunc.ServeHTTP(0xc0006fa860, 0x38eac00, 0xc0010411a0, 0xc001b92700)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http/server.go:2041 +0x44
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http.(*ServeMux).ServeHTTP(0xc0006fe0c0, 0x38eac00, 0xc0010411a0, 0xc001b92700)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http/server.go:2416 +0x1a5
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1(0x38f4c80, 0xc0007ce620, 0xc001b92700)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: github.com/NYTimes/[email protected]/gzip.go:277 +0x1e6
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http.HandlerFunc.ServeHTTP(0xc00071ecf0, 0x38f4c80, 0xc0007ce620, 0xc001b92700)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http/server.go:2041 +0x44
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http.serverHandler.ServeHTTP(0xc0006f61c0, 0x38f4c80, 0xc0007ce620, 0xc001b92700)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http/server.go:2836 +0xa3
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http.(*conn).serve(0xc000616140, 0x3906b80, 0xc001b80cc0)
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http/server.go:1924 +0x86c
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: created by net/http.(*Server).Serve
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: net/http/server.go:2962 +0x35c
Aug 12 21:05:07 nomad-dev-0 nomad[23799]: 2020-08-12T21:05:07.817Z [DEBUG] http: request complete: method=GET path=/v1/volume/csi%2Ftestvol duration=228.369µs
@RickyGrassmuck
Copy link
Contributor Author

Looks like y'all were on top of this one, forgot to check closed issues where I found the original report and PR #8655

Gonna close this out since it looks to be taken care of!

@github-actions
Copy link

github-actions bot commented Nov 3, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant