Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add volume reconstruct/cleanup logic in kubelet volume manager #12024

Merged
merged 3 commits into from
Dec 1, 2016

Conversation

jsafrane
Copy link
Contributor

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1398417

There are two UPSTREAM fixes here, both are needed to fix the bug. Beware, the first one is HUGE :-(.

@ncdc
Copy link
Contributor

ncdc commented Nov 24, 2016

@sttts @ingvagabund would you have time to review?

@ingvagabund
Copy link
Member

This is basically backport from kubernetes, so it should be more like a formal review, right?

@ingvagabund
Copy link
Member

Any chance to run more tests than the travis one? Hard to believe it is the only CI that is required to run.

@stevekuznetsov
Copy link
Contributor

[test]

@mfojtik
Copy link
Contributor

mfojtik commented Nov 25, 2016

you forgot to run hack/update-generated-clientsets.sh

@gnufied
Copy link
Member

gnufied commented Nov 25, 2016

Bunch of failures here appears to be because the test harness in jenkins doesn't work well for old builds anymore, such as:

  1. docker version is upgraded to 1.12. Any code that use API/option present in only 1.10 fails (I am looking at router tests)

  2. ./test/extended/networking-minimal.sh does not exist on 1.3

  3. conformonce test is flaking on Extended.deploymentconfigs with failing hook [Conformance] should get all logs from retried hooks #11630

@stevekuznetsov
Copy link
Contributor

stevekuznetsov commented Nov 25, 2016

We will need to set up a different job to handle non-master PR tests for Origin. The setup exists for OSE today, but we have not replicated it here. Any chance we can open this PR against ose/enterprise-3.3 ?

@gnufied
Copy link
Member

gnufied commented Nov 25, 2016

@mfojtik I ran that. It doesn't produce any changes to generated code (or any commitable changes). Also locally when I ran make verify on this PR - it did not throw errors I see here. Not sure

@gnufied
Copy link
Member

gnufied commented Nov 25, 2016

@jsafrane
Copy link
Contributor Author

@gnufied, thanks for taking care of this.

Indeed, make verify fails on PRs to release-1.3 branch, IMO there is nothing I can do about it. Exactly the same patch is in https://github.com/openshift/ose/pull/480 and the bot there does not complain.

@@ -135,6 +135,18 @@ func (plugin *vsphereVolumePlugin) getCloudProvider() (*vsphere.VSphere, error)
return vs, nil
}

func (plugin *vsphereVolumePlugin) ConstructVolumeSpec(volumeName, mountPath string) (*volume.Spec, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we pick up the latest fix
kubernetes/kubernetes#37167

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep that to a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes/kubernetes#37167

that's for photon volume plugin, it's not present in OpenShift 1.3

@childsb
Copy link
Contributor

childsb commented Nov 28, 2016

Both changes are merged upstream (there's some funk in the links & description):

kubernetes/kubernetes#36840

kubernetes/kubernetes#27970

lgtm for merge as soon as tests complete.

@stevekuznetsov
Copy link
Contributor

@smarterclayton is this something we want to be updating the Origin 1.3 branch with? @jsafrane as I said before, our test infra is not set up to be able to run this test correctly. You won't see it pass here. This is why the OSE PR is created.

@@ -37,6 +41,9 @@ type Interface interface {
// IsLikelyNotMountPoint determines if a directory is a mountpoint.
// It should return ErrNotExist when the directory does not exist.
IsLikelyNotMountPoint(file string) (bool, error)
// GetDeviceNameFromMount finds the device name by checking the mount path
// to get the global mount path which matches its plugin directory
GetDeviceNameFromMount(mountPath, pluginDir string) (string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also needs to pick up kubernetes/kubernetes#30724

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR won't hurt , but do we support OpenShift on non-Linux platforms?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Origin is required to compile on all Linux platforms that have a build target (32, 64, arm64, and s390 eventually).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking the openshift dev experience on mac will earn a swift revert :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically must compile on all platforms, must not knowingly break kubelet on all Linux platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, picked kubernetes/kubernetes#30724
thanks for spotting that.

@smarterclayton
Copy link
Contributor

Was this back ported to 1.3? We've had another request to get at fixes in kube 1.3.x - it may be easier to simply merge those fixes via an update.

@gnufied
Copy link
Member

gnufied commented Nov 28, 2016

kubernetes/kubernetes#27970 was backported to kubernetes-1.3 release branch. kubernetes/kubernetes#36840 is too new and hasn't been backported.

glog.V(3).Infof("Orphaned pod %q found, but volumes are not cleaned up; err: %v", uid, err)
continue
}
// Check whether volume is still mounted on disk. If so, do not delete directory
if volumeNames, err := kl.getPodVolumeNameListFromDisk(uid); err != nil || len(volumeNames) != 0 {
Copy link
Member

@gnufied gnufied Nov 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eparis @jsafrane so I reviewed this PR in some detail and this is the bit - which helps with volumes not getting deleted if it is still mounted on disk.

The other big change here is in reconciler.go:

  1. Refactor reconcile() function in its own function. There isn't code a code change in whole of reconcilation itself but diff looks pretty big because of refactoring.
  2. After reconcillation - reconciler also periodically calls reconstruct - which populates state of world from pod by iterating through each pod volume plugin directory.

Overall, I think we should be okay.

@childsb
Copy link
Contributor

childsb commented Nov 30, 2016

this is critical bug. both changes were merged upstream already and this is just a pick of those changes. i reviewed and this lgtm.

@smarterclayton
Copy link
Contributor

Removing merge, this does not compile on macs.

@smarterclayton
Copy link
Contributor

You need to fix

○ make
hack/build-go.sh
++ Building go targets for darwin/amd64: cmd/openshift cmd/oc
# github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/mount
/Users/clayton/projects/origin/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/mount/mount.go:88: cannot use Mounter literal (type *Mounter) as type Interface in return argument:
	*Mounter does not implement Interface (missing GetDeviceNameFromMount method)

in this PR before merge.

@jsafrane
Copy link
Contributor Author

@smarterclayton, it should be compilable now, I do not have mac though.

@jsafrane
Copy link
Contributor Author

[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to feae665

@childsb
Copy link
Contributor

childsb commented Nov 30, 2016

I verified that this builds on OSX.

@childsb
Copy link
Contributor

childsb commented Nov 30, 2016

[merge]

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11823/) (Base Commit: d200e8a)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to feae665

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11847/) (Base Commit: d200e8a)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.