Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates to containerd executor, including basic CNI based network support #2299

Closed
wants to merge 20 commits into from

Conversation

ijc
Copy link
Contributor

@ijc ijc commented Jul 6, 2017

Apart from actually adding basic CNI support to the containerd executor the largest non-vendoring change here is to refactor some of the network operations into an interface (initially proposed in #2230 (comment)) in order to provide both CNM and CNI integration.

One nice IMHO result of that refactoring is that all imports of libnetwork are now in manager/allocator/cnmallocator and manager/network/cnm (previously some were also in controlapi, which seemed a bit like a layering violation) while all uses of CNI are in manager/network/cni and agent/exec/containerd/adapter.go. I'd still like to abstract the adaptor.go uses further to remove those.

I think there are likely still various bits of manager/allocator/cnmallocator which might more properly belong in manager/network/cnm (I'm thinking the use of the driver registry for example) but I haven't pulled on that thread more than the amount minimally required here.

The revendoring in Update containerd to xxxxx updates a few things which are used outside of the containerd plugin, specifically github.com/opencontainers/runc and
golang.org/x/sys. Both look OK to me but I'm not sure if there is some interaction with e.g. vendoring into moby/moby to worry about (especially runc).

The result of this that basic IP level connectivity is made to containers, I've not sorted out thinkgs like service discovery (e.g. DNS) yet.

In terms of testing I've been using the swarmd project in LinuxKit to fire up a cluster of VMs running Weave Net based CNI networking. A network is created with swarmctl network create --cni '{ "name": "weave", "type": "weave-net" }' and then service create ... --network weave can be used in the normal way.

Allocator() (networkallocator.NetworkAllocator, error)
SetDefaults(spec *api.NetworkSpec) error
ValidateNetworkSpec(spec *api.NetworkSpec) error
PredefinedNetworks() []networkallocator.PredefinedNetworkData
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the definition of PredefinedNetworkData belongs in this package rather than networkallocator.

We can't move it on its own because that would cause a circular dependency.

I wonder if it would also make sense to move the NetworkAllocator interface definition here. Then this package would contain the interface definitions, and networkallocator would have the implementations.

I'm not completely sure either way, just throwing this out as an idea. Something doesn't feel quite right about this package depending on the allocator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point/idea. I'll try it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually PredefinedNetworkData isn't used anywhere in the network allocator package after the changes in this PR so moving it makes absolute sense and doesn't cause a circular dependency.

I'm still considering possibly moving networkallocator.NetworkAllocator to network.Allocator, I think I like it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann done. I also refactored the rest of manager/allocator/networkallocator and a few bits of manager/allocator/network.go out into various more suitable places, so the packages which depend on the allocator are now (IMHO at least) much cleaner.

SetDefaults(spec *api.NetworkSpec) error
ValidateNetworkSpec(spec *api.NetworkSpec) error
PredefinedNetworks() []networkallocator.PredefinedNetworkData
SupportIngressNetwork() bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SupportsIngressNetwork?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to do this is only define an IngressNetwork method on implementations that support the ingress network, and type-assert to check whether a particular implementation has this method or not. This keeps implementation-specific details from creeping into the main interface definition. It can be more cumbersome to use, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try this too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronlehmann I made the name change as an interim measure. but WRT the IngressNetwork method I think it might make sense to move the newIngressNetwork function into the cnm version of that and use the type check WDYT?

I'm also eyeing up newPredefinedNetwork in a similar light, in essence making PredefinedNetworks return an api.Network instead of the PredefinedNetworkData and so embedding the function into CNM where it is used.


// Model is an abstraction over the Network Model to be used.
type Model interface {
Allocator() (networkallocator.NetworkAllocator, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this NewAllocator.

continue
}
for _, ip := range ns.cni.IPs {
log.G(ctx).Debugf("Iface# %d, num interfaces %d\n", ip.Interface, len(ns.cni.Interfaces))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no \n.

netWatch, cancel := state.Watch(s.WatchQueue(), api.EventUpdateNetwork{}, api.EventDeleteNetwork{})
defer cancel()
taskWatch, cancel := state.Watch(s.WatchQueue(), api.EventUpdateTask{}, api.EventDeleteTask{})
defer cancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this works as intended, but it might clarify the code to avoid reusing the name cancel.

@codecov
Copy link

codecov bot commented Jul 11, 2017

Codecov Report

Merging #2299 into master will decrease coverage by 0.16%.
The diff coverage is 49.32%.

@@            Coverage Diff             @@
##           master    #2299      +/-   ##
==========================================
- Coverage   60.38%   60.22%   -0.17%     
==========================================
  Files         128      130       +2     
  Lines       26056    26122      +66     
==========================================
- Hits        15735    15731       -4     
- Misses       8934     9006      +72     
+ Partials     1387     1385       -2

@ijc ijc force-pushed the containerd branch 2 times, most recently from 395c9a4 to d01cd54 Compare July 11, 2017 15:22
@ijc
Copy link
Contributor Author

ijc commented Jul 11, 2017

Addressed all comments except #2299 (review) and #2299 (review) which GH has helpfully decided are old due to my touching those lines for other reasons. Leaving this comment as a reminder to self.

@ijc ijc force-pushed the containerd branch 4 times, most recently from f6eaabb to 7f80d62 Compare July 12, 2017 14:14
api/network.go Outdated
package api

// IsIngressNetwork check if the network is an ingress network
func IsIngressNetwork(nw *Network) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put these helper functions in an api/network package? We've tried to reserve api for autogenerated code, though there are some exceptions now.

Copy link
Collaborator

@aaronlehmann aaronlehmann Jul 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think github.com/docker/swarmkit/manager/network would be a fine place to put these. In particular, IsIngressNetworkNeeded is basically business logic, so the manager/network package might make more sense.

@aaronlehmann
Copy link
Collaborator

Mostly LGTM. Left a few comments.

api/types.proto Outdated
string interface = 2;

// Address is the IP address in the form ipv4:XXX or ipv6:YYY
string address = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would make sense to break apart the protocol and address into separate fields.

If this format is borrowed from CNI and is well-defined there, this is probably fine, but in general I like to avoid microformats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I did indeed make this up myself.

api.EndPoint.VirtuallIP.addr is a string. Looking at the usage it appears to be defined as one which is compatible with net.ParseCIDR/net.ParseIP. This seems like a reasonable enough precedent to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this I don't really like what I've done wrt error reporting at https://github.com/ijc/swarmkit/blob/9c619b516e62f103df460dfc8ffd9652634733ac/agent/exec/containerd/controller.go#L66, essentially it just uses an "error:" prefix to distinguish.

I think maybe it would be better to have a separate error field, probably as a oneof.

pg plugingetter.PluginGetter
}

// New produces a fresh CNI Network Model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNM

if cnmallocator.IsBuiltInDriver(driver.Name) {
return nil
}
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the empty default case.

@ijc
Copy link
Contributor Author

ijc commented Jul 21, 2017

I've now addressed the outstanding review (sorry for the delay), I intend to fold those changes back into the right place in the course of the rebase which I need to do also.

I'm also going to pull in an update containerd v1.0.0-alpha1 (or later if there's another release in the meantime) at the same time.

@ijc ijc force-pushed the containerd branch 3 times, most recently from 240e50a to 8a09c03 Compare July 21, 2017 14:47
@ijc
Copy link
Contributor Author

ijc commented Jul 21, 2017

Rebased. Only minor hiccup was the need to plumb a PluginGetter back through node.Config and manager.Config (previously "Abstract Network Model into an interface" managed to remove this) since the dispatcher now uses it too (since e9a7bc0).

ijc pushed a commit to ijc/linuxkit that referenced this pull request Jul 21, 2017
The PR has been rebased to containerd v1.0.0-alpha1.

Signed-off-by: Ian Campbell <[email protected]>
@@ -193,6 +214,29 @@ func (c *containerAdapter) isPrepared() bool {
return c.container != nil && c.task != nil
}

func cniConfig(n *api.Network) (*libcni.NetworkConfig, error) {
if n.DriverState.Name != "cni" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could DriverState be nil here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cni.ValidateNetworkSpec would have rejected it if it were, and cniNetworkAllocator.Allocate ensures (redundantly) that it is not, but I think I shouldn't rely on the behaviour of such distant components so I will add a check here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, cni.ValidateNetworkSpec is looking at DriverConfig not DriverState, so cniNetworkAllocator.Allocate isn't redundant, I should still add the check here though.

}

if spec.Annotations.Name == "" {
spec.Annotations.Name = cni.Network.Name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this bit. It's sort of problematic - controlapi shouldn't modify the spec, which should only be user data. I can't think of any horrible implications from this particular statement, but we've worked pretty hard to keep a clear separation between user-controlled data and fields that are modifiable by swarmkit, and I wouldn't want to introduce an exception without a very good reason. Among other things, we're thinking of adding a mechanism for signing specs before they're submitted to swarmkit, in the future.

Could we remove SetDefaults and instead dynamically resolve an emptyspec.Annotations.Name field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have far reaching implications, for example places like validateAnnotations would need a special case which would depend on the type of spec the annotation was embedded in and we would need to arrange that uses of the network spec always got their name via a wrapper function instead of going to it directly.

There's also the overhead of having to parse the JSON each time (although I have an open plan to try caching that).

I was hoping to avoid requiring the user to specify both spec.Annotations.Name and the cni name identically, but perhaps I should flip this around and instead allow the cni name to be empty and fill it in from the (required) annotation name. That would "just" require me to figure out how to round trip an arbitrary JSON object through a parser to add a field without knowing in advance all of the fields which are present. I'm not actually sure if the go json library can do that, but I'll investigate.

Or another alternative would be to add a Name field to the api.Network object which contains the canonical name as determined by looking at the spec and the cni. AIUI that object is, unlike the spec, internal and fair game for mutating.

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to avoid requiring the user to specify both spec.Annotations.Name and the cni name identically

I think it's not so bad to require the user to specify spec.Annotations.Name. I think we require non-empty values for some object types.

I'm not sure I understand why spec.Annotations.Name would have to match the CNI name, but if they do, enforcing this constraint in controlapi validation sounds okay to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why spec.Annotations.Name would have to match the CNI name

Just to avoid confusion since the CNI name might get used where the admin might see it e.g. the bridge name (I think) and having that match swarmd's idea of the network's name seems useful.

Maybe it's not worth enforcing it though, if an admin wants to confuse themself that's up to them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect client software (i.e. Docker) to use the same value in each field in practice. I don't think it's necessary to enforce it at the swarmkit level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey dokey, it certainly simplifies my life here ;-)

@@ -465,12 +465,31 @@ enum TaskState {
// Remove this message when the states are deemed perfect.
}

// State of Networks attached to a container
message ContainerNetworkStatus {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this would be used in flat list, where the same values of network and interface may be repeated to provide multiple addresses.

If one of the networks is showing an error, there would be a single entry for it with an error instead of an IP.

This seems different from the way networks are represented internally in the executor. There's a set of networks at the top level, and each network can have zero or more IPs underneath it. Would it make sense to replicate that hierarchy here? My first thought is it would be a little less confusing than the flattened list here with the oneof choosing between an IP or an error condition, but I'm not really familiar with CNI, so it's very possible I'm missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have literally no idea why I chose to flatten this in the way I did, I probably started down one path before I had fully groked the CNI data structures and never reevaluated the base decision.

I'll do as you suggest and unflatten it making it a (non-oneof) string error and repeated string ip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a bit more complicated than I had remembered. The result of a single CNI add call (corresponding to a single api.Tasks.Network) is a Result struct which contains (among other things) an array of Interfaces (name, sandbox, max) and an array of IPs (4 vs 6, reference to the Interfaces array, network and gateway). There's also routes and dns in there. There's a (brief) example under the "Result" heading in the spec.

There isn't a requirement that len(Interfaces) == 1, i.e. a single CNI add call might produce multiple interfaces. It's also possible that len(Interfaces) < len(IPs), i.e. that a single interface might be given multiple IP addresses. There's also more information here (gateways, routes etc) than I have exposed so far.

I'm not really sure how best to approach this. I'm leaning towards a more hierarchical representation of the raw CNI (in particular dereferencing the interface array index) and leaving out the other fields I mentioned for now (they can be added later easily enough). So something of the form:

message ContainerNetworkStatus {
	// Network is the Network ID
	string network = 1;

	// Error, if any.
	string error = 2;

	message NetworkInterface {
		string name = ??;

		// Each IP is in dotted decimal ("192.0.2.1") or IPv6 ("2001:db8::68") form.
		repeated string ip = ?? [(gogoproto.customname) = "IPs"];
	};

	repeated NetworkInterface interfaces = ??;
}

While I was here I did spot that my construction of an interface name (eth%d) to pass to the CNI add call was unnecessary (and probably wrong). The spec requires the plugin to use that name if it is given, but if not can choose its own name(s), which seems like the correct approach to use (and is reflected in the state above).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal makes sense to me. Again, I'm not really familiar with CNI, but this is probably what I'd come up with if asked to abstractly describe network interfaces and their IPs.

@ijc
Copy link
Contributor Author

ijc commented Jul 25, 2017

Review has been addressed.

api/types.proto Outdated
string error = 4;
}
// Each IP is in dotted decimal ("192.0.2.1") or IPv6 ("2001:db8::68") form.
repeated string ip = 2 [(gogoproto.customname) = "IPs"];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom name isn't consistent with the protobuf name - only one is plural.

I think I'd just go with IP.

Ian Campbell and others added 20 commits August 3, 2017 13:38
Currently the implementation is the existing CNM provider.

Since plugingetter is in effect CNM specific encapsulate it into the CNM
implementation of network.Model instead of passing it around. This means moving
validate driver (which was only used for networking purposes) down into the
interface too. This also allows moves the caller of
allocator.IsBuiltInNetworkDriver to the CNM model, which can therefore call the
cnmallocator directly.

One downside of this approach is that while the plugingetter was almost always
optional the network model is not, meaning one needs to be added to numerous
test cases.

The uses of plugingetter are now confined to the CNM model and allocator code
rather than being exposed to various parts of the manager and node, which seems
cleaner.

Signed-off-by: Ian Campbell <[email protected]>
and move the CNM implementation into the CNM model.

Signed-off-by: Ian Campbell <[email protected]>
Add a single simplistic test case for the allocator, when in CNI we expect no
allocations other than the population of the NetworkAttachment fields.

The allocator is essentially a nop, except for populating n.DriverState.

Select this model when in containerd mode.

Signed-off-by: Ian Campbell <[email protected]>
Ingress networking is (AFAIK) not covered by CNI and would be provider/plugin
specific (or involve chained plugins), so there is no sane configuration we can
inject by default.

Signed-off-by: Ian Campbell <[email protected]>
This synchronises a few other things with containerd's vendor.conf, image-spec
is containerd executor specific. But github.com/opencontainers/runc and
golang.org/x/sys are used elsewhere.

For reference:
opencontainers/runc@b6b70e5...429a538
golang/sys@5eaf0df...fb4cac3

No changes were needed to the non-vendored code.

Signed-off-by: Ian Campbell <[email protected]>
Also add a SetDefaults which avoids updating the spec in the validation
function in the CNI case.

With this all uses of libnetwork are in manager/allocator/cnmallocator and
manager/network/cnm/cnm.go while all uses of CNI are in
manager/network/cni/cni.go and agent/exec/containerd/adapter.go. I'd still like
to abstract the containerd/adaptor.go case further.

Signed-off-by: Ian Campbell <[email protected]>
Although the current code is not wrong it is clearer to use a specific name.

Signed-off-by: Ian Campbell <[email protected]>
Going to put some non-model stuff in here, not worth a second file.

Signed-off-by: Ian Campbell <[email protected]>
So manager/allocator/networkallocator.NetworkAllocator becomes
manager/network.Allocator.

Signed-off-by: Ian Campbell <[email protected]>
This makes manager/allocator/networkallocator obsolete.

Remove the thin wrappers around these functions from allocator

Signed-off-by: Ian Campbell <[email protected]>
Only a small changes to error handling required.

Also bump some dependent vendorings.

golang.org/x/sys update may not be strictly necessary.

Signed-off-by: Ian Campbell <[email protected]>
No meaningful difference to either, but good to be on the real releases!

Signed-off-by: Ian Campbell <[email protected]>
cniNetworkAllocator.Allocate should have ensured there is always a DriverState,
but in case not (e.g. we end up unexpectedly running with a different network
allocator) check here too.

Signed-off-by: Ian Campbell <[email protected]>
Structure as Network->[]Interfaces->[]IPs, which more closely reflects reality
and is also closer to the CNI model.

While reworking this I observed the CNI_IFNAME environment variable (AKA
libcni.RuntimeConf.IfName) is optional, however with at least the weave-net
plugin (and looking at the CNI repo anything based on the skeleton there)
insists on it.

Relatedly I also see that the Result lacks an Interfaces array and that the IP
addresses reference Interface==-1.

Handle this in the code by creating a fake Interface if none is returned (using
the network if name) and using it if the index is out of bounds.

Signed-off-by: Ian Campbell <[email protected]>
The spec should be readonly to swarmkit. Instead push this requirement onto the
user. This is already checked by `cni.ValidateNetworkSpec` but relax this into
just a warning, so admins have the flexibility to have a different CNI name to
swarm name. However log the mismatch and pass a context to
`network.Model.ValidateNetworkSpec()` to facilitate that.

Signed-off-by: Ian Campbell <[email protected]>
@ijc
Copy link
Contributor Author

ijc commented Aug 3, 2017

Force pushed to resolve conflict with 82c36fe.
I'm going to tag on some commits to update to latest containerd alpha too.

@ijc
Copy link
Contributor Author

ijc commented Aug 3, 2017

Ah, I remember why I hadn't done alpha2 (or now alpha3), it's the logrus change.

I'll revisit that aspect once #2334 has landed. I would expect that merging this PR would make #2334 a bit easier since the delta from alpha1 (here) to alpha2,3,etc should be small since this code uses the client library.

I'll do a fake/local rebase of this onto #2334 to check that theory.

/cc @dmcgowan.

@ijc ijc mentioned this pull request Aug 3, 2017
@AkihiroSuda
Copy link
Member

@ijc What's current status and plan?

@ijc
Copy link
Contributor Author

ijc commented Feb 22, 2018

I'm afraid I've moved onto other things and have no current plans to pursue this further.

ijc pushed a commit to ijc/swarmkit that referenced this pull request Mar 23, 2018
It targets an old pre v1.0 alpha version of containerd and is not currently
maintained.

closes moby#2219, closes moby#2258, closes moby#2259 and closes moby#2299.

The vendoring of libtrust did not become obsolete here but vndr noted that it
was unused so I dropped it while I was there.

Signed-off-by: Ian Campbell <[email protected]>
ijc pushed a commit to ijc/swarmkit that referenced this pull request Mar 23, 2018
It targets an old pre v1.0 alpha version of containerd and is not currently
maintained.

closes moby#2219, closes moby#2258, closes moby#2259 and closes moby#2299.

The vendoring of libtrust did not become obsolete here but vndr noted that it
was unused so I dropped it while I was there.

Signed-off-by: Ian Campbell <[email protected]>
@cyli cyli closed this in #2568 Mar 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants