Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4815: DRA Partitionable Devices #4874

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

klueska
Copy link
Contributor

@klueska klueska commented Sep 25, 2024

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 25, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 25, 2024
@pohly
Copy link
Contributor

pohly commented Sep 25, 2024

/cc

@k8s-ci-robot k8s-ci-robot requested a review from pohly September 25, 2024 15:44
@klueska klueska force-pushed the dra-partitonable-devices branch from c9c762f to a845fd3 Compare September 26, 2024 14:08
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 26, 2024
@klueska
Copy link
Contributor Author

klueska commented Sep 26, 2024

I haven't added any text yet (i plan to do that over the next few days), but I wanted to at least get the proposed API changes along with an example pushed so that those that are familiar with what we want to do here can start taking a look.

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look over this. If I understand correctly, you are allowing the creation of "abstract" devices which are then used as sort of "mixins" in the concrete devices via the "extends" attribute. This allows us to capture various aspects of the general shape of devices, and mix-and-match those aspects in the concrete devices.

The first ~1150 lines of the yaml capture those abstract devices, and the rest capture the many concrete instances across 4 GPUs. Do you have a "back-of-the-envelope" size calculation for the YAML similar to what we did here?

I don't love the "abstract" device thing. It's confusing, IMO. I would prefer a separate data type for the "shapes" rather than re-using "device". What would be the implications of that?

keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
@klueska klueska force-pushed the dra-partitonable-devices branch from a845fd3 to 9adb6cd Compare September 26, 2024 15:08
@klueska
Copy link
Contributor Author

klueska commented Sep 26, 2024

I took a quick look over this. If I understand correctly, you are allowing the creation of "abstract" devices which are then used as sort of "mixins" in the concrete devices via the "extends" attribute. This allows us to capture various aspects of the general shape of devices, and mix-and-match those aspects in the concrete devices.

Yes. That's the idea.

The first ~1150 lines of the yaml capture those abstract devices, and the rest capture the many concrete instances across 4 GPUs.

It's actually 8 GPUs.

Do you have a "back-of-the-envelope" size calculation for the YAML similar to what we did here?

The file size of the example I included here is 48K. I wil

I don't love the "abstract" device thing. It's confusing, IMO. I would prefer a separate data type for the "shapes" rather than re-using "device". What would be the implications of that?

It shouldn't matter since it is a requirement that all devices included in "extends" be abstract. I actually find it fairly intuitive to categorize them as devices (but mark them as abstract) since they can (and by definition do) include all of the same fields as a device. That said, I'd also be fine moving them under DeviceExtensions as a new DeviceShape or DeviceTemplate type.

@klueska klueska force-pushed the dra-partitonable-devices branch 2 times, most recently from a514f0b to 7ec6e85 Compare September 27, 2024 09:19
@klueska
Copy link
Contributor Author

klueska commented Sep 27, 2024

Based on @johnbelamaric's comments I have updated this to include the explicit notion of a mixin. I am happy with the change.

Text coming throughout the day and over the weekend...

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the "mixin" a lot more, it's easier to understand the relationships between things. I do wonder though if this is just a new DeviceType rather than an extension of BasicDevice.

keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
- name: mig-1g.5gb
- name: memory-slices-0
consumesCapacityFrom:
- name: gpu-0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already said that you want to change it, so perhaps this will become clear soon, but let me call it out explicitly here: it's not obvious to me what this gpu-0-mig-1g.5gb-0 device consumes from gpu-0. Presumably any capacity defined for the device (whether it's in the device itself or in a mixin) gets consumed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as it's defined right now, all capacity must be consumed from somewhere if this field is populated. There is no way to list capacities particular to this device that aren't consumed from elsewhere.

That said, there is (technically) nothing preventing you from listing yourself as a source here. That way you would pull all resources available on GPU 0 from there first, falling back to yourself for those that are only specific to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, there is (technically) nothing preventing you from listing yourself as a source here.

Hmm, isn't consumption said to be recursive? Referencing yourself would lead to infinite recursion.

Copy link
Contributor

@pohly pohly Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which might be an important point: allowing a driver to publish something that can lead to infinite recursion should be prevented by validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Includes[]DeviceMixinRef

// ConsumesCapacityFrom defines the set of devices where any capacity
// consumed by this device should be pulled from. This applies recursively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "pulled from" implies that there must be a device that can be pulled from. That device itself gets listed and thus can get allocated.

Suppose we have a flat list of identical devices which have some conflicting capacity requirements. We would need to define a fictional additional device that provides this scarce capacity. To Kubernetes, that device looks like something that it can allocate and a request for a "real" device will match it (same driver name, for example).

This is not insurmountable: the fictional device could have "fictional: true" attribute and the DeviceClass selector could exclude devices where that value is true. I just wanted to call this out. Might be worth adding to the KEP as an example.

Copy link
Contributor Author

@klueska klueska Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is one limitation of this approach (which I thought of but didn't call out explicitly here).

My original proposal (from many months ago) allowed for a separate place to list "shared" capacities that weren't attached to any particular device. Individual devices could then pull rom this list of shared capacities rather than pulling from another device explicitly.

So in the case of GPUs with MIG, we would have one shared capacity per full GPU from which both full GPUs and MIG devices would pull.

@thockin wasn't a fan of this approach though as he thought it seemed too abstract to have a "bag" of resources just declared in a resource slice but not explicitly tied to any particular physical device. At this point, I'm honestly OK with either approach, it's more about what is the common case, and I think needing to pull from some abstract source is actually the outlier, so it's fine to optimize for the more intuitive case.

Regarding having a fictional: true attribute. The first version of this KEP had something similar with an explicit field of abstract: true (not an attribute but an actual top-level field in the Device struct). Having an attribute is just as good (maybe better) though, so long as your DeviceClass includes the ability to filter out these types of devices from actually being allocated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A smaller API is better, so I prefer the approach with the attribute. We just need to explain it because it wouldn't be obvious to driver authors how to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to @thockin's comment here:
#4874 (comment)

@klueska klueska force-pushed the dra-partitonable-devices branch from 7ec6e85 to 841efd5 Compare October 1, 2024 08:22
@klueska klueska force-pushed the dra-partitonable-devices branch 3 times, most recently from a60a3e2 to b9f4313 Compare October 1, 2024 17:55
@klueska klueska force-pushed the dra-partitonable-devices branch from 0d59a82 to 4c41571 Compare October 2, 2024 11:48
@klueska klueska force-pushed the dra-partitonable-devices branch from 45cd82a to 8a6096e Compare October 9, 2024 12:42
@mrunalp
Copy link
Contributor

mrunalp commented Oct 9, 2024

/approve
Leaving finalizing the shape of API and lgtm for API/scheduling.

@klueska klueska force-pushed the dra-partitonable-devices branch 2 times, most recently from 16dc163 to 9ed882e Compare October 9, 2024 13:36
@johnbelamaric
Copy link
Member

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2024
@klueska klueska force-pushed the dra-partitonable-devices branch from 9ed882e to 6ed63f0 Compare October 9, 2024 15:41
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
from SIG Scheduling

My only concern is from the observability PoV. The question "why did my pod not schedule" is now significantly harder to answer.
Some information about the usage of a ResourceSlice could be helpful, akin to the output of kubectl describe node:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests         Limits
  --------           --------         ------
  cpu                284m (30%)       2045m (217%)
  memory             539316480 (18%)  4007175680 (136%)
  ephemeral-storage  0 (0%)           0 (0%)
  hugepages-1Gi      0 (0%)           0 (0%)
  hugepages-2Mi      0 (0%)           0 (0%)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, johnbelamaric, klueska, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrunalp
Copy link
Contributor

mrunalp commented Oct 9, 2024

Some information about the usage of a ResourceSlice could be helpful, akin to the output of kubectl describe node:

Maybe some summary with capacity per device and consumed / available will be useful 👍

@klueska
Copy link
Contributor Author

klueska commented Oct 9, 2024

Agreed, though that is out of scope for this KEP

@kannon92
Copy link
Contributor

kannon92 commented Oct 9, 2024

From what I understand, this information would be available via a ResourceClaim. I don't think we expect users to look into ResourceSlices for scheduling decisions.

@alculquicondor
Copy link
Member

Agreed, though that is out of scope for this KEP

Not necessarily. It could be left for beta, though.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some nits, almost done.

keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
//
// +optional
// +oneOf=deviceMixinType
Composite *CompositeDeviceMixin `json:"composite,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other mixins do we expect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure -- @johnbelamaric had the same question. I'm fine removing this level of indirection if that is the consensus.

It just seemed symmetrical with how we treated devices, where a CompositeDevice can only pull mixins of type CompositeDeviceMixin. In the future, AdvancedDevice would only be allowed to pull mixins of type AdvancedDeviceMixin (which may actually be identical to CompositeDeviceMixin, but doesn't have to be).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Let's keep it. As it's not really a user-facing API (not written by a user, only read for debugging) the extra indirection doesn't hurt that much.

/lgtm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also skeptical that we need to extend here, we can look at it in code review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to making it flat. It makes it asymmetrical with what we did for devices, but maybe that's OK.

keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
keps/sig-node/4815-dra-partitionable-devices/README.md Outdated Show resolved Hide resolved
@klueska klueska force-pushed the dra-partitonable-devices branch from 6ed63f0 to 53e1584 Compare October 10, 2024 11:27
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit 18c41ea into kubernetes:master Oct 10, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 10, 2024
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John pointed out that some of my comments were made elsewhere, so I apologize - the thread is super long and I thought it made sense to send these sooner rather than later.

I don't hate this. I actualy find the ideas rather elegant, but there are some things that nag at me, like some semi-unstated property that this asserts which may not be true, but I can't actively disprove.

It assumes that every resource is allocated hierarchically and that every level of that hierarchy is itself allocatable. Again, I can't find an example where it's not true, but it feels like an example MUST exist.

I am OK to pursue this KEP, so the fact that it merged already is OK.

matchAttribute: "gpu.nvidia.com/parentUUID"
```

This would result in the following set of non-overlapping partitions:
Copy link
Member

@thockin thockin Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is interesting - it highlights that either users need to be careful to specify things or the specific-device selection logic needs to be smart about seemingly-equivalent choices.

The table below works, but it could just as easily have chosen:

+------------+------------+------------+------------+------------+------------+------------+------------+
| MemSlice 0 | MemSlice 1 | MemSlice 2 | MemSlice 3 | MemSlice 4 | Memslice 5 | MemSlice 6 | MemSlice 7 |
+------------+------------+------------+------------+------------+------------+------------+------------+
|                      3g.20gb                      |                         X                         |
+------------+------------+------------+------------+------------+------------+------------+------------+
|            X            |            X            |         2g.10gb         |            X            |
+------------+------------+------------+------------+------------+------------+------------+------------+
|      X     |      X     |     X      |     X      |     X      |     X      |   1g.5gb   |    XXX     |   <--- Bad!  The last 1g.5gb can't be allocated
+------------+------------+------------+------------+------------+------------+------------+------------+

Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler might attempt to do this, but in doing so it would not be able to allocate all requests. I would then need to backtrack and try a different combination. Specific logic for doing this with MIG devices can be seen here: https://github.com/NVIDIA/mig-parted/blob/main/pkg/mig/config/config.go#L125. We would need to generalize this for the abstractions being introduced here.

//
// +optional
// +oneOf=deviceMixinType
Composite *CompositeDeviceMixin `json:"composite,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also skeptical that we need to extend here, we can look at it in code review

//
// +optional
// +oneOf=deviceType
Composite *CompositeDevice `json:"composite,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: I am not in love with "compsite" here - it sounds like "compound" which we used (informally) to describe a different idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I originally called it PartitionableDevice but that put the emphasis on the the consumeCapacityFrom field and ignored the new mixin field.

I'd love it if we could just add both of these new fields to the existing BasicDevice type (since they are strictly extensions), but as @pohly keeps pointing out (because I keep forgetting) we can't extend the API in this way because it breaks downgrades. An old scheduler will not recognize the new fields and attempt to allocate a device based solely on the fields that it DOES know about. For mixins / attributes this is not too bad because selection would simply fail if the scheduler was not able to see the additional attributes that a mixin provides. Not honoring consumesCapacityFrom properly would be unacceptable though.

//
// The propertes of each included mixin are applied to this device in
// order. Conflicting properties from multiple mixins are taken from the
// last mixin listed that contains them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we validate that they don't collide? Or is there a useful reason to allow it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we restrict mixins to th scope of a ResourceSlice rather than the scope of an entire pool(which may span multiple resource slices, they yes, I think we can do this validation. In conversations with @pohly I think we want to make this restriction anyway to make validation easier in general. It means that some mixins may be duplicated across multiple resource slices, but I think that is probably OK for the sake of validation.

// The maximum number of attributes and capacities combined is 32.
//
// +optional
Capacity map[QualifiedName]DeviceCapacity `json:"capacity,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put the ConsumesCapacityFrom inside DeviceCapacity? or in a parallel map?

quantity: 40Gi
devices:
- name: gpu-0
composite:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't really even need the composite indirection? What if we just extended basic ? As we move beyond alpha, we can consider whether we even REALLY need basic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this goes back to the comment above...

I'd love it if we could just add both of these new fields to the existing BasicDevice type (since they are strictly extensions), but as @pohly keeps pointing out (because I keep forgetting) we can't extend the API in this way because it breaks downgrades. An old scheduler will not recognize the new fields and attempt to allocate a device based solely on the fields that it DOES know about. For mixins / attributes this is not too bad because selection would simply fail if the scheduler was not able to see the additional attributes that a mixin provides. Not honoring consumesCapacityFrom properly would be unacceptable though.

string: GPU-662077db-fa3f-0d8f-9502-21ab0ef058a2
```

As you can see, three mixins are created called "system-attributes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we talked about per-slice attribs thsat would be "inherited". This is similar but more explicit (both good and bad aspects). If we know EVERY device will have common things, my name them in each Device? It's sort of noisy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to see an example where mixins were not identical in every device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the real-world MIG use case (later in the KEP), only one mixin (with 3 attributes) is shared by all of the devices in the slice. The vast majority of mixins are only referenced by a subset of devices.

In this example, five devices are defined: a full GPU called "gpu-0" and four
partitions of "gpu-0" called "gpu-0-partion-0", "gpu-0-partion-1",
"gpu-0-partion-2", and "gpu-0-partion-3" respectively. The full GPU has 40Gi of
memory available, and each of the partitions pull 10Gi from it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to validate that the sum of 1st level children is less than the parent? E.g. Could I define gpu0 with 40Gi and 4 children each with 20Gi ? That could be an error (e.g. cut-paste) or it could be that they have some resources "dedicated" to them that are not in the parent or it could some sort of fungible pool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the children declare a capacity "foobar" which is not named in the parent? Is that an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for top-level resources from which children can consume but are not themselves allocatable? e.g. This card has 8 frobnicators and each child device needs 1 or 2, but there's no allocatable device with 8?

Copy link
Contributor Author

@klueska klueska Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember -- no one is defining these resource slices by hand (the driver publishes these), so the likelihood of a cut-paste-error is minimal. I don't think its worth trying to define semantics that would enable us to recognize that "gpu0 with 40Gi and 4 children each with 20Gi" is an invalid configuration. Driver implementors should be responsible for this themselves.

What if the children declare a capacity "foobar" which is not named in the parent? Is that an error?

This would be hard to validate if the parent and child are in different resource slices. I think we can (and should) do our best to validate it in cases where they are in the same slice, but not error out if they are not.

Is there a need for top-level resources from which children can consume but are not themselves allocatable? e.g. This card has 8 frobnicators and each child device needs 1 or 2, but there's no allocatable device with 8?

Yes, there are use cases for this. In fact, this is the exact use case @byako pointed out for SR-IOV VFs. You can achieve this in the current design by defining your DeviceClass such that the top level device is never selectable, but its a bit back-handed since this is not encoded anywhere in the ResourceSlice API itself.

We toyed with the idea of adding an extra field to a device to accommodate this case (e.g. abstract: true or allocatable: false), but never made it part of the design. I think the conclusion was that we could rely on DeviceClasses for now and think more deeply about this later.

Happy to continue discussing during implementation though.

quantity: "1"
memory:
quantity: 40Gi
memorySlice0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this aspect. At some point we talked about "named mutexes" or something, so this would looke more like:

    attributes:
      ...

    capacity:
      memory:
        quantity: 40Gi

    consumes:
      - memorySlice1

I guess it's somewhat minor -- it has to be listed SOMEWHERE. It just feels
weird to be in capacity, as if someone would want to write a claim against
it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- but this is where we are at the moment with the design.

I had also talked about introducing an intRange abstraction at one point so that one could directly write:

consumes:
  memorySlices: "0-3"

but that had it's own problems.

string: NVIDIA A100-SXM4-40GB
type:
string: mig
- name: memory-slices-0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next ~100 lines of YAML really suggest to me that we're doing somethign wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this would be unnecessary if we had an intRange or a set abstraction to work with. It only looks as nasty as it does because we need some way of encoding an explicit "set" of resources, but all we have at our disposable is the ability to define a "quantity" of resources.

For example, this mixin:

- name: memory-slices-0-3
  composite:
    capacity:
      memorySlice0:
        quantity: "1"
      memorySlice1:
        quantity: "1"
      memorySlice2:
        quantity: "1"
      memorySlice3:
        quantity: "1"

would turn into a specific capacity entry on the device itself of something like:

capacity:
  memorySlices: [0,1,2,3] // if we supported sets

or

capacity:
  memorySlices: "0-3" // if we supported intRanges

There would be no need for this long list of mixins defining valid memorySlice combinations at all.

Given that this is what we have to work with for now, I think it's acceptable, but we should look into writing a future KEP that introduces set and/or intRange as a usable abstraction here.

@johnbelamaric
Copy link
Member

Thanks Tim. @klueska, @catblade I read Tim's input as saying he is OK with abstract pools :). And figuring out how to define local capacities above and beyond pooled capacities.

One other thing he and I discussed is the idea of adding back in one level of inheritance. I didn't like the original proposal because the complex tree was hard to parse in your head. But we probably could allow the user to define the mixins, and ALSO then to use those mixins to define shapes. Both of those are abstract, just like today. A shape is just a named collection of mixins. But it's not a tree. Then, an individual device has a shape, plus locally defined attributes and capacities (no mixins directly in the devices, I think?).

Our device list then becomes more compact, you don't have to repeat the mixins over and over. But it's just one level deep, not N levels deep. WDYT?

@thockin
Copy link
Member

thockin commented Oct 11, 2024

I read Tim's input as saying he is OK with abstract pools

I would want to see what exactly that means before signing off on it :)

One other thing he and I discussed is the idea of adding back in one level of inheritance

What I meant was more like this:

  • Each Slice can define attributes - all devices in the slice implicitly get those attributes
  • Each slice can define mixins - some devices in the slice explicitly get those attributes via includes
  • Each device can define attributes unique to that device

@johnbelamaric
Copy link
Member

I think a "shape" concept is useful, it avoids the repeated includes for every device of the same shape.

@thockin
Copy link
Member

thockin commented Oct 11, 2024

Too many levels of indirection will make things harder, not easier. I am somewhat skeptical that we need need both shapes and mixins

@catblade
Copy link

I'm thinking of this in terms of set notation. Trying to sort out what that looks like. IE item A is part of set C, but set C also includes D, E, and F. I want to be able to ask for a C, but I also want to be able to ask for an A if I absolutely need an A. I'll look at this later next week and see if I can make sense of it. What number of levels of abstraction can I get?

@klueska
Copy link
Contributor Author

klueska commented Oct 24, 2024

@thockin

What I meant was more like this:

  • Each Slice can define attributes - all devices in the slice implicitly get those attributes
  • Each slice can define mixins - some devices in the slice explicitly get those attributes via includes
  • Each device can define attributes unique to that device

In my experience there are very few attributes that actually get referenced by all devices, so I' don't think introducing a "top-level" attributes concept is worth it. The other bullet points are already supported in the current design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.