Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that promote_secondaries is set on IPAssigner interfaces #6898

Merged

Conversation

antoninbas
Copy link
Contributor

The IPAssigner should ensure that the promote_secondaries sysctl variable is always set when creating an interface. This ensures that When the primary IP address is removed from the interface, a secondary IP address is promoted, instead of removing all the corresponding secondary IP addresses. Otherwise, the deletion of one IP address can trigger the automatic removal of all other IP addresses in the same subnet, if the deleted IP happens to be the primary (first one assigned chronologically). For example, this can affect the Egress feature (when EgressSeparateSubnet is used).

Fixes #6890

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. area/transit/egress Issues or PRs related to Egress (SNAT for traffic egressing the cluster). labels Jan 3, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
@antoninbas antoninbas force-pushed the set-promote_secondaries-in-ip-assigner branch from 472aa34 to c46177f Compare January 3, 2025 21:28
// instead of removing all the corresponding secondary IP addresses. Otherwise, the deletion of one IP address
// can trigger the automatic removal of all other IP addresses in the same subnet, if the deleted IP happens to
// be the primary (first one assigned chronologically).
if err := util.EnsurePromoteSecondariesOnInterface(name); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn I added this here to match the call to EnsureRPFilterOnInterface, but it feels like we could move these calls to addVLANAssignee, so that they are called when the antrea-agent restarts. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes more sense to move them to addVLANAssignee, as there is no guarantee EnsureRPFilterOnInterface has executed once when the vlan interface is found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move both calls together in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up PR: #6900

// instead of removing all the corresponding secondary IP addresses. Otherwise, the deletion of one IP address
// can trigger the automatic removal of all other IP addresses in the same subnet, if the deleted IP happens to
// be the primary (first one assigned chronologically).
if err := util.EnsurePromoteSecondariesOnInterface(deviceName); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to set this on the dummy interface IMO, but this is not likely to affect any of our use cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is it because the IPs on the dummy interface have /32 mask, hence not in the same subnet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my thinking, but I have since realized that it is possible to create an ExternalIPPool with a subnetInfo and without a VLAN ID. In this case, the IP will be addressed to the dummy interface (antrea-egress0) with the prefix length specified in the subnetInfo.
For example:

apiVersion: crd.antrea.io/v1beta1
kind: ExternalIPPool
metadata:
  name: eip
spec:
  ipRanges:
  - start: 10.10.0.2
    end: 10.10.0.10
  subnetInfo:
    gateway: 10.10.0.1
    prefixLength: 24
  nodeSelector: {}

After creating a couple of Egresses, the following can be observed on the Node:

5: antrea-egress0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 6e:44:5f:08:02:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.10.0.2/24 brd 10.10.0.255 scope global antrea-egress0
       valid_lft forever preferred_lft forever
    inet 10.10.0.4/24 brd 10.10.0.255 scope global secondary antrea-egress0
       valid_lft forever preferred_lft forever

Prior to the EgressSeparateSubnet feature, all assigned IPs would be /32, so Egress IPs would never be "secondary" aliases and the problem did not exist.

@antoninbas antoninbas requested review from tnqn and xliuxu January 3, 2025 21:28
@antoninbas antoninbas added kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release action/release-note Indicates a PR that should be included in release notes. action/backport Indicates a PR that requires backports. and removed kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release labels Jan 3, 2025
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing it.

// instead of removing all the corresponding secondary IP addresses. Otherwise, the deletion of one IP address
// can trigger the automatic removal of all other IP addresses in the same subnet, if the deleted IP happens to
// be the primary (first one assigned chronologically).
if err := util.EnsurePromoteSecondariesOnInterface(deviceName); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is it because the IPs on the dummy interface have /32 mask, hence not in the same subnet?

// instead of removing all the corresponding secondary IP addresses. Otherwise, the deletion of one IP address
// can trigger the automatic removal of all other IP addresses in the same subnet, if the deleted IP happens to
// be the primary (first one assigned chronologically).
if err := util.EnsurePromoteSecondariesOnInterface(name); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes more sense to move them to addVLANAssignee, as there is no guarantee EnsureRPFilterOnInterface has executed once when the vlan interface is found.

@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas antoninbas merged commit 1ca6f6c into antrea-io:main Jan 6, 2025
59 of 65 checks passed
@antoninbas antoninbas deleted the set-promote_secondaries-in-ip-assigner branch January 6, 2025 20:41
antoninbas added a commit to antoninbas/antrea that referenced this pull request Jan 8, 2025
…ea-io#6898)

The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this pull request Jan 8, 2025
…ea-io#6898)

The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit to antoninbas/antrea that referenced this pull request Jan 8, 2025
…ea-io#6898)

The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes antrea-io#6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this pull request Jan 10, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes #6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this pull request Jan 10, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes #6890

Signed-off-by: Antonin Bas <[email protected]>
antoninbas added a commit that referenced this pull request Jan 10, 2025
The IPAssigner should ensure that the promote_secondaries sysctl
variable is always set when creating an interface. This ensures that
When the primary IP address is removed from the interface, a secondary
IP address is promoted, instead of removing all the corresponding
secondary IP addresses. Otherwise, the deletion of one IP address can
trigger the automatic removal of all other IP addresses in the same
subnet, if the deleted IP happens to be the primary (first one assigned
chronologically). For example, this can affect the Egress feature (when
EgressSeparateSubnet is used).

Fixes #6890

Signed-off-by: Antonin Bas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. area/transit/egress Issues or PRs related to Egress (SNAT for traffic egressing the cluster). kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky integration test case "TestIPAssigner"
2 participants