Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to create VRF-bound BGP instance with different ASN than default, if VRF has an L3VNI #16152

Closed
2 tasks done
toreanderson opened this issue Jun 4, 2024 · 8 comments · Fixed by #16159
Closed
2 tasks done
Labels
triage Needs further investigation

Comments

@toreanderson
Copy link
Contributor

Description

When trying to create a VRF-bound BGP instance that uses a different ASN than the default BGP instance, FRR will refuse to do so with the error message BGP is already running; AS is X, where X is the ASN of the default BGP instance.

This only happens if the VRF has a L3VNI.

Version

FRRouting 9.1 (xps13) on Linux(6.8.9-300.fc40.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--runstatedir=/run' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/libexec/frr' '--sysconfdir=/etc/frr' '--libdir=/usr/lib64/frr' '--libexecdir=/usr/libexec/frr' '--localstatedir=/run/frr' '--enable-multipath=64' '--enable-vtysh=yes' '--disable-ospfclient' '--disable-ospfapi' '--enable-snmp=agentx' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-rtadv' '--disable-exampledir' '--enable-systemd=yes' '--enable-static=no' '--disable-ldpd' '--disable-babeld' '--with-moduledir=/usr/lib64/frr/modules' '--with-crypto=openssl' '--enable-fpm' '--enable-grpc' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'

How to reproduce

Starting with an unconfigured FRR instance already running, issue the following commands:

ip link add up name vrf100 type vrf table 100
ip link add up name br100 master vrf100 type bridge
ip link add up vni10100 type vxlan id 10100
ip link set vni10100 master br100

vtysh <<EOF
configure

vrf vrf100
 vni 10100
exit-vrf

router bgp 50
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit

router bgp 100 vrf vrf100
exit
EOF

Expected behavior

The commands should complete without issue.

(Note that the FRR documentation makes it clear that using different ASNs in different VRFs is supposed to work.)

Actual behavior

The VRF-bound BGP instance is not created. The script fails with the following output:

xps13(config)# router bgp 100 vrf vrf100
BGP is already running; AS is 50

Additional context

If the L3VNI is bound to the VRF after the BGP instance is created, it works. In other words, after changing the script as follows, the configuration successfully loads:

configure

router bgp 50
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit

router bgp 100 vrf vrf100
exit

vrf vrf100
 vni 10100
exit-vrf

However, if this configuration is made persistent with write, the vrf section is located above the router bgp sections in the generated configuration file, causing it to not load correctly when FRR (re)starts.

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@toreanderson toreanderson added the triage Needs further investigation label Jun 4, 2024
@toreanderson
Copy link
Contributor Author

Related #9537?

It seems to like the conditions for that issue is different. In particular, there is no EVPN/L3VNI in that configuration, which is a requirement for the bug to trigger in mine. That said, it could of course be that there is a single root cause at play here that can be triggered in multiple ways.

@toreanderson
Copy link
Contributor Author

Strange. Just double-checking, is the L3VNI visible to FRR for you prior to the attempted creation of the BGP instance?

xps13# show evpn vni 
VNI        Type VxLAN IF              # MACs   # ARPs   # Remote VTEPs  Tenant VRF                           
10100      L3   vni10100              0        0        n/a             vrf100                               
xps13# configure 
xps13(config)# router bgp 100 vrf vrf100
BGP is already running; AS is 50

Not sure if it matters, but I'm running kernel 6.8.9-300.fc40.x86_64 (Fedora 40).

@ton31337
Copy link
Member

ton31337 commented Jun 4, 2024

Yes, pardon, master is affected also. Will check what's going on, and let you know.

@ton31337
Copy link
Member

ton31337 commented Jun 4, 2024

Overall yes, this is technically the same issue as #9537.

TL;DR; When we configure advertise-all-vni (in this case), a new BGP instance is created with the name vrf100, and ASN 50. Next, when we create router bgp 100 vrf vrf100, we look for the BGP instance with the same name and we found it, but ASNs are different 50 vs. 100.

@ton31337
Copy link
Member

ton31337 commented Jun 4, 2024

@toreanderson are you able to test a patch (compile)?

@toreanderson
Copy link
Contributor Author

@toreanderson are you able to test a patch (compile)?

Assuming the build process is relatively straight forward (or well documented if not), certainly.

ton31337 added a commit to opensourcerouting/frr that referenced this issue Jun 4, 2024
Configuration:

```
vtysh <<EOF
configure

vrf vrf100
 vni 10100
exit-vrf

router bgp 50
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit

router bgp 100 vrf vrf100
exit
EOF
```

TL;DR; When we configure `advertise-all-vni` (in this case), a new BGP instance
is created with the name vrf100, and ASN 50. Next, when we create
`router bgp 100 vrf vrf100`, we look for the BGP instance with the same name
and we found it, but ASNs are different 50 vs. 100.

Every such a new auto created instance is flagged with BGP_VRF_AUTO.

After the fix:

```
router bgp 50
 !
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit
!
router bgp 100 vrf vrf100
exit
!
end
donatas.net(config)# router bgp 51
BGP is already running; AS is 50
donatas.net(config)# router bgp 50
donatas.net(config-router)# router bgp 101 vrf vrf100
BGP is already running; AS is 100
donatas.net(config)# router bgp 100 vrf vrf100
donatas.net(config-router)#
```

Fixes: FRRouting#16152
Fixes: FRRouting#9537

Signed-off-by: Donatas Abraitis <[email protected]>
@ton31337
Copy link
Member

ton31337 commented Jun 4, 2024

The patch is here #16159, you could also wait for the artifacts to be compiled and install .deb, .rpm if CI passes of course.

@toreanderson
Copy link
Contributor Author

The patch is here #16159, you could also wait for the artifacts to be compiled and install .deb, .rpm if CI passes of course.

Tested build artifacts on Debian 12:

  • frr_10.1-dev-master-ga24c805-20240604.084942-1~deb12u1_amd64.deb
  • frr_10.1-dev-PR16159-g755fea3-20240604.123658-1~deb12u1_amd64.deb

I can reprocue the issue on the former, but not on the latter. LGTM! 👌

ton31337 added a commit to opensourcerouting/frr that referenced this issue Jun 4, 2024
Configuration:

```
vtysh <<EOF
configure

vrf vrf100
 vni 10100
exit-vrf

router bgp 50
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit

router bgp 100 vrf vrf100
exit
EOF
```

TL;DR; When we configure `advertise-all-vni` (in this case), a new BGP instance
is created with the name vrf100, and ASN 50. Next, when we create
`router bgp 100 vrf vrf100`, we look for the BGP instance with the same name
and we found it, but ASNs are different 50 vs. 100.

Every such a new auto created instance is flagged with BGP_VRF_AUTO.

After the fix:

```
router bgp 50
 !
 address-family l2vpn evpn
  advertise-all-vni
 exit-address-family
exit
!
router bgp 100 vrf vrf100
exit
!
end
donatas.net(config)# router bgp 51
BGP is already running; AS is 50
donatas.net(config)# router bgp 50
donatas.net(config-router)# router bgp 101 vrf vrf100
BGP is already running; AS is 100
donatas.net(config)# router bgp 100 vrf vrf100
donatas.net(config-router)#
```

Fixes: FRRouting#16152
Fixes: FRRouting#9537

Signed-off-by: Donatas Abraitis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants