-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipoe gets completely stuck on ipoe session timeout from one client with connectivity issues #163
Comments
Hello @cygnusb , could you please try to install accel-ppp with debug information and run it in GDB? |
Ok, thanks. I could reproduce this with some use. It seems four threads go into a deadlock
arp.c:73 is following pthread_mutex_lock:
accel-pppd/ctrl/ipoe/ipoe.c:1275 is the last line of this code
|
System is Alma Linux 8 (RHEL 8 clone) with Kernel 4.18.0-348.7.1.el8_5.x86_64 using pull request 159 I have in addition setup the secondary system (has this vlan with weight=0) using Debian 11 to check whether this is related to the OS |
And thread 3:
|
Think this has to due with a patch from us in arp.c not releasing the lock. Sorry to bother and thanks for the debugging hints |
We are currently trying to roll out ipoe instead of pptp (with accel-ppp) setup for our users.
Our ipoe setup gets completely stuck sometimes (mostly within hours) with a handful of first migrated users, due to one client with connectivity issues and ipoe session timeout (due to missing DHCP Request Packet). Setup is using L2 with DHCPv4.
After this we do not see any more log output and accel-ppp is completely stuck. Calling
accel-cmd show sessions
is not producing any output and not returning to terminal. I do see some connectivity issues with this particular client in our switch logs. So as one can also see from the log, the next DHCP request gets lost due to client connectivity outage.I must say that we have some custom changes to our accel-ppp (which I will submit later on as a Pull request), but I am quite sure that these small changes are not the cause of this. These include extracting the DHCP Option 82 Suboption Subscriber-ID including logging that Option, adding a Class-B route (Option 249) and setting a ip-pool and l4-redirect-pool on each interface config. Custom build is based on commit 1b8711c from Tue Oct 5 16:15:43 2021.
IPV6 Prefix/DP come from Radius. Acct-Interim Interval is set by accel-ppp. Attached is our accel-ppp.conf.
I was not really able to reproduce this with test clients cutting the network connection.
My guess is that this is due to some timing issue, maybe possibly a race condition. Any input is highly appreciated.
accel-ppp.conf.txt
The text was updated successfully, but these errors were encountered: