-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ath79-generic (e.g. WR1043ND v4) - WLAN Mesh broken when upgrading to v2022.1.x because of timing issue in boot process #2779
Comments
This is a v4, that's been active and updated for the last four years: https://hannover.freifunk.net/karte/#/en/map/8416f99bd2d0 With vH31 it is currently running gluon v2022.1.1 (ab1fb05). Meshing does work and can be seen without problems on it's statuspage: Though I certainly might have overlooked something, I think the migration was fine. |
tom reported on IRC that in darmstadt there was a similar issue with a TL-WR1043N v5: |
i just tested the tag v2022.1 and the issue also happens with the initial release of this branch. @AiyionPrime can you be sure, the linked device was never reconfigured manually? one probably cannot see this from the available data. i'm currently trying to get my hands on a TL-WR1043N v5 to check if i can reproduce it. it would also help to check it on a second TL-WR1043ND v4 - does anyone have it laying around and can test? |
No, I can not. We can send the owner an email and ask though, if that helps. |
Just some ideas. If they don't apply, then pls bear with me :) |
serious question? well ok: no relevant info there as far as i can see.
that i have already done and also written in my bug report, i added "(fresh install)" now as it seems it hasn't been clear enough. |
ok my first suggestions were very basic. 😑😅 You probably figured this out, but maybe you can replicate what they did in Darmstadt and compare configs before and after saving config mode: |
The only commit, that happened after adding the device would be Looking back at the old definition: Maybe I found the issue?
Old address of calibration data New definition based on art partition, but art != eeprom(?)
New address of calibration data Possible fix? |
the only difference in config i could find is as follows:
looking at the code, there may be an issue with "get_wlan_mac" during the upgrade from 2021.1.x and therefore the upgrade script returns before setting up the above section. this correlates with what @Djfe found in OpenWrt. anyone else can follow this argumentation? @blocktrron @NeoRaider @adschm ? |
I feel like we should print an error when there is no wmac to be found (Lines 130-131). Such a log could be useful for adding new devices, too. |
i bought a TL-WR1043N v5 and tested it. |
@Djfe i tested this, it doesn't work and instead soft-bricks the device on upgrade |
after looking a bit into it with the help of rmilecki from openwrt it seems like the issue may not be in the OpenWrt dts ... |
after a discussion in today's Gluon meetup we want to debug the band migration also: |
so after many hours i'm closer to the problem - without a solution. during first boot after upgrade when the upgrade scripts run, in 200-wireless the call to get_htmode fails and therefore no config update (the lines after) is written: the actual path of the phy would be /sys/devices/platform/ahb/18100000.wmac (at least on TL-WR1043ND v4) and later this path is correctly set. seems to me like a timing issue during first boot after the sysupgrade. |
after talking about it with @NeoRaider on IRC we found out that it may be a timing issue. i verified the theory by adding a 5 second delay in one of gluon's first upgrade scripts here: with that sleep-hack, the upgrade works fine! so the already existing hack in OpenWrt seems to be too little: it would be nice to find a solution that doesn't depend on timing but is deterministic... maybe @NeoRaider comes up with an idea, otherwise we might need to add some seconds of sleep in Gluon |
…initialisations workaround for a timing issue during first boot on ath79-generic after sysupgrade from ar71xx-generic image GitHub Issue: #2779
i created a pull request for the workaround: #2792 this issue stays as long as we have no deterministic fix |
wait for device initialisations workaround for a timing issue during first boot on ath79-generic after sysupgrade from ar71xx-generic image GitHub Issue: #2779
wait for device initialisations workaround for a timing issue during first boot on ath79-generic after sysupgrade from ar71xx-generic image GitHub Issue: #2779
removing the issue from the milestones as workarounds have been implemented. |
If I remember correctly, the wifi startup somehow happens asynchronously and you simply cannot depend on it during procd startup. That's why you have to rely on these hotplug.d scripts if you want to configure anything after they have come up. But I might be wrong, it's been a while since I dealt with this stuff. |
wait for device initialisations workaround for a timing issue during first boot on ath79-generic after sysupgrade from ar71xx-generic image GitHub Issue: freifunk-gluon#2779
should we revert this commit now? |
@Djfe I don't think there is anything specific to the update from 2021.1.x to 2022.1.x to this issue, it could easily occur on any upgrade that requires updating the |
@neocturne this could be the fix for our issue as well, no? |
forget the above "fix", because jow wrote on IRC:
maybe someone has an idea how to implement this in order to replace the sleep-Hack |
When upgrading from Gluon v2021.1.x to v2022.1.x wlan mesh doesn't work anymore on a TP-Link TL-WR1043ND v4.
The 802.11s mesh interface is shown in "iwinfo" but not on the status page or "batctl if"
The upgrade process was tested from latest v2021.1.x branch (fresh install) to Gluon v2022.1, v2022.1.1 and v2022.1.2.
The problem does not appear when flashing with "forget settings" and reconfiguring the v2022.1.x firmware from scratch.
The problem does not appear with WR1043ND v2 or v3:
A problem with the migration from ar71xx-generic to ath79-generic may have happened, although @AiyionPrime stated in #2431 that everything was working fine - so maybe the issue was introduced later than v2022.1(.0) ?
The text was updated successfully, but these errors were encountered: