Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main/backup: changing switching rules #1857

Merged

Conversation

maxsharabayko
Copy link
Collaborator

@maxsharabayko maxsharabayko commented Mar 11, 2021

Changing main backup switching logic (WIP).

Changes

  • Member sockets being in a Fresh-Activated state are not used to silence other active links until they become stable. Fixes Do not silence parallel links if there are no stable links #1793.
  • sendBackup_RetryWaitBlocked was not waiting for active links, but was waiting to activate backup links. It conflicts with the logic of the sending function, as all backup activation is handled previously in sendBackup_TryActivateIdleLink. Now waits for active links. Concern: if it happens (should not happen) that there is one idle (stand by) link in the group that was not activated previously, it will continuously report read-readiness, resulting in a CPU-intensive write-readiness checking loop.
  • Moved some logic into a newly created sendBackup_SendOverActive function.
  • Unstable link may become stable again if it continuously gets a response from the receiver (not on a single response).
  • An unstable link that remains unstable for quite some time can be closed if there is at least one stable link in a group (?).
  • If a backup link was activated and qualified as stable, and it has the same or a higher weight as the main link (now unstable), silence the unstable link without closing it (perform a switch).

@maxsharabayko maxsharabayko added Type: Enhancement Indicates new feature requests [core] Area: Changes in SRT library core labels Mar 11, 2021
@maxsharabayko maxsharabayko added this to the v1.4.4 milestone Mar 11, 2021
@maxsharabayko maxsharabayko self-assigned this Mar 11, 2021
@lgtm-com
Copy link
Contributor

lgtm-com bot commented Mar 11, 2021

This pull request introduces 1 alert when merging 2bdb324 into a1bcd4a - view on LGTM.com

new alerts:

  • 1 for Declaration hides parameter

@maxsharabayko maxsharabayko force-pushed the develop/main-backup-switch branch from 89cb4f5 to 55736ca Compare March 29, 2021 11:29
@maxsharabayko
Copy link
Collaborator Author

Additional metrics checked

  • Sender packet drops – not very persistent and receiver is likely to drop first.
  • Sender-side pkt loss (based on NAK reports) percentage. Also varying. Hard to tell which level to use.
  • Packet drop reported by the receiver in extended ACK (PR [core] WIP: Extending ACK with RCV drop total #1889).
    • Probing for stability for n x latency.
    • Easy to implement.
    • Can’t tell which packets were dropped – no drop stats for the whole group.

Experiment

  • RTT 100ms, SRT latency 300ms.
  • 30% loss on path weight 10.
  • 4xlatency for probing period (to transition from unstable -> stable).
  • Closing a member socket which remains unstable for SRTO_PEERIDLETMO
  • New receiver drop total in ACK -> qualify as unstable.

image

Problem

  • Breaking an unstable link from the group does not invoke a connection callback. This logic needs to be somehow implemented, and it is hitting the group-member synchronization difficulties (mutex lock order).

@maxsharabayko
Copy link
Collaborator Author

Member ordering rule by priority

  1. Higher weight (highest priority)
  2. By backup state (if equal weight):
    1. ACTIVE_STABLE
    2. ACTIVE_FRESH
    3. ACTIVE_UNSTABLE_WARY
    4. ACTIVE_UNSTABLE
    5. BROKEN
    6. STANDBY (IDLE)
    7. PENDING
  3. By Socket ID (if equal weight and state): lower value first.

maxsharabayko and others added 3 commits May 5, 2021 16:19
Adding sendBackup_sendOverActive

Still working on sendBackup_SendOverActive

sendBackup_RetryWaitBlocked now waits for active links only

Fixed parameter hiding

Adding SendBackupCtx and updating active member states

Passing SendBackupCtx to sendBackup_SendOverActive.
TODO: iterate over active members

sendBackup_SendOverActive now using SendBackupCtx

WIP. Partially removed old lists of members

Proceeding with the transition.
Added sendBackup_CheckUnstableSockets.
Modified sendBackup_CloseBrokenSockets, etc.

Updated sendBackup_SilenceRedundantLinks

Testable version
Finished RetryWaitBlocked and silenceRedundantLinks

Fixed some mistakes

Added stability trace and fixed some issues

Fixed AssignBackupState, removed some unused funcs

Now using SockerData* instead of gli_t iterator

Unstable if sndDrop or loss above 5 pct

Improved EWMA loss percent (placed in ACK and NAK)

Experimental: receiver drop reports

snd rcv drop

Fixed ACK rcv drop

Wary probing for 2 x latency

Ignore drops in activation

Don't silence unstable if no stable members

Improved member ordering: compare status value

Removed extra stability metrics.
RCV DROP in ACK, loss percent EWMA.

Removed some unused functions
Removed unused sendBackup_CheckRunningLinkStable

New connection error: broken as unstable
@maxsharabayko maxsharabayko marked this pull request as ready for review May 10, 2021 12:27
@maxsharabayko maxsharabayko merged commit 2559797 into Haivision:master May 10, 2021
@maxsharabayko maxsharabayko deleted the develop/main-backup-switch branch May 10, 2021 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not silence parallel links if there are no stable links
1 participant