Improve logging in DeltaScheduler #8909

vladsud · 2022-01-27T22:51:27Z

Forking from #8908 concrete work:

I see 531K "InboundOpsProcessingTime" events in one day.
So yes, DeltaScheduler seems like is being hit a lot.

One thing that I do not like is that we issue event only if we hit 2K idle states.
It feels like once we set a timer in batchEnd(), we have to report an event that it happened, and ideally

how many ops we already processed till that moment?
how many ops are remained in queue?
3.Once done (reached idle), how many ops actually were processed (will be different from # 2, as ops are coming in).

BTW, this.isScheduling seems can be removed. It's always true when we process ops, and always false when we stop processing ops. Given that we always alternate between these states, there is no reason to check the value - we know what the value is

vladsud · 2022-01-27T22:57:41Z

I can't find obvious bug in logic, but data does not match my intuition about how it should work.
Given that graph mentioned in parent bug tracks ack of own ops, there should not be that many ops in between to process (unless we were already holding on ops when sending an op). Might be worth having this data (as boolean) on fluid:telemetry:OpPerf:OpRoundtripTime event.

Do we test that DeltaScheduler does not pause ops if there not many of them / they are processed fast?
Glancing at existing UTs, processOp() always injects 30ms delay per op, which suggest we only test the opposite.

Can we convert existing tests to use mock timer?

Please take a closer look, it would be said if our latencies are affected not because of network / service, but because of silly bugs

vladsud · 2022-01-30T07:09:17Z

Please note that #8912 tracks more generic problem of tracking who pauses inbound queue and for how long

vladsud · 2022-03-30T21:09:07Z

This issue is related to #9505. I'd say this bug is lower priority over removing unneeded pauses in DeltaScheduler.
#8912 should also bring some more clarity on how long it takes for ops to get to processed / if they get stuck in the queue due to pausing.

vladsud · 2022-04-12T13:36:12Z

Closing as fixed by above mentioned PR.

vladsud added bug Something isn't working perf labels Jan 27, 2022

vladsud added this to the February 2022 milestone Jan 27, 2022

vladsud assigned agarwal-navin Jan 27, 2022

anthony-murphy modified the milestones: February 2022, March 2022 Mar 1, 2022

agarwal-navin modified the milestones: March 2022, April 2022 Mar 4, 2022

vladsud assigned vladsud and NicholasCouri and unassigned agarwal-navin and vladsud Mar 30, 2022

vladsud mentioned this issue Mar 30, 2022

Op latency: weird spikes every 2 frames #8908

Closed

vladsud mentioned this issue Apr 8, 2022

DeltaScheduler : Unneeded delays on boot #9505

Closed

NicholasCouri mentioned this issue Apr 8, 2022

Improve logging in DeltaScheduler #9788

Merged

vladsud closed this as completed Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve logging in DeltaScheduler #8909

Improve logging in DeltaScheduler #8909

vladsud commented Jan 27, 2022

vladsud commented Jan 27, 2022

vladsud commented Jan 30, 2022

vladsud commented Mar 30, 2022 •

edited

Loading

vladsud commented Apr 12, 2022

Improve logging in DeltaScheduler #8909

Improve logging in DeltaScheduler #8909

Comments

vladsud commented Jan 27, 2022

vladsud commented Jan 27, 2022

vladsud commented Jan 30, 2022

vladsud commented Mar 30, 2022 • edited Loading

vladsud commented Apr 12, 2022

vladsud commented Mar 30, 2022 •

edited

Loading