[inst-xhr/fetch] Network events incorrectly dropped #5314
Labels
bug
Something isn't working
pkg:instrumentation-fetch
pkg:instrumentation-xml-http-request
priority:p2
Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect
What happened?
Steps to Reproduce
fetch()
request that is fulfilled by the service workerExpected Result
The instrumentation produces a span with the usual network events. At present, that would be these 8 events:
fetchStart
domainLookupStart
domainLookupEnd
connectStart
connectEnd
requestStart
responseStart
responseEnd
(I would advocate that
workerStart
, if present, should be included as well, but that's a different issue)Actual Result
The instrumentation produces a span with these 7 events:
fetchStart
domainLookupStart
domainLookupEnd
connectStart
connectEnd
responseStart
responseEnd
Note that
requestStart
is missing.Additional Details
This is a direct result of #4486, which is intended to be a fix for #4478:
The wording implies/assumes that:
fetchStart
can be safely assumed to be the "time origin" of the resource timings(1) is definitely a bug worth fixing, but (2) didn't turn out to be correct. As shown in the example above, when a worker is involved,
requestStart
===workerStart
<fetchStart
(in my version of Chrome at least – but IMO any instances of real world occurrences is sufficient to demonstrate the problem with the current code). Thus, therequestStart
network event ends up getting dropped even though it is perfectly valid.I found this issue with service worker, but I think the assumption is fundamentally flawed here, and there may be other cases where this results in the same bug.
If we want to check against a "time origin" value,
startTime
would be the correct one to check.However, we could probably just checking for
timingValue === 0
for the purpose of skipping events. 0 is defined in the resource timing spec as a special default value when the information is not available, so it's not like that's a random coincidence.@dyladan pointed out 0 could be a valid timing for some of these values, so perhaps that's why we didn't go with that approach. However, I'm not sure if that's the case for our special use case here: these values are high-res performance timestamps relative to
performance.timeOrigin
, which is typically the navigation event for the initial page load. It doesn't seem possible for a fetch/xhr event to happen at exactly the same instant that the page began to load, though admittedly I haven't confirmed that with the spec.OpenTelemetry Setup Code
No response
package.json
No response
Relevant log output
No response
Operating System and Version
No response
Runtime and Version
No response
The text was updated successfully, but these errors were encountered: