Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inst-xhr/fetch] Network events incorrectly dropped #5314

Open
chancancode opened this issue Jan 10, 2025 · 0 comments
Open

[inst-xhr/fetch] Network events incorrectly dropped #5314

chancancode opened this issue Jan 10, 2025 · 0 comments
Labels
bug Something isn't working pkg:instrumentation-fetch pkg:instrumentation-xml-http-request priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect

Comments

@chancancode
Copy link
Contributor

chancancode commented Jan 10, 2025

What happened?

Steps to Reproduce

  1. Setup the fetch instrumentation
  2. Setup a service worker (such as with conveniences like msw)
  3. Make a fetch() request that is fulfilled by the service worker
  4. Observe that (in Chrome, at least) this request resulted in a ResourceTiming entry similar to this:
{
  "name": "http://localhost:9876/api/status.json",
  "entryType": "resource",
  "startTime": 2171.4000000953674,
  "duration": 2.0999999046325684,
  "initiatorType": "fetch",
  "deliveryType": "cache",
  "nextHopProtocol": "",
  "renderBlockingStatus": "non-blocking",
  "workerStart": 2171.600000143051,
  "redirectStart": 0,
  "redirectEnd": 0,
  "fetchStart": 2171.7000000476837,
  "domainLookupStart": 2171.7000000476837,
  "domainLookupEnd": 2171.7000000476837,
  "connectStart": 2171.7000000476837,
  "secureConnectionStart": 0,
  "connectEnd": 2171.7000000476837,
  "requestStart": 2171.600000143051,
  "responseStart": 2173.4000000953674,
  "firstInterimResponseStart": 2173.4000000953674,
  "responseEnd": 2173.5,
  "transferSize": 0,
  "encodedBodySize": 11,
  "decodedBodySize": 11,
  "responseStatus": 200,
  "serverTiming": []
}

Expected Result

The instrumentation produces a span with the usual network events. At present, that would be these 8 events:

  1. fetchStart
  2. domainLookupStart
  3. domainLookupEnd
  4. connectStart
  5. connectEnd
  6. requestStart
  7. responseStart
  8. responseEnd

(I would advocate that workerStart, if present, should be included as well, but that's a different issue)

Actual Result

The instrumentation produces a span with these 7 events:

  1. fetchStart
  2. domainLookupStart
  3. domainLookupEnd
  4. connectStart
  5. connectEnd
  6. responseStart
  7. responseEnd

Note that requestStart is missing.

Additional Details

This is a direct result of #4486, which is intended to be a fix for #4478:

When browser navigation or resource timing apis don't set a certain timing value or report it as 0 (e.g connectionStart, responseStart,...) the spans generated by instrumentations like like xhr/fetch/resource report a value that is lower than fetchStart ( timeOrigin ) , which can result in wrong metrics by backend.

The wording implies/assumes that:

  1. When certain timing values are unavailable (0), the instrumentation code has a bug that causes nonsensical events to be created
  2. fetchStart can be safely assumed to be the "time origin" of the resource timings

(1) is definitely a bug worth fixing, but (2) didn't turn out to be correct. As shown in the example above, when a worker is involved, requestStart === workerStart < fetchStart (in my version of Chrome at least – but IMO any instances of real world occurrences is sufficient to demonstrate the problem with the current code). Thus, the requestStart network event ends up getting dropped even though it is perfectly valid.

I found this issue with service worker, but I think the assumption is fundamentally flawed here, and there may be other cases where this results in the same bug.

If we want to check against a "time origin" value, startTime would be the correct one to check.

However, we could probably just checking for timingValue === 0 for the purpose of skipping events. 0 is defined in the resource timing spec as a special default value when the information is not available, so it's not like that's a random coincidence.

@dyladan pointed out 0 could be a valid timing for some of these values, so perhaps that's why we didn't go with that approach. However, I'm not sure if that's the case for our special use case here: these values are high-res performance timestamps relative to performance.timeOrigin, which is typically the navigation event for the initial page load. It doesn't seem possible for a fetch/xhr event to happen at exactly the same instant that the page began to load, though admittedly I haven't confirmed that with the spec.

OpenTelemetry Setup Code

No response

package.json

No response

Relevant log output

No response

Operating System and Version

No response

Runtime and Version

No response

@chancancode chancancode added bug Something isn't working triage labels Jan 10, 2025
@pichlermarc pichlermarc added priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect pkg:instrumentation-fetch pkg:instrumentation-xml-http-request and removed triage labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:instrumentation-fetch pkg:instrumentation-xml-http-request priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect
Projects
None yet
Development

No branches or pull requests

2 participants