Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support spans, metrics and events >1s after Page Load #1554

Open
simonhearne opened this issue Dec 11, 2024 · 1 comment
Open

Better support spans, metrics and events >1s after Page Load #1554

simonhearne opened this issue Dec 11, 2024 · 1 comment

Comments

@simonhearne
Copy link

Problem

The JavaScript agent currently sends data 1,000ms after the Page Load event, in order to capture resources and metrics which change immediately after the event.

There are cases, however, where this delay is not sufficient. For example where a client-side A/B testing framework is used which manipulates the DOM after Page Load, resulting in resources and events (such as Largest Contentful Paint) that occur >1,000ms after the transaction is closed and shipped to APM. Other cases may include slow loading lazy images and third-party content, or complex operations such as flight availability checks or insurance quotes. Further, JavaScript errors which occur after the transaction ends are not collected.

Taking LCP as an example, Google measures LCP as the last candidate before a user interaction. This results in a situation where Elastic APM can report an earlier candidate than other tools (browser developer tools, web-vitals, CrUX) resulting in a discrepancy in aggregate LCP values.

Proposed Solutions

  1. Statically increase PAGE_LOAD_DELAY (5,000ms?) - this will make it less likely that spans, events and metrics which occur shortly after Page Load are missed in the JavaScript agent.
  2. Dynamically increase the delay if there are active spans when the PAGE_LOAD_DELAY duration expires, up to a maximum delay.
  3. Increase data collected in the page_exit beacon (currently used for INP), to update metrics such as LCP and add spans which occurred after the initial transaction ends
  4. Add incremental beacons during the page lifecycle to add spans, update metrics etc.

Workaround

It is possible to modify the agent's behavior with a blocking span, holding the transaction open until a fixed / dynamic threshold is met.
This adds a span to transactions which must ultimately be ignored, and requires implementation effort to determine the correct logic for transaction ending.

window.addEventListener('load', () => {
    const FALLBACK_PAGE_LOAD_DELAY = 5000;
    // retrieve the current Elastic APM transaction (page load)
    const tr = elasticApm.getCurrentTransaction();

    // attach a blocking span to the transaction, preventing the transaction closing until the span ends
    const span = tr.startSpan("delay-span-ignore", "delay", { blocking: true });

    // race a fallback promise with a custom promise which resolves when all dynamic content is complete
    // note `sleep(time)` should be implemented as a promise that resolves with a `setTimeout(time)`
    Promise.race([
        sleep(FALLBACK_PAGE_LOAD_DELAY),
        clientScriptsComplete()
    ]).then(()=>{
        span.end();
    });

    // OR manually end the transaction afer a set time
    // setTimeout(()=>{span.end()},FALLBACK_PAGE_LOAD_DELAY);
});
@lwilkins
Copy link

Thanks Simon for reporting on my behalf. While my particular issue focusses on the LCP Core Web Vital in particular, I appreciate a fix may extend beyond just this metric!

It's very possible that continuing to capture LCP after the first user interaction will result in a different sort of error though. Bigger elements may appear after an interaction. CrUX won't count these, so it would be good to align the Elastic RUM agent with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants