-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early browser API accesses and function calls are missed #77
Comments
Hey @asumansenol , thanks for bringing this up! I observed the same with our API collection integration test -> https://github.com/duckduckgo/tracker-radar-collector/blob/main/tests/integration/apiCollection.test.js . Which is somehow flaky because of this issue. I suspect a race condition between API collection script setting things up (https://github.com/duckduckgo/tracker-radar-collector/blob/main/collectors/APICalls/TrackerTracker.js#L126) and scripts on the page alrady running. This is not a huge issue for DDG use case as everything is ready before 3p request load and execute in most cases, plus we operate on a huge sample of sites, but I can see how this is not precise enough for other use cases. I suspect this is fixable - I'll give it a shot next week and let you know. |
Sorry, still no solution to this. @muodov is updating APICollector for a better attribution (#90), but it doesn't seem to have an effect on this issue. I suspect the solution here is to block scripts from running before all collectors are fully set up. This can be done e.g. via Debugger.pause as soon as page starts loading. |
There seems to be a problem with RequestCollector and latest chromium as well, I'm currently investigating, but don't have a concrete solution yet |
I think this is basically the same problem as described in puppeteer/puppeteer#8507. This was fixed in puppeteer last year, but unfortunately it is incompatible with our current CDP usage, as I mentioned in #84 (comment). We're exploring different options to fix this at the moment. |
Hi!
While running some pilot crawls for our current study, we found that the TRC doesn’t collect function calls or access to properties when the call/access occurs immediately after page load. Perhaps APICallCollector can’t find time to register the breakpoints. To test this issue, we have created two test pages that
toDataURL
method of an HTML5 canvas elementWe’ve visited the test pages using the latest version of TRC without any modification.
npm run crawl -- -u "https://homes.esat.kuleuven.be/~asenol/fp-test-with-timeout/" -o ./data/ -v -f -d 'apis'
npm run crawl -- -u "https://homes.esat.kuleuven.be/~asenol/fp-test-without-timeout/" -o ./data/ -v -f -d 'apis'
I hope this helps. If you need any other info, just let me know.
The text was updated successfully, but these errors were encountered: