Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to Browser API webRequest #397

Open
vyach-vasiliev opened this issue May 7, 2017 · 74 comments
Open

Access to Browser API webRequest #397

vyach-vasiliev opened this issue May 7, 2017 · 74 comments

Comments

@vyach-vasiliev
Copy link

Is it possible to make access to Browser API from the User-Script code? At least for Chrome-like browser.

Sometimes this really is really not enough and I do not want to write everytime extra extensions for the sake of a couple of lines of code.

Similar to:

    var filter = {urls: ["http://*/*", "https://*/*"], tabId: currentTabId };
    var opt_extraInfoSpec = ['blocking'];
    GM_webRequest.onBeforeRequest.addListener(
        callback, filter, opt_extraInfoSpec);
@tophf
Copy link

tophf commented May 7, 2017

The thing is, blocking code is synchronous so it must be inside the background page of Tampermonkey and you surely understand it would be insane to allow arbitrary userscripts to run there.

A viable solution might be declarative syntax e.g. provide an array of rules that specify what to do.

@tophf
Copy link

tophf commented May 7, 2017

On the other hand, maybe Tampermonkey's sandbox can be applied to userscript code in the background page too...

@vyach-vasiliev
Copy link
Author

vyach-vasiliev commented May 7, 2017

@tophf I understand it and certainly agree with you what is required declarative syntax for rules. If I knew exactly how Tampermonkey works, I would try to implement it.

Agree that this option would give a strong advantage before similar extensions?

@derjanb
Copy link
Member

derjanb commented May 8, 2017

Ok, I've created a preview that shows how GM_webRequest and @webRequest could possibly work.

After downloading and drag and dropping it to the extension page you can install this script:

// ==UserScript==
// @name         GM_webRequest testing
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        *://*/*
// @include      http://new_static.url/*
// @include      *//redirected.to/*
// @grant        GM_webRequest
// @webRequest   [{"selector":"*cancel.me/*","action":"cancel"},{"selector":{"include":"*","exclude":"http://exclude.me/*"},"action":{"redirect":"http://new_static.url"}},{"selector":{"match":"*://match.me/*"},"action":{"redirect":{"from":"([^:]+)://match.me/(.*)","to":"$1://redirected.to/$2"}}}]
// ==/UserScript==

var currently_active_webrequest_rule = JSON.stringify(GM_info.script.webRequest); // == @webRequst header from above

GM_webRequest([
    { selector: '*cancel.me/*', action: 'cancel' },
    { selector: { include: '*', exclude: 'http://exclude.me/*' }, action: { redirect: 'http://new_static.url' } },
    { selector: { match: '*://match.me/*' }, action: { redirect: { from: '([^:]+)://match.me/(.*)',  to: '$1://redirected.to/$2' } } }
], function(info, message, details) {
    console.log(info, message, details);
});

/*
  Notes:
    Action:
      The final redirect: URL needs to be included into the scripts @match or @include header
    Selector:
      If just a string is given, it is interpreted as include: property.
      include:, exclude: and match: properties are supported and can have an array or a string value - the syntax equals @include, @match and @exclude
    @webRequest may have a human readable format in the future, but for now it's the stringified first argument of GM_webRequest and allow request 
      manipulation even if the script didn't run yet. The only downside atm is that you need to re-register the rules (which overrides all previous ones)
      in order to get manipulation events. GM_webRequest(currently_active_webrequest_rule, function...)
*/

This all is not heavily tested so expect bugs here and there. :)

@jspenguin2017
Copy link

@derjanb
Thanks! This is going to help a lot!

If I understood right, the callback function will only receive information about the WebRequest once it is done and is not able to manipulate it.
In this case, I think being able to redirect to a generated string as the response content would be helpful.

I know it is not a good idea to let random function to run in privileged extension context, but there can be times when that would be handy. What is your opinion on this? Should I just create another extension to do this or we can have a special header for it?
The redirection and blocking that the design above allows can mostly be done by uBO, Adguard, or some other ad blockers, what they cannot do is run complex function to make dynamic decision.

Also, as a side thought, shouldn't it be named TM_webRequest? Since this functionality is obviously not in Greasemonkey.

@vyach-vasiliev
Copy link
Author

@derjanb
Wow, thank you for such a prompt reply! ⚡

I try to play with https://ssl.gstatic.com/gb/images/v1_b3735dd8.png, but not one of the rules does not work for me.
What am I doing wrong?

// ==UserScript==
// @name         GM_webRequest testing
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       You
// @match        *://*/*
// @include      *gstatic.com/*
// @include      https://google.com/*
// @include      *//google.com/*
// @grant        GM_webRequest
// @webRequest   [{"selector":"*cancel.me/*","action":"cancel"},{"selector":{"include":"*","exclude":"http://exclude.me/*"},"action":{"redirect":"http://new_static.url"}},{"selector":{"match":"*://match.me/*"},"action":{"redirect":{"from":"([^:]+)://match.me/(.*)","to":"$1://redirected.to/$2"}}}]
// ==/UserScript==

console.log("GM_webRequest start");

var currently_active_webrequest_rule = JSON.stringify(GM_info.script.webRequest); // == @webRequst header from above

GM_webRequest([
    //{ selector: '*gstatic.com/*', action: 'cancel' },
    { selector: { include: '*gstatic.com/*', exclude: 'http://exclude.me/*' }, action: { redirect: 'https://google.com/' } },
    //{ selector: { match: '*://*.gstatic.com*' }, action: { redirect: { from: '([^:]+)://(.+)gstatic.com/(.*)',  to: '$1://google.com/$2' } } }
], function(info, message, details) {
    console.log(info, message, details); // This also does not work
});

@derjanb
Copy link
Member

derjanb commented May 9, 2017

Ah sorry, I forgot to add this information. Since intercepting requests makes things slower Tampermonkey only handles the following request types at the moment: 'sub_frame', 'script', 'xmlhttprequest' and 'websocket'.

All other are considered to be replaceable by a userscript, even after they were loaded, but that's discussable.

@vyach-vasiliev
Copy link
Author

vyach-vasiliev commented May 9, 2017

Hm, can be done something like?

/* here, any calculations of javasscript are not related to the page (setting vars, functions) */
GM_webRequest([
        { selector: '*gstatic.com/*', action: 'replace' },
], function(info, message, details) {
        console.log(info, message, details);
	details.url = details.url.replace('gstatic', 'gstatic-2');
	return details;
});
input  >>> https://gstatic.com/index.php?foo=bar...
output >>> https://gstatic-2.com/index.php?foo=bar...

We change url or body of the request and query data from the site.
With support for GET and POST requests.
Of course, according to the rules, the script code will be executed once for each request.

@derjanb
Copy link
Member

derjanb commented May 10, 2017

We change url or body of the request and query data from the site.

Atm only the URL via:

{ selector: { match: '*://match.me/*' }, action: { redirect: { from: '([^:]+)://match.me/(.*)',  to: '$1://redirected.to/$2' } } }

With support for GET and POST requests.

Shouldn't make a difference.

details.url = details.url.replace('gstatic', 'gstatic-2');

This can't work because all communication is asynchronous. This means when the request is inspected by the background page, then there is no synchronous way to contact the userscript to make a decision. That's why all rules need to be defined when the requests happens.

@vyach-vasiliev
Copy link
Author

@derjanb
You did not understand me. I meant, this scheme:

  1. Script installation
  2. Loading the script into the storage (function => toString)
  3. Reload background page and load Userscripts with GM_webRequest into array at background page.
  4. Start load request (and check rules Userscripts)
  5. Loading a script (string) from the storage(array) and executing into callback of webRequest

So will it work?
I don't understand webRequest's work well, but I'm afraid to give up hope for success 😃

@derjanb
Copy link
Member

derjanb commented May 10, 2017

Loading a script (string) from the storage(array) and executing into callback of webRequest

The background page has all extension permissions. It's not possible to run foreign JavaScript code there without leaking these permissions. Sorry. That's why a defined rule set is needed to tell the background page what needs to be done with a request.

@vyach-vasiliev
Copy link
Author

vyach-vasiliev commented May 10, 2017

@derjanb
Oh, thanks now everything is clear. But the parameters can still be read?
There can be such a parser?

var rules = [ 
        { event: 'request', key: 'url', action: replace: { from: '([^:]+)://match.me/(.*)', to: '$1://redirected.to/$2'}, },
];  // Execution in order
GM_webRequest(rules, function(info, message, events) {
        console.log(info, message, events); // ex. events.request.url or events.response.body
}); // Reload background page and load rules with GM_webRequest into array rules at background page.

Help:

  • actions (action)
    • block
    • replace
  • events (event)
    • request
    • response
  • event keys (key)
    • url
    • header
    • body
  • types request (type, default - all types)
    • xmlhttprequest or fetch
    • window location
  • methods request (method, default - all methods)
    • POST
    • GET
    • etc

UPDATE 03.06.17

@LennyPenny
Copy link

I don't think only intercepting certain request types is a good idea. There might be instances where one wouldn't want some request to even be made in the first place. Also in general replacing things that already have been loaded by something that too has to first be loaded seems pretty slow as well.

How about being able to configure which requests types will be intercepted?

@vyach-vasiliev
Copy link
Author

@LennyPenny You mean, XHR and Location types of query handlers, as well as the query types like POST, GET and others?
Give an example please, if it is not difficult for you 😃

@LennyPenny
Copy link

I'm just responding to @derjanb and this #397 (comment)

@Couchy
Copy link

Couchy commented Jun 6, 2017

Isn't it still the case that the soonest Tampermonkey can run scripts is asynchronously after document-start? If so, then wouldn't it be possible to miss resources (scripts, etc) loaded in the head? Don't get me wrong, I appreciate the effort, but it seems kind of silly to me to have an API that can 1. not dynamically handle requests, 2. can only catch certain resource types, and 3. may or may not actually work.

@derjanb
Copy link
Member

derjanb commented Jun 6, 2017

@Couchy

Isn't it still the case that the soonest Tampermonkey can run scripts is asynchronously after document-start?

That's true. (Even though it's much less likely if the experimental "Inject Mode" is set to "Fast".) That's why you can use @webRequest to allow Tampermonkey to process your rules even though the script doesn't run yet.

can only catch certain resource types

For now. As I said, that's discussable and may be subject to change.

@LennyPenny

How about being able to configure which requests types will be intercepted?

That's doable, but adds a lot of complexity. So I'll implment it if really needed by will try to go without as long as possible.

@jasonkhanlar
Copy link

I was able to use this to cancel a request for a single static javascript file, however refreshing the page repeatedly, both with CTRL+R and also CTRL+SHIFT+R (clear cache), debug console sometimes doesn't show log output from: console.log(info, message, details);

In Chromium, I see the red text line for "GET file.js net::ERR_BLOCKED_BY_CLIENT" but sometimes the console.log text is missing. Why is this?

@derjanb
Copy link
Member

derjanb commented Sep 20, 2017

@jasonkhanlar

Because the script was not injected/executed yet, but Tampermonkey blocked the request based on the @webRequest header. See #211 regarding document-start reliability.

@augustresende
Copy link

@derjanb i need block script loading only from "script" type request, allow loading from XHR. Is not possible to do this?

@derjanb
Copy link
Member

derjanb commented Nov 7, 2017

@augustoresende You can initially block all by using @webRequest and once your script runs reconfigure the rules via GM_webRequest to allow it being downloaded.

@augustresende
Copy link

augustresende commented Nov 7, 2017

@derjanb webRequest is not human readable. You can easily fix this allowing multiple @webRequest, like multiple @include

// @include      *://example1.com/*
// @include      *://example2.com/*
// @include      *://example3.com/*
// @webRequest   [{"selector":"*://example1.com/*","action":"cancel"}]
// @webRequest   [{"selector":"*://example2.com/*","action":"cancel"}]
// @webRequest   [{"selector":"*://example3.com/*","action":"cancel"}]

@augustresende
Copy link

augustresende commented Nov 8, 2017

And... A suggestion:

webRequest Header/Body content intercept and editor

chrome.webRequest.onBeforeSendHeaders.addListener(
  function(details) {
    injectHeader(
      'Referer',
      'https://www.google.com.br/',
      details.requestHeaders
    );

    return {requestHeaders: details.requestHeaders};
  },
  {
    urls: [
      "*://www.example.com/*"
    ],
    types: ["main_frame"]
  },
  ["blocking", "requestHeaders"]
);

GM_xmlhttpRequest already have header request manipulation, is like that:

        GM_xmlhttpRequest({
            method: 'GET',
            url: window.location.href,
            headers: {
                'Referer': 'https://www.google.com.br/'
            },
            anonymous: true,
            onload: function(response) {
            }
        });

like xhook, but xhook only works with XHR and fetch. And... include "main_frame", is not replaceable by a userscript

@augustresende
Copy link

augustresende commented Nov 9, 2017

@derjanb BUG:
webRequest not blocking script if it is in the browser "memory" cache :(
Chrome

@augustresende
Copy link

@derjanb ?

@derjanb
Copy link
Member

derjanb commented Nov 10, 2017

You can easily fix this allowing multiple @webRequest, like multiple @include

Yes, this would be one way to make it better readable.

webRequest Header/Body content intercept and editor

This is AFAIK only possible at Firefox:
https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/webRequest/StreamFilter
https://bugs.chromium.org/p/chromium/issues/detail?id=104058
and therefore not supported by Tampermonkey.
For the moment you can only cancel requests, download the resource via GM_xhr, modify it and then forward it to the page.

webRequest not blocking script if it is in the browser "memory" cache :(

I know. All you get is that it's loaded from cache at the onResponseStarted listener:
https://developer.chrome.com/extensions/webRequest#property-details-fromCache
but it's not possible to block or modify things at this point.

@AugustoResende2
Copy link

header intercept working on chrome:

chrome.webRequest.onBeforeSendHeaders.addListener(
  function(details) {
    injectHeader(
      'Referer',
      'https://www.google.com.br/',
      details.requestHeaders
    );

    return {requestHeaders: details.requestHeaders};
  },
  {
    urls: [
      "*://www.example.com/*"
    ],
    types: ["main_frame"]
  },
  ["blocking", "requestHeaders"]
);

@7nik
Copy link

7nik commented Jul 22, 2020

@youk, as was mentioned above the communication between userscript and background page is asynchronous so there is no way to dynamically analyze the request and then take different actions. You only can define static rules and listeners for rule triggering.

@derjanb, looks like GM_webRequest returns object with abort method that does nothing.

@7nik
Copy link

7nik commented Jul 25, 2020

  1. If the action property isn't provided, error on the background page happens;
  2. If the action is neither cancel nor redirect ({}), no errors but the listener is never called;
  3. If the redirect has both static and dynamic redirections then only static redirect happens. Maybe it'd be better at first try dynamic one and if it fails then redirection to a static link;
  4. details.redirect_url maybe it should be in camel case details.redirectUrl?
// ==UserScript==
// @name         Test GM.webRequest
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  try to take over the world!
// @author       7nik
// @match        https://httpbin.org/*
// @grant        GM.webRequest
// @connect      httpbin.org
// ==/UserScript==

(async () => {
    const rules = [
        // error on the background page
        { selector: "https://httpbin.org/anything?key=1" },
        // works but never calls the listener
        { selector: "https://httpbin.org/anything?key=2", action: {} },
        // always redirects to the static URL
        { selector: "https://httpbin.org/anything?key=3", action: {
            redirect: {
                from: "https://httpbin.org/anything\\?(.*)",
                to: "https://httpbin.org/anything/redirect?mode=dynamic&$1",
                url: "https://httpbin.org/anything/redirect?mode=static",
            },
        } },
    ];

    console.log("start");
    for (let i = 0; i < rules.length; i++) {
        try {
            let [action, message, details] = await new Promise((resolve, reject) => {
                GM.webRequest([rules[i]], (...args) => resolve(args))
                    .then(() => fetch(`https://httpbin.org/anything?key=${i+1}`))
                    .finally(() => setTimeout(reject, 500, "timeout"));
            });
            console.log(i+1, details.description || message, details.redirect_url || details.url);
        } catch (ex) {
            console.log(i+1, ex, rules[i]);
        }
    }
    console.log("done");
})();

@rukletsov
Copy link

I've stumbled upon this ticket while figuring out how to intercept request headers (specifically, auth token) in a user script. I tried a simple match-and-forward rule with GM.webRequest but the listener did not seem to receive request headers; apparently, they are not exposed.

For posterity, the workaround I ended up with relies on overriding window.XMLHttpRequest.prototype.setRequestHeader. The downside of this approach is that it is hard to map headers to request URLs, which can be an issue if the page makes requests with distinct tokens. The script lives here.

@7nik
Copy link

7nik commented Aug 28, 2020

@rukletsov, the GM.webRequest doesn't provide any info about the body or headers of the request. It allows only redirect and cancel requests and get notified about it.

You can use Map/WeakMap to store XHRs and their URLs on open and then retrieve the link in the setRequsetHeader via XHR aka this.

@imrelo
Copy link

imrelo commented Jan 7, 2021

why it just works when i press ctrl + f5?

@7nik
Copy link

7nik commented Jan 7, 2021

@imrelo, we aren't psychics to say what's going on without seeing your code.

@MikeDabrowski
Copy link

MikeDabrowski commented Jan 7, 2021

Can I listen for the request being made and get notified when it is done and also get response data?

#397 (comment) Here it is stated that we can download, modify and forward it. How this forwarding works then?

@7nik
Copy link

7nik commented Jan 7, 2021

@MikeDabrowski
As I said a little above:

It allows only redirect and cancel requests and get notified about it.

Static redirect:

            {
                selector: "https://httpbin.org/anything?redirect=1",
                action: { redirect: "https://httpbin.org/anything/redirect?mode=static1" },
            }

"Dynamic" redirect (made on regexp):

            {
                selector: "https://httpbin.org/anything?redirect=3&key=*",
                action: {
                    redirect: {
                        from: "https://httpbin.org/anything\\?redirect=3&key=(\\d+)",
                        to: "https://httpbin.org/anything/redirect?mode=dynamic&key=$1",
                    },
                },
            },

@MikeDabrowski
Copy link

Ok, I get it now. All I get is that the request was done and I know its url. Then I could do regular xhr to get details and so on. But in my case endpoint response is random and breaks the whole concept :(

@7nik
Copy link

7nik commented Jan 7, 2021

Endpoints cannot be absolutely random. You can use multiple selectors with patterns:
selector: ["https://example.com/api/*", "https://api.example.com/v1/*", "*.example.com/api/*]
Or listen for another endpoint that returns the address of the target endpoint.
And make something similar to #1086 (comment)

@MikeDabrowski
Copy link

MikeDabrowski commented Jan 7, 2021 via email

@tylkomat
Copy link

Is there another version than the preview from 2017? Firefox says it's corrupt and chrome says CRX_HEADER_INVALID.

@7nik
Copy link

7nik commented Aug 30, 2021

@tylkomat, now webRequest is presented in both stable and beta versions.

@tylkomat
Copy link

@7nik thanks for the info. I didn't notice it was already inside.

Canceling requests works. but redirect does not. Must the redirect url be from the same domain then the selector url?

@7nik
Copy link

7nik commented Aug 31, 2021

@tylkomat

The final redirect: URL needs to be included into the scripts @match or @include header

@7nik
Copy link

7nik commented Aug 31, 2021

Here the docs I have for this function:

GM.webRequest(rules, listener)

(Re-)registers rules for web request manipulations and the listener of triggered rules.
If you need to just register rules it's better to use @webRequest header.
Note, webRequest proceeds only requests with types sub_frame, script, xhr and websocket.

Parameters

  • rules - object[], array of rules with following properties:
    • selector - string|object, for which URLs the rule should be triggered, string value is shortening for { include: [selector] }, object properties:
      • include - string|string[], URLs, patterns, and regexpes for rule triggering;
      • match - string|string[], URLs and patterns for rule trigering;
      • exclude - string|string[], URLs, patterns, and regexpes for not triggering the rule;
    • action - string|object, what to do with the request, string value "cancel" is shortening for { cancel: true }, object properties:
      • cancel - boolean, whether to cancel the request;
      • redirect - string|object, redirect to some URL which must be included in any @match or @include header. When a string, redirects to the static URL. If object:
        • from - string, a regexp to extract some parts of the URL, e.g. "([^:]+)://match.me/(.*)";
        • to - string, pattern for substitution, e.g. "$1://redirected.to/$2";
  • listener - function, is called when the rule is triggered, cannot impact on the rule action, arguments:
    • info - string, type of action: "cancel", "redirect";
    • message - string, "ok" or "error";
    • details - object, info about the request and rule:
      • rule - object, the triggered rule;
      • url - string, URL of the request;
      • redirect_url - string, where the request was redirected;
      • description - string, error description.

And here are some of my notes.

@QwertyChouskie-Asurion
Copy link

Ah sorry, I forgot to add this information. Since intercepting requests makes things slower Tampermonkey only handles the following request types at the moment: 'sub_frame', 'script', 'xmlhttprequest' and 'websocket'.

All other are considered to be replaceable by a userscript, even after they were loaded, but that's discussable.

Is this still the case? I need to modify an image that can be created and destroyed many times, and so far the only way I can find to do this is a redirect.

I can do this using a separate browser extension (Request Interceptor), but really need to consolidate all changes I do to this webapp into one script.

@MikeDabrowski
Copy link

I settled on overwriting fetch. I just assigned custom function to fetch, in it I did my intercept and at the end called original fetch. Maybe this will help someone.

@Nate-Wilkins
Copy link

Nate-Wilkins commented May 6, 2022

I can't seem to do this at all in tampermonkey any more.
I has a script that just overrode window.fetch on load but it appears to be reset after my script executes.

Has anyone else figured out a way to do this?

EDIT

I solved it by overriding XMLHttpRequest (what is with this class name??).
You can find the solution on StackOverflow here -> https://stackoverflow.com/questions/629671/how-can-i-intercept-xmlhttprequests-from-a-greasemonkey-script/72137265#72137265

Also if this is considered a "Bug" in tampermonkey please please do not fix it unless there's a workaround.

@anonghuser
Copy link

oh wow how did i not find out about this sooner... can't even count how many times i've wanted it. but to be useful it will need more juice. if not arbitrary code execution, then at least regex search/replace in the response, and be applicable to the main html request too. this way we can really prepend a script tag to the html and solve the "guarantee to run before website scripts" hassles, among other things.

@anonghuser
Copy link

anonghuser commented Nov 20, 2022

btw the api to manipulate most things is blocking but specifically for changing the body it is async, so maybe having user code for that part will be ok?

edit: actually, reading more docs, it can be async for headers too.
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/onBeforeSendHeaders

To modify the headers asynchronously: pass "blocking" in extraInfoSpec, then in your event listener, return a Promise which is resolved with a BlockingResponse.

edit 2: oh, you already knew, but it's ff only. i should've read the thread more carefully. sorry. though personally, this might be a good enough reason to switch to ff. amazing how chrome made no progress in the 5 years this has been open.

@damelco
Copy link

damelco commented Jan 31, 2023

@derjanb
so ... in 2023 whats "versions" of tampermonkey [still] contain this code?

thanks!!

-DJ

@damelco
Copy link

damelco commented Apr 9, 2023

so far, script BLOCKING and REDIRECT seem very promising ...
however sub-frame requests are ignored.
have not tried xhr or websocket yet.

thank you for where you have it so far!
very cool.

-DJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests