Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

vm module correction #3042

Closed
isaacs opened this issue Mar 31, 2012 · 15 comments
Closed

vm module correction #3042

isaacs opened this issue Mar 31, 2012 · 15 comments
Labels
Milestone

Comments

@isaacs
Copy link

isaacs commented Mar 31, 2012

The vm module has a lot of problems: https://github.com/joyent/node/issues/search?utf8=%E2%9C%93&q=vm

Furthermore, while it's useful in some scenarios, it's not suitable for what I consider a primary use case for a JavaScript virtual machine, what the JavaScript language was literally designed for: building sandboxes to run arbitrary code in a suitably low-risk manner1, and communicate over proscribed channels with this arbitrary code (eg, via the DOM api.)

Requirements:

I consider these the requirements of a correct JavaScript vm implementation. Our current one is broken.

  1. A context should be indistinguishable from the object it's based on.

    Currently:

    > var o = {}; o.o = o; vm.runInContext("this === this.o", vm.createContext(o));
    false
    > var o = vm.createContext({}); o.o = o; vm.runInContext("this === this.o", o);
    false
    

    Both should return true. This makes it unnecessarily difficult to have links to the global context within the context. As far as I can tell, this is the only way:

    > var o = vm.createContext({}); o.o = vm.runInContext("this", o);vm.runInContext("this === this.o", o);
    true
    > o.o === o
    false
    

    However, note that this breaks the link from outside. So, this blocks communication with the child context about any links that should point to the context object.

    This prevents easy simulation of either node.js or DOM environments.

  2. Items added to the context from anywhere, should be immediately visible to all programs that can access that context. Example:

    var o = {foo:{bar:"baz"}}
    var s = "setInterval(function() { console.log(foo.bar) }, 100)"
    setTimeout(function () { o.foo.bar = "bloo" }, 1000)
    vm.runInContext(s, o)

    This should print "baz" a few times, and then start printing "bloo" once it's updated in the parent context.

  3. As mentioned in Fix issues with v8 contexts in vm module #3039, the vm internal architecture has some problems which cause issues which the Locker object exposes when in debug mode. We need a cleaner way to know when to dispose of the context objects.

@brianmcd's contextify module exhibits the correct behavior in the cases that I've tested. However, the contextify API is not ideal: it adds properties onto the object, and requires user interaction to manually dispose of the context, making it not quite as easy to use as it could be. Also, it leaks the contextify methods into the child context:

> var x = contextify({})
undefined
> x.run("typeof getGlobal")
'function'
> x.run("typeof run")
'function'
> x.run("typeof dispose")
'function'
> x.run("dispose()")
undefined
> x.run("42")
Error: Called run() after dispose().

We should try to stay as close as we can to the current JavaScript API, but with correct semantics and resource management. Also, more of the logic should be moved out of C++ and into lib/vm.js.

  1. Creating a new context: var c = vm.createContext({console:console})

  2. Creating a link to the global in the context: c.global = vm.getGlobal(c)

  3. Iterating over keys in a context: for (var i in c) console.log(i); // prints "console" and "global", only

  4. Any changes to the global object in the child context, or to c itself in the parent context, is immediately seen in the other.

  5. Disposing of a context happens automatically when the Context object is gc'ed, but can be called explicitly by vm.dispose(c)

  6. Cyclical links back to the root object should be detected at the outset, and replaced with vm.getGlobal(obj)

  7. Not sure how we'd handle cases where the supplied object is already the instanceof some other class. It would be acceptable for vm.createContext(obj) to throw if Object.getPrototypeOf(obj) !== Object.prototype, and simpler to deal with than trying to paper over the discrepancy. The understanding should be that it will modify the existing object in-place.

  8. Not sure how we'd handle functions that return the context object directly, when called from inside the context. Consider:

    function getExternalGlobal () { return o }
    function getLocalGlobal () { return vm.getGlobal(o) }
    var o = vm.createContext({})
    o.getExternalGlobal = getExternalGlobal
    o.getLocalGlobal = getLocalGlobal
    o.externalGlobal = o
    o.localGlobal = vm.getGlobal(o)
    
    // Would be ideal if these three all returned true
    vm.runInContext("this === getExternalGlobal()", o)
    vm.runInContext("this === getLocalGlobal()", o)
    vm.runInContext("this === externalGlobal", o)
    vm.runInContext("this === localGlobal", o)

There are probably much more I'm not thinking of right now. Any fix for this should come along with a very thorough set of this-assignment edge case tests.

/cc @laverdet @piscisaureus @bnoordhuis @brianmcd @tmpvar

The history of browsers has shown us that, in fact, this is much harder than only removing access to various system APIs, but also usually requires isolating potentially nefarious code even further so that its process can be killed if it behaves badly.

@brianmcd
Copy link

Thanks for the detailed writeup, Isaac. I did some work on this a couple of weeks ago, and made some good progress. It shouldn't be too hard to realize what you're asking for here (I think my current implementation meets most of these requirements). My work is here: https://github.com/brianmcd/node/commits/vm-contexts . I'm a student trying to finish my thesis, so I've just been swamped lately and haven't had time to clean up the code, check all the edge cases, etc. I will dedicate some time to this on Tuesday or Wednesday this week, and see if I can clean up my work and get it into something worth discussing.

@bnoordhuis
Copy link
Member

The history of browsers has shown us that, in fact, this is much harder than only removing access to various system APIs, but also usually requires isolating potentially nefarious code even further so that its process can be killed if it behaves badly.

If security is important, then VM contexts are the wrong approach, it's not what they're designed for. You need process isolation plus something like seccomp.

@isaacs
Copy link
Author

isaacs commented Apr 1, 2012

@bnoordhuis Yes, that is true. But, just because VM contexts are not an entire security solution, doesn't mean that they aren't part of a valid security solution. In these troubled times, when it seems the entire news industry has abandoned the purity of HTML markup, if you want to scrape a lot of websites, you have to evaluate their javascript in order to get the actual content. If the vm module doens't work for jsdom, it doesn't work.

@isaacs
Copy link
Author

isaacs commented Apr 1, 2012

@brianmcd Thanks for the link. If you feel like working on this, of course, it'd be most welcome, but my main goal with this issue is to delineate what needs to be done, not to push too hard for you to do it if you've got other stuff going on. There's not a huge rush to get this finished.

@ssuda
Copy link

ssuda commented Apr 1, 2012

Much of the criticism around node is due to its single threaded nature. Though some problem is solved by "cluster" module, there are many issues around leaks due to gc not getting time to run.
We need to think seriously about multi thread.
When we are thinking different contexts how about "Isolates"?
Can we run gc in separate thread?

@isaacs
Copy link
Author

isaacs commented Apr 1, 2012

Isolates are gone, and there is no plan for them to return. Let's try to keep this issue single-threaded.

@ghost
Copy link

ghost commented Apr 6, 2012

Contextify actually meets most/all of these requirements as is. I was able to meet the above requirements and replicate the vm module (sans Script) using Contextify's .node. The js bindings it has are what introduce the leaking of its functions but that's unnecessary. In this example I've used a WeakMap which ensures proper garbage collection, but it's possible to mostly meet the needs of this using two arrays, or a C++ land makeWeak.

https://gist.github.com/2317834

Caveat: to make vm.dispose work full would require help from C++ land or use of a Proxy. Actually, A lot of these problems become solvable entirely in JS with proxies... All that's need from C++ land is the ability to turn arbitrary objects into the global variable record for a chunk of code. (you can actually do this right now using with and a proxy at a cost of 100x performance).

@ghost
Copy link

ghost commented Apr 6, 2012

@ghost
Copy link

ghost commented Apr 7, 2012

I realized I misread the first requirement and have made an alternate version. In this one the sandboxes global is never actually returned by createContext, just the input sandox (or newly created {}). runInContext is also modified so that it simply takes any object and either creates a context based on it or uses an existing one.

https://gist.github.com/2324128

First off this means vm.dispose(sandbox) works without any help from C++ land. The one reference to the context wrapper with .run is gone. Second, there's no floating references to the context global (or the way to make it) by default.

Mirroring it so that references to the global, such as in vm.runInContext('({ a: this })', sandbox) are translated requires C++ land help (or Proxies). Two or more contexts at once will have references to the containing object so it can't simply be modified in place. It looks like briandmcd's recent work addresses this.

@AndreasMadsen
Copy link
Member

This would properly also fix this issue #706

var vm = require('vm');
var fn = function (check) { return check instanceof RegExp; };
vm.runInNewContext("fn(/hallo/)", {fn: fn});
// false
vm.runInNewContext("/hallo/ instanceof RegExp", {});
// true

@isaacs
Copy link
Author

isaacs commented Dec 30, 2012

Not getting to this in 0.10, unfortunately. Some day :)

@bmeck
Copy link
Member

bmeck commented Jan 20, 2013

@isaacs I would talk to the SES guys, they seem to have this pretty well covered. May need to figure out how to drop the strict mode etc.

@domenic
Copy link

domenic commented Mar 11, 2013

Let's get to this in 0.11!??!! jsdom would love you forever...

@isaacs
Copy link
Author

isaacs commented Mar 12, 2013

@domenic Patches welcome ;)

@domenic
Copy link

domenic commented Mar 12, 2013

Looks like @Benvie had something close to done. @Benvie, want to step in and try to make it a real patch? Otherwise I'll probably try to bumble my way through the C++.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants