Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP: Make proc_open work without requiring an explicit sleep() #951

Closed
adamziel opened this issue Jan 17, 2024 · 5 comments
Closed

PHP: Make proc_open work without requiring an explicit sleep() #951

adamziel opened this issue Jan 17, 2024 · 5 comments
Labels

Comments

@adamziel
Copy link
Collaborator

adamziel commented Jan 17, 2024

Problem

Playground supports proc_open() through a custom function called js_open_process() added in #596. Instead of deferring to OS process opening functions, like PHP does by default, it calls a user-defined callback that spawns a new process and returns a ChildProcess object.

js_open_process then writes any stdin data using cp.stdin.write(dataStr); and captures the output using cp.stdout.on('data', function (data) {. The stdout event listener is asynchronous and may receive data after 1ms or 100ms – we don't really know.

What does native PHP do?

Let's consider the following script:

<?php
$descriptorspec = array(
    0 => array("pipe", "r"),  // stdin
    1 => array("pipe", "w"),  // stdout
    2 => array("pipe", "w")   // stderr
);
$process = proc_open('less', $descriptorspec, $pipes);

if (is_resource($process)) {
    fwrite($pipes[0], "Hello world!\n");
    fclose($pipes[0]);

    echo stream_get_contents($pipes[1]);
    fclose($pipes[1]);

    proc_close($process);
}

It will output "Hello world!\n". However, there are some nuances.

  • stream_get_contents expects the input pipe to close. If we comment fclose($pipes[0]); in that script above, it will hand indefinitely. I'm not sure where that behavior comes from, but both _php_stream_fill_read_buffer and php_stream_eof sound like interesting candidates.
  • fread() reads whatever is available and returns. If we replace stream_get_contents() with fread($pipes[1], 1024); and comment the fclose() call, the script will finish and output Hello world!\n

If I adjust the proc_open() call to be proc_open('sleep 1; less'), then the fread() call takes more than 1 second which tells me something in it is blocking after all.

Similarly:

proc_open('sleep 1; echo "Hi"; sleep 1; echo "There";', /* ...args */);

// This call takes around 1s and outputs "Hi":
echo fread($pipes[1], 5);

// This call takes around 1s and outputs "There":
echo fread($pipes[1], 5);

Which makes it seem like fread waits for any output, regardless of the buffer size. In this case, we could yield back in _php_stream_read() and until the process fd has some data available.

If, however, I call stream_set_blocking($pipes[1], 0);, then the fread() call returns instantly. For plain streams, that call is translated to flags |= O_NONBLOCK; fcntl(fd, F_SETFL, flags). Cool! We're getting somewhere!

Here's some more resources:

https://bugs.php.net/bug.php?id=47918

I have encountered a number of applications that will cause PHP hang on
fread() until the process closes (regardless of whether or not the buffer has filled).
I have to disappoint here, the anonymous pipes are plain file descriptors

Possible Solutions

A proper fix would yield back to the event loop in the same place where PHP waits for data.

Idea 1 – Async-compatible libc implementation

Playground currently patches PHP with custom implementations of functions like select(2) in an attempt to make the synchronous C code work in an asynchronous JavaScript runtime. These patches target PHP whereas in reality their goal is to replace blocking syscalls with async-compatible syscalls. Instead of PHP, we should be patching the syscalls library.

To illustrate the issue, here's the wasm_select implementation:

EMSCRIPTEN_KEEPALIVE int wasm_select(int max_fd, fd_set * read_fds, fd_set * write_fds, fd_set * except_fds, struct timeval * timeouttv) {
	emscripten_sleep(0); // always yield to JS event loop
	int timeoutms = php_tvtoto(timeouttv);
	int n = 0;
	for (int i = 0; i < max_fd; i++)
	{
		if (FD_ISSET(i, read_fds)) {
			n += wasm_poll_socket(i, POLLIN | POLLOUT, timeoutms);
		} else if (FD_ISSET(i, write_fds)) {
			n += wasm_poll_socket(i, POLLOUT, timeoutms);
		}
	}
	return n;
}

If the relevant syscalls knew how to wait for asynchronous events, we wouldn't need that at all.

Here's the Emscripten-provided library_syscall.js file that handles syscalls:

https://github.com/emscripten-core/emscripten/blob/29b0eaacfda55c5c034453a880609cca91f39d8e/src/library_syscall.js

Functions like __syscall__newselect or __syscall_poll could return Asyncify.sleep() whenever required, handle the asynchronous data flow, and call wakeUp() whenever a regular OS would.

Idea 2 – Handle EWOULDBLOCK in _fd_read

I patched _fd_read as follows:

function _fd_read(fd, iov, iovcnt, pnum) {
	try {
		var stream = SYSCALLS.getStreamFromFD(fd);
		//  console.log({ stream });
		var num = doReadv(stream, iov, iovcnt);
		HEAPU32[pnum >> 2] = num;
		return 0;
	} catch (e) {
		console.error(e);
		console.trace();
		if (typeof FS == 'undefined' || !(e.name === 'ErrnoError')) throw e;
		return e.errno;
	}
}

And got an error with errno=6. It seems like in Emscripten that means EWOULDBLOCK, which makes sense. I found this related Emscripten issue:

pipe() doesn't create a blocking pipe

It seems like we could detect that error and return Asyncify.sleep() instead.

@adamziel
Copy link
Collaborator Author

adamziel commented Jan 17, 2024

I investigated a bit more and added console.trace() to each syscall in php_8_2.js and ran the following script:

<?php

// proc_open `less` and pipe some dummy input to it
$descriptorspec = array(
    0 => array("pipe", "r"),  // stdin
    1 => array("pipe", "w"),  // stdout
    2 => array("pipe", "w")   // stderr
);
$process = proc_open('less', $descriptorspec, $pipes);
if (is_resource($process)) {
    echo "Pipe 0";
    fwrite($pipes[0], "Hello world!\n");
    fclose($pipes[0]);

    echo "Pipe 1";
    echo fread($pipes[1], 1024);
    fclose($pipes[1]);

    // echo "Pipe 2";
    // echo stream_get_contents($pipes[2]);
    // fclose($pipes[2]);

    echo "close";
    proc_close($process);
}

What I found is that patching the syscalls library with Asyncify calls wouldn't help at all. It seems like _fd_read and _fd_write are the culprit here. They seem to be Emscripten versions of:

write(2) manual says:

POSIX requires that a read(2) that can be proved to occur after a
write() has returned will return the new data. Note that not all
filesystems are POSIX conforming.

At the same time, Node.js writable docs says that it...

writable.write(chunk[, encoding][, callback])#
Returns: false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true.

I checked, and that write call returns true in our case, though, so it doesn't seem like there's much more we can do here besides somewhat blindly yielding back to the event loop after each _fd_write call. That sounds a bit over the top because:

  • I've seen Emscripten calling write() one byte at a time. I don't know if that happens universally or even often, but that's what I've seen when shimming with proc_open().
  • Asyncifying _fd_write would mean we'd need to add a whole lot of functions to the ASYNCIFY_ONLY list, as write() is ubiquitious and can be called from anywhere. However, we're only really interested here in asyncifying PHP stream handlers.

Perhaps a better choice would be yielding after each php_stream_write call? Which brings us back to square one – patching PHP instead of using an Async-first version of libc .

This was referenced Jan 17, 2024
@adamziel adamziel changed the title PHP: Wrap syscall handlers in Asyncify PHP: Ship Async-compatible libc to make calls like proc_open() just work without wondering where and how to yield back to the event loop Jan 17, 2024
@adamziel adamziel changed the title PHP: Ship Async-compatible libc to make calls like proc_open() just work without wondering where and how to yield back to the event loop PHP: Ship Async-compatible libc to make calls like proc_open() just work Jan 17, 2024
@adamziel adamziel changed the title PHP: Ship Async-compatible libc to make calls like proc_open() just work PHP: Make proc_open work without requiring an explicit sleep() Jan 17, 2024
adamziel added a commit that referenced this issue Jan 17, 2024
Shims read(2) functionallity by providing an alternative read()
function called wasm_read() that gracefully handles blocking
pipes. This is neecessary because Emscripten does not support
blocking pipes and instead returns EWOULDBLOCK.

See:

* #951
* emscripten-core/emscripten#13214

 ## Testing instructions

Confirm the `proc_open()` tests pass on CI. This PR adjusts a few
of them to make sure the output is read without the tricky sleep()
call.
adamziel added a commit that referenced this issue Jan 18, 2024
Shims read(2) functionallity by providing an alternative read()
function called wasm_read() that gracefully handles blocking
pipes. This is neecessary because Emscripten returns EWOULDBLOCK
when reading from a blocking pipe, whereas a clang-compiled 
program would wait until data becomes available.

See:

* #951
* emscripten-core/emscripten#13214

## Other changes

This PR also ships:

* A regexp fix for `@php-wasm/cli` to preserve a trailing whitespace
when rewrite PHP-related spawn shell commands.
* A fix to correctly propagate the child process exit code back to PHP.
The WordPress bootstrap script for PHPUnit needs that to work correctly.

## Motivation

### wp-cli

Without this PR, running `wp-cli.phar` doesn't output anything. The
output is piped through a pager like `less` using `proc_open` and then
displayed by reading from a blocking pipe.

With this PR, running `wp-cli.phar` returns its regular help message as
the pager pipe can be read:

```
NAME
  wp
DESCRIPTION
  Manage WordPress through the command-line.
SYNOPSIS
  wp <command>
SUBCOMMANDS
  cache                 Adds, removes, fetches, and flushes the WP Object Cache
                        object.
  cap                   Adds, removes, and lists capabilities of a user role.
  cli                   Reviews current WP-CLI info, checks for updates, or
                        views defined aliases.
...
```

### PHPUnit

With this PR, Playground can run PHPunit on `wordpress-develop`!

```
TMPDIR=/tmp PHP=8.2 node --loader ../plugins/playground/packages/nx-extensions/src/executors/built-script/loader.mjs ../plugins/playground/dist/packages/php-wasm/cli/main.js ./vendor/bin/phpunit --no-coverage -v
(node:87723) ExperimentalWarning: Custom ESM Loaders is an experimental feature. This feature could change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
Installing...
aaabb
int(0)
Running as single site... To run multisite, use -c tests/phpunit/multisite.xml
Not running ajax tests. To execute these, use --group ajax.
Not running ms-files tests. To execute these, use --group ms-files.
Not running external-http tests. To execute these, use --group external-http.
PHPUnit 9.6.15 by Sebastian Bergmann and contributors.

Runtime:       PHP 8.2.10-dev
Configuration: /Users/cloudnik/www/Automattic/core/wordpress-develop/phpunit.xml.dist
Warning:       Your XML configuration validates against a deprecated schema.
Suggestion:    Migrate your XML configuration using "--migrate-configuration"!

...........................................................    59 / 17622 (  0%)
...........................................................   118 / 17622 (  0%)
............................FFFFFF.........................   177 / 17622 (  1%)
...........................................................   236 / 17622 (  1%)
```

## Testing instructions

Confirm the `proc_open()` tests pass on CI. This PR adjusts a few
of them to make sure the output is read without the tricky sleep()
call.

Related:

* #827
* #930

CC @mho22
@adamziel
Copy link
Collaborator Author

Fixed in #931

@mho22
Copy link
Contributor

mho22 commented Jan 18, 2024

@adamziel I am actually trying php-wasm/cli : 0.5.5 and I have a simple file :

proc.php

<?php

$command = 'ls';

$descriptorspec = [
    0 => ['pipe', 'r'],
    1 => ['pipe', 'w'],
    2 => ['pipe', 'w']
];

$process = proc_open( $command, $descriptorspec, $pipes );

var_dump( proc_get_status( $process ) );

var_dump( stream_get_contents( $pipes[ 1 ] ) );

Here are the results when I use php and php-wasm/cli :

wasm [main] ⚡ php proc.php
array(8) {
  ["command"]=>
  string(2) "ls"
  ["pid"]=>
  int(32507)
  ["running"]=>
  bool(true)
  ["signaled"]=>
  bool(false)
  ["stopped"]=>
  bool(false)
  ["exitcode"]=>
  int(-1)
  ["termsig"]=>
  int(0)
  ["stopsig"]=>
  int(0)
}
string(52) "node_modules
package-lock.json
package.json
proc.php
"
wasm [main] ⚡ node_modules/.bin/cli proc.php
array(8) {
  ["command"]=>
  string(2) "ls"
  ["pid"]=>
  int(4203808)
  ["running"]=>
  bool(false)
  ["signaled"]=>
  bool(false)
  ["stopped"]=>
  bool(false)
  ["exitcode"]=>
  int(-1)
  ["termsig"]=>
  int(0)
  ["stopsig"]=>
  int(0)
}
string(52) "node_modules
package-lock.json
package.json
proc.php
"

First of all, congratulations, there is no more need for sleep(1) in this case!

What I currently see, is the running status as true on php and false on php-wasm. It doesn't look like a problem when you get the return value, but if I call something else like this :

$command = [
    '/usr/local/opt/ffmpeg@4/bin/ffmpeg',
    '-i',
    '/Users/MHO/Work/Sandbox/Web/php/wasm/audio.wav',
    '-acodec',
    'libmp3lame',
    '/Users/MHO/Work/Sandbox/Web/php/wasm/audio.mp3'
];

[ converting a wav file in an mp3 file ]

php does this correctly while php-wasm does not really execute the command.

Any idea where I could find a beginning of a solution to improve this ?

Maybe should I create another issue ?

@adamziel
Copy link
Collaborator Author

Thank you for reporting this! I fixed proc_get_status() in #957. As for the $command issue, do you mean this doesn't work with your other patch from #944?

@mho22
Copy link
Contributor

mho22 commented Jan 19, 2024

@adamziel Yes you're right. I was so into my PR that I forgot I was using the array style to try the command. Your comment made me doubt and it works in a string. Sorry about that... Thank you for your quick answers and don't hesitate to give me some tips when you have time next week about the dynamic pointer generation as mentionned in the last comment of my #944 patch. But I hope the idea I have will fix the PR this week-end!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants