Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZendStr should accept strings containing null bytes. #200

Closed
ju1ius opened this issue Nov 26, 2022 · 1 comment · Fixed by #202
Closed

ZendStr should accept strings containing null bytes. #200

ju1ius opened this issue Nov 26, 2022 · 1 comment · Fixed by #202

Comments

@ju1ius
Copy link
Contributor

ju1ius commented Nov 26, 2022

Hi,

PHP strings are allowed to contain null bytes, but the current implementation of ZendStr doesn't allow it.

#[php_function]
pub fn class_name_of(obj: &ZendObject) -> String {
    obj.get_class_name().unwrap()
}
<?php

var_dump(class_name_of(new class {}));

Expected result:

string(%d) "class@anonymous/path/to/file.php:3$0"
// or, with XDebug enabled:
string(%d) "class@anonymous\000/path/to/file.php:3$0"

Actual result:

PHP Fatal error: Uncaught Exception: String given contains NUL-bytes which cannot be present in a C string. [...]
@ju1ius
Copy link
Contributor Author

ju1ius commented Nov 26, 2022

In ZendStr::new() the &str argument is first converted to a CString before being passed to zend_string_init, but this is incorrect. zend_strings must be able to contain null bytes.

References:

@ju1ius ju1ius changed the title Uncaught Exception: String given contains NUL-bytes which cannot be present in a C string ZendStr should accept string containing null bytes. Nov 26, 2022
@ju1ius ju1ius changed the title ZendStr should accept string containing null bytes. ZendStr should accept strings containing null bytes. Nov 26, 2022
ju1ius added a commit to ju1ius/ext-php-rs that referenced this issue Nov 26, 2022
ju1ius added a commit to ju1ius/ext-php-rs that referenced this issue Nov 26, 2022
ju1ius added a commit to ju1ius/ext-php-rs that referenced this issue Nov 26, 2022
ju1ius added a commit to ju1ius/ext-php-rs that referenced this issue Nov 26, 2022
ju1ius added a commit to ju1ius/ext-php-rs that referenced this issue Nov 27, 2022
ju1ius added a commit to ju1ius/ext-php-rs that referenced this issue Nov 27, 2022
ju1ius added a commit to ju1ius/ext-php-rs that referenced this issue Nov 28, 2022
ptondereau pushed a commit that referenced this issue Dec 9, 2022
Closes #200

## Rationale
In PHP zend_strings are binary strings with no encoding information. They can contain any byte at any position.
The current implementation use `CString` to transfer zend_strings between Rust and PHP, which prevents zend_strings containing null-bytes to roundtrip through the ffi layer. Moreover, `ZendStr::new()` accepts only a `&str`, which is incorrect since a zend_string is not required to be valid UTF8.

When reading the PHP source code, it is apparent that  most functions marked with `ZEND_API` that accept a `const *char` are convenience wrappers that convert the `const *char` to a zend_string and delegate to another function. For example [zend_throw_exception()](https://github.com/php/php-src/blob/eb83e0206c9b9261d786943bf2c5ad61dca287e2/Zend/zend_exceptions.c#L823) takes a `const *char message`, and just converts it to a zend_string before delegating to [zend_throw_exception_zstr()](https://github.com/php/php-src/blob/eb83e0206c9b9261d786943bf2c5ad61dca287e2/Zend/zend_exceptions.c#L795).

I kept this PR focused around `ZendStr` and it's usages in the library, but it should be seen as the first step of a more global effort to remove usages of `CString` everywhere possible.

Also, I didn't change the return type of the string related methods of `Zval` (e.g. I could have made `Zval::set_string()` 
 accept an `impl AsRef<[u8]>` instead of `&str` and return `()` instead of `Result<()>`). If I get feedback that it should be done in this PR, I'll do it.

## Summary of the changes:
### ZendStr
* [BC break]: `ZendStr::new()` and `ZendStr::new_interned()` now accept an `impl AsRef<[u8]>` instead of just `&str`, and are therefore infaillible (outside of the cases where we panic, e.g. when allocation fails). This is a BC break, but it's impact shouldn't be huge (users will most likely just have to remove a bunch of `?` or add a few `Ok()`).
* [BC break]: Conversely, `ZendStr::as_c_str()` now returns a `Result<&CStr>` since it can fail on strings containing null bytes.
* [BC break]: `ZensStr::as_str()` now returns a `Result<&str>` instead of an `Option<&str>` since we have to return an error in case of invalid UTF8.
* adds method `ZendStr::as_bytes()` to return the underlying byte slice.
* adds convenience methods `ZendStr::as_ptr()` and `ZendStr::as_mut_ptr()` to return raw pointers to the zend_string.

 ### ZendStr conversion traits
* adds `impl AsRef<[u8]> for ZendStr`
* [BC break]: replaces `impl TryFrom<String> for ZBox<ZendStr>` by `impl From<String> for ZBox<ZendStr>`.
* [BC break]: replaces `impl TryFrom<&str> for ZBox<ZendStr>` by `impl From<&str> for ZBox<ZendStr>`.
* [BC break]: replaces `impl From<&ZendStr> for &CStr` by `impl TryFrom<&ZendStr> for &CStr`.

### Error
* adds new enum member `Error::InvalidUtf8` used when converting a `ZendStr` to `String` or `&str`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant