-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allows ZendStr to contain null bytes #202
Conversation
A few off-topic remarks:
WDYT? |
@ptondereau I overlooked some of the |
IMHO, this is a code smell to bring a builder, but I agree that we can have a specific constructor for a not so used context.
It should be done then.
I guess PHP will crash before Rust have the allocation state and this leads to an unresolved error (that I need to create issue) that leak memory of Rust's allocated stuff. I would go for a wrapping with |
81928ad
to
f279440
Compare
f279440
to
a0126cc
Compare
src/types/string.rs
Outdated
pub fn as_str(&self) -> Option<&str> { | ||
self.as_c_str().to_str().ok() | ||
pub fn as_str(&self) -> Result<&str> { | ||
std::str::from_utf8(self.as_bytes()).map_err(|_| Error::InvalidUtf8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wary that we might have a Schlemiel the Painter problem here.
If I remember correctly, there's a flag somewhere at the C-level to indicate that the string has already been utf8-validated (i.e. by the mbstring or pcre extensions), need to investigate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires adding stuff to the ffi bindings though (the gc flags related stuff in zend_types.h
).
I'm going to leave this off for now and do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dors this require mbstring to be installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dors this require mbstring to be installed
No, the flag is used by mbstring
and PCRE
but it is defined in zend_types.h.
See this PR for historical context. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, I think this is a requirement here because It can lead to the same perf regression as in the PHP issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so too! I proposed to do this in another PR to keep this one focused (and maximize the chances to get it merged quickly), but if you think it should be done here I'm OK with that.
Do you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was my intention. Sorry and thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shall be done then! 😉
2d3cf78
to
19c67be
Compare
The idea wasn't to bring a builder, just to remove the impl ZendStr {
pub fn new(value: impl AsRef<[u8]>) -> Self {}
pub fn new_persistent(value: impl AsRef<[u8]>) -> Self {}
pub fn new_interned(value: impl AsRef<[u8]>) -> Self {}
} This is more a DX improvement. For example this would allow removing the But this is not needed for this PR, so let's keep things the way they are for now. |
This requires adding symbols to the ffi bindings (see related constants in zend_string.h). |
727b9d3
to
7dfe8ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #200
Rationale
In PHP zend_strings are binary strings with no encoding information. They can contain any byte at any position.
The current implementation use
CString
to transfer zend_strings between Rust and PHP, which prevents zend_strings containing null-bytes to roundtrip through the ffi layer. Moreover,ZendStr::new()
accepts only a&str
, which is incorrect since a zend_string is not required to be valid UTF8.When reading the PHP source code, it is apparent that most functions marked with
ZEND_API
that accept aconst *char
are convenience wrappers that convert theconst *char
to a zend_string and delegate to another function. For example zend_throw_exception() takes aconst *char message
, and just converts it to a zend_string before delegating to zend_throw_exception_zstr().I kept this PR focused around
ZendStr
and it's usages in the library, but it should be seen as the first step of a more global effort to remove usages ofCString
everywhere possible.Also, I didn't change the return type of the string related methods of
Zval
(e.g. I could have madeZval::set_string()
accept an
impl AsRef<[u8]>
instead of&str
and return()
instead ofResult<()>
). If I get feedback that it should be done in this PR, I'll do it.Summary of the changes:
ZendStr
ZendStr::new()
andZendStr::new_interned()
now accept animpl AsRef<[u8]>
instead of just&str
, and are therefore infaillible (outside of the cases where we panic, e.g. when allocation fails). This is a BC break, but it's impact shouldn't be huge (users will most likely just have to remove a bunch of?
or add a fewOk()
).ZendStr::as_c_str()
now returns aResult<&CStr>
since it can fail on strings containing null bytes.ZensStr::as_str()
now returns aResult<&str>
instead of anOption<&str>
since we have to return an error in case of invalid UTF8.ZendStr::as_bytes()
to return the underlying byte slice.ZendStr::as_ptr()
andZendStr::as_mut_ptr()
to return raw pointers to the zend_string.ZendStr conversion traits
impl AsRef<[u8]> for ZendStr
impl TryFrom<String> for ZBox<ZendStr>
byimpl From<String> for ZBox<ZendStr>
.impl TryFrom<&str> for ZBox<ZendStr>
byimpl From<&str> for ZBox<ZendStr>
.impl From<&ZendStr> for &CStr
byimpl TryFrom<&ZendStr> for &CStr
.Error
Error::InvalidUtf8
used when converting aZendStr
toString
or&str