Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New string parsing is very gas heavy #16

Closed
ethanfrey opened this issue May 7, 2020 · 4 comments
Closed

New string parsing is very gas heavy #16

ethanfrey opened this issue May 7, 2020 · 4 comments

Comments

@ethanfrey
Copy link
Member

Just adding the checks for special characters increases gas costs for cosmwasm contracts about 30% (see comments on CosmWasm/cosmwasm#314)

It should be possible to reduce cost to around the same when no special chars are involved. Start with an optimistic parser and revert to current safe parser if needed.

Optimistic check:

From begining of string, go through all chars and increment until we hit a \ or ". If \, then we start your safe parsing. If we hit " first, we found the end and can just copy that range into a String. The fast case should be around the same cost as the previous action.. maybe a bit more for checking for two chars at each step rather than one, but probably less than now.

Optimistic encoding:

If all chars are in the range 0x20-0x7f and exclude \ and ", we can do the straight copy. With any other char, we can use the safe method.

@webmaster128
Copy link
Member

webmaster128 commented May 8, 2020

There are a bunch of optimizations in serde_json that we can look into. One is for example the parse_str return type

pub enum Reference<'b, 'c, T>
where
    T: ?Sized + 'static,
{
    Borrowed(&'b T),
    Copied(&'c T),
}

that avoids copies as long as no unescaping is required. This is then visited like

        match tri!(self.de.read.parse_str(&mut self.de.scratch)) {
            Reference::Borrowed(s) => visitor.visit_borrowed_str(s),
            Reference::Copied(s) => visitor.visit_str(s),
        }

In order to optiize this, it would be very good to know where out visitor is implemented and how calls to visit_string, visit_str and visit_borrowed_str can be utilized. @ethanfrey do you have a clue where that visitor is? Is this auto-derived code we never see?

And there a few things that json-serde-core established, which are not necessarily what we want, like:

// NOTE(serialize_*signed) This is basically the numtoa implementation minus the lookup tables,
// which take 200+ bytes of ROM / Flash
macro_rules! serialize_unsigned {
    ($self:ident, $N:expr, $v:expr) => {{
        let mut buf = [0u8; $N];

@ethanfrey
Copy link
Member Author

Okay, I was thinking of trivial optimizations to get back where we were before the changes.

Before we touch these deeper optimizations, I would like to take the time to decide if we stick with JSON into the 1.0 release or use a different codec (not worth putting much more time into JSON if we toss it out by 1.0 anyway)

@webmaster128
Copy link
Member

Okay, I was thinking of trivial optimizations to get back where we were before the changes.

This is an impossible goal. JSON serialization/deserialization is expensive and there is no way around it. E.g. the whole idea of zero-copy string deserialization in serde-json-core does not respect the nature of JSON.

Even serde_json makes make questionable API design decisions, where they allow deserializing into &str (pointing to JSON source), which works for some input data but then suddenly break at runtime for other input data.

@maurolacy
Copy link

Closed by #25.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants