Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement API for Deserializer in-band Signalling #1463

Open
mitsuhiko opened this issue Jan 25, 2019 · 3 comments
Open

Replacement API for Deserializer in-band Signalling #1463

mitsuhiko opened this issue Jan 25, 2019 · 3 comments

Comments

@mitsuhiko
Copy link
Contributor

There are quite a few features currently in serde implementations where in-band signalling is used to communicate data from the deserializer into deserialize. This is turning into quite a big problem over time because it destroys the interoperability of serde in various ways.

A good example is the arbitrary_precision flag in serde_json. Since serde has no notion of a large integer the deserializer will deserialize numbers internally by emitting an object with a special $serde_json::private::Number key which is understood by serde_json itself. This however means that if someone tries to use this with other value types (like many libraries have) it will fail.

We use two value types in our own code: we have a fork of serde's internal Content type as well as a variation of serde_json's Value type that carries additional annotations. Currently if we turn on arbitrary_precision as feature we break everything because now any number is emitted as $serde_json::private::Number which none of our deserializers support. However we also feel very uncomfortable turning on this feature because even if we fix our own stuff is not now has a chance to break more.

I understand that a correct fix for this is tricky but I do wonder if it makes sense to start extending the underlying serde type system to carry some extension data around because I think this is going to eventually bite us much harder. I'm not entirely what the best solution is but I think extending the list of serde types to carry some internal extension value is most likely going to play a role in it. Similarly to how u128 was retrofitted we could add a new Signal value that formats can add extra information to.

Still not sure how to best make interoperability work best. Most likely deserializers should be encouraged to invoke some sort of default behavior when they find something like that.

@mitsuhiko
Copy link
Contributor Author

I keep running into this issue and I wonder if there is not something that can be done here. Maybe all that's needed is to add a marker type into the serde data model that can only be passed around but is never serialized / deserialized directly.

@cmpute
Copy link

cmpute commented Nov 17, 2022

I vote for this change. I just notice that there is no way to support serialize and deserialize big integers in serde_yaml and serde_toml. This has been the biggest complaint I have with the serde framework.

@melody-rs
Copy link

Using min_specialization, it's possible to extend serde's visitor trait! Unfortunately it's locked behind nightly.

#![feature(min_specialization)]

use serde;

use serde::de::Error as SerdeError;
use serde::de::{Unexpected, Visitor, Deserialize, Deserializer};
use serde::forward_to_deserialize_any;

#[derive(Debug)]
struct CustomError(String);

impl SerdeError for CustomError {
    fn custom<T>(msg: T) -> Self
    where
        T: std::fmt::Display,
    {
        Self(msg.to_string())
    }
}

impl std::fmt::Display for CustomError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_str(&self.0)
    }
}

impl std::error::Error for CustomError {}

struct CustomDeserializer;

impl<'de> Deserializer<'de> for CustomDeserializer {
    type Error = CustomError;
    
    fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error> 
    where V: VisitorExt<'de>
    {
        visitor.visit_custom("hello world!", &[1, 2, 3, 4])
    }
    
    forward_to_deserialize_any! {
        bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
        bytes byte_buf option unit unit_struct newtype_struct seq tuple
        tuple_struct map struct enum identifier ignored_any
    }
}

pub trait VisitorExt<'de>: Visitor<'de> {
    fn visit_custom<E>(self, class: &'de str, data: &'de [u8]) -> Result<Self::Value, E>
    where
        E: SerdeError;
}

impl<'de, T> VisitorExt<'de> for T
where
    T: Visitor<'de>,
{
    default fn visit_custom<E>(self, _class: &'de str, _data: &'de [u8]) -> Result<Self::Value, E>
    where
        E: SerdeError
    {
        Err(SerdeError::invalid_type(
            Unexpected::Other("custom_type"),
            &self,
        ))
    }
}

#[derive(Debug)]
#[allow(dead_code)]
struct CustomDeserialize {
    class: String,
    data: Vec<u8>
}

impl<'de> Deserialize<'de> for CustomDeserialize {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error> 
    where D: Deserializer<'de>
    {
        struct CustomDeserializeVisitor;
        
        impl<'de> Visitor<'de> for CustomDeserializeVisitor {
            type Value = CustomDeserialize;
            
            fn expecting(&self, fmt: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
                fmt.write_str("custom")
            }
        }
        
        impl<'de> VisitorExt<'de> for CustomDeserializeVisitor {
            fn visit_custom<E>(self, class: &'de str, data: &'de [u8]) -> Result<Self::Value, E>
            where
            E: SerdeError {
                Ok(CustomDeserialize {
                    class: class.to_string(),
                    data: data.to_vec()
                })
            }
        }
    
        deserializer.deserialize_any(CustomDeserializeVisitor)
    }
}

fn main() {
    let custom = CustomDeserialize::deserialize(CustomDeserializer).unwrap();
    
    println!("{custom:#?}")
}

When min_specialization is stabilized it might be a good fix to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants