Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PlotlyJSONEncoder always casts values to float64 due to using tolist() #3232

Open
nicolaskruchten opened this issue Jun 8, 2021 · 5 comments
Labels
feature something new P3 backlog

Comments

@nicolaskruchten
Copy link
Contributor

Regarding the numpy floating point precision and that PlotlyJSONEncoder always casts those to float64 due to using tolist()...

This had always bugged me, as it resulted in much larger exports (i.e. html / ipynb file sizes) than necessary (when float16 or float32 is sufficient) and affected not only coordinate data, but also marker sizes, meta info, etc.

Just in case the plotly.py devs or others are interested: I had found a way to avoid this number inflation by modifying (& monkey patching) the encode_as_list method:

@staticmethod
def encode_as_list_patch(obj):
    """Attempt to use `tolist` method to convert to normal Python list."""
    if hasattr(obj, "tolist"):

        numpy = get_module("numpy")
        try:
            if isinstance(obj, numpy.ndarray) \
               and obj.dtype == numpy.float32 or obj.dtype == numpy.float16 \
               and obj.flags.contiguous:
                return [float('%s' % x) for x in obj]
        except AttributeError:
            raise NotEncodable

        return obj.tolist()
    else:
        raise NotEncodable

It's about 30-50x slower than .tolist(), but - being in the order of a few μs - still much faster than the json encoding, with the benefit of ~3x smaller exports.

I always wanted to report this, and this PR revived the topic. Could this be relevant for a new issue (especially since orjson will not become the default)?

FYI: for reference, a quick search revealed that a patch of encode_as_list was already suggested before: #1842 (comment), in the context of treating inf & NaN, which got brought up again in #2880 (comment).

Originally posted by @mherrmann3 in #2955 (comment)

@RRiva
Copy link

RRiva commented Sep 28, 2022

Hi @nicolaskruchten I just wanted to thank you for this nice solution 🙂 My animations went from the original 1500 KB to 900 KB in single precision and 700 KB in half precision, without a visible loss in quality. It would be very nice to have this code merged. The best way to do it is of course open for discussion. On the one hand, it is natural to respect the array type, but on the other I wonder how many users will take advantage of it. An alternative is to add a keyword precision to write_html(), and do the casting/rounding internally. What do you think about it?

For future reference, here is how to apply the monkey patch.

import importlib
mod_plty = importlib.import_module('_plotly_utils.utils', 'plotly')

# Code from above.
@staticmethod
def encode_as_list_patch(obj):
    """Attempt to use `tolist` method to convert to normal Python list."""
    if hasattr(obj, "tolist"):
        try:
            if isinstance(obj, np.ndarray) \
               and obj.dtype == np.float32 or obj.dtype == np.float16 \
               and obj.flags.contiguous:
                return [float('%s' % x) for x in obj]
        except AttributeError:
            raise mod_plty.NotEncodable

        return obj.tolist()
    else:
        raise mod_plty.NotEncodable


mod_plty.PlotlyJSONEncoder.encode_as_list = encode_as_list_patch


# Convert the numpy array to single precision.
arr_single = arr.astype(np.float32)

# Or half precision.
arr_half = arr.astype(np.float16)

# Afterwards, call write_html() as always.

@gvwilson
Copy link
Contributor

gvwilson commented Jul 5, 2024

Hi - we are trying to tidy up the stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if it is still a concern, please add a comment letting us know what recent version of our software you've checked it with so that I can reopen it and add it to our backlog. Alternatively, if it's a request for tech support, please post in our community forum. Thank you - @gvwilson

@gvwilson gvwilson closed this as completed Jul 5, 2024
@RRiva
Copy link

RRiva commented Jul 8, 2024

Hi @gvwilson, if I understand well, the float precision is handled correctly by selecting the orjson engine. Unfortunately, only plotly.io.write_json() accepts the engine argument, while plotly.io.write_html() doesn't. How can I specify this engine when writing a html file?

Thanks 🙂

@gvwilson
Copy link
Contributor

gvwilson commented Jul 8, 2024

Hi @RRiva - I don't have an answer for you right now, but I'll reopen this and add it to our backlog and try to find one for you. Cheers - @gvwilson

@RRiva
Copy link

RRiva commented Jul 8, 2024

Thanks so much 😄

@gvwilson gvwilson added the P3 backlog label Aug 12, 2024
@gvwilson gvwilson changed the title Regarding the numpy floating point precision and that PlotlyJSONEncoder always casts those to float64 due to using tolist()... PlotlyJSONEncoder always casts values to float64 due to using tolist() Aug 12, 2024
@gvwilson gvwilson added the feature something new label Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P3 backlog
Projects
None yet
Development

No branches or pull requests

4 participants