Dynamic output types #60

marshallpierce · 2021-02-22T17:17:27Z

In practice I'm encountering models with different types on different outputs. As an example of the problem, a trivial TensorFlow model that takes string input and returns the unique elements of the input tensor produces this ONNX structure:

Inputs:
  0:
    name = input_1:0
    type = String
    dimensions = [None]
Outputs:
  0:
    name = Identity:0
    type = Int32
    dimensions = [None]
  1:
    name = Identity_1:0
    type = String
    dimensions = [None]

The two outputs are of different types, so the current type structure for retrieving output that assumes one type for all outputs won't work.

One way to go about it would be to add a trait like the equivalent on the input side:

pub trait `: Sized {
    /// The tensor element type that this type can extract from
    fn tensor_element_data_type() -> TensorElementDataType;

    /// Extract an `ArrayView` from the ort-owned tensor.
    fn extract_array<'a, D: ndarray::Dimension>(
        shape: D,
        tensor: *mut sys::OrtValue,
    ) -> Result<ndarray::ArrayView<'a, Self, D>>;
}

We could provide implementations of that trait for all the common types that map to ONNX types, as on the input side. In the String case the data would have to be copied, as far as I can tell. Output could be in the form of some new dynamic tensor type that exposes, for each output, the TensorElementDataType so that the user can then use an appropriate type with an OwnedTensorDataToType that matches.

The text was updated successfully, but these errors were encountered:

Narsil · 2021-03-09T07:53:57Z

Hi,

I wanted to give this library a spin, but this is blocking me, and also the fact that Input types can have the same problems.
In NLP, some graphs will consume both Ids and masks (which are integers) AND some cached attention layers, which do NOT have the same dimension AND are floats.

The same goes for the output, different shapes and different data types (returning new clues to be used for the cache + some logits or other heads).

I'm not sure if it's currently feasible on the input side. I've seen https://docs.rs/ndarray/0.14.0/ndarray/type.IxDyn.html which could be useful for the dimension, but does not seem to solve the type i32 vs f32.

Thanks for this lib !

ahirner · 2022-08-28T20:11:10Z

Another very common use case are indices alongside float values (e.g. torch.topk). When int64 is inferred as f32, all values turn out to be nigh zero floats. Thus, casting them back won't work either.

Are you still interested working on #65 @marshallpierce?

marshallpierce · 2022-08-28T20:52:49Z

Sadly I don't have time. I hope someone picks it up though.

Narsil · 2022-08-29T16:29:01Z

Is this something that would work: #69 ?

ahirner · 2022-08-29T18:19:25Z

Is this something that would work: #69 ?

This PR is about inputs. If by something you mean the reverse, like a generic .next(), I think this could be an interesting solution. I didn't quite get the gist of #65 yet. I had tuples with finite amount of lengths in mind. To use such feature, the caller has to know their target types anyway.

It's probably a good idea to study the tch-rs API as well.

Narsil · 2022-08-30T13:50:52Z

tch-rs is using Tensor as the backbone type, which is completely opaque to the rust typing system so no issues doing Vec<Tensor> of different types and shapes.

If by something you mean the reverse, like a generic .next(),

That is what I meant.
If you could have a fully staticly typed call signature with all necessary tensors + types that would be ideal (100% typed + named tensors so that callers cannot miscall the forward pass). However I think this is not really realistic is it ? I mean onnx uses the session objects, and unwrapping the signature at compile time is not really feasible right ? (At least that's why I went with .next() in my tentative PR).

A bit more context on the PR, I was attempting to remove python entirely for inference on GPT2+ ORT quantized (in a webserver context). The end result is that we were still winning something like 2x (I remember 3x, but I don't want to boast something wrong, I don't really remember) over something relatively naïve with Python webserver + GPT2 + ORT quantized.

nbigaouette added enhancement New feature or request help wanted Extra attention is needed labels Feb 24, 2021

marshallpierce mentioned this issue Mar 9, 2021

Use DynOrtTensor for model output tensors #65

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic output types #60

Dynamic output types #60

marshallpierce commented Feb 22, 2021

Narsil commented Mar 9, 2021

ahirner commented Aug 28, 2022

marshallpierce commented Aug 28, 2022

Narsil commented Aug 29, 2022

ahirner commented Aug 29, 2022

Narsil commented Aug 30, 2022

Dynamic output types #60

Dynamic output types #60

Comments

marshallpierce commented Feb 22, 2021

Narsil commented Mar 9, 2021

ahirner commented Aug 28, 2022

marshallpierce commented Aug 28, 2022

Narsil commented Aug 29, 2022

ahirner commented Aug 29, 2022

Narsil commented Aug 30, 2022