-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic output types #60
Comments
Hi, I wanted to give this library a spin, but this is blocking me, and also the fact that Input types can have the same problems. The same goes for the output, different shapes and different data types (returning new clues to be used for the cache + some logits or other heads). I'm not sure if it's currently feasible on the input side. I've seen https://docs.rs/ndarray/0.14.0/ndarray/type.IxDyn.html which could be useful for the dimension, but does not seem to solve the type i32 vs f32. Thanks for this lib ! |
Another very common use case are indices alongside float values (e.g. torch.topk). When Are you still interested working on #65 @marshallpierce? |
Sadly I don't have time. I hope someone picks it up though. |
Is this something that would work: #69 ? |
This PR is about inputs. If by something you mean the reverse, like a generic It's probably a good idea to study the tch-rs API as well. |
That is what I meant. A bit more context on the PR, I was attempting to remove python entirely for inference on GPT2+ ORT quantized (in a webserver context). The end result is that we were still winning something like 2x (I remember 3x, but I don't want to boast something wrong, I don't really remember) over something relatively naïve with Python webserver + GPT2 + ORT quantized. |
In practice I'm encountering models with different types on different outputs. As an example of the problem, a trivial TensorFlow model that takes string input and returns the unique elements of the input tensor produces this ONNX structure:
The two outputs are of different types, so the current type structure for retrieving output that assumes one type for all outputs won't work.
One way to go about it would be to add a trait like the equivalent on the input side:
We could provide implementations of that trait for all the common types that map to ONNX types, as on the input side. In the String case the data would have to be copied, as far as I can tell. Output could be in the form of some new dynamic tensor type that exposes, for each output, the
TensorElementDataType
so that the user can then use an appropriate type with anOwnedTensorDataToType
that matches.The text was updated successfully, but these errors were encountered: