-
Notifications
You must be signed in to change notification settings - Fork 74.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot subclass dataset_ops.DatasetV2 #61394
Comments
@AyushExel,
Could you please find the explanation about Variant Tensor or DT_Variant in the following doc. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/variant.h#L54
|
@tilakrayal well so subclassing should work on initializing it the way I do in the example right? But that doesn't work. Then how can I subclass Datasetv2 |
Hi, Please find the below implementation of subclassing
|
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you. |
This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further. |
Issue type
Support
Have you reproduced the bug with TensorFlow Nightly?
No
Source
binary
TensorFlow version
2.x
Custom code
Yes
OS platform and distribution
Mac OS 13.0
Mobile device
No response
Python version
3.9
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Hi, I'm from LanceDB team and we're trying to build native support for tf.data. See WIP PR here lancedb/lance#1087 .
Ideally, we'd like to simply subclass
tf.dataset_ops.DatasetV2
so that all the metadata needed to recreate the dataset can be pushed down to our file format that enabled parallelism elegantly.So, it'd be something like this
The above code complains that can not create LanceTfDataset to tf.Tensor/variant.
Issue - what exactly is variant_tensor and how do we go about creating one? I read through the docs but couldn't find anything concrete. There was a mention that variant_tensor is a special tensor that tell about the type of the dataset and that it's equivalent to tf.Variant, but the above code doesn't work.
Having a version of tf.dataset that we can use to capture extra metadata would allow us to improve the interface as well:
so instead of lance.tf.data.from_dataset(uri, columns, filter, batch_size) we can just have from_lance(uri).filter(..).batch_size(...).shuffle().
So what's the way to go about subclassing tf Dataset?
Standalone code to reproduce the issue
Relevant log output
No response
The text was updated successfully, but these errors were encountered: