You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a datumaro dataset that has nested items, files often have paths such as mydir/file1.jpg. Is there a way to flatten it using datumaro ? I would like to iterate over each item, move the item.media.path to the root (possibly check if there's already a file of that name), update the item.id then re-export. Is there a way to do this ?
The text was updated successfully, but these errors were encountered:
Hi @CourchesneA, thank you for your continued interest.
Currently, there is no flatten feature in Datumaro, but there are tricky ways to achieve flattening.
When exporting a dataset in Datumaro format, the path of the image is determined by the id and the subset in the DatasetItem. For instance, if the id is "mydir/img1" and the subset is "mysubset", then it would be set as "images/mysubset/mydir/img1.jpg".
Therefore, if all subsets are the same, you can save all images in one folder (e.g., images/mysubset) by changing the id accordingly.
To achieve this, you can use the reindex transform (link).
If the dataset contains multiple subsets, you should use the map_subsets transform to merge the subsets into one, then perform the reindex transform to prevent duplicate ids before exporting.
mapping = {subset:"default" for subset in dataset.subsets()}
dataset.transform("map_subsets", mapping=mapping)
dataset.transform("reindex", start=0)
dataset.export("flattened", "datumaro", save_media=True)
If there are no duplicates among the file names, you could consider using the id_from_image_name transform. However, if there are duplicates, it will unfortunately result in a RepeatedItemError during export, meaning you cannot retain the original file names.
I have a datumaro dataset that has nested items, files often have paths such as
mydir/file1.jpg
. Is there a way to flatten it usingdatumaro
? I would like to iterate over each item, move theitem.media.path
to the root (possibly check if there's already a file of that name), update theitem.id
then re-export. Is there a way to do this ?The text was updated successfully, but these errors were encountered: