-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add task type information when importing #1422
Conversation
caption = 11 | ||
super_resolution = 12 | ||
depth_estimation = 13 | ||
mixed = 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for users to know what tasks are possible when TaskType is mixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixed
task can be transformed to any task types. The reason why we are providing mixed
is because Datumaro format can have any AnnotationType
when importing.
src/datumaro/components/dataset.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see there are many changes in plugins/data_format
. However, I'd rather revert them and let a set of annotation types existing in the dataset to be managed by DatasetStorage
(Dataset
's dataset item container) or StreamDatasetStorage
(a correspondent to StreamDataset
). This is because
- This implementation makes a hidden constraint that every dataset extractor (
DatasetBase
) should implement an annotation type gatherer such as
ann_types = set() - This implementation is not aligned with our dataset transform logics. It currently compute
task_type
atDatasetBase
. Let's assume that someDatasetBase
decides that a given dataset has two annotation types,Label
andBbox
. However, if an arbitrary dataset transform is applied on top of it and it drops everyBbox
, we must re-compute a set of annotation types existed in the dataset after transformation. This should be done byDatasetStorage
orStreamDatasetStorage
.
Following this idea, it would be
class DatasetStorage:
def __init__(self):
...
self._set_of_ann_types: set | None = None
...
@property
def set_of_ann_types(self):
if self._set_of_ann_types is None:
self._set_of_ann_types = set()
# If reset or not computed, run its iterator to compute
for item in self:
for ann in item.annotations:
self._set_of_ann_types.add(ann.type)
return self._set_of_ann_types
@property
def task_type(self):
return infer_task_type_from_set_of_ann_types(self.set_of_ann_types)
...
def _iter_init_cache_unchecked(self) -> Iterable[DatasetItem]:
# Merges the source, source transforms and patch, caches the result
# and provides an iterator for the resulting item sequence.
...
# Reset if there is a possible change
self._set_of_ann_types = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the good idea. When I approached with this way, it needs to have a single iteration of whole dataset for obtaining available task information. So, I have turned to obtain available task during importing. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see 3d30c1d
I have a question. I'm curious why datasets are designed to have a single task_type. For example, if a dataset has both label and bbox annotations, it can be used for both classification and detection tasks. (And even for anomaly cls/det. tasks if the labels are anomalous and normal). However, according to your implementation, it seems like the task_type becomes detection. |
Hi @jihyeonyi, thank you for the question. We are able to identify the mapping between annotation types and tasks in |
1d05fe9
to
3d30c1d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1422 +/- ##
===========================================
+ Coverage 80.85% 80.98% +0.12%
===========================================
Files 271 272 +1
Lines 30689 31137 +448
Branches 6197 6279 +82
===========================================
+ Hits 24815 25216 +401
- Misses 4489 4505 +16
- Partials 1385 1416 +31
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
# # when adding a new item, task_type will be updated automatically | ||
# for ann in item.annotations: | ||
# self._set_of_ann_types.add(ann.type) | ||
# self._task_type = TaskAnnotationMapping().get_task(self._set_of_ann_types) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed at 0a313ca
@@ -643,7 +705,15 @@ def stacked_transform(self) -> IDataset: | |||
return transform | |||
|
|||
def __iter__(self) -> Iterator[DatasetItem]: | |||
yield from self.stacked_transform | |||
# yield from self.stacked_transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed at 0a313ca
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.