-
Notifications
You must be signed in to change notification settings - Fork 141
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Intermediate skills - Level 9: Dataset Explorartion from a Query …
…Image/Text (#959) <!-- Contributing guide: https://github.com/openvinotoolkit/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary <!-- Resolves #111 and #222. Depends on #1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem #1234 --> - Ticket no. 108906 - Add intermediate skills - Level 9: Data exploration - Rename searcher to explorer ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [ ] I have added unit tests to cover my changes. - [ ] I have added integration tests to cover my changes. - [ ] I have added the description of my changes into [CHANGELOG](https://github.com/openvinotoolkit/datumaro/blob/develop/CHANGELOG.md). - [X] I have updated the [documentation](https://github.com/openvinotoolkit/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/openvinotoolkit/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2023 Intel Corporation # # SPDX-License-Identifier: MIT ```
- Loading branch information
Sooah Lee
authored
Apr 20, 2023
1 parent
13fd698
commit efd8a40
Showing
16 changed files
with
208 additions
and
150 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
48 changes: 48 additions & 0 deletions
48
docs/source/docs/command-reference/context_free/explorer.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Explore | ||
|
||
## Explore datasets | ||
|
||
This command explore similar data results for query on dataset. You can use your own query with any image file or text description, even put it on the list. The result includes top-k similar data among target dataset and the visualization of result is saved as png file. This feature is supposed to help users to figure out dataset property easier. | ||
|
||
Explorer is a feature that operates on hash basis. Once you put dataset that use as a datasetbase, Explorer calculates hash for every datasetitems in the dataset. Currently, hash of each data is computed based on the CLIP ([article](https://arxiv.org/abs/2103.00020)), which could support both image and text modality. Supported model format is Openvino IR and those are uploaded in [openvinotoolkit storage](https://storage.openvinotoolkit.org/repositories/datumaro/models/). When you call Explorer class, hash of whole dataset is started to compute. For database, we use hash for image of each datasetitem. Through CLIP, we extracted feature of image, converted it to binary value and pack the elements into bits. Each hash information is saved as `HashKey` in annotations. Hence, once you call Explorer for the dataset, all datasetitems in dataset have `HashKey` in each annotations. | ||
|
||
To explore similar data in dataset, you need to set query first. Query could be image, text, list of images, list of texts and list of images and texts. The query does not need to be an image that exists in the dataset. You can put in any data that you want to explore similar dataset. And you need to set top-k that how much you want to find similar data. The default value for top-k is 50, so if you hope to find more smaller results, you would set top-k. For single query, we computed hamming distance of hash between whole dataset and query. And we sorted those distance and select top-k data which have short distance. For list query, we repeated computing distance for each query and select top-k data based on distance among all dataset. | ||
|
||
The command can be applied to a dataset. And if you want to use multiple dataset as database, you could use merged dataset. The current project (`-p/--project`) is also used a context for plugins, so it can be useful for dataset paths having custom formats. When not specified, the current project's working tree is used. To save visualized result (`-s/--save`) is turned on as default. This visualized result is based on [Visualizer](../../jupyter_notebook_examples/visualizer). | ||
|
||
Usage: | ||
``` bash | ||
datum explore [-q <path/to/image.jpg> or <text_query>] [-topk TOPK] | ||
``` | ||
|
||
Parameters: | ||
- `-q, --query` (string) - Image path or text to use as query. | ||
- `-topk` (int) - Number how much you want to find similar data. | ||
- `-p, --project` (string) - Directory of the project to operate on (default: current directory). | ||
- `-s, --save` (bool) - Save visualized result of similar dataset. | ||
|
||
Examples: | ||
- Use image query | ||
```bash | ||
datum project create <...> | ||
datum project import -f datumaro <path/to/dataset/> | ||
datum explore -q path/to/image.jpg -topk 10 | ||
``` | ||
- Use text query | ||
```bash | ||
datum project create <...> | ||
datum project import -f datumaro <path/to/dataset/> | ||
datum explore -q elephant -topk 10 | ||
``` | ||
- Use list of images query | ||
```bash | ||
datum project create <...> | ||
datum project import -f datumaro <path/to/dataset/> | ||
datum explore -q path/to/image1.jpg path/to/image2.jpg path/to/image3.jpg -topk 50 | ||
``` | ||
- Use list of texts query | ||
```bash | ||
datum project create <...> | ||
datum project import -f datumaro <path/to/dataset/> | ||
datum explore -q motorcycle bus train -topk 50 | ||
``` |
44 changes: 0 additions & 44 deletions
44
docs/source/docs/command-reference/context_free/searcher.md
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 2 additions & 2 deletions
4
...commands/datumaro.cli.commands.search.rst → ...ommands/datumaro.cli.commands.explore.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
Search module | ||
Explore module | ||
============= | ||
|
||
.. automodule:: datumaro.cli.commands.search | ||
.. automodule:: datumaro.cli.commands.explore | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
4 changes: 2 additions & 2 deletions
4
...mponents/datumaro.components.searcher.rst → ...mponents/datumaro.components.explorer.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
Searcher module | ||
Explorer module | ||
=============== | ||
|
||
.. automodule:: datumaro.components.searcher | ||
.. automodule:: datumaro.components.explorer | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
Oops, something went wrong.