Identify task #1839

WithoutPants · 2021-10-13T05:22:04Z

As discussed in #1802.

Resolves #1353 (alongside #1812).
Resolves #1589.

Supercedes #1522.

Adds an Identify task to the Tasks page, grouped with Scan and Auto Tag in the Library section:

Opens a dialog as follows:

By default, only the first stash-box is added as a source. Other sources can be added, but must either be a stash-box instance or a scene fragment scraper.

Allows setting default options which can be overridden in the individual sources. Includes field options:

This allows specific field behaviour to be changed. Strategy can be "ignore", "merge", or "overwrite". For single value fieds, merge ~~is functionally the same as Overwrite~~ only sets if no value is already set on the field. For multivalue fields (performers, tags), it means that existing values are kept while adding to them. For performers, tags and studios, a create missing option is available that creates any missing objects as needed. This defaults to false.

It is possible to set the defaults for the task. This pattern will be applied to the other tasks, alongside a general refactor of the Tasks pages, in another PR.

The Identify task itself tries to identify the scene starting from the first source. If it finds a result, then it sets the scene fields in accordance with the field settings (preferring source-specific settings, then defaulting to the global values. Where a scene field isn't specified, its strategy is set to "merge" and not to create missing objects.

The other sources are only checked if the previous ones had no results.

The task logs if a scene was identified or not.

This PR still has some pending things to do, but it's worth giving it a test as it is.

TODO:

identify from scene selection
support Identify using folder selection
documentation
unit tests

Future iterations:

add identify task as scan option
add filename filtering to scraper configs
add some sort of throttling for scraper requests

SmallCoccinelle

Generally, this looks good. I only looked at the Go code though, since I'm bad at Typescript idioms and stuff. Most of the comments are just my curiosity as to why certain things are written as they are.

graphql/schema/types/metadata.graphql

pkg/manager/config/config.go

pkg/manager/task_identify.go

pkg/models/stash_box.go

pkg/autotag/identify.go

kermieisinthehouse · 2021-10-13T20:34:16Z

It should complain if you have no sources configured, otherwise it will confuse users who haven't configured scrapers / stashdb
The message "Default options will be used where source-specific options are not set." is unclear. It took a little while to figure out what it refers to.
The "Unable to identify" should maybe be at the warning log level, since they outnumber the matches 1000:1 (at least in my dataset)

We should also be much clearer with the user that media information about their entire library is about to be transmitted over the network to their sources. Maybe have an option about what the match criteria to send, as in only send phashes/oshashes, or filename fragments, etc.

kermieisinthehouse · 2021-10-13T20:34:34Z

Some questions: What does merge mean for a studio? What does it mean for titles?

stg-annon · 2021-10-13T20:39:30Z

Some questions: What does merge mean for a studio? What does it mean for titles?

Seems to be ignored and only used for multi field values based off this comment

WithoutPants · 2021-10-13T21:23:30Z

Some questions: What does merge mean for a studio? What does it mean for titles?

For single value fieds, merge is functionally the same as Overwrite. For multivalue fields (performers, tags), it means that existing values are kept while adding to them.

WithoutPants · 2021-10-20T21:13:47Z

Potential future enhancement - would be useful if this (could) include the blacklist from the scene tagger (or it's own blacklist). There are files where it works if query done in the scene tagger due to blacklist stripping out a value like _1080, but doesn't work in Identify because it uses the raw file name.

May be somewhat redundant with the TODO of adding filename filtering to scraper configs, but it may also be less development work if the blacklist data is already available.

Otherwise my tests found that it identified all files that could be found with "fingerprint match", and seemed to correctly identify all files that could be found with the 'query' button in scene tagger as long as that query string had not had any values removed from it by blacklist.

The identify task does not use filenames and the blacklist does not apply. The Identify task is the equivalent of running a query by fingerprint not a keyword query. I realise that some scrapers - for example ThePornDB scraper - use the filename, but in this instance the blacklist is not used; the blacklist is only applied when using the keyword query function.

In short, the Identify task is the equivalent of running the "Scrape by fragment" or "Scrape All" operations in the Tagger, not the "Search" operation.

There is future scope for having scraper-specific options and a blacklist might be worth including, but at least with the current implementation, the blacklist simply doesn't apply.

ui/v2.5/src/docs/en/Identify.md

SmallCoccinelle

Looks good. Went over the Go code only. Added some comments here and there, and some of them are definitely worth looking into.

graphql/schema/schema.graphql

graphql/schema/types/metadata.graphql

pkg/autotag/identify.go

SmallCoccinelle · 2021-10-21T22:22:28Z

pkg/autotag/identify.go

+		}
+
+		if scene == nil {
+			return fmt.Errorf("no scene found with id %d", t.SceneID)


An option here is to have an error at the top-level:

var ErrNotFound = errors.New("entity not found")

so you can wrap that static error here: fmt.Errorf("scene with id %d: %w", t.SceneID, ErrNotFound).

I'll reuse models.ErrNotFound instead, unless there's a reason not to.

Depends on where you want to go. Consolidation is good. Unless you want to tell the errors apart, or you want to avoid cross-dependencies between packages.

pkg/autotag/identify.go

SmallCoccinelle · 2021-10-21T22:44:31Z

pkg/autotag/identify.go

+	return ret
+}
+
+func idListEquals(a, b []int) bool {


Make sure this does what it's supposed to. Consider e.g.:

func TestIdEqualsReject(t *testing.T) { x := []int{1, 1, 1, 2, 2, 2} y := []int{1, 2, 3, 4, 5, 6} if IdListEquals(x, y) { t.Errorf("Lists aren't the same") } }

Which fails.

If you are just interested in precise equality, you can use reflect.DeepEqual. Or use the hammer which is also safer: `https://pkg.go.dev/github.com/google/go-cmp.

The problem is that we want to compare lists while ignoring order, since the order may be changed. I've written a utility function to do this generically.

pkg/autotag/identify.go

pkg/scene/query.go

SmallCoccinelle

LGTM - just updating so I'm caught up.

SmallCoccinelle · 2021-10-21T23:05:05Z

Went over this PR as well. Managed to push the wrong button so I reviewed some outdated stuff as well. But I'm sure you can manage to find if it has moved elsewhere.

ghost · 2021-10-22T22:48:08Z

The UI feels a bit confusing. I would suggest listing all fields all the time, and using a button group for the strategy instead of a dropdown, that way it's easy to quickly toggle on/off different fields. It's also clear what actions will be taken for all fields. For source settings you could have an additional "inherit" option which is selected by default.

I also think it would be better to remove the option to set invalid values, like merge for single value fields. Considering the identify task can run on an entire collection, it's best to be clear about what it will do.

Lastly, for the set cover images option, isn't that basically just another field?

WithoutPants · 2021-10-22T22:58:10Z

The UI feels a bit confusing. I would suggest listing all fields all the time, and using a button group for the strategy instead of a dropdown, that way it's easy to quickly toggle on/off different fields. It's also clear what actions will be taken for all fields. For source settings you could have an additional "inherit" option which is selected by default.

Sounds reasonable, I'll take a look.

I also think it would be better to remove the option to set invalid values, like merge for single value fields. Considering the identify task can run on an entire collection, it's best to be clear about what it will do.

Merge is a valid value, but perhaps could be named better for single-value fields. It basically means overwrite if empty.

Lastly, for the set cover images option, isn't that basically just another field?

Scene cover is in a bit of a funny place at the moment. Conceptually, the image cover always exists - it is automatically generated during scan. We can check for the presence of the cover image in the database, but I don't think this is good long-term design (shifting images out of the database is something I want to do in future). As such, there's no good way to determine if a cover is set (to something other than the default) or not, meaning that the option for it is a simple boolean. There's no reason to have overwrite/merge/ignore state for it, so a simple boolean state seemed like the sensible option.

WithoutPants · 2021-10-24T05:48:05Z

Did some refactoring to support running the scene update post hook after updating a scene.
Fixed bug where directories could not be cleared in the identify dialog.
Identify dialog now shows all fields and presents strategy options with radio buttons instead of dropdown:

ui/v2.5/src/components/Dialogs/IdentifyDialog/ThreeStateBoolean.tsx

* Add identify task * Change type naming * Debounce folder select text input * Add generic slice comparison function

WithoutPants added 19 commits October 8, 2021 12:17

Add identify task

cf2c2d8

Change type naming

bccf78e

Initial UI support

b78b97f

Support adding and editing sources

a3fc3a9

Support rearranging source list

257fecb

Styling

36d7335

Use three-state checkbox

77a940a

Fix cover image condition

715732a

WIP

58f3aa2

Allow object member param reassignment

55170ad

Rearrange to separate files

86abb9f

Use table for field options

848766a

Improve three-state boolean

53e5139

Hide table when no fields

0f20f64

Support saving identify task defaults

acfd16e

Show defaults in source editors

73a6108

Bug fixing

16a84a2

Lint

3f3e38f

Refactoring

aa04569

WithoutPants added the feature Pull requests that add a new feature label Oct 13, 2021

WithoutPants added this to the Version 0.11.0 milestone Oct 13, 2021

WithoutPants marked this pull request as draft October 13, 2021 05:27

SmallCoccinelle reviewed Oct 13, 2021

View reviewed changes

WithoutPants added 3 commits October 14, 2021 08:54

Merge remote-tracking branch 'upstream/develop' into identify-task

6e9b6be

Fix incorrect lock

7c379d3

Reuse context

cc38c2d

bnkai reviewed Oct 20, 2021

View reviewed changes

ui/v2.5/src/docs/en/Identify.md Outdated Show resolved Hide resolved

WithoutPants added 4 commits October 21, 2021 16:57

Fix documentation

ccfcf2b

Merge remote-tracking branch 'upstream/develop' into identify-task

aa8455c

Fix Use Default showing in default field options

eb690a8

Fix setting scene cover

3239349

SmallCoccinelle reviewed Oct 21, 2021

View reviewed changes

WithoutPants added 4 commits October 22, 2021 12:10

Fix documentation

ec7f2c2

Add generic slice comparison function

f5135c0

Review fixes

d5b09be

Review fix

8fc8359

WithoutPants added 3 commits October 23, 2021 10:58

Allow remove all directories from identify dialog

79a61b8

Execute SceneUpdatePost hook

3840e41

Show all fields in options

e1b85d6

Merge remote-tracking branch 'upstream/develop' into identify-task

2c244cf

ghost reviewed Oct 27, 2021

View reviewed changes

ui/v2.5/src/components/Dialogs/IdentifyDialog/ThreeStateBoolean.tsx Outdated Show resolved Hide resolved

WithoutPants added 4 commits October 28, 2021 12:12

Merge remote-tracking branch 'upstream/develop' into identify-task

0b9760a

Fix warning

3553dba

Add id to ThreeStateBoolean checkbox

30a2b53

Add changelog entry [skip ci]

f507609

WithoutPants merged commit 27cdefb into stashapp:develop Oct 28, 2021

WithoutPants added a commit to WithoutPants/stash that referenced this pull request Oct 28, 2021

Identify task (stashapp#1839)

0f64954

* Add identify task * Change type naming * Debounce folder select text input * Add generic slice comparison function

This was referenced Oct 28, 2021

Add stash-box batch scene tagging #1522

Closed

[Feature] Enable "Scrape with..." multiple items (scenes, performers etc.) #1180

Closed

SmallCoccinelle mentioned this pull request Oct 28, 2021

Refactor scraper top half #1893

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify task #1839

Identify task #1839

WithoutPants commented Oct 13, 2021 •

edited

Loading

SmallCoccinelle left a comment

kermieisinthehouse commented Oct 13, 2021

kermieisinthehouse commented Oct 13, 2021

stg-annon commented Oct 13, 2021 •

edited

Loading

WithoutPants commented Oct 13, 2021

WithoutPants commented Oct 20, 2021

SmallCoccinelle left a comment

SmallCoccinelle Oct 21, 2021

WithoutPants Oct 22, 2021

SmallCoccinelle Oct 22, 2021

SmallCoccinelle Oct 21, 2021

SmallCoccinelle Oct 21, 2021

WithoutPants Oct 22, 2021

SmallCoccinelle left a comment

SmallCoccinelle commented Oct 21, 2021

ghost commented Oct 22, 2021

WithoutPants commented Oct 22, 2021

WithoutPants commented Oct 24, 2021

Identify task #1839

Identify task #1839

Conversation

WithoutPants commented Oct 13, 2021 • edited Loading

SmallCoccinelle left a comment

Choose a reason for hiding this comment

kermieisinthehouse commented Oct 13, 2021

kermieisinthehouse commented Oct 13, 2021

stg-annon commented Oct 13, 2021 • edited Loading

WithoutPants commented Oct 13, 2021

WithoutPants commented Oct 20, 2021

SmallCoccinelle left a comment

Choose a reason for hiding this comment

SmallCoccinelle Oct 21, 2021

Choose a reason for hiding this comment

WithoutPants Oct 22, 2021

Choose a reason for hiding this comment

SmallCoccinelle Oct 22, 2021

Choose a reason for hiding this comment

SmallCoccinelle Oct 21, 2021

Choose a reason for hiding this comment

SmallCoccinelle Oct 21, 2021

Choose a reason for hiding this comment

WithoutPants Oct 22, 2021

Choose a reason for hiding this comment

SmallCoccinelle left a comment

Choose a reason for hiding this comment

SmallCoccinelle commented Oct 21, 2021

ghost commented Oct 22, 2021

WithoutPants commented Oct 22, 2021

WithoutPants commented Oct 24, 2021

WithoutPants commented Oct 13, 2021 •

edited

Loading

stg-annon commented Oct 13, 2021 •

edited

Loading