Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong detection for local anime #469

Closed
mufeedali opened this issue Jan 2, 2020 · 14 comments
Closed

Wrong detection for local anime #469

mufeedali opened this issue Jan 2, 2020 · 14 comments

Comments

@mufeedali
Copy link
Contributor

Many local anime are recognized wrongly. For examples,

  • Dr. Stone is being recognized as Dr. Stone: Stone Wars and OreSuki is being recognised as the upcoming OreSuki OVA.
  • Shinchou Yuusha isnt being recognised at all, which might be due to it being the short name
  • Kimetsu No Yaiba, which seems pretty obvious, isnt recognized at all.

Similar story with most of the Anime in my library. Is there maybe a naming scheme I should be following? That said, the detection of local anime seems to be a weak point of Trackma. It does very poorly compared to KawAnime or Kodi or other tools. Still an amazing tracker tho.

@FichteFoll
Copy link
Collaborator

FichteFoll commented Jan 2, 2020

Related: #413

Please provide the file names of the mismatched files. You can enable more debug output in the CLI ui using trackma -d and executing rescan there.

@purposelycryptic
Copy link

Unless things have changed dramatically since I last gave it a shot, Trackma uses a series of RegEx functions to whittle the filename down to a string most likely to represent the series name
and extract an episode number while at it. Anything in any kind of brackets is not considered at all for any kind of ID, which was my biggest problem with it, as I use AniDB series names, where sequel series are frequently the same name, just with the year in brackets.

I spent far more time than I should have rewriting and testing rewriting much of the RegEx for it, since I was only getting about ~65% of my library correctly IDed, but just ended up too sick, and had to drop it before it was finished, as fever was just messing up too much.

It is definitely still picky when it comes to IDing, i.e., I just finished rewatching Fairy Tail the other day (naming pattern: Fairy Tail - EPNUM [GROUPNAME] [SOURCE, RESOLUTION, AUDIOCODEC] [CRC32].EXT), and not a single episode was recognized. Then I started on the 2014 sequel series (naming pattern: Fairy Tail (2014) - EPNUM [GROUPNAME] [SOURCE, RESOLUTION, AUDIOCODEC] [CRC32].EXT), which is being IDed as the original series... And this is one of the rare instances on AniList where the sequel series actually uses the same series name as AniDB, too.

¯_(ツ)_/¯

Bah, can't do much of anything from here (stuck in hospital, spending my days watching anime off of my server), since I am on my cell, and am not even allowed to use that in my room. Just saw the issue in my email inbox and thought I'd add my two cents, since I really had nothing better to do during my daily few minutes out of bed.

@molkoback
Copy link
Contributor

KawAnime seems to be using https://github.com/erengy/anitomy. The parsing algorithm is briefly explained in the README and might be worth looking into.

@mufeedali
Copy link
Contributor Author

mufeedali commented Jan 3, 2020

KawAnime seems to be using erengy/anitomy. The parsing algorithm is briefly explained in the README and might be worth looking into.

But then, does that mean simpler naming schemes confuse it? For example, Kimetsu no Yaiba is simply named in the format "Kimetsu no Yaiba - ##.mkv" in my Anime folder and trackma fails to recognize it.

@mufeedali
Copy link
Contributor Author

Related: #413

Please provide the file names of the mismatched files. You can enable more debug output in the CLI ui using trackma -d and executing rescan there.

[D] Engine: Not a show, skipping: ~/Anime/Ansatsu Kyoushitsu 2nd Season [1080]/[Cleo]Ansatsu_Kyoushitsu_2nd_Season_-_10_(Dual Audio_10bit_BD1080p_x265).mkv
This is weird, since it looks like it should work, as it seems to match anitomy's naming scheme. It's repeated for each episodes of both seasons.

Not a show, skipping: ~/Anime/Anohana/Ano Hi Mita Hana no Namae o Bokutachi wa Mada Shiranai - 03.mp4
Part of an older, shittier collection, but still should be recognized correctly imo.

[D] Engine: Adding to library: ~/Anime/The Promised Neverland/The Promised Neverland - 12.mkv
[D] Engine: Redirected to Yakusoku no Neverland 2nd Season 12

The most obvious issue is this. It assumes a series to be its upcoming season 2, OVA or something of the sort... does it handles spaces poorly?

@mufeedali
Copy link
Contributor Author

Unless things have changed dramatically since I last gave it a shot, Trackma uses a series of RegEx functions to whittle the filename down to a string most likely to represent the series name
and extract an episode number while at it. Anything in any kind of brackets is not considered at all for any kind of ID, which was my biggest problem with it, as I use AniDB series names, where sequel series are frequently the same name, just with the year in brackets.

So, it doesnt use Anitomy?

I spent far more time than I should have rewriting and testing rewriting much of the RegEx for it, since I was only getting about ~65% of my library correctly IDed, but just ended up too sick, and had to drop it before it was finished, as fever was just messing up too much.

Damn, too bad, would be great to see better parsing.

It is definitely still picky when it comes to IDing, i.e., I just finished rewatching Fairy Tail the other day (naming pattern: Fairy Tail - EPNUM [GROUPNAME] [SOURCE, RESOLUTION, AUDIOCODEC] [CRC32].EXT), and not a single episode was recognized. Then I started on the 2014 sequel series (naming pattern: Fairy Tail (2014) - EPNUM [GROUPNAME] [SOURCE, RESOLUTION, AUDIOCODEC] [CRC32].EXT), which is being IDed as the original series... And this is one of the rare instances on AniList where the sequel series actually uses the same series name as AniDB, too.

Yes, this is weird, happens a lot, even with a pretty simple naming scheme.

@FichteFoll
Copy link
Collaborator

You could try enabling matching for your entire library instead of just watching and planning. In case you have shows on your disk that you already completed and that is confusing trackma.

@BanchouBoo
Copy link

Is there any particular reason Anitomy isn't being used with Trackma? Back when I was on Windows and using Taiga (which uses Anitomy) I never had any issues like this with detection.

@z411 z411 added the duplicate label Jan 4, 2020
@z411
Copy link
Owner

z411 commented Jan 4, 2020

The issues regarding the extraction of the title from the filename could be fixed by switching to Anitomy, for sure. But the issue of the wrong show being recognized is a different one; this is done by a different algorithm doing fuzzy search. This would need rework, but for now we can see if increasing the required ratio improves things.

if highest_ratio[1] > 0.7:

@FichteFoll
Copy link
Collaborator

FichteFoll commented Jan 4, 2020

Anitomy is a C library and even though Python can interface with that, it's always a struggle to distribute - cross-platform at that. It might be easier to reimplement the algorithm in Python.

@mufeedali
Copy link
Contributor Author

You could try enabling matching for your entire library instead of just watching and planning. In case you have shows on your disk that you already completed and that is confusing trackma.

Looks like it's enabled already... it even searches for upcoming series on my folder.

@mufeedali
Copy link
Contributor Author

The issues regarding the extraction of the title from the filename could be fixed by switching to Anitomy, for sure. But the issue of the wrong show being recognized is a different one; this is done by a different algorithm doing fuzzy search. This would need rework, but for now we can see if increasing the required ratio improves things.

I see. I'll try. Should I close this if it's a duplicate?

@molkoback
Copy link
Contributor

The issues regarding the extraction of the title from the filename could be fixed by switching to Anitomy, for sure. But the issue of the wrong show being recognized is a different one; this is done by a different algorithm doing fuzzy search. This would need rework, but for now we can see if increasing the required ratio improves things.

if highest_ratio[1] > 0.7:

Have you tried other fuzzy search methods such as Levenshtein distance? The devs of fuzzywuzzy prefer it seatgeek/fuzzywuzzy#128.

@mufeedali
Copy link
Contributor Author

Closing it since it's a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants