Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement Query lang #606

Merged
merged 67 commits into from
Dec 6, 2024
Merged

Conversation

Computerdores
Copy link
Collaborator

@Computerdores Computerdores commented Nov 27, 2024

Implements #600

Current supported Grammar:

ORList         = ANDList ( "OR", ANDList)* ;
ANDList        = Term ( ["AND"] Term )* ;
Term           = Constraint | "(", ORList, ")" | "NOT", Term ;

Constraint     = [ConstraintType, ":"], Literal, "[", PropertyList, "]" ;

ConstraintType = "tag" | "tag_id" | "mediatype" | "filetype" | "path" | "special" ; (* not case sensitive *)
PropertyList   = Property, (",", Property)* ;
Property       = ULITERAL, "=", Literal ;
Literal        = ULITERAL | QLITERAL ;

Notes:

  • path:<glob> searches entries using a UNIX-Style GLOB
  • special:untagged searches for entries that don't have any tags

Limitations / Future Work:

  • Sub-Tags do not work yet
  • autocompletion could use an overhaul
  • When query is invalid user should be informed what part of it is wrong
    • also warn / fail when searching for tag / tag id that doesn't exist
  • Backwards and Forwards buttons don't work

Todo:

  • Try to put tag id lookup into subqueries that return tag ids. This should reduce the number of joins in the main query from 4 to 2 1. That might drastically improve performance
  • let chained NOTs cancel eachother out
  • use relational division queries on both negated and non negated Tag constraints
  • fix bug that @CyanVoxel found and @python357-1 figured out the root of

@python357-1
Copy link
Collaborator

python357-1 commented Nov 27, 2024

NOT operator also seems to not work (NOT tag_id:1000 returns no results) in a library with entries that dont have tag id 1000

@Computerdores
Copy link
Collaborator Author

TODO: port untagged and empty or equivalents

@CyanVoxel CyanVoxel added Type: Enhancement New feature or request Type: QoL A quality of life (QoL) enhancement or suggestion Priority: High An important issue requiring attention TagStudio: Search The TagStudio search engine labels Nov 28, 2024
@CyanVoxel CyanVoxel added this to the Alpha v9.5 (Post-SQL) milestone Nov 28, 2024
@CyanVoxel CyanVoxel linked an issue Nov 28, 2024 that may be closed by this pull request
3 tasks
@Computerdores
Copy link
Collaborator Author

NOT operator also seems to not work (NOT tag_id:1000 returns no results) in a library with entries that dont have tag id 1000

NOT is now implemented


def visit_not(self, node: Not) -> ColumnExpressionArgument:
return ~Entry.id.in_(
# TODO TSQLANG there is technically code duplication from Library.search_library here
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure on whether / how I should deal with this duplication, opinions would be appreciated

@CyanVoxel CyanVoxel removed the Status: Review Needed A review of this is needed label Dec 4, 2024
@Computerdores
Copy link
Collaborator Author

Computerdores commented Dec 4, 2024

I have now written every test I could think of and then some.
Test Coverage of query_lang/* and visitors.py is at 92%, I think this should be good enough.

(Going through the code you might see # pragma: nocover, this marks lines as excluded from the coverage calculation. I used this to exclude lines that will only ever be run during development / debugging)

Copy link
Member

@CyanVoxel CyanVoxel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic! Works great on my end, and I don't even have much in terms of review feedback left at this point.

Thank you so much for your amazing work on this!!

@Computerdores
Copy link
Collaborator Author

No idea what is going with ruff and mypy, 9d357b0 passed both and on 4265fde they suddenly fail in files that weren't even modified

@CyanVoxel
Copy link
Member

No idea what is going with ruff and mypy, 9d357b0 passed both and on 4265fde they suddenly fail in files that weren't even modified

Hmm, ruff was updated on main recently but I don't see how the mypy error is occurring suddenly

@VasigaranAndAngel
Copy link
Collaborator

No idea what is going with ruff and mypy, 9d357b0 passed both and on 4265fde they suddenly fail in files that weren't even modified

it's strange. i didn’t encounter any errors on my system (windows and ubuntu 22.04) using the same ruff and mypy versions as specified in the workflow. this could be due to:

from src.core.library.alchemy import Library
from src.core.library.alchemy.library import Entry, LibraryStatus

Entry and LibraryStatus are imported directly by importing library and not using the __init__.py ?
in src/core/library/alchemy/, there’s an __init__.py file. (contains __all__ attribute also):

from .enums import ItemType
from .library import Library
from .models import Entry, Tag

__all__ = ["Entry", "Library", "Tag", "ItemType"]

isn’t that odd? but it’s just a weird assumption for a weird issue. nvm.

ruff was updated on main recently

the ruff and mypy versions are hard coded in the workflows, so they aren’t loaded from requirements.txt,

version: 0.6.4
args: 'format --check'
ruff-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: chartboost/ruff-action@v1
with:
version: 0.6.4
args: 'check'

python -m pip install --upgrade uv
uv pip install --system -r requirements.txt
uv pip install --system mypy==1.11.2
mkdir tagstudio/.mypy_cache

@CyanVoxel CyanVoxel added the Status: Mergeable The code is ready to be merged label Dec 6, 2024
@CyanVoxel CyanVoxel merged commit a535ed1 into TagStudioDev:main Dec 6, 2024
5 checks passed
@CyanVoxel CyanVoxel removed the Status: Mergeable The code is ready to be merged label Dec 6, 2024
DandyDev01 pushed a commit to DandyDev01/TagStudio that referenced this pull request Dec 13, 2024
* add files

* fix: term was parsing ANDList instead of ORList

* make mypy happy

* ruff format

* add missing todo

* add more constraint types

* add parent property to AST

* add BaseVisitor class

* make mypy happy

* add __init__.py

* Revert "make mypy happy"

This reverts commit 926d0dd.

* refactoring and fixes

* rudimentary search field integration

* fix: check for None properly

* fix: Entries without Tags are now searchable

* make mypy happy

* Revert "fix: Entries without Tags are now searchable"

This reverts commit 19b40af.

* fix: changed joins to outerjoins and added missing outerjoin

* use query lang instead of tag_id FIlterState

* add todos

* fix: remove uncecessary line that broke search when searching for exact name

* fix tag search

* refactoring

* fix: path now uses GLOB operator for proper GLOBs

* refactoring: remove FilterState.id and implement Library.get_entry_full as replacement

* fix: use default value notation instead of if None statement in __post_init__

* remove obsolete Search Mode UI and related code

* ruff fixes

* remove obsolete tests

* fix: item_thumb didn't query entries correctly

* fix: search_library now correctly returns the number of *unique* entries

* make mypy happy

* implement NOT

* remove obsolete filename search

* remove summary as it is not applicable anymore

* finish refactoring of FilterState

* implement special:untagged

* fix: make mypy happy

* Revert changes to search_tags in favor of changes from TagStudioDev#604

* fix: also port test changes

* fix: remove unneccessary import

* fix: remove unused dataclass

* fix: AND now works correctly with tags

* simplify structure of parsed AST

* add performance logging

* perf: Improve performance of search by reducing number of required joins from 4 to 1

* perf: double NOT is now optimized out of the AST

* fix: bug where pages would show less than the configured number of entries

* Revert "add performance logging"

This reverts commit c3c7d75.

* fix: tag_id search was broken

* somewhat adapt the existing autocompletion to this PR

* perf: Use Relational Division Queries to improve Query Execution Time

* fix: raise Exception so as to not fail silently

* fix: Parser bug broke parentheses

* little bit of clean up

* remove unnecessary comment

* add library for testing search

* feat: add basic tests

* fix: and queries containing just one tag were broken

* chore: remove debug code

* feat: more tests

* refactor: more consistent name for variable

Co-authored-by: Travis Abendshien <[email protected]>

* fix: ruff check complaint over double import

---------

Co-authored-by: Travis Abendshien <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High An important issue requiring attention TagStudio: Search The TagStudio search engine Type: Enhancement New feature or request Type: QoL A quality of life (QoL) enhancement or suggestion
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[Feature Request]: FINAL search engine features for 9.5
4 participants