Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[full-ci] enhancement: add more kql spec tests and simplify ast normalization #7254

Merged
merged 8 commits into from
Sep 11, 2023

Conversation

fschade
Copy link
Contributor

@fschade fschade commented Sep 8, 2023

Description

follow up pr for the ocis kql integration, @butonic found a ms pdf spec which contains even more detail than the KQL docs page.

https://msopenspecs.azureedge.net/files/MS-KQL/%5bMS-KQL%5d.pdf

With that in mind, it looks we understood (or tried to ;)) some parts of the implicit operator node connections wrong.
There are still some parts of which contradict each other, but most of that should be now spec compliant.

following queries are now exactly covered as described in the spec:

* 2.1.2 AND Operator 
* 2.1.6 NOT Operator
* 2.1.8 OR Operator
* 2.1.12 Parentheses
* 2.3.5 Date Tokens
    * Human tokens not implemented
* 3.1.11 Implicit Operator
* 3.1.12 Parentheses
* 3.1.2 AND Operator
* 3.1.6 NOT Operator
* 3.1.8 OR Operator
* 3.2.3 Implicit Operator for Property Restriction
* 3.3.1.1.1 Implicit AND Operator
* 3.3.5 Date Tokens

beside that it also includes:

  • support for - and + bool operators
  • error handling for:
    • queries that start with an binary operator
    • allow only FreeTextKeywordNodes children for GroupNodes with key
  • toolify pigeon
  • optimize grammar and parser
  • better dateTime detection/handling

Related Issue

Motivation and Context

with the found spec, it's more clear how some parts are connected to each other, as we said KQL will be our primary query language, therefore we should respect the spec as good as we can.

How Has This Been Tested?

  • unit tests
  • acceptance tests
  • e2e tests

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Technical debt
  • Tests only (no source changes)

Checklist:

  • Code changes
  • Unit tests added
  • Acceptance tests added
  • Documentation ticket raised:

@fschade fschade force-pushed the optimize-kql branch 2 times, most recently from 48d9ced to 3421942 Compare September 8, 2023 12:10
@fschade fschade marked this pull request as ready for review September 8, 2023 12:19
@2403905
Copy link
Contributor

2403905 commented Sep 8, 2023

It is not related to code.
But related to docs and it looks counter...

3.3.1.1.1 Implicit AND Operator
....
The following queries match the same items:
cat +dog -fox
cat AND dog AND NOT fox
3.3.1.1.2 Implicit OR Operator
....
The following queries match the same items:
cat +dog -fox
(NOT fox) AND (dog OR (dog AND cat))

Why do two equal expressions become different?

@jnweiger
Copy link
Contributor

jnweiger commented Sep 8, 2023

(NOT fox) AND (dog OR (dog AND cat))

Which is reducable. Isn't it? (dog OR (dog AND cat)) is logically equivalent to (dog)
The entire "implicit OR" thing sounds like broken-by-design to me.

@fschade
Copy link
Contributor Author

fschade commented Sep 8, 2023

@2403905, writing from my mobile, I hope it makes sense, to be exact, the docs say

cat +dog -fox
(NOT fox) AND (dog OR (dog AND cat))

are the same, which translates to:

no fox items
all dog items
all cat times but none of them without also having  a dog prop

we will never be able to translate to the exact same query from the docs, but we can creat queries that mean the same.

but i wonder too, I expect

cat and dog and not fox

when reading reading

cat +dog -fox

and we do:

&ast.StringNode{Value: "cat"},
&ast.OperatorNode{Value: kql.BoolAND},
&ast.StringNode{Value: "dog"},
&ast.OperatorNode{Value: kql.BoolAND},
&ast.OperatorNode{Value: kql.BoolNOT},
&ast.StringNode{Value: "fox"},

i picked the one that made most sense, but thanks for birds eyes, we could discuss it together, i expect it's a typo in the docs, makes no sense to me at all.

cat AND dog AND NOT fox

@fschade
Copy link
Contributor Author

fschade commented Sep 9, 2023

@2403905 i had to touch the parser again, thanks to @ScharfViktor ive found a bug in which child nodes were linked with an OR instead of an AND, i found the part in the docs, implemented it and we now can decide on a node bases how connecting edges get handled.

still, the sound strange to me, but i hope the spec is correct:

author:"John Smith" AND author:"Jane Smith"

is the same as

author:("John Smith" "Jane Smith")

but

author:"John Smith" author:"Jane Smith"

is the same as

author:"John Smith" OR author:"Jane Smith"

but what about

("John Smith" "Jane Smith") cat

is it?

"John Smith" AND "Jane Smith" AND cat

or is it?

"John Smith" OR "Jane Smith" AND cat

we now have

"John Smith" AND "Jane Smith" AND cat

wich sounds good to me, what do you all mean?

connecting nodes (with edges) seem straight forward when not using group, the default connection for nodes with the same node is always OR. THis only applies for first level nodes, for grouped nodes it is defined differently. The KQL docs are saying, nodes inside a grouped node, with the same key are connected by a AND edge.
@fschade
Copy link
Contributor Author

fschade commented Sep 9, 2023

also some optimizations down the road.

Before:

❯ go test -bench=. -benchtime=10s -count 5
goos: darwin
goarch: arm64
pkg: github.com/owncloud/ocis/v2/services/search/pkg/query/kql
BenchmarkParse-10          25560            470035 ns/op          329406 B/op       7580 allocs/op
BenchmarkParse-10          25639            465516 ns/op          329405 B/op       7580 allocs/op
BenchmarkParse-10          25536            466214 ns/op          329408 B/op       7580 allocs/op
BenchmarkParse-10          25803            472425 ns/op          329405 B/op       7580 allocs/op
BenchmarkParse-10          25431            467778 ns/op          329405 B/op       7580 allocs/op
PASS
ok      github.com/owncloud/ocis/v2/services/search/pkg/query/kql       83.663s

After:

❯ go test -bench=. -benchtime=10s -count 5
goos: darwin
goarch: arm64
pkg: github.com/owncloud/ocis/v2/services/search/pkg/query/kql
BenchmarkParse-10          48206            247459 ns/op          285539 B/op       6306 allocs/op
BenchmarkParse-10          48243            247084 ns/op          285539 B/op       6306 allocs/op
BenchmarkParse-10          48393            245621 ns/op          285539 B/op       6306 allocs/op
BenchmarkParse-10          48474            246485 ns/op          285539 B/op       6306 allocs/op
BenchmarkParse-10          48990            245175 ns/op          285540 B/op       6306 allocs/op
PASS
ok      github.com/owncloud/ocis/v2/services/search/pkg/query/kql       72.235s

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

87.5% 87.5% Coverage
0.0% 0.0% Duplication

@fschade fschade merged commit c0553c7 into master Sep 11, 2023
@delete-merged-branch delete-merged-branch bot deleted the optimize-kql branch September 11, 2023 11:49
ownclouders pushed a commit that referenced this pull request Sep 11, 2023
…lization (#7254)

* enhancement: add more kql spec tests and simplify ast normalization

* enhancement: kql parser error if query starts with AND

* enhancement: add kql docs and support for date and time only dateTimeRestriction queries

* enhancement: add the ability to decide how kql nodes get connected

connecting nodes (with edges) seem straight forward when not using group, the default connection for nodes with the same node is always OR. THis only applies for first level nodes, for grouped nodes it is defined differently. The KQL docs are saying, nodes inside a grouped node, with the same key are connected by a AND edge.

* enhancement: explicit error handling for falsy group nodes and queries with leading binary operator

* enhancement: use optimized grammar for kql parser and toolify pigeon

* enhancement: simplify error handling

* fix: kql implicit 'AND' and 'OR' follows the ms html spec instead of the pdf spec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants