Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lens] Formula: allow "count of hits.total.value", the count of all the documents in which a field value appears, including any (*) field values #115770

Closed
dminovski0 opened this issue Oct 20, 2021 · 4 comments
Labels
Feature:Lens feedback_needed Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@dminovski0
Copy link

dminovski0 commented Oct 20, 2021

Describe the feature:
This formula feature returns the count of all underlying documents in which a field's value appears. The field can take a list of values, but this feature counts the number of returned documents. For example, if there are 3 documents in the index, and one of them has the field items with the values:

items" : [ "item1", "item2" ]

while the other two documents don't have values for this field, then this feature will return the count for item:* with a value of one (1) instead of two (2).

It can also retrieve the total count if the value is set with the wildcard operator. It can be equivalent to the KQL command:

items.keyword : *

or the Wildcard query:

GET items_index/_search { "query": { "wildcard": { "items.keyword": { "value": "*" } } } }

Currently, the count() feature can use:

count(kql='items.keyword : *')

But it returns the count of the values per row, not the total number of all documents.

Describe a specific use case for the feature:

The feature can be used for calculating a percent when a field can have multiple values. For example, if there are 3 documents, first with the values

items" : [ "item1", "item2" ]

the second with the values

items" : [ "item1", "item3" ]

and the third with the values

items" : [ "item1" ]

Then dividing the count of "item1" with the count of all values is 3 / 5 = 0.6, or 60 percent. But, "item1" appears in all 3 documents, it's represented 100%.
The division should be between count of "item1" and count of the documents where "item1" appears - 3 / 3 = 1, or 100 %.

@botelastic botelastic bot added the needs-team Issues missing a team label label Oct 20, 2021
@dminovski0 dminovski0 changed the title [Lens] Allow "count of hits.total.value", the count of all the documents in which a field value appears, including all field values [Lens] Allow "count of hits.total.value", the count of all the documents in which a field value appears, including any (*) field values Oct 20, 2021
@dminovski0 dminovski0 changed the title [Lens] Allow "count of hits.total.value", the count of all the documents in which a field value appears, including any (*) field values [Lens] Formula: allow "count of hits.total.value", the count of all the documents in which a field value appears, including any (*) field values Oct 20, 2021
@stratoula stratoula added Feature:Lens Team:Visualizations Visualization editors, elastic-charts and infrastructure triage_needed labels Oct 26, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-vis-editors (Team:VisEditors)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Oct 26, 2021
@flash1293
Copy link
Contributor

Currently, the count() feature can use:
count(kql='items.keyword : *')
But it returns the count of the values per row, not the total number of all documents.

This is not the case - count() is already doing what you suggest here, it's returning the number of documents, not the number of values.

Actually, we are missing the other way around - there's no way in Lens today to count the number of items - this is the feature request: #74910

Can you clarify for what feature you are looking here @dminovski0 ?

@dminovski0
Copy link
Author

dminovski0 commented Oct 28, 2021

@flash1293 We need the total number of documents, count() returns the number of documents for a particular value. Then, they can be summed with overall_sum(), but because some documents can have a list of values, those documents will be counted more than once.

There are 3 documents here with the data:

"items" : [ "item1" ]
"items" : [ "item1", "item2" ]
"items" : [ "item1", "item2", "item3"]

image

It seems that count(kql='items.keyword : *') is filtered per row, and it shows all (*) values, but only for that row.
To count the number of all values, we can use overall_sum(count(kql='items.keyword : *'))

image

This shows 6, which is correct, they are 3 item1, 2 item2, and 1 item3. We can then split the count() by the overall_sum(), and we will get a percentage that shows the representation of a value in the sum of all values. For example, item1 is 50% of the items, item2 is 33%. But, there are only 3 documents, and in some of them, there is more than one value. Counting the values, we will get a bigger number than the number of documents.

The document with the 3 values is counted 3 times, instead of one.

if we want to see in how many of the documents the items appear, then the numbers are different. Item1 appears in all of the documents, 100%, and item2 is in 2/3 documents, its representation is 66%. The number of all documents is equivalent to the "hits.total.value" when we do a _search query.

But, this is counting the same document more than once because it has a list of values.

image

@flash1293
Copy link
Contributor

Thanks for that explanation, I get your issue now. This is captured in #94789 - what you need is basically an “overall count” which is different from “overall_sum(count())” in case of array values. I added this case to the other issue because I didn’t think of it, thanks for raising this.

closing this as a duplicate, feel free to subscribe to the other issue to track it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Lens feedback_needed Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants