-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computers need IDs, people want labels #17877
Comments
Original comment by @colings86: This looks similar to #5009 from @skearns64 |
Original comment by @markharwood: @skearns64 Looks users are needing a solution here. For graph we need the unambiguity of a unique ID for correct linking purposes but the readability of a label. Here's the options I see:
Option 2 feels like the more robust solution. It would also allow retrieval of properties other than labels e.g. image URLs (think mugshots in a policing system). Before I built the generic Graph UI we have today, I built several bespoke apps on the graph API for the datasets Wikipedia, MovieLens and BestBuy. Each of these used a call-out which took the new vertex IDs being loaded into the workspace and did an mget to load JSON docs that could be attached as metadata to the nodes and used in displays. Clearly this custom code could be replaced with some generic UI settings to define the mapping. The question is where to put this setting - is it Thoughts? |
Original comment by @skearns64: @markharwood - yea, I agree that this would be super useful. I'd love to hear from @epixa, about how he sees #5009 and whether he feels that we should solve this (resolve IDs to strings for display) in Kibana generically via that issue (or other), or whether it is far enough out, or this feels like a separate enough use-case that we should consider adding it directly to the Graph UI.. |
Original comment by @markharwood: They way I see a basic label lookup being used in Graph is to add the following fields to define the lookup index: !LINK REDACTED In the StackOverflow data shown in the example I am Graphing documents of the type "Post" (to draw out the connections between users and tags e.g. who might be an expert in #elasticsearch). Tags conveniently serve as both an ID and a label but unique user IDs are needing for graphing but need to show a user name to be readable. The StackOverflow data has a seperate index for users containing their display names, bios, image URLs etc. and we would need to take the IDs we use to identify vertices uniquely and lookup the user name from this index. |
Original comment by @skearns64: @markharwood - I think that's a workable approach if we only wanted it for Graph, but I expect that we'll want some sort of control like this for Kibana more generally, perhaps as part of index pattern definition? cc @epixa |
Original comment by @markharwood: I spoke to @spalger last night and he suggested keeping this to Graph only for now (I thought he was adding a comment to this ticket as we spoke so it may have wound up somewhere else?) |
Original comment by @epixa: I agree that this should just go into graph right now. This is something that I want to tackle in Kibana, but if we wait to do that, it could be months before we get around to it. |
Original comment by @mikeh-elastic: In my customer discussions using the uid for the graph exploration and displaying something like a first+last name and/or an image url for the icon has been requested. |
Original comment by @markharwood: Had a customer discussion and they want to map out their infrastructure of nodes/services and show dependencies from a store of health-check events (eg "service A called service B OK"). So to expand the scope of this ticket from "computers want IDs, people want labels" this should perhaps be renamed "event stores hold minimal info, people want detail on the entities they reference". As a foundation it would be useful to assume that each vertex loaded into the graph could optionally have a looked-up JSON structure with many fields that we could use to populate various parts of the graph UI. |
Original comment by @markharwood: I'm going off this idea of attaching labels only at the last minute when IDs need displaying. It helps avoid a common ETL step (enriching incoming events with reference data looked up by ID) but a normalized event store with only IDs makes basic free-text searching difficult and also prevents the proposed LINK REDACTED where the combination of person, address and company name labels held in docs give relevance ranking algorithms lots of useful data to chew on. If we adopt the convention I outlined of combining ID and label e.g
If this is a common enough convention the graph UI could have special treatment for "ID plus label" tokens:
We would need to add a configuration option in graph to declare a field's terms as "containing IDs and labels" and then all of the above functionality could be unlocked. |
Original comment by @markharwood: Following a discussion with @colings86 we outlined the following possible options for generally associating labels with IDs:
A fourth option is to try associate a label for a given field from the same (non-nested) Lucene doc but this is not practical for a variety of reasons. For my money option 1 is the least-worst scenario and so we could phase this in as follows:
We continually butt heads with the need for hard IDs and softer, human-understandable labels so we need to find a way through this. |
Original comment by @clintongormley: @markharwood an extension of option 1 would be to set up an analyzer like the following (requires elastic/elasticsearch#18064 to be added to work properly):
Then you can use the Obviously this doesn't just work out of the box, which is a downside. |
Original comment by @markharwood: Thanks for the mapping, @clintongormley !
The problem is most of our analytics (Kibana bar charts, Graph UI...) is on agg results from fielddata/docvalues so accessing _source of individual docs for display purposes is out of the question. |
Original comment by @clintongormley: Actually, you could store the |
Original comment by @markharwood:
I think the basis of my proposal is we go one step further - we have a type called |
Original comment by @skearns64: Do we have field-level metadata in the mappings? I wonder if a middle-road here would be to support metadata on the field level in the mappings. This metadata could be used by default in native ES to explain "magic" like how dynamically detected string field Maybe this is crazy though :) |
Original comment by @markharwood:
I don't believe so but we have doctype-level metadata which can be arbitrarily complex JSON used to describe the doc as a whole. We could use that to refer to fields by name e.g.
Obviously we'd need to work on what convention we might want to adopt for use in there. It's important to remember arrays of things e.g. products in an order cannot easily keep the relationship between the various product IDs and associated product names without resorting to nested docs or complex script logic about same-array-positions. This is why I advocate a convention of combined ID+label tokens in the source docs and mapping logic to support splitting them. |
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations) |
Closing this because it's not planned to be resolved in the foreseeable future. It will be tracked in our Icebox and will be re-opened if our priorities change. Feel free to re-open if you think it should be melted sooner. |
Original comment by @markharwood:
This old chestnut is a general concern with Kibana and specifically an issue in Graph.
The unit of our analysis is terms (terms aggs, significant_terms etc) and for this reason they need to be unique:
Consequently, to avoid confusion, unique IDs are generated to represent these entities and we must index those for analysis BUT - when visualizing data in graph UI or elsewhere people typically don't want to see the ugly IDs and want useful labels instead. This translation service could be a configurable feature of graph ("the label for ID field X can be found in index Y and field Z"). This translation can be implemented as a single multi-get operation when new IDs are loaded into the graph workspace. Equally this could be a general feature as part of Kibana for use in all visualizations.
In looking at Panama papers I was forced to index terms that were both an ID and a label - the ID was required to avoid merging multiple "John Smith"s into one but the label was also required to be useful to end users. This made for an ugly UI and added code to the ingest process. The bank client forked the graph UI to trim the ID part of the term from the displayed terms in order to make the UI less ugly.
The text was updated successfully, but these errors were encountered: