From 5fb875a32be06c831bda690912e18644e52768d0 Mon Sep 17 00:00:00 2001 From: Peter Szalai Date: Mon, 3 Oct 2022 09:35:46 +0200 Subject: [PATCH] feat(docs): Improve documentation on Search (#5889) --- docs/how/search.md | 141 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 127 insertions(+), 14 deletions(-) diff --git a/docs/how/search.md b/docs/how/search.md index 6cef5cf889311..73d4ec8f1c2c3 100644 --- a/docs/how/search.md +++ b/docs/how/search.md @@ -1,43 +1,145 @@ -# Search Guide +import FeatureAvailability from '@site/src/components/FeatureAvailability'; -## Introduction +# About DataHub Search -The search bar is one of the means of finding data in Datahub. In this document, we discuss more effective ways of finding information beyond doing a standard keyword search. This is because keyword searches can return results from almost any part of an entity. + -### Search in Specific Fields + + + + +The **search bar** is an important mechanism for discovering data assets in DataHub. From the search bar, you can find Datasets, Columns, Dashboards, Charts, Data Pipelines, and more. Simply type in a term and press 'enter'. + +

+ +

+ +**Advanced queries** and the **filter sidebar** helps fine tuning queries. For programmatic users Datahub provides a **GraphQL API** as well. + +## Search Setup, Prerequisites, and Permissions + +Search is available for all users. Although Search works out of the box, the more relevant data you ingest, the better the results are. + +## Using Search + +Searching is as easy as typing in relevant business terms and pressing 'enter' to view matching data assets. + +By default, search terms will match against different aspects of a data assets. This includes asset names, descriptions, tags, terms, owners, and even specific attributes like the names of columns in a table. + + +### Filters + +The filters sidebar sits on the left hand side of search results, and lets users find assets by drilling down. You can quickly filter by Data Platform (e.g. Snowflake), Tags, Glossary Terms, Domain, Owners, and more with a single click. + +

+ +

+ +### Results + +Search results appear ranked by their relevance. In self-hosted DataHub ranking is based on how closely the query matched textual fields of an asset and its metadata. In Managed DataHub, ranking is based on a combination of textual relevance, usage (queries / views), and change frequency. + +With better metadata comes better results. Learn more about ingestion technical metadata in the [metadata ingestion](../../metadata-ingestion/README.md) guide. + +### Advanced queries + +The search bar supports advanced queries with pattern matching, logical expressions and filtering by specific field matches. + +The following examples are in the format of +X: *typical question* : ```what to key in search bar```. [sample url](https://example.com) Wildcard characters can be added to the search terms as well. These examples are non exhaustive and using Datasets as a reference. -I want to: -1. *Find a dataset with the word **mask** in the name* : +If you want to: +1. Find a dataset with the word **mask** in the name: ```name: *mask*``` [Sample results](https://demo.datahubproject.io/search?page=1&query=name%3A%20%2Amask%2A) This will return entities with **mask** in the name. Names tends to be connected by other symbols, hence the wildcard symbols before and after the word. -2. *Find a dataset with a property, **encoding*** +2. Find a dataset with a property, **encoding** ```customProperties: encoding*``` [Sample results](https://demo.datahubproject.io/search?page=1&query=customProperties%3A%20encoding%2A) Dataset Properties are indexed in ElasticSearch the manner of key=value. Hence if you know the precise key-value pair, you can search using ```key=value```. However, if you only know the key, you can use wildcards to replace the value and that is what is being done here. -3. *Find a dataset with a column name, **latitude*** +3. Find a dataset with a column name, **latitude** ```fieldPaths: latitude``` [Sample results](https://demo.datahubproject.io/search?page=1&query=fieldPaths%3A%20latitude) fieldPaths is the name of the attribute that holds the column name in Datasets. -4. *Find a dataset with the term **latitude** in the field description* +4. Find a dataset with the term **latitude** in the field description ```editedFieldDescriptions: latitude OR fieldDescriptions: latitude``` [Sample results](https://demo.datahubproject.io/search?page=1&query=editedFieldDescriptions%3A%20latitude%20OR%20fieldDescriptions%3A%20latitude) Datasets has 2 attributes that contains field description. fieldDescription comes from the SchemaMetadata aspect, while editedFieldDescriptions comes from the EditableSchemaMetadata aspect. EditableSchemaMetadata holds information that comes from UI edits, while SchemaMetadata holds data from ingestion of the dataset. -5. *Find a dataset with the term **logical** in the dataset description* +5. Find a dataset with the term **logical** in the dataset description ```editedDescription: *logical* OR description: *logical*``` [Sample results](https://demo.datahubproject.io/search?page=1&query=editedDescription%3A%20%2Alogical%2A%20OR%20description%3A%20%2Alogical%2A) Similar to field descriptions, dataset descriptions can be found in 2 aspects, hence the need to search 2 attributes. -6. *Find a dataset which reside in one of the browsing folders, for instance, the **hive** folder* +6. Find a dataset which reside in one of the browsing folders, for instance, the **hive** folder ```browsePaths: *hive*``` [Sample results](https://demo.datahubproject.io/search?page=1&query=browsePaths%3A%20%2Ahive%2A) BrowsePath is stored as a complete string, for instance ```/datasets/prod/hive/SampleKafkaDataset```, hence the need for wildcards on both ends of the term to return a result. -## Where to find more information? + + +### Videos + +**What can you do with DataHub?** + +

+ +

+ + +### GraphQL + +* [searchAcrossEntities](https://datahubproject.io/docs/graphql/queries/#searchacrossentities) +* You can try out the API on the demo instance's public GraphQL interface: [here](https://demo.datahubproject.io/api/graphiql) + +The same GraphQL API that powers the Search UI can be used +for integrations and programmatic use-cases. + +``` +# Example query +{ + searchAcrossEntities( + input: {types: [], query: "*", start: 0, count: 10, filters: [{field: "fieldTags", value: "urn:li:tag:Dimension"}]} + ) { + start + count + total + searchResults { + entity { + type + ... on Dataset { + urn + type + platform { + name + } + name + } + } + } + } +} +``` + + +### DataHub Blog +* [Using DataHub for Search & Discovery](https://blog.datahubproject.io/using-datahub-for-search-discovery-fa309089be22) + +## FAQ and Troubleshooting + +**How are the results ordered?** + +The order of the search results is based on the weight what Datahub gives them based on our search algorithm. The current algorithm in OSS DataHub is based on a text-match score from Elastic Search. + +**Where to find more information?** + The sample queries here are non exhaustive. [The link here](https://demo.datahubproject.io/tag/urn:li:tag:Searchable) shows the current list of indexed fields for each entity inside Datahub. Click on the fields inside each entity and see which field has the tag ```Searchable```. However, it does not tell you the specific attribute name to use for specialized searches. One way to do so is to inspect the ElasticSearch indices, for example: ```curl http://localhost:9200/_cat/indices``` returns all the ES indices in the ElasticSearch container. @@ -121,5 +223,16 @@ example information of a dataset: }, ``` + + +*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!* +### Related Features +* [Metadata ingestion framework](../../metadata-ingestion/README.md)