Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(elasticsearch_index): create datahub_usage_event index where datahub_analytics_enabled set to false #5974

Merged

Conversation

GyuhoonK
Copy link
Contributor

@GyuhoonK GyuhoonK commented Sep 18, 2022

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

Setting global.datahub_analytics_enabled to false means not using the feature of DataHub Usage Analytics. However, If global.datahub_analytics_enabled is set to false, elasticsearch-setup job doesn't create any index. Not creating any index means you cannot use not only DataHub Usage Analytics but also Data Landscape Summary. This is because when you click Analytics tab, GMS check whether
ElasticSearch has the index named datahub_usage_event(precisely data_stream index).

So, developers who want to turn off only DataHub Usage Analytics feature and use Data Landscape Summary have no choice. They should turn both of them off or turn them on.

If elasticsearch-setup job could create index(datahub_usage_event, but not data_stream index) where global.datahub_analytics_enabled is set to false, We can keep Data Landscape Summary alive.

Thank you.

@github-actions github-actions bot added the devops PR or Issue related to DataHub backend & deployment label Sep 18, 2022
@github-actions
Copy link

github-actions bot commented Sep 18, 2022

Unit Test Results (build & test)

584 tests  ±0   580 ✔️ ±0   13m 18s ⏱️ +25s
143 suites ±0       4 💤 ±0 
143 files   ±0       0 ±0 

Results for commit 19263ba. ± Comparison against base commit 325b959.

♻️ This comment has been updated with latest results.

@pedro93 pedro93 self-assigned this Sep 19, 2022
@anshbansal anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Sep 20, 2022
@swaroopjagadish swaroopjagadish added product PR or Issue related to the DataHub UI/UX and removed devops PR or Issue related to DataHub backend & deployment labels Sep 20, 2022
@swaroopjagadish
Copy link
Contributor

@jjoyce0510 Please chime in here for next steps

@pedro93
Copy link
Collaborator

pedro93 commented Sep 20, 2022

Hello @GyuhoonK

Could you explain the reasoning behind the PR?
Why would you want Data Landscape Summary but not DataHub Analytics?

@GyuhoonK
Copy link
Contributor Author

@pedro93
ElasticSearch connected to DataHub cannot use data_stream feature in my case.
X-pack is not installed into my ElasticSearch.
So I have to turn off DataHub Analytics.

@pedro93
Copy link
Collaborator

pedro93 commented Sep 21, 2022

Can you clarify what you mean by:
ElasticSearch connected to DataHub cannot use data_stream feature in my case.

X-pack is not installed into my ElasticSearch. So I have to turn off DataHub Analytics.
X-pack is a security and monitoring module for Elastic, why does not having it present mean you have to turn off DataHub's Analytics page?

@GyuhoonK
Copy link
Contributor Author

Sorry, I make you confused.
I thought ElasticSearch can use data_stream only if X-pack installed.
data_stream is basic feature in current version(8.2) as you told.
My ElasticSearch is 7.10 version, and this version doesn't include data_stream as basic feature. X-pack is needed(ElasticSearch Guide[7.10]).

@pedro93
Copy link
Collaborator

pedro93 commented Sep 21, 2022

How does wanting to use Data Landscape Summary and not DataHub Analytics relate to wanting to use ElasticSearch's data_stream?

@GyuhoonK
Copy link
Contributor Author

GyuhoonK commented Sep 21, 2022

Acutally it is related to datahub_usage_event index.
Data Landscape Summary and DataHub Analytics are included in Analytics Tab. It makes me to turn both of them off.
I agree that if global.datahub_analytics_enabled is set to false, datahub_usage_event index is not created and DataHub Analytics is disabled.
I think Data Landscape Summary is not related to global.datahub_analytics_enabled. It is just showing summary, not user's log.
However, when I click Analytics in the situation where global.datahub_analytics_enabled is set to false, which means ElasticSearch doesn't have datahub_usage_event, web UI shows error log.
image
I want to turn off only DataHub Analytics. However, I cannot use Data Landscape Summary also. This is problem.
And I found this error log from GMS pods.

│ java.lang.RuntimeException: Search query failed:                                                                                                                                                                                                                                                                    
│     at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:265)                                                                                                                                                                                                 
│     at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.getTimeseriesChart(AnalyticsService.java:99)                                                                                                                                                                                                 
│     at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.getProductAnalyticsCharts(GetChartsResolver.java:77)                                                                                                                                                                                       
│     at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:50)                                                                                                                                                                                                             
│     at com.linkedin.datahub.graphql.analytics.resolver.GetChartsResolver.get(GetChartsResolver.java:37)                                                                                                                                                                                                             
│     at graphql.execution.ExecutionStrategy.fetchField(ExecutionStrategy.java:270)                                                                                                                                                                                                                                   
│     at graphql.execution.ExecutionStrategy.resolveFieldWithInfo(ExecutionStrategy.java:203)                                                                                                                                                                                                                         
│     at graphql.execution.AsyncExecutionStrategy.execute(AsyncExecutionStrategy.java:60)                                                                                                                                                                                                                             
│     at graphql.execution.Execution.executeOperation(Execution.java:165)                                                                                                                                                                                                                                             
│     at graphql.execution.Execution.execute(Execution.java:104)                                                                                                                                                                                                                                                      
│     at graphql.GraphQL.execute(GraphQL.java:557)                                                                                                                                                                                                                                                                    
│     at graphql.GraphQL.parseValidateAndExecute(GraphQL.java:482)                                                                                                                                                                                                                                                    
│     at graphql.GraphQL.executeAsync(GraphQL.java:446)                                                                                                                                                                                                                                                               
│     at graphql.GraphQL.execute(GraphQL.java:377)                                                                                                                                                                                                                                                                    
│     at com.linkedin.datahub.graphql.GraphQLEngine.execute(GraphQLEngine.java:90)                                                                                                                                                                                                                                    
│     at com.datahub.graphql.GraphQLController.lambda$postGraphQL$0(GraphQLController.java:94)                                                                                                                                                                                                                        
│     at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)                                                                                                                                                                                                                          
│     at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)                                                                                                                                                                                                                         
│     at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)                                                                                                                                                                                                                                              
│     at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)                                                                                                                                                                                                                                  
│     at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)                                                                                                                                                                                                                                          
│     at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)                                                                                                                                                                                                                                 
│ Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]                                                                                                                                                     
│     at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)                                                                                                                                                                                                                       
│     at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)                                                                                                                                                                                                                      
│     at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)                                                                                                                                                                                                           
│     at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)                                                                                                                                                                                                           
│     at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)                                                                                                                                                                                                                   
│     at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)                                                                                                                                                                                                     
│     at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)                                                                                                                                                                                                                           
│     at com.linkedin.datahub.graphql.analytics.service.AnalyticsService.executeAndExtract(AnalyticsService.java:260)                                                                                                                                                                                                 
│     ... 21 common frames omitted                                                                                                                                                                                                                                                                                    
│     Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://elasticsearch-master:9200], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query  
│ Warnings: [Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.16/security-minimal-setup.html to enable security., [ignore_throttled] parameter is deprecated because froz  
│ {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahub_usage_event]","resource.type":"index_or_alias","resource.id":"datahub_usage_event","index_uuid":"_na_","index":"datahub_usage_event"}],"type":"index_not_found_exception","reason":"no such index [datahub_usage_even  
│         at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)                                                                                                                                                                                                                                 
│         at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)                                                                                                                                                                                                                                  
│         at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)                                                                                                                                                                                                                                  
│         at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)                                                                                                                                                                                                       
│         ... 25 common frames omitted                                                                                                                                                                                                                                                                                
│ 15:32:49.579 [ForkJoinPool.commonPool-worker-10] ERROR c.datahub.graphql.GraphQLController:98 - Errors while executing graphQL query: "query getAnalyticsCharts {\n  getAnalyticsCharts {\n    groupId\n    title\n    charts {\n      ...analyticsChart\n      __typename\n    }\n    __typename\n  }\n}\n\nfragm

It happens because ElasticSearch doesn't have datahub_usage_event index(bc elasticsearch-setup didn't create template). If ElasticSearch has datahub_usage_event index, I can see Data Landscape summary, so I suggest creating datahub_usage_event index (not created from template using data_stream, just index) where global.datahub_analytics_enabled set to false.

And data_stream is related to this question.
Q. Why do you want not to use DataHub Analytics?
A. I cannot use data_stream on my ElasticSearch, because its version is 7.10 and X-pack is not installed. I have no choice.

@pedro93
Copy link
Collaborator

pedro93 commented Sep 21, 2022

So if I understand you correctly… you can not use DataHub Analytics because your Elastic cluster does not support x pack but still want to have Data Summary Landscape by changing the way the datahub_usage_event index works?

@jjoyce0510
Copy link
Collaborator

That is my interpretation. I think this PR makes sense. Landscape is indeed separate from usage tracking.

Going to approve!

@GyuhoonK
Copy link
Contributor Author

@pedro93
Yes. Exactly what I mean.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but needs a tweak to the curl calls to align with the rest.

docker/elasticsearch-setup/create-indices.sh Outdated Show resolved Hide resolved
docker/elasticsearch-setup/create-indices.sh Outdated Show resolved Hide resolved
@GyuhoonK
Copy link
Contributor Author

@shirshanka
I add insecure mode. please check it!

Copy link
Collaborator

@pedro93 pedro93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM let’s wait for green ci before merging

@GyuhoonK
Copy link
Contributor Author

@pedro93
It didn't pass smoke test..

Unable to run quickstart - the following issues were detected:
- kafka-setup is still running

If you think something went wrong, please file an issue at https://github.com/datahub-project/datahub/issues
or send a message in our Slack https://slack.datahubproject.io/
Be sure to attach the logs from /tmp/tmpqri39__g.log
Error: Process completed with exit code 1.

is there any issue on kafka setup?

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjoyce0510 jjoyce0510 merged commit b45d5eb into datahub-project:master Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants