-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(elasticsearch) Analytics indices creation on AWS ES #5502
fix(elasticsearch) Analytics indices creation on AWS ES #5502
Conversation
The script contains many copy-pasting and is not easy to follow. Add comments, extract commonly used operations into functions, unify approaches.
Fix the issue where Amazon OpenSearch (AWS ES) indices are incorrectly initialised and the Analytics screen shows errors only.
mention USE_AWS_ELASTICSEARCH env value if it seems it's set the wrong way
fi | ||
|
||
# path where index definitions are stored | ||
INDEX_DEFINITIONS_ROOT=/index/usage-event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
...ata-service/restli-servlet-impl/src/main/resources/index/usage-event/aws_es_usage_event.json
Outdated
Show resolved
Hide resolved
elif [ $RESOURCE_STATUS -eq 404 ]; then | ||
# resource doesn't exist -> need to create it | ||
echo -e "\n>>> creating $RESOURCE_ADDRESS ..." | ||
# use the given path as definition, but first replace all occurences of PREFIX with the actual prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments are super helper. thank you
|
||
elif [ $RESOURCE_STATUS -eq 401 ] || [ $RESOURCE_STATUS -eq 405 ]; then | ||
echo -e "\n>>> failed to GET $RESOURCE_ADDRESS ($RESOURCE_STATUS) !" | ||
echo "... make sure you have correct USE_AWS_ELASTICSEARCH env value set (current=$USE_AWS_ELASTICSEARCH)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why AWS specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having USE_AWS_ELASTICSEARCH
set to true
is the common issue several people have met, which then causes Analytics to go wrong. I observed the 401/405 errors to be linked with this particular misconfiguration and figured it might be helpful for people to be aware of the obvious fix. (See the log in Testing / Case 2 section of the PR description.)
However I can see this looks too specific for a general-purpose function. Let me add a condition checking whether $ELASTICSEARCH_URL
contains some aws-specific substring and display the message only then.
USAGE_EVENT_STATUS=$(curl -o /dev/null -s -w "%{http_code}\n" --header "$ELASTICSEARCH_AUTH_HEADER" "$ELASTICSEARCH_URL/${PREFIX}datahub_usage_event") | ||
if [ $USAGE_EVENT_STATUS -eq 200 ]; then | ||
USAGE_EVENT_DEFINITION=$(curl -s --header "$ELASTICSEARCH_AUTH_HEADER" "$ELASTICSEARCH_URL/${PREFIX}datahub_usage_event") | ||
# the definition is expected to contain "datahub_usage_event-000001" string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Consider adding to comment here)
if [[ $USAGE_EVENT_DEFINITION != *"datahub_usage_event-000001"* ]]; then | ||
# ... if it doesn't, we need to drop it | ||
echo -e "\n>>> deleting invalid datahub_usage_event ..." | ||
curl -s -XDELETE --header "$ELASTICSEARCH_AUTH_HEADER" "$ELASTICSEARCH_URL/${PREFIX}datahub_usage_event" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting an index in the setup job seems dangerous. We need to be EXTREMELY careful that this doesn't affect existing deployments. Can you explain a bit more why this drop is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will analytics necessarily not work if the conditions is met to enter this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if the index was previously created with USE_AWS_ELASTICSEARCH
not set, and then later switched to true
, Analytics tab doesn't work and no analytics events get ever recorded. (The name matches, but the contents do not.) The only solution (and the core fix of this PR) is to drop the wrong index and recreate it anew using aws_es_usage_event.json
as source.
This block is executed only if USE_AWS_ELASTICSEARCH=true
(we are on AWS) and only if there is an existing datahub_usage_event
index which doesn't contain the AWS-specific part — indicating the case described above.
What could go wrong? Maybe if the name of the index is later changed and this condition is not adjusted appropriately. Let me extract it to a constant to be safe and add some more comments in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping is probably too harsh here, we have an alias. If it's a mapping that needs to be fixed it should use the reindex API rather than a full drop of the index.
# create indices for ES (non-AWS) | ||
function create_datahub_usage_event_datastream() { | ||
create_if_not_exists "_ilm/policy/${PREFIX}datahub_usage_event_policy" policy.json | ||
create_if_not_exists "_index_template/${PREFIX}datahub_usage_event_index_template" index_template.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need a line like this here:
create_if_not_exists "${PREFIX}datahub_usage_event" aws_es_usage_event.json
to ensure that the index is actually created for non aws cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I do think there's a separate issue actively open where on fresh quickstarts without any usage events, the analytics index will be missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this works outside of AWS. The behavior in non-AWS environment should be the same as before refactoring.
Can you link the issue mentioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's this PR in question: #5974
- more comments - more defensive approach - index file renamed
🥅 Goal
Solve issue #5376 with analytics Elasticsearch indices being created incorrectly on AWS ES and the Analytics Datahub page then not working.
🔍 Details
When running against AWS Elasticsearch (aka Amazon OpenSearch), analytics indices tend to have problems (see issue #5376 or search Slack for
datahub_usage_event-000001
). This PR introduces three changes in thecreate-indices.sh
script:datahub_usage_event
index was created incorrectly (probably by GMS when running withUSE_AWS_ELASTICSEARCH
incorrectly not set), it drops it and recreates it. This is should help many struggling developers.USE_AWS_ELASTICSEARCH
should have been used after ES endpoint error and writes a hint about its usage.🧪 Testing
Building the modified
elasticsearch-setup-job
image and using it in my Datahub helm charts, then deploying using these charts.My setup uses Amazon Opensearch. Didn't test with the other case.
Case 1: clean slate
elasticsearch-setup-job log
Case 2: invalid index
USE_AWS_ELASTICSEARCH
not set -> elasticsearch-setup-job fails (see log below)elasticsearch-setup-job log
USE_AWS_ELASTICSEARCH=true
elasticsearch-setup-job log
Case 3: no-change
elasticsearch-setup-job log
☑️ Checklist