Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update template to use ES 5.x mapping #462

Merged
merged 11 commits into from
Aug 17, 2016

Conversation

suyograo
Copy link
Contributor

@suyograo suyograo commented Aug 4, 2016

Fixes #386

@untergeek
Copy link
Contributor

LGTM™. Let's see if the tests pass 😄

@untergeek
Copy link
Contributor

Super excited about this, btw. Definitely going to tweak some upgraders, but +1 for the clean start.

@suyograo
Copy link
Contributor Author

suyograo commented Aug 4, 2016

hmm, I have to use some rspec tag trickery to fix this.

@jordansissel
Copy link
Contributor

Let's block this until elastic/elasticsearch#19812 is resolved.

@suyograo
Copy link
Contributor Author

suyograo commented Aug 4, 2016

Aye. Stalling

@jordansissel
Copy link
Contributor

A few of us (@suyograo, @andrewvc, @talevy, etc) had discussions about this offline, and there were two main areas for discussion about the keyword name:

  1. The keyword name in this template was chosen to be consistent with Elasticsearch's use of the word. Further, the name keyword is maybe not the best word.
  2. The change from raw to keyword may negatively impact users who upgrade LS from 2.x to 5.x.

FOr the first part, we are discussing in elastic/elasticsearch#19812.

For the second part, we might leave this field named as raw to relieve stress on the upgrade experience. We may move to be consistent with Elasticsearch at a later date after we figure out how to minimize pains of upgrades.

@untergeek
Copy link
Contributor

In all honesty, I'd multi-field and have both raw and keyword. This forces users to adjust the template themselves if they're space conscious, and gives us both reverse and forward compatibility, should Kibana standardize on .keyword in future releases.

@suyograo
Copy link
Contributor Author

suyograo commented Aug 4, 2016

In all honesty, I'd multi-field and have both raw and keyword.

I am -1 on this. I wouldn't want users to incur extra disk space because of this. This multi-field is for every string, which is quite a bit.

@jordansissel
Copy link
Contributor

jordansissel commented Aug 8, 2016

Did some historical digging. Logstash v1.3.0 (December 2013) is the first version which shipped a template. Changing .raw has the opportunity to break schema with possibly 2.5 years of events stored in Elasticsearch with Logstash.

We discussed this last week somewhere I forget, but we had some thoughts for Logstash 5.0 timeframe:

Premise:

1.By default, Logstash will only install the template if there is no template already there.
2. We want to ship a template that more closely aligns with Elasticsearch's naming and defaults.
3. Elasticsearch now makes JSON strings dynamically mapped as both text and a multifield .keyword as keyword mapping type.

Therefore, I summarize a proposal as follows:

  1. We change the default template to change .raw to .keyword
  2. Advertise loudly in the changelog and release notes about this change.
  3. Document an upgrade path for existing 2.x users to upgrade to the new template, including instructions on how to reindex data to effect a rename of .raw to .keyword

Expected impact of default behavior as proposed:

  • New users: Get .keyword, align with Elasticsearch.
  • Upgrading users: Keep .raw, misalign with Elasticsearch.

@suyograo
Copy link
Contributor Author

@jordansissel I've changed this PR to reflect the proposal in #462 (comment). Can you take a look please? /cc @talevy

- INTEGRATION=true ES_VERSION=1.7.5
- INTEGRATION=true ES_VERSION=5.0.0-alpha1
- INTEGRATION=true ES_VERSION=2.3.4
- INTEGRATION=true ES_VERSION=5.0.0-alpha4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be alpha5 now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump version to alpha5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be increased to alpha5 since it has been released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahahaha. All of us saw the same thing 👍

@untergeek
Copy link
Contributor

LGTM™

"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
"keyword" : { "type": "keyword" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the idea was to let ES run through its defaults in the case that a stringed field is being dynamically mapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we need "norms" : false, to be set, which ES does not by default. That's why I had to explicitly add this rule for strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 2 lines above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. sounds good

@@ -1,6 +1,6 @@
require_relative "../../../spec/es_spec_helper"

describe "Ingest pipeline execution behavior", :integration => true, :version_5x => true do
describe "Ingest pipeline execution behavior", :integration => true, :version_greater_than_5x => true do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greater_than? and equal to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

@talevy
Copy link
Contributor

talevy commented Aug 17, 2016

lgtm

@suyograo suyograo merged commit b8c336d into logstash-plugins:master Aug 17, 2016
@suyograo suyograo deleted the fix/386 branch August 17, 2016 20:13
suyograo added a commit to suyograo/logstash-output-elasticsearch that referenced this pull request Sep 8, 2016
In PR logstash-plugins#462 (comment)
we used `.keyword` to match ES's change in default behavior.

Both approaches, using `.keyword` to match ES for consistency and using `.raw` to match
existing logstash data has pros/cons. I think at this time it is wise to err on the side of
easy-upgrade-path so reverting this back to using `.raw` as before

Fixes logstash-plugins#466
neumaics added a commit to MapQuest/logstash-output-elasticsearch that referenced this pull request Dec 19, 2016
Fix typo

Update template to use ES 5.x mapping (logstash-plugins#462)

* Update template to use ES 5.x mapping

Fixes logstash-plugins#386

remove `force` as valid version type parameter

remove trailing comma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants