Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove index pattern mapping cache #6498

Closed
rashidkpc opened this issue Mar 10, 2016 · 38 comments
Closed

Remove index pattern mapping cache #6498

rashidkpc opened this issue Mar 10, 2016 · 38 comments
Labels
enhancement New value added to drive a business result Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:http high hanging fruit

Comments

@rashidkpc
Copy link
Contributor

Currently we cache a normalized view of the Elasticsearch mapping because large mappings are expensive to parse. This causes lots of other problems, for example when a user adds a field we don't see it unless they manually refresh the mapping cache, which is a non-obvious task. Basically any time the mapping changes, it becomes painful for the user: #2236, #6362

And its not just mapping changes, we still have issues with parsing the large mappings in the first place, the more indices the user has, the longer it takes. Thats why we do stuff like restrict the parsing to 5 indices by default. #1540, #2928

The crux of the issue goes back to the pre-beta1 days when Kibana 4 didn't have a server to offload this stuff to, so we do it in the browser. There's 3 things that would help this:

  1. Move index parsing to the server. The work on the ingest API is a good first step.
  2. Don't cache the mappings forever. Retrieving these from Elasticsearch is cheap, we can do it regularly on the server. We could cache in memory for a short period, but we don't need to keep them forever
  3. Get normalized mappings from Elasticsearch: Return an aggregated view of all mappings/properties of all types elasticsearch#15728

The first 2 we can do immediately, the last one would be an amazing optimization that would make everyone's life a lot easier and remove a lot of load and code from the Kibana backend.

@Bargs
Copy link
Contributor

Bargs commented Mar 11, 2016

I'll just leave this here: #5575

There's some extra cruft in there, but that PR already has most of 1 and 2.

@rfarley3
Copy link

+1

@Evertras
Copy link

Pre-baking Kibana instances becomes ugly with these caches. Being able to do it on the fly, even as an option for smaller instances, would be fantastic. +1

@hannayurkevich
Copy link

+1

@droberts195
Copy link
Contributor

+1

It would make life a lot easier for Prelert if Kibana just used mappings direct from Elasticsearch rather than having its own mappings.

@sophiec20
Copy link
Contributor

+1

@Bargs Bargs mentioned this issue Nov 4, 2016
3 tasks
@tbragin tbragin added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Feb 7, 2017
@epixa epixa removed the P1 label Apr 25, 2017
@tbuching
Copy link

tbuching commented Jul 5, 2017

+1
Do you have any ideas, when something like Solution 2 could land in the final product?

@epixa epixa added enhancement New value added to drive a business result and removed release_note:enhancement labels May 7, 2018
@JulienCarnec
Copy link

+1
Refreshing mapping automatically or expose some API to refresh index patterns would be nice too.

@fakenine
Copy link

fakenine commented Jun 6, 2018

+1
Do you have an ETA for a solution like an automatical refresh or API endpoint ?

@Hariharan-Gandhi
Copy link

Hariharan-Gandhi commented Jun 6, 2018

+1 API for refresh

@tarraschk
Copy link

+1

@fwininger
Copy link

Please, someone can give us some help ?

@AustinBGibbons
Copy link

+1 for API Refresh - also preserving the "popularity"

Or current approach is going to be to directly call

GET _plugin/kibana/api/index_patterns/_fields_for_wildcard?pattern=...
PUT _plugin/kibana/api/saved_objects/index-pattern/

in imitation of the network requests that we see in the refresh icon

@sgarg7
Copy link

sgarg7 commented Mar 28, 2019

Upgrading our users from from Kibana 3 to Kibana 6 and Kibana hangs when it tries to load an index with ~22K fields, even after the mappings have been cached - #32153
That's a big difference between how the new Kibana handles large indexes. Moving forward on this would be great, thanks.

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch

@sgarg7
Copy link

sgarg7 commented Jul 19, 2019

Are there any suggested workarounds for this error?

@Bargs Bargs removed their assignment Sep 18, 2019
@akshayurdh
Copy link

akshayurdh commented Oct 17, 2019

+1
Need Index pattern refresh API.

@AndrewMcQuerry
Copy link

+1

@alexios-y
Copy link

+1. Just realized it's been 5 years since the first issue was raised.

@fabrei
Copy link

fabrei commented Jan 15, 2020

+1 for API Refresh - also preserving the "popularity"

Or current approach is going to be to directly call

GET _plugin/kibana/api/index_patterns/_fields_for_wildcard?pattern=...
PUT _plugin/kibana/api/saved_objects/index-pattern/

in imitation of the network requests that we see in the refresh icon

That is fine if it would work. I wrote this two lines of bash script to do exactly the same requests as the browser sends to kibana backend (replace with your specific one).

refresh_payload=$(curl -X GET 'localhost:5601/api/index_patterns/_fields_for_wildcard?pattern=packets*&meta_fields=_source&meta_fields=_id&meta_fields=_type&meta_fields=_index&meta_fields=_score' | jq '.fields[] | . + {count: 0} | . + {scripted: false}' | jq -s '. | tostring | {"attributes": {"title": "packets*", "timeFieldName": "timestamp", "fields": . }}')

curl -X PUT 'localhost:5601/api/saved_objects/index-pattern/<index-id>' -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d "$refresh_payload"

I added a new field to my index template and updated all existing indices as well. After the curl requests, I get an answer that the index pattern was updated. But the pattern was not updated in my kibana dashboard. I checked if the added field is in the response of the first curl; it is. Also it is not possible to filter by the added field or create a chart. So I think the code for refreshing a pattern makes another request which is not tracked by my developer tool..But after a day passed, the pattern was successfully refreshed and I had access to the added field.

If you take a look at the refresh button in kibana settings (with a developer tool), you see that the button calls the function refreshFields(). I took a look into the code and found that you need an IndexPattern-object. This object has this specific method. In my case it would be nice to call refreshFields() manually from my plugin which I wrote. Actually I am experimenting, how I can initiate an IndexPattern-object. But does anyone already have an idea?

@mmguero
Copy link

mmguero commented Jan 15, 2020

UPDATE: Thanks to @fabrei in idaholab/Malcolm#100, he suggested something to make the script I pasted more robust. I've updated the link and the code here to reflect that (using a _find to get the index ID based on the index pattern name vs. just assuming they're the same):


Here's a python script I wrote to refresh my index pattern fields in my project:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import print_function

import argparse
import json
import requests
import os
import sys

GET_STATUS_API = 'api/status'
GET_INDEX_PATTERN_INFO_URI = 'api/saved_objects/_find'
GET_FIELDS_URI = 'api/index_patterns/_fields_for_wildcard'
PUT_INDEX_PATTERN_URI = 'api/saved_objects/index-pattern'

###################################################################################################
debug = False
PY3 = (sys.version_info.major >= 3)
scriptName = os.path.basename(__file__)
scriptPath = os.path.dirname(os.path.realpath(__file__))
origPath = os.getcwd()

###################################################################################################
if not PY3:
  if hasattr(__builtins__, 'raw_input'): input = raw_input

try:
  FileNotFoundError
except NameError:
  FileNotFoundError = IOError

###################################################################################################
# print to stderr
def eprint(*args, **kwargs):
  print(*args, file=sys.stderr, **kwargs)

###################################################################################################
# convenient boolean argument parsing
def str2bool(v):
  if v.lower() in ('yes', 'true', 't', 'y', '1'):
    return True
  elif v.lower() in ('no', 'false', 'f', 'n', '0'):
    return False
  else:
    raise argparse.ArgumentTypeError('Boolean value expected.')

###################################################################################################
# main
def main():
  global debug

  parser = argparse.ArgumentParser(description=scriptName, add_help=False, usage='{} <arguments>'.format(scriptName))
  parser.add_argument('-v', '--verbose', dest='debug', type=str2bool, nargs='?', const=True, default=False, help="Verbose output")
  parser.add_argument('-i', '--index', dest='index', metavar='<str>', type=str, default='sessions2-*', help='Index Pattern Name')
  parser.add_argument('-k', '--kibana', dest='url', metavar='<protocol://host:port>', type=str, default='http://localhost:5601/kibana', help='Kibana URL')
  parser.add_argument('-n', '--dry-run', dest='dryrun', type=str2bool, nargs='?', const=True, default=False, help="Dry run (no PUT)")
  try:
    parser.error = parser.exit
    args = parser.parse_args()
  except SystemExit:
    parser.print_help()
    exit(2)

  debug = args.debug
  if debug:
    eprint(os.path.join(scriptPath, scriptName))
    eprint("Arguments: {}".format(sys.argv[1:]))
    eprint("Arguments: {}".format(args))
  else:
    sys.tracebacklimit = 0

  # get version number so kibana doesn't think we're doing a XSRF when we do the PUT
  statusInfoResponse = requests.get('{}/{}'.format(args.url, GET_STATUS_API))
  statusInfoResponse.raise_for_status()
  statusInfo = statusInfoResponse.json()
  kibanaVersion = statusInfo['version']['number']
  if debug:
    eprint('Kibana version is {}'.format(kibanaVersion))

  # find the ID of the index name (probably will be the same as the name)
  getIndexInfoResponse = requests.get(
    '{}/{}'.format(args.url, GET_INDEX_PATTERN_INFO_URI),
    params={
      'type': 'index-pattern',
      'fields': 'id',
      'search': f'"{args.index}"'
    }
  )
  getIndexInfoResponse.raise_for_status()
  getIndexInfo = getIndexInfoResponse.json()
  indexId = getIndexInfo['saved_objects'][0]['id'] if (len(getIndexInfo['saved_objects']) > 0) else None
  if debug:
    eprint('Index ID for {} is {}'.format(args.index, indexId))

  if indexId is not None:

    # get the fields list
    getFieldsResponse = requests.get('{}/{}'.format(args.url, GET_FIELDS_URI),
                                     params={ 'pattern': args.index,
                                              'meta_fields': ["_source","_id","_type","_index","_score"] })
    getFieldsResponse.raise_for_status()
    getFieldsList = getFieldsResponse.json()['fields']
    if debug:
      eprint('{} would have {} fields'.format(args.index, len(getFieldsList)))

    # set the index pattern with our complete list of fields
    if not args.dryrun:
      putIndexInfo = {}
      putIndexInfo['attributes'] = {}
      putIndexInfo['attributes']['title'] = args.index
      putIndexInfo['attributes']['fields'] = json.dumps(getFieldsList)

      putResponse = requests.put('{}/{}/{}'.format(args.url, PUT_INDEX_PATTERN_URI, indexId),
                                 headers={ 'Content-Type': 'application/json',
                                           'kbn-xsrf': 'true',
                                           'kbn-version': kibanaVersion, },
                                 data=json.dumps(putIndexInfo))
      putResponse.raise_for_status()

    # if we got this far, it probably worked!
    if args.dryrun:
      print("success (dry run only, no write performed)")
    else:
      print("success")

  else:
    print("failure (could not find Index ID for {})".format(args.index))

if __name__ == '__main__':
  main()

@fabrei
Copy link

fabrei commented Jan 16, 2020

Thanks for sharing :) I don't know what I did wrong yesterday, but today my script works as well. I took a look into the management view for index patterns in my dashboard and the number of indexed fields had not been updated yesterday. Maybe the reload did not went totally right yesterday. As I know now which requests I have to do, I will integrate it in my plugin. But if someone knows how the already existing function refreshFields() can be used in a plugin and she/he shares this information, I would feel really happy about it =)

@mbudge
Copy link

mbudge commented Feb 18, 2020

+1
This would be very useful!

@slmingol
Copy link

slmingol commented Mar 6, 2020

A co-worker recently wrote a Golang app which does this for us:

@ikawalec
Copy link

ikawalec commented Jul 3, 2020

+1

@skmizuho
Copy link

skmizuho commented Aug 6, 2020

+1

1 similar comment
@debu99
Copy link

debu99 commented Aug 24, 2020

+1

@mattkime mattkime removed the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Oct 2, 2020
@Nilubkal
Copy link

+1

@mattkime mattkime closed this as completed Dec 2, 2020
@mattkime
Copy link
Contributor

mattkime commented Dec 2, 2020

No longer needed as field list is no longer cached - #82223 - will be released in 7.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:http high hanging fruit
Projects
None yet
Development

No branches or pull requests