Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indices stats API should provide external refresh stats #36712

Closed
clandry94 opened this issue Dec 17, 2018 · 7 comments
Closed

Indices stats API should provide external refresh stats #36712

clandry94 opened this issue Dec 17, 2018 · 7 comments
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.

Comments

@clandry94
Copy link

clandry94 commented Dec 17, 2018

Describe the feature:

Currently, refresh stats are contained at the index level are an average of all shard refreshes within that index and exposed through the _stats api. Likewise, individual shards do collect refresh statistics, but only for INTERNAL refresh events and these statistics are not publicly accessible through the rest API.

I'm proposing the following two features to be added to elasticsearch:

  1. A mechanism or data structure to distinguish between EXTERNAL and INTERNAL refreshes within shards
    2) Extending the _stats to provide shard level refresh statistics. this is already implemented e.g.
GET /my_index/_stats/refresh

would respond with

{
  "_shards" : {
    "total" : 800,
    "successful" : 800,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "refresh" : {
        "total" : 3952969,
        "total_time_in_millis" : 325583210,
        "listeners" : 0
      }
    },
    "total" : {
      "refresh" : {
        "total" : 7898248,
        "total_time_in_millis" : 649856023,
        "listeners" : 0
      }
    }
  },
  "indices" : {
    "products.3" : {
      "uuid" : "rFXl6ARRXRIiXVjKlh4qLEw",
      "primaries" : {
        "refresh" : {
          "total" : 3952969,
          "total_time_in_millis" : 325583210,
          "listeners" : 0
        }
      },
      "total" : {
        "refresh" : {
          "total" : 7898248,
          "total_time_in_millis" : 649856023,
          "listeners" : 0
        }
      }
     "shards": [
       { 
         "id": 1,
         "refresh" : {
          "total" : 9050918,
          "total_time_in_millis" : 948194819,
          "listeners" : 0
        }
      },
      { 
         "id": 2,
         "refresh" : {
          "total" : 412740,
          "total_time_in_millis" : 4921479849,
          "listeners" : 0
        }
      },
     ]
    }
  }
}

Related issue and prior discussion with @s1monw #36541

I can take on this and start working on it if all looks good 👍

@albertzaharovits albertzaharovits added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Dec 17, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor

@clandry94 could you clarify how the change to the stats API differs from the statistics returned by GET /my_index/_stats/refresh?level=shards? I don't think we should return shard-level statistics without ?level=shards enabled, but if that is enabled we already seem to return what you propose.

(I'm ok with adding more stats to distinguish the two kinds of refresh as you describe, but that doesn't seem to be the main proposal here)

@clandry94
Copy link
Author

Hey @DaveCTurner. I wasn't aware of the level flag or that the scope of stats could be restricted to the shards. Thank you for that info. Are these refresh statistics for shards internal or external refreshes?

@DaveCTurner
Copy link
Contributor

I do not know this code intimately, but tracing through rather naïvely it seems they are internal refreshes:

Collections.singletonList(new RefreshMetricUpdater(refreshMetric)),

@s1monw
Copy link
Contributor

s1monw commented Jan 5, 2019

@clandry94 @DaveCTurner these are internal refreshes.

@clandry94
Copy link
Author

Just a quick update to this, I've finally gotten some time to work on collecting external refresh metrics and have something that I think is good. I'm planning on opening up a PR this coming week or next week. Sorry for the long delay 😄

@clandry94 clandry94 changed the title Indices stats API should provide shard level refresh stats Indices stats API should provide external refresh stats Feb 5, 2019
@dnhatn
Copy link
Member

dnhatn commented Mar 15, 2019

Close in favor of #38643.

@dnhatn dnhatn closed this as completed Mar 15, 2019
dnhatn pushed a commit that referenced this issue Mar 18, 2019
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates #36712
dnhatn pushed a commit to dnhatn/elasticsearch that referenced this issue Mar 23, 2019
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates elastic#36712
dnhatn added a commit that referenced this issue Mar 25, 2019
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates #36712
dnhatn pushed a commit that referenced this issue Mar 25, 2019
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates #36712
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

No branches or pull requests

7 participants