Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Solr plugin to capture DataImportHandler #3682

Closed
mkboudreau opened this issue Jan 16, 2018 · 2 comments
Closed

Expand Solr plugin to capture DataImportHandler #3682

mkboudreau opened this issue Jan 16, 2018 · 2 comments

Comments

@mkboudreau
Copy link
Contributor

Feature Request

DataImportHandler data is important to track status of running imports. Counts, timestamps, and status fields related to this handler are all currently missing from the solr plugin.

Proposal:

Add a measurement, solr_dih that looks either (1) looks at the queryhandler class to determine whether to parse additional data fields and/or (2) make a separate request per core to get dataimport statuses.

Current behavior:

Solr's query handler is currently submitting a solr_queryhandler measurement to influxdb with the following fields:

	"15min_rate_reqs_per_second"
	"5min_rate_reqs_per_second"
	"75th_pc_request_time"
	"95th_pc_request_time"
	"999th_pc_request_time"
	"99th_pc_request_time"
	"avg_requests_per_second"
	"avg_time_per_request"
	"errors"
	"handler_start"
	"median_request_time"
	"requests"
	"timeouts"
	"total_time"

When the queryhandler value contains "class": "org.apache.solr.handler.dataimport.DataImportHandler", the stats object has additional data:

    "stats": {
      "Status": "IDLE",
      "Documents Processed": "java.util.concurrent.atomic.AtomicLong:107055",
      "Requests made to DataSource": "java.util.concurrent.atomic.AtomicLong:7",
      "Rows Fetched": "java.util.concurrent.atomic.AtomicLong:890632",
      "Documents Deleted": "java.util.concurrent.atomic.AtomicLong:0",
      "Documents Skipped": "java.util.concurrent.atomic.AtomicLong:0",
      "Total Documents Processed": "java.util.concurrent.atomic.AtomicLong:107055",
      "Total Requests made to DataSource": "java.util.concurrent.atomic.AtomicLong:7",
      "Total Rows Fetched": "java.util.concurrent.atomic.AtomicLong:890632",
      "Total Documents Deleted": "java.util.concurrent.atomic.AtomicLong:0",
      "Total Documents Skipped": "java.util.concurrent.atomic.AtomicLong:0",
      "handlerStart": 1516040508274,
      "requests": 28,
      "errors": 0,
      "serverErrors": 0,
      "clientErrors": 0,
      "timeouts": 0,
      "totalTime": 284413.586149,
      "avgRequestsPerSecond": 0.0003439662467888311,
      "5minRateRequestsPerSecond": 0.0029911294095871514,
      "15minRateRequestsPerSecond": 0.0010717048417562779,
      "avgTimePerRequest": 0.320567,
      "medianRequestTime": 0.320567,
      "75thPcRequestTime": 0.320567,
      "95thPcRequestTime": 0.320567,
      "99thPcRequestTime": 0.320567,
      "999thPcRequestTime": 0.320567
    }

Desired behavior:

One of the following two options would work.

  1. Grab the additional fields from the queryhandler response:

       "Status": "IDLE",
       "Documents Processed": "java.util.concurrent.atomic.AtomicLong:107055",
       "Requests made to DataSource": "java.util.concurrent.atomic.AtomicLong:7",
       "Rows Fetched": "java.util.concurrent.atomic.AtomicLong:890632",
       "Documents Deleted": "java.util.concurrent.atomic.AtomicLong:0",
       "Documents Skipped": "java.util.concurrent.atomic.AtomicLong:0",
       "Total Documents Processed": "java.util.concurrent.atomic.AtomicLong:107055",
       "Total Requests made to DataSource": "java.util.concurrent.atomic.AtomicLong:7",
       "Total Rows Fetched": "java.util.concurrent.atomic.AtomicLong:890632",
       "Total Documents Deleted": "java.util.concurrent.atomic.AtomicLong:0",
       "Total Documents Skipped": "java.util.concurrent.atomic.AtomicLong:0",
    
  2. Make separate requests to http://<solr host>/solr/<core name>/<dih handler path>?wt=json which returns data in the following format:

{
  "responseHeader": {
    "status": 0,
    "QTime": 0
  },
  "initArgs": [
    "defaults",
    [
      "config",
      "data-config.xml",
      "datasource", []
    ]
  ],
  "status": "idle",
  "importResponse": "",
  "statusMessages": {
    "Total Requests made to DataSource": "7",
    "Total Rows Fetched": "890632",
    "Total Documents Processed": "107055",
    "Total Documents Skipped": "0",
    "Full Dump Started": "2018-01-16 06:24:02",
    "": "Indexing completed. Added/Updated: 107055 documents. Deleted 0 documents.",
    "Committed": "2018-01-16 06:24:07",
    "Time taken": "0:4:44.365"
  }
}

Use case: [Why is this important (helps with prioritizing requests)]

Organizations that use the dataimporthandler to grab and index data need to monitor the status of those processes.

@mkboudreau
Copy link
Contributor Author

mkboudreau commented Jan 16, 2018

I have some working code that implements this feature; however, I'd like to open it up for discussion in case anyone (i.e. @ljagiello) has a strong opinion regarding the usefulness of this feature or the method of implementation for this feature.

@sjwang90
Copy link
Contributor

Closing this issue due to inactivity and the low priority of this issue.

If you would still like this feature, please check out our external plugins that can be used with execd to run seamlessly with Telegraf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants